The AV Era: Based on signatures and heuristics. Highly labor intensive and only effective against known threats.
The Machine Learning Era: Now able to detect zero day malware. However detection capabilities are only based on human selected features and is therefore limited.
The Deep Learning Era: Even higher detection rates are achieved with the ability to skip human engineering and analyze all the available raw data in a file.
The era of Antivirus solutions: The AV software isolates suspicious files based on existing file signatures, heuristic analysis and file reputation. This is only effective against known malicious threats and vulnerabilities.
As AI technologies start to mature, we enter the era of Machine Learning: Endpoint protection, detection & response is made possible by machine learning-based static analysis, heuristic behavioral analysis, and memory protection. Indeed a big step forward, but still not optimal. Machine Learning systems rely on feature engineering which is limited to the knowledge of the security expert who has to handcraft the features for detection. Machine learning-based solutions are still producing low detection rates for new malware and high false-positive rates.
Enter Deep Learning: The autonomy of the training and prediction stages are enhanced with Deep Learning, so that the algorithm can analyze all the raw data in a file, and is not limited by an expert’s capabilities. This represents a quantum leap in computer science. For cybersecurity this enables a more advanced level of protection; with higher detection rates of unknown malware, lowest false-positive rates and the ability to detect prior to execution, effectively in zero-time.
Feature engineering & extraction
Requires a human domain expert to
define and engineer features for conducting
Looks at all the raw data in a fully autonomous manner.
A fraction of available data is analyzed
By converting the data into small vector of
features, e.g. statistical correlations, it is
inevitably throwing away most of the data.
Processes 100% of available raw data
Processes 100% of the data, with a massive
number of characteristics (e.g. pixels,
waveforms or bytes) in order to make a
Limited file types are covered
Features selected by a human domain expert can only lean on simple linear properties, limiting the algorithm and neutralizing other correlations or patterns, which could not be rationalized by the features that were pre-defined.
Using raw data provides the ability to find non-linear correlations between data that are too complex for human to define- even for the top experts in the field.
Limited file types are covered (only PE)
Today, only PE files are supported. Feature extraction is a completely different process for file format, making the effort and knowledge commence from scratch.
Any new file types
Content (and therefore to file types) agnostic, without requiring substantial modifications or adaptations.
Can only detect known attacks
Due to the need to define specific features, unknown attack vectors won’t be able to be detected.
Detects unknown attacks
The algorithm understands and defines by itself what is relevant or not in order to predict attacks.
Suffers from human errors
A human domain expert needs to define the features. In case of a human error, the feature can be missed.
No human errors
Fully autonomous, there are no human errors.
Limited visibility of the content
Focuses mostly on file headers and other statistics ones, overlooking other data within the content, resulting in missing attacks.
Looks at the whole content of the files
As there is no usage of engineered features, the algorithm looks at the whole content of the file.