The AV Era: signature based only, labor intensive. Effective only against known threats
The Machine Learning Era: feature based only, cyber expert is needed. Limited knowledge, high false-positive rate
The Deep Learning Era: raw data based, fully autonomous. highest detection rates, lowest false positive rates
The era of Antivirus based solutions: AV software isolates suspicious files based on existing file signatures, heuristic analysis and file reputation . This is only effective against known malicious threats and vulnerabilities.
As AI technologies start to mature, we enter the era of Machine Learning: Endpoint protection, detection & response made possible by machine learning based static analysis, heuristic based behavioral analysis and memory protection – indeed a big step forward but still not optimal: Machine Learning systems rely on feature extractions, limited by the knowledge of the security expert, have low detection rates of new malware and high false-positive rates.
Enter AI and Deep Learning: harnessing the power of Deep Learning results in a fully autonomous system that can learn from any raw data, not limited by an expert’s technological knowledge, and it represents a quantum leap in cyber security. This is autonomous and advanced endpoint protection that provides prevention, detection and response with the highest detection rates of unknown malware; lowest false-positive rate; and zero-time to protect any new device, endpoint or server, and any operating system, against any file/file–less attacks, with low impact on performance.
Feature engineering & extraction
Requires a human domain expert which
defines and engineers features to conduct
Looks at the raw data in a fully autonomous manner.
Only 2.5-5% of available data is being used
By converting the data into small vector of
features, e.g. statistical correlations, it is
inevitably throwing away most of the data.
Processes 100% of available raw data
Processes 100% of the data, with a massive
number of characteristics (e.g. pixels,
waveforms or bytes) in order to make a
Limited file types are covered
Features selected by a human domain expert can only lean on simple linear properties, limiting the algorithm and neutralizing other correlations or patterns, which could not be rationalized by the features that were pre-defined.
Using raw data provides the ability to find non-linear correlations between data that are too complex for human to define- even for the top experts in the field.
Limited file types are covered (only PE)
Today, only PE files are supported. Feature extraction is a completely different process for file format, making the effort and knowledge commence from scratch.
Any new file types
Content (and therefore to file types) agnostic, without requiring substantial modifications or adaptations.
Can only detect known attacks
Due to the need to define specific features, unknown attack vectors won’t be able to be detected.
Detects unknown attacks
The algorithm understands and defines by itself what is relevant or not in order to predict attacks.
Suffers from human errors
A human domain expert needs to define the features. In case of a human error, the feature can be missed.
No human errors
Fully autonomous, there are no human errors.
Limited visibility of the content
Focuses mostly on file headers and other statistics ones, overlooking other data within the content, resulting in missing attacks.
Looks at the whole content of the files
As there is no usage of engineered features, the algorithm looks at the whole content of the file.