Deep learning is the most advanced subset of
artificial intelligence. Also known as “deep neural
networks,” it takes inspiration from how the
human brain works.
Namely, the more data that is fed into the machine
the better it is at intuitively understanding the
meaning of new data. It, therefore, does not
require a (human) expert to help it understand
the significance of new features.
Data scientists prepare data samples that are used for training the ‘brain’ or the deep neural network. Those data samples contain millions of labeled files, malicious and benign – including malware “mutations”, scripts, macros etc.
The “self-learning loop” is a process during which the deep neural network is exposed to all the available “raw data” in a file, from which it learns to instinctively identify malicious code. The incorporation of GPUs (Graphic Processing Units) has enabled this training phase to be dramatically reduced. Training takes place within 24-48 hours, instead of months.
As the training phase goes by, the brain begins to instinctively detect and identify malware by scanning the “DNA” or all of the raw features in a file.
The deep neural network has now reached the prediction stage where it can quickly and efficiently predict whether or not a file is a threat. The input agnostic algorithm can apply this knowledge to any sort of file.
This phase compresses the brain with all its abilities into a lightweight and yet highly powerful agent. Turning TeraBytes of insights into MegaBytes of instincts.
The input agnostic algorithm means that the satellite agent can be inserted into any organizational domain, be it end-points and servers or even mobile devices.
Once deployed the agent checks every file, script, macro, etc before execution. The process is very fast (less than a millisecond) causing no discernable impact on user experience or system performance.
The agent is constantly working to detect and prevent any type of malware, whether they’re already known or zero-days and APTs. This means benign files can run while malicious files are prevented.
How does Deep Instinct use deep learning to
protect against zero-day threats?
Using deep learning software and technology, we are able to detect and prevent never seen before and unknown threats, such as zero-days and APTs.
Deep Instinct’s solution is based on a two-phase approach; similar to the way the brain learns and then acts in an instinctive mode.
• Training phase: The training process is performed with hundreds of millions of malicious and legitimate files that take place at Deep Instinct’s headquarters. The output of this process is the prediction model.
• Prediction phase: Once a device has the deep learning prediction model (D-Brain), it becomes an autonomous analysis entity, allowing it to predict in real-time malicious intents and prevent them from executing. There is no need for any supplementary analysis in a remote server or sandboxing appliance.
The entire analysis and the determination of whether it is malicious or benign, are done on the device within milliseconds, effectively enabling zero-time detection.
How does Deep Instinct build their dataset for training the neural network algorithm?
Training is performed on hundreds of millions of files, half of them malicious and half benign.
The malicious part of the dataset is sourced from different families, representing different attack scenarios and malicious behaviors.
Files are gathered from the following sources:
• Premium repositories: Third-party threat intelligence malware feeds, premium services, malware exchange collaborations.
• Public repositories: Open-source repositories, trackers, etc.
• Darknet: Specific threats collected and bought manually, also from known leads such as exploit kits and from specific leads and forums.
• Deep Instinct Research Lab: New threats that were developed by creating new malware mutations, using proprietary internal tools developed by Deep Instinct and third-party tools found in the cybersecurity industry.
How frequently does Deep Instinct publish a new prediction model?
Approximately twice a year.
When Deep Instinct produces a new deep learning prediction model, the D-Appliance receives the update and distributes the brain to all the D-Clients. This is different from AV solutions that require several updates per day, and EDR solutions that requires continuous connectivity in order to receive threat intelligence feeds. With Deep Instinct’s solution, an update is provided once every few months as this is all that is needed to achieve its high prevention rates. According to our tests, if you don’t update the prediction model for 6 months, the detection rate deteriorates by less than 1%.
How does Deep Instinct conduct feature extraction and engineering? What kind of features are analyzed?
Unlike machine learning-based soltuions, our deep learning-based solution does not involve any feature extraction at all. Similar to the relevant extraction required for image recognition, which uses the raw data of images (pixels), we use the raw data from files.
Is there any training of the deep neural network on the customer’s system?
No, Deep Instinct provides the customer with a solution that has already been trained and provides immediate protection. All training is performed at Deep Instinct’s lab.
How long is the training phase of the deep neural network and where does it take place?
With the use of high-performance servers with GPUs, the training phase typically takes about 24-48 hours. Training occurs at Deep instinct’s Research Lab, and the D-Client on the device encompasses the prediction model, which is the output of the training phase.
Can Deep Instinct classify malware into categories?
Yes, the deep learning model autonomously classifies identified malware into one of seven categories using the Deep Classification module; Ransomware, Worm, Virus, Dropper, Spyware, Backdoor and PUA.
Does the deep learning process use static or dynamic analysis?
Currently, the deep learning process is applied to static analysis at the endpoint.
Deep Instinct is currently developing the option to also perform a dynamic analysis.
In addition, every malicious file detected or prevented is uploaded to the D-Appliance (optional, as defined in the policy) in order to run additional static and dynamic analyses to provide additional forensic information.