What can deep learning do for you? A better question is, what can’t it do? Compared with the various earlier incarnations of artificial intelligence and machine learning, the principles of deep learning really knock the ball out of the ballpark.
This article explores why deep learning works so much better in the real world than other methods of machine learning. Then it takes a sector-by-sector journey through the many ways deep learning has had an amazing impact on the world. It details the deep learning advantages in computer vision, and explores how deep learning has advanced the ability of computers to analyze and understand text. It documents the advances deep learning has brought to speech recognition as well as synthesis. It spells out how deep learning is advancing the popular world of computer gaming. And, of vital importance, it outlines why deep learning may be the ultimate answer to the ever-growing threats to cybersecurity.
To apply traditional machine learning to any problem, you first must perform a lot of pre-processing. In particular, you have to determine in advance which are the important properties or features in the problem domain. As explained in more detail in Chapter 2, this process requires manual feature specification, and you end up disregarding most of the raw data.
That article’s example of a dog detector, shown here in Figure 5-1, shows how this works.
But any dog lover will tell you a dog is a
whole lot more than a bunch of numbers. Even with the best feature specifications, it simply isn’t possible to grasp the complex patterns in the data.
Deep learning, on the other hand, doesn’t rely on feature extraction. It’s the first family of methods within machine learning that doesn’t need it, and at the moment it’s still the only one.
Instead of human experts explicitly specifying the features beforehand, deep neural networks use their deep hierarchy of layers to learn the complex features by themselves. The idea is illustrated in Figure 5-2. This is very similar to how the human brain learns new concepts by being exposed to new data.
This robustness of deep learning has brought about great improvements in most benchmarks of computer vision, speech recognition, language understanding, and other domains. In past years, improvements were gradual, spread over the course of many years. Deep learning has been creating benchmark improvements of 20 to 30 percent a year.
With deep learning, many tasks previously viewed as impossible are now achievable. Add it all together and you can view deep learning’s contribution as the greatest leap ever in the history of artificial intelligence.
Here’s how it was summed up by Geoffrey Hinton, considered to be the father of deep learning. Honoring a career dedicated to neural network research, he was presented the IEEE/RSE James Clerk Maxwell Medal in 2016, and this is what he said in his acceptance speech:
Fifty years ago, the fathers of artificial intelligence convinced everybody that logic was the key to intelligence. Somehow we had to get computers to do logical reasoning. The alternative approach, which they thought was crazy, was to forget logic and try and understand how networks of brain cells learn things. Curiously, two people who rejected the logic-based approach to AI were Turing and Von Neumann. If either of them had lived I think things would have turned out differently . . .. Now neural networks are everywhere and the crazy approach is winning.
Just what kind of impact has deep learning had in the real world? Read on for examples of how it has revolutionized nearly every field to which it has been applied.
Some of the most dramatic improvements brought about by deep learning have been in the field of computer vision. For decades, computer vision relied heavily on image processing methods, which means a whole lot of manual tuning and specialization. Deep learning, on the other hand, ignores nearly all traditional image processing, and it has resulted in dramatic improvements to every computer vision task.
ImageNet is a great example. It’s the largest publicly available dataset of labeled images, with more than 10 million images sorted into a thousand different classes. Since 2010, there’s been an annual ImageNet Large Scale Visual Recognition Challenge, aiming to measure the classification accuracy of different computer vision models. Accuracy is measured on a test set of images that have not previously been used for training the models.
These are real-world images, many of which show more than a single object. Each predicting module is allowed a total of five guesses from that list of a thousand different categories, and if one of them is correct, it is declared that the image has been classified correctly. The final results are measured in terms of classification error rate, which is the percentage of images classified incorrectly.
The results are illustrated in Figure 5-3. In 2011, the best computer vision models relying on traditional machine learning and image processing obtained a 25 percent error rate. In 2012, when a deep neural network joined the competition, the error rate dropped to 16 percent, and since then deep learning has cut the error rate to 4 percent or less.
Wow, that’s almost as good as what a person could do, right? Actually, it’s even better. As a comparison, humans typically achieve an error rate of about 5 percent in this challenge. The bottom line is that deep learning has cut the error rate by 20-plus percentage points, and has now even surpassed human accuracy!
So, what kinds of things can computer vision recognize with the help of deep learning? Today, all state-of-the-art object recognition modules rely solely on deep learning. Google Photos is a prime example. It automatically uses deep learning to classify images and group them together. Because of deep learning, you can search your Google Photos albums for “Cavalier King Charles Spaniel,” and it provides all the relevant results, even if you have not done any manual labeling.
Find that hard to believe? Just check out Figure 5-4. As you can see, in most of the images the dog is not clearly visible, but Google Photos saw it. Traditional non-deep learning modules would have great difficulty detecting that there is a dog in the image, let alone accurately classifying its breed.
Although different categories of objects are visually very different from one another — cars, for example, really don’t look like dolphins — faces are much more similar to each other, with differences that often are very subtle. For decades, face recognition software relied on years of image processing methods that improved only gradually and incrementally. Today, deep learning has resulted in a huge improvement in the accuracy of face recognition, without relying on traditional image processing features.
End-to-end deep learning can be applied to practically any computer vision task involving classification. For example, artist classification is an interesting problem — can deep learning take a look at a painting and identify who painted it? Traditional image processing has worked its way up to 78 percent accuracy on a test set of three painters: Renoir, Rembrandt, and van Gogh. In 2016, deep learning succeeded in improving the accuracy to 96 percent, without relying on any feature due to image processing.
Deep learning’s huge accuracy improvement in computer vision has resulted in numerous real-world breakthroughs. These days deep learning is performing on a par with human radiologists in detecting many forms of cancer, and it’s widely used in medical image analysis. A company known as Zebra Medical, for example, is one of the leading organizations using deep learning for medical image analysis.
And then there’s deep learning behind the wheel. All of today’s state-of-the-art autonomous driving modules rely on deep learning, and their accuracy and safety measures will soon exceed those of human drivers.
In all these example areas, traditional machine learning was given a try before deep learning took its turn, and the application of deep learning resulted in a huge improvement. Beyond that, deep learning has been tackling issues that were previously considered completely intractable.
Imagine that you take a nice picture, and want to turn it into something resembling a painting. Your favorite painting is van Gogh’s The Starry Night, or perhaps Edvard Munch’s The Scream. It would be great to turn your photo into a painting in the specific style of those classics.
In 2015, researcher Leon Gatys and colleagues used deep learning for what they called “artistic style transfer.” They described how deep learning can be used to learn the artistic style of a painting, and then use that knowledge to transform another existing picture into a painting. Figure 5-5 shows an experiment using the same technique.
The top-left image is the original photo. Each of the other images is a transformation of the original photo, turned into a painting based on a particular style.
For nearly all computer vision tasks, convolutional neural networks are used most often. That’s due to the presence of location correlations in the input data.
During the past few years, deep learning has been successfully applied to numerous problems in text analysis and understanding. These include document classification, sentiment analysis, automatic translation, and that kind of thing, with usually dramatic improvements. Recurrent neural networks are especially useful here, because of the sequential nature of textual data. One of the most important contributions in this area has been deep learning’s ability to train a language model from raw text data. Imagine that you have large amount of text in a certain language — let’s say it’s a dataset a billion characters long. You can train a neural net that receives a character and tries to predict what the next character is going to be. At first it simply guesses random characters, but it gradually learns the vocabulary in this language. Then, to improve its prediction accuracy, it learns grammar, context, and other important traits. The higher the accuracy at this “next character prediction” becomes, the better it understands the language. It is developing a better language model.
Deep learning language models can even be trained together with deep learning models for computer vision, providing results that until just recently were considered impossible in the near future. For example, image captions can be generated as the result of a deep learning model. They don’t rely on any manual image processing or natural language processing. Just the fact that the caption is a correct English sentence is amazing in itself — after all, nobody taught English to the model. It learned the language by itself by training on large amounts of English text. The understanding of what’s happening in the image, combined with the use of language to describe it, is incredibly close to what humans can do.
Still more amazing are the results of training a deep learning model to answer questions about an image it sees. This problem is more complex, because the model needs to understand the question, know where to look in the image to find the answer, find it, and then use language to accurately provide the answer.
Deep learning can also be used to generate a completely new image based on a text description. These images can be created entirely by a neural network, pixel by pixel, without relying on any previous image.
Speech recognition includes several major families of problems. The most widely researched is voice to text, or taking the spoken word and turning it into text on the screen. The problem may not seem all that complex at first glance, because it seems like it’s just a matter of converting each sound to a corresponding character. In fact, though, it’s one of the most complex areas in signal processing.
The auditory cortex in our brain is trained over several years in childhood to recognize voice and convert it to language, and humans become very good at this, despite the fact that completely different sentences can sound very similar vocally. An example Geoffrey Hinton frequently cites involves the phrases “recognize speech” and “wreck a nice beach.” They certainly sound very similar, but their meaning is completely different, and humans can only tell the difference because they understand the language and are always looking for context clues. In the same way, in order to perform speech recognition, a model needs to have a good understanding of the underlying language and context.
While the progress in speech recognition has been incremental over many decades, in recent years deep learning has revolutionized this field in the same way it has moved others into the future. Traditional speech recognition relied on cumbersome feature extraction processes, which were limited in their nature. Deep learning, on the other hand, is capable of directly operating on raw data, and being trained on large datasets of audio recording. It can exceed the accuracy of traditional models by a huge margin, with accuracy improvement of 20 to 30 percent.
Today most smart assistants rely on deep learning, and their understanding level is rapidly increasing in question answering tasks. Google Assistant, which relies almost entirely on deep learning, has the highest accuracy in the latest benchmarks, followed by continuously improving smart assistants from Microsoft (Cortana), Amazon (Alexa), and Apple (Siri).
Deep learning has also been successfully applied to speech generation or synthesis, often known as text to voice. Recently, Google DeepMind presented a novel method called WaveNet for directly training deep learning models on raw audio so that they can generate their own raw audio. Their results show near human performance for voice and speech generation.
Speaker recognition — or recognizing who is talking — is another area where deep learning has improved accuracy substantially. This is especially important for national security. Fifth Dimension, one of the leading developers of investigation platforms based on deep learning, successfully employs speech recognition such that a terrorist making an anonymous phone call can be identified by matching his voice sample against a large dataset of known voices.
Since the dawn of computer science, computer chess was an especially challenging problem. Goethe called chess “the touchstone of the intellect,” and Alan Turing, the forefather of modern computer science, designed the first chess-playing algorithm before he could even run it on any computer.
Computer chess, while being one of the most researched fields within AI, has not lent itself well to the successful application of conventional learning methods, because of its enormous complexity.
In a recent work titled “DeepChess,” which won the Best Paper Award at the International Conference on Artificial Neural Networks, my co-authors and I demonstrated how end-to-end deep learning could be applied for training a chess-playing program, without any prior knowledge. By merely training on millions of chess positions taken from grandmaster games, the program reaches a super-human performance level. Figure 5-6 shows some moves selected by DeepChess, which cannot be found by most regular chess programs.
The game of Go is another complex game, which for many years could not be tackled by any traditional machine learning approach. Google DeepMind used deep learning to train its “AlphaGo” program and defeat Lee Sedol, one of the strongest human Go players.
One of the most crucial real-world problems today, one that concerns every large and small company, is cybersecurity. More than a million new malware threats (malicious software) are created every single day, and sophisticated attacks are continuously crippling entire companies — or even nations — by targeting critical national infrastructures, as would happen in the case of nationstate cyberattacks.
There are many, many cybersecurity solutions out there, but all are struggling to detect new malware. It’s easy to mutate a malware and evade detection by even the most sophisticated cybersecurity solutions, which perform dynamic analysis on files and use traditional machine learning.
For nearly two decades, antivirus solutions mainly relied on signatures to detect malicious files. In their simplest form, the signatures could be a list of file hashes. In more sophisticated cases, such as most advanced antivirus solutions today, they detect the presence of certain features in files, such as a string that is associated with a malicious file family.
Although antivirus solutions today are quite effective for protecting against previously existing malware, they are incapable of detecting the millions of new malicious files that are continuously created. Due to these severe limitations, in the past few years a new generation of more advanced solutions have emerged, focusing on the detection of new malware.
Most of these “next gen” cybersecurity solutions use sandboxing, which is the dynamic analysis of suspected files. This is a lengthy process and it can’t be used for threat prevention, only detection. Detection means finding and stopping the malware after it has already started running and has potentially caused damage, while prevention means stopping the malicious file before it is able to start running in the first place.
Many of these solutions also rely on machine learning to increase their detection rates. Applying traditional machine learning in this case can require several years of effort devoted to feature extraction. For example, given a Windows executable file, what are its most important features? The most obvious features would be function calls (API), strings, and tens or hundreds of additional handcrafted features.
This feature extraction phase has several severe limitations that become particularly evident in cybersecurity:
On the face of it, deep learning addresses all the limitations of traditional machine learning in cybersecurity. Specifically, deep learning processes raw data and does not rely on feature extraction. That doesn’t make it easy, though. Applying deep learning is much more challenging in the domain of cybersecurity.
For example, unlike in computer vision, where different image sizes can be adjusted to a pre-specified size and fed into a neural network, a computer file can be of any size, from a few kilobytes up to many gigabytes. Also, different file formats have different file structures, and none of these structures has any obvious local correlations that could be used by neural network types such as convolutional neural networks.
Despite these challenges, deep learning has been successfully applied to cybersecurity. Deep Instinct has demonstrated how a dedicated deep learning framework adapted specifically for cybersecurity can overcome the difficulties mentioned in the preceding section and can train a deep learning model on raw files.
The training phase is performed in the laboratory, using hundreds of millions of malicious and legitimate files of different file formats. This training process takes only a single day or so using GPUs. After the training has converged, the resulting deep learning model is only a few tens of megabytes in size, and it can provide a prediction for any given file within a few milliseconds. And it achieves that speed on the average CPU. The GPU is used only in the training phase, not the prediction phase. Because of that, it can be deployed on any endpoint using only a negligible amount of resources, and provide full pre-execution prevention.
The deep learning-based model is capable of obtaining a much higher detection rate and a much lower false-positive rate for new, previously unseen files, when compared with the best traditional machine learning solutions available. And because deep learning is agnostic to file types, it can be applied to any file format, and even to any operating system, without requiring modifications or adaptations. Compare that to traditional machine learning, where each effort pretty much has to start from scratch, and you can see one more reason why deep learning is so powerful.
In addition to determining whether a file is malicious or not, deep learning can be used to identify what type of malware it is (for example, ransomware or Trojan). Recently my co-authors and I presented a paper at the International Conference on Artificial Neural Networks demonstrating how deep learning can even detect which nation-state is behind an attack (for example, China or Russia).
Read more about Deep Learning