What exactly is Deep Learning? A beginner's guide to exploring the field.
HEADS UP
This is just an introductory article and hence no full-blown project will be built.
The goal of this article is to get you, the reader acquainted with the field and to make you aware of just what you need to succeed.
- Keep an open mind, get your pen and let's get this ball rolling.
- I would assume you are familiar with Machine Learning just enough to know what supervised and unsupervised learning is; you also know what it means to train a model and test it.
Let's dive right in...
I wish my phone could automatically send messages on its own in the way I would want it, it's stressful to reply to people.
Been there, trust me. But, what if I told you it's possible! In fact, many of the movie-like magic scenes are turning into reality day by day and in some years' time, the world would become something entirely different than it currently is. I must ask though, at what end of the change do you want to be? At the end of the changemakers or the end of the mere consumers? It's not bad to be a consumer, don't get me wrong, but it's also not all that fun.
At the heart of the above is the cutting-edge technology known as Deep Learning.
Any Maths inclined student can immediately see that Deep Learning is a subset of Machine Learning which is in turn a subset of Artificial Intelligence.
IBM defines deep learning as:
Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers. These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to “learn” from large amounts of data.
Now, yeah I know what you're thinking, "what the hell is a neural network?", "How can a piece of metal and semiconductors with a bit of juice (electricity) learn?" Hope you got the inside joke though 🙃. All these would be demystified soon enough!
To begin with, Machine Learning is famously defined as the ability of a computer to perform tasks without being explicitly programmed. Since Deep Learning is a child of the parent(Machine Learning), so it has therefore inherited the properties of what Machine learning is, so we can partly define what Deep Learning means.
To the programmers reading this, imagine you were tasked to build a product that can classify between a cat and dog(one of the classics starter deep learning projects), how would you go about it? Of course, you would immediately want to curse the task giver as the task seem very daunting! Imagine the number of classes or functions you would have to write, don't get me started on conditional statements 😩. Let's explore some of this below for the fun of it, shall we?
if whiskers:
catScore += 0.05
dogScore -= 0.05
if wagging tail:
catScore -= 0.05
dogScore -= 0.05
There are better ways to go about this though, just wanted to make my point clear. Hope it is now though! The term Neural Network which we saw earlier is the crux of deep learning and it was inspired by how our brain works. In your head right now as you're reading this, millions and millions of neurons are firing to help you understand each word. Do not forget how the optic nerve first transfers a signal to the brain, and it's left for the brain to make sense of it. I would forever be astonished by the speed involved though. These processes were first carefully mimicked on the computer in the year 1943 by Walter Pitts and Warren McCulloch and it has grown ever since!
The above is an image of a perceptron, a.k.a a single-layered neural network. It is a simplified neural network! It would amaze you that the SIRI or Alexa or Google Assistant you rely on so much can be represented by such a simplified image above. It could be as simple as the above or just like the one below:
Someone might be thinking, when they said I was drawing "JAGAJAGA" (a Nigerian term for referring to rubbish) when I was little, I was actually drawing the future. Yes, you were! and you should be proud.
Remember the definition that goes:
A computer is an electronic device that takes in input as data, processes it, and brings it out as information.
Clap for yourself for saying it along! Anyways, we could use the same framework to define a neural network, it's simply a bunch of neurons that takes in data and find patterns (process it) and help make predictions, classify objects, create voice, and many more (brings it out as information). I hope you get the gist!
Terms associated with Neural Network
- Input Layer: These are just data that is brought in by us humans! It could be analytical data, texts, videos, GIFs, or anything at all. Hold on a minute! Video, images, texts?? I thought the computer works in 1 or 0's or 1 and 0's (quantum state- qubit). Yeah, yeah. You're right! A neural network can't do all that. Bazinga (BBT fans)! It actually can. At the end of the day, images are just vectors! A vector is just a collection of numbers! Images are first converted into vectors by using a popular python library called Numpy. You can have it easily! First import Numpy! There's one more library called pillow which is an image processing library.
!pip install numpy
!pip install pillow
import numpy as np
from PIL import Image
OH! I made an important assumption that you already have the Jupyter notebook installed and if you don't kindly install Anaconda as it comes with it. As an outro, I would go over the full requirements so don't worry just. One more thing, there's no problem statement yet, as this is just an attempt to get you acquainted. For now, let's just use the below task prompt to formulate a problem.
Where's my fish!? Jibola lamented as he woke up; His fish has continually disappeared for three days in a row and he doesn't know how that happened. Jibola has a dog and a neighbor just next door who has a cat that always comes to pester him. These are Jibola's two prime suspects! As being a Sherlock Holmes lover and a technology expert, Jibola has contacted you to solve this mystery!
Manner of Approach Hmm! I would advise him to get a spycam. Then on the spycam, I would interface it with a Raspberry Pi and use computer vision techniques to solve the problem. But how? How did computer vision even come in! Well, well, another vocabulary in your bag. So computer vision is a fancy way of saying the ability of a computer to be able to recognize images using neural networks and image processing techniques. So the goal of the spycam is to get a video feed of what is happening where the fish is. Now what we care about is image processing and computer vision and how to relate to the topic at hand. We train our neural network beforehand to be able to classify between a cat and a dog. How can we do this??
Don't forget we are still in the input layer phase! So all we need here to train our model is lots and lots of images of cats and dogs. So we google search, "images of cats", "images of dogs". We can then scrape thousands of images of each.
NB: This is a supervised learning problem, hence we would have to label some images with what they actually are. It is also a binary classification problem: binary in the sense that it is made to classify one of two things - cat or dog.
So let's say we have 1000 cat and dog images, we can have 700 cat and dog images in the train set and 300 in the test set. Let's leave this part for now as that isn't the main aim of the problem.
So now like I said earlier images are just an array of numbers. Let me explain: I am sure you might have heard RGB before either as a front-end developer or even as a primary school student! RGB is simply, Red, Green, and blue! So remember how they told you that there's a class of color called primary color and some secondary? Well, yeah. So every image is made up of Red, Green, and Blue combined in some way to form an intermediate color.
# RGB can be represented as a tuple, where the first entry is Red, second green and third blue
redColor = (255, 0, 0)
greenColor = (0, 255, 0)
blueColor = (0, 0, 255)
As we can see from above the maximum amount of number is 255. The value of each pixel ranges from 0 - 255, which implies that there is an 8bits allocation of space for each pixel of color -> 2^8. So let's say we have an image that is 64 * 64 resolution.
Upclose to that image would be what we have above, most likely not the same color, but the same box-like representation. There would be 64 64 pixels for say red. Now since each box is made up of a combination of RGB, then there would be 64 64 * 3 pixels in total for an image of that resolution which is 12288 pixels! Well, it looks like a lot but then it isn't if you want a very quality image, like 1080p, now do the maths and see how many pixels are used for that very clear video.
Traditionally, Input layers or input data are represented with x. The target label: what we want to classify, cat or dog is usually represented as Y.
Since it is a binary classification problem: Y ∈ {0, 1} where 0 represents Cat and 1 represents dog; based on your preference. Using an image from Deeplearning.ai's introduction to Neural Network's course:
The image we had that was 64 * 64 has been changed into a vector of numbers.
Now's let check the code part out. Let's turn the image below into a vector. The image has a 400 * 600 resolution. The file name is dog.jpg.
# How to convert image to input vectors
import matplotlib.pyplot as plt
image = np.array(Image.open('dog.jpg'))
plt.imshow(image)
image.shape
As expected! The is just showing us the three layers of RGB.
image
The image is now a long array! But then it hasn't been reshaped to resemble a column vector just as we saw earlier. First, I would first convert the image to have a resolution of 64 * 64 and then reshape it.
image = np.array(Image.open('dog.jpg').resize((64, 64)))
image2 = image.reshape(64 * 64 * 3, 1)
image2
Job done! Now we have a flattened input layer! Note this is just one image that we converted, peradventure we had like 700 training images, then the shape of our initial matrix would have been (700, 64, 64, 3).
- Hidden Layer: The strength of the neural network is often from this place, so be sure to read it carefully. Note the first image of the neural network (the perceptron) had no hidden layers but the next one did. There is something called weights! So each node(circle) has a weight associated with it. Note that one image represents one circle. So in the case above with 700 training images, that would be 700 circles. Each of which has separate weights. The hidden layer finds patterns in the data and basically learns it, stores it as weights, and uses it to make predictions. There are usually tons of hidden layers!
I would summarise the next steps below:
- Multiply each weight with input feature x, and add what is called a bias. This can be represented by the equation:
def sigmoid(s):
return (1 / (1 + e^-s))
y_hat = sigmoid(np.dot(w.T, X) + b)
- The sigmoid above is called an activation function and it basically introduces non-linearities in the network.
Cost function and backward propagation: This process is basically testing how well your neural network performed and updating your weights to do a better job next time.
Output Layer : This layer is basically what is above, the y_hat which is the predicted value of the neural network, and y is the actual value.
Prerequisite To Learn Deep Learning
- Of course, a laptop.
- I would advise Python, but you can try languages like Julia, C although I've never used them before.
- An IDE of your choice, I use Jupyter Notebook though.
- Basic Linear Algebra understanding
- Basic Understanding of derivatives and Calculus
- Basic Understanding of Machine Learning
Conclusion: There is still a whole lot to talk about, especially if we want a complete neural network. An in-depth understanding of the cost function and backward propagation is important, which is missing in this article. I plan on writing part II as soon as I can. But for now, if you want to learn more, I recommend DeepLearning.ai's courses on Coursera or Intro to deep learning by MIT.