Technology

Machine learning explained: what is a neural network and why do we use them?

Calipsa 19 June 2020
  • Technology

In the previous instalment of “Machine Learning Explained”, we covered the significance of true and false positives, and true and false negatives. In video analytics, these four categories enable us to see whether our machine learning model has successfully identified human activity, or not. 

So how does Calipsa’s False Alarm Filtering Platform reach these four possible conclusions? The answer lies in neural networks - the “learning” part of machine learning. 

Want to discover even more about machine learning? Download our latest ebook, Machine learning explained: The 3 essentials for video analytics. 

Download ebook

What is a neural network?

Neural networks are a set of algorithms whose purpose is to find patterns in vast amounts of data. With advances in data science and hardware, it is now possible to analyse any kind of data with the help of neural networks, such as speech, temperature readings, or in our case, images. 

No matter what type of information is inputted, it first has to be converted into numerical data for the network to interpret it. In the case of video analytics, the conversion is usually done by sensors built in to security cameras, which turns the image it has captured into a series of numbers. 

Once it has identified patterns in the numerical translation, it delivers an output, which is its interpretation of the data we fed in. Ideally, the neural network will have correctly recognised patterns much quicker than a human could.

Neural networks offer us other advantages, too: unlike people, the network never gets tired or loses concentration, which removes the risk of human error. It can also make consistent predictions about new data based on information it has processed before.  

Neural network diagram, source: https://i.redd.it/q3n8ip6bqrq11.png
Source: i.redd.it/q3n8ip6bqrq11.png

Calipsa’s video analytics platform uses a subset of artificial intelligence known as deep learning, or “stacked” neural networks. In a stacked neural network, information is processed by many layers of computations known as nodes; a node is where a single computation takes place. 

While data will pass through all the network layers, each of the nodes is weighted, so it has a greater or lesser impact on the final output depending on its significance. We’ll go into more detail on the link between deep learning and video analytics later. 

How do neural networks learn?

There are two ways a neural network can learn from the data it is given: supervised learning and unsupervised learning. In unsupervised learning, a neural network finds similarities in data without any direction from a human. This process of finding similarities is called clustering - in other words, the network will independently identify “clusters” of similarities as a pattern. 

In supervised learning, the neural network is assisted by a human, who will add labels to a dataset. By adding labels, we are teaching the network that there is a correlation between our labels and the data, which creates the pattern it has to look for. 

When the machine recognises the pattern we have taught it, it classifies the data. Classification is different from clustering, because the network has been directed to find certain patterns and ignore others. 

How does this apply to video analytics?

In video analytics, we are teaching a neural network to identify patterns in images. Specifically, we want it to learn what human activity looks like (i.e. people and vehicles). This is the pattern we want the neural network to discover in security camera alarms. 

However, we don’t want the network to learn what an exact person or an exact vehicle looks like - we want it to learn a generalised concept that it can recognise in any data it’s given. This is much harder than it sounds, because we comprehend images in an entirely different way to a computer.

To a neural network, an image has 3 dimensions. This is because digital images are made up of red, blue and green (RGB) encoding to create all the colours we can see. So, as well as the width and height, a neural network also sees the red, blue and green colours as an additional dimension that makes the image 3 layers deep. These image layers are called channels

All the while, the network is looking for patterns in this 3-dimensional object. So actually, when we ask the network to identify a person in an image, it doesn’t look for a flat shape; it looks for a volume that conforms to the pattern it has been taught.

Convolutional neural network diagramThis multi-layered image passes through a multi-layered network, which means it is scanned and sampled in thousands of ways. At first, the image will look a lot like something we would recognise, as the network scans it for features like edges and large shapes. 

Deeper layers contain nodes that respond to more granular components of the image. As it moves through the network, the image is increasingly interpreted in ways that are too abstract for us to understand. Once the image has passed through the entire network, the computer has enough information to classify it as conforming to the pattern, or not. 

In Calipsa’s case, our neural networks have learned the generalised concept of a person, and of vehicles, so that it can process millions of alarms to quickly identify which are real and which are nuisance alarms.


Want to discover even more about machine learning? Download our latest ebook, Machine learning explained: The 3 essentials for video analytics. 

Download ebook

No comments