Anomalous (adj.) “deviating from a general rule,” 1640s, from Late Latin anomalous, from Greek anomalous “uneven, irregular,” from an- “not”.
What if you could tell if your car’s engine was behaving differently than usual and get it to the shop long before a serious engine problem would cost you lots of money? What if a farmer could spot an unwell animal and care for it long before the animal needs a vet? What if a company could spot unusual reaction times in its online store and mobilise additional resources from the cloud long before the customers even notice?
And what if we could use machines to do that work for us?
The field of machine learning that is concerned with finding behaviour that differs from the norm is called anomaly detection. In a world that grows more complex and entangled every day, detecting and evaluating divergent behaviour in data leverages the power of computational insight to give you the knowledge to react and make time-critical decisions and interventions.
The first part of this primer will explore the definitions of anomalies and the resulting challenges in their detection. The second part will show how current machine learning approaches answer these challenges.
How do we categorise anomalies?
We all know of moments that were different from what we expected – a chilly day in summer, a bitter coffee from our favourite barista, a long wait to get through an intersection on our way to work. But in a complex world, anomalies do not consist of just a single outlier deviating from a clear trend. To do rigorous data science on anomalies, we have to agree on a definition. With respect to large, modern data sets, a good definition is:
- Anomalies or outliers present different features from the norm of the dataset – they are much larger/smaller/longer/shorter/brighter/dimmer etc than other data points
- Anomalies are rare in a dataset – a dataset can have great variance with many points being different from a calculated average, outliers have to be outside this range
An interesting take on the idea of anomalies was raised by Guha et. al. who pointed out that anomalies have a behavioural consequence. Anomalies can make it harder to describe the data – if the anomaly is a red ball in a bag of blues, we have to spend time finding their colour.
It is important to recognise that while noise may vary across systems, noise is normal. To identify something as noise is not the same as identifying an anomaly. At the same time, noisy environments can mask anomalies and make detection harder.
In data with multiple dimensions, anomalies can be separated into 3 categories:
- Point anomalies – The standard outlier. A single data point far off, such as the IP address of all logins being in Melbourne and one being in Kopenhagen.
- Contextual anomalies – These are anomalies in certain contexts of the data set, but not in others. The anomaly recognition is guided by two attributes, the contextual attribute and the behavioural attribute. The context is given by the neighbourhood of the data, such as the time in a time series, the behaviour is the non-contextual element of the data point. A congested road could be common during rush hour traffic but would be uncommon at another time of day.
- Collective anomalies – A collection of data points that is anomalous for the entire dataset, but not on its own. We distinguish two structures here, the ordered and unordered anomalies. An ordered collective anomaly would be hearing the chorus of a song where the verse should be. By itself, the chorus is an ordered collection of data points but taking the place of another expected collection of data points, it shows up as an anomaly. An unordered collective anomaly could consist of many small and large credit card purchases in an unusual location.
The challenge of defining normal
What does “business as usual” mean? In an abstract sense, this is easy to formulate, but in the real world, the question becomes much harder. The more one drills down into the answer, the harder it becomes to be precise. We can’t always find a hard line between normal and anomalous. Worse, a seemingly anomalous data point could be caused by normal behaviour while a seemingly normal data point could have been caused by anomalous behaviour due to the variability of the processes causing them.
Too-well-defined normal behaviour can even be harmful if its definition is exploited as a guide for malicious actors to adapt and disguise themselves in seemingly normal behaviour.
Furthermore, anomalous means something different in different environments. Some environments are naturally fast-paced and have large fluctuations, such as the stock market. Temperature changes or particle concentrations that are perfectly acceptable in consumer environments may not be tolerable in a production line or development lab. Depending on the circumstances, the rapidity and scale of the fluctuation may make detection of anomalies difficult because of the inherent noise in the system.
These fluctuations can also be periodic or “seasonal”. Traffic during rush hour is not representative of the average flow of traffic but is normal at certain times of the day on certain days of the week. Beyond all that, in our complex and evolving world, the state of “normal” also constantly evolves. What was normal yesterday – such as having a phone with a cord into the wall – might be the anomaly of today.
All of these factors make it especially difficult to label data to train a system to recognise and determine anomalies. However, despite these challenges, there have been a wide variety of applications for anomaly detection in fields ranging from astronomy to industrial machine control.
Normal is hard: The cluster C3 is a small collection of data points outside of the larger groups C1 and C2. Is it a rare, but normal occurrence or a collection of anomalous behaviours? – credit: Goldstein et. al. 2016
Who can benefit from this?
Machine learning can be of great benefit for anomaly detection in areas that are too data intensive for human intervention, and beyond that even enhance already existing human methods.
One of the largest areas of interest for anomaly detection is the prevention of fraud, especially with more and more transactions going online. Credit card and mobile phone fraud are areas of such vast data quantities that evaluating the transactions by humans is no longer feasible in real time. A similar data-intensive area where anomaly detection is used to prevent or uncover criminal activity is the evaluation of insurance claims and the detection of insider trading on stock markets. Digital records allow for the comparison of data points that would otherwise have been hardly accessible by a single evaluating body. The anomaly detection algorithm would report a set of suspicious entries to an external agent to investigate further.
Anomaly detection is also of great interest to industrial machinery and sensor applications. Not all machinery can signal its status directly, structural components can be scanned for integrity and flag a device or site unsafe for use. Structural integrity is especially well suited for anomaly detection as changes over time can be subtle and are hard to predict or classify as a particular individual status of a system. Classifying it instead as any deviation from a learned norm, the algorithm will detect strain or damage and alert the appropriate agent to follow up and repair or replace the damaged elements.
Perhaps the most unexpected use of anomaly detection is the use of image recognition technology to identify man-made items in natural environments to assist search and rescue missions. The camera of a helicopter or drone is linked to an algorithm that can process more data faster than a human observer could and direct the rescue teams to the person in need. This example also shows that we are only limited by our own ingenuity when applying machine learning. There are many undiscovered areas that can be improved by machine learning, we only have to find out where to apply the level of computational power.
In the next part of this primer, we will explore which particular algorithms in machine learning are used for anomaly detection and how to decide which algorithms fits which kind of problem and what your data should look like to use that algorithm.