As Dataminr’s Chief Scientist and SVP of AI, I am excited to use this blog as a new forum to highlight some of Dataminr’s work in AI for Good. While engineers at many tech companies get to work on hard AI challenges, few tech companies put the results of this work to use in life-saving ways that are deployed at scale across the world. At Dataminr, my team and I get to work every day on accelerating emergency response — and this is one of the things that inspires us the most.
Last week, I was fortunate to have been invited to speak at a workshop at the University of Pennsylvania on Multi-Modal Understanding and Summarization of Critical Events for Emergency Response. The audience included a mix of academic and AI industry experts and leaders in several fields of research. I was asked to address specific technical AI topics, the most exciting new applications of AI at Dataminr, and generally talk about my opinions on trends and challenges in AI.
In this blog post, I’d like to cover some of the technical topics I discussed at this workshop and dive into an example of how Dataminr is approaching multi-modal understanding and summarization to create alerts on critical events for emergency response.
Across Dataminr’s private and public sector client base, detecting aircraft-related signals represents one of the most common use cases for Dataminr. The alerts Dataminr delivers to clients on incidents involving aircraft have proven critical for first response, ranging from clients relaying stories of rescuing pilots and passengers after a crash, to detecting in-air security incidents, to even learning of potential hijackings in motion.
I’d like to start with a real-life scenario to illustrate the technical challenges we’re tackling and how using new alternative public data sources, at scale, is helping us identify a broader set of critical incidents in real time for emergency response.
Imagine an evening flight losing several thousand feet of altitude in a relatively short period of time, deviating from its expected flight plan, in an area that is sparsely populated or not populated at all. After the loss of altitude, the plane stops emitting location signals.
This scenario suggests that the plane might have gone down. There might be survivors; and, if so, getting help quickly can save lives. But how could have this been detected? Unlike in a populated region of the world, there are no eyewitnesses and no social media posts of a loud bang from the ground, no posting of images of smoke in social media as the fire is seen in the distance after the plane crash. How can AI tackle situations like these, which are all too common in crisis situations (i.e. there is limited to no information in sparsely or non-populated places).
As anyone who follows the AI industry knows, recent advances in AI have been fueled by the rise of Deep Learning, coupled with the convergence of more computing power and the availability of massive new public datasets.
At Dataminr, we have built an AI platform that is capable of processing billions of public data units every day to determine what’s happening in the world, and detect, in real time, emergency events that require a response from our clients in order to save lives. Dataminr’s platform has delivered alerts from more than 10,000 different public data sources, ranging from global and regional social media platforms, blogs and web forums, local media outlets, and many others.
Most recently, we’ve been integrating sensor data into our platform. That information is not published by a person (on social media, blogs, etc.). Instead, public sensor data streams come directly from machines such as an on-the-ground earthquake detector or from transponder signals streaming from moving objects such as cars, ships, or airplanes. This type of public data is considered to be part of the Internet of Things (IoT), which according to the National Science Foundation is on track to connect 50 billion “smart” things in 2020 and 1 trillion sensors soon after.
One of the strengths of the Dataminr platform is reflected in our ability to make the best use of the unique attributes of any given public dataset that we’ve integrated. As we’ve grown the sources, volumes, and data types we use, my team has discovered ways to use new datasets to detect events that were previously impossible to discover in real time, while also augmenting our broader detection capabilities to generate more valuable signals and better alerts for our clients.
For example, one of our newest data sources consists of airplane transponder signals, called ADS-B Signals. These signals stream from the cockpit of planes as they fly through the sky beaming down to air traffic control towers and ADS-B receivers along the route. At any given moment, there are an estimated 8,000 to 20,000 planes in the air across the globe, providing real-time time-series data for about 150,000 flights within a 24 hour period, at a rate of hundreds of data points every second. A January 2020 U.S. mandate requires that all aircraft operating in a controlled airspace adopt the ADS-B standard to broadcast their real-time position, identification, altitude, and vertical and horizontal speed. Today, the Dataminr platform consumes tens of millions of data points per day produced by aircraft and aircraft operators worldwide. These inputs include data from radar, ground and space-based ADS-B receiving stations (latitude and longitude, airplane identification, vertical and horizontal speed) and International Civil Aviation Organization (ICAO) flight plans containing origination, route, planned destination, etc.
Dataminr’s platform uses AI to detect anomalies in the time-series data ranging from a plane that diverts from its scheduled route, to a plane exhibiting sharp and unexpected shifts in altitude or course. The volumes of raw data, sparsity of data in some situations, and complexity pose fascinating problems for AI. To answer these challenges, our work in ADS-B signals includes AI methods that range from statistical event modeling, to Deep Learning and Machine Learning, to techniques for Anomaly Detection. In particular, we use a combination of neural network-based methods (a Deep Learning Recurrent Neural Network (LSTM) sequential model for early detection of incidents in flights), and trajectory-based techniques in Supervised and Unsupervised Machine Learning approaches.
However, detecting the signal is only part of creating a valuable alert for our customers. The last step in the process is to use Natural Language Generation techniques to generate a textual description that summarizes the event as an alert. This provides a customer with an easily readable and understandable summary of the anomaly. This description is written automatically by Dataminr’s AI platform in a matter of milli-seconds — leveraging the known features extracted from the signal and converting these features into text phrases that are stitched together into a cohesive human-understandable sentence.
After an event is identified in this manner, and augmented with a text summary to make it immediately actionable by the customer, an alert is automatically created. But, which of Dataminr’s end users need to receive this alert? And, for those that do, at what criticality level should the alert be delivered? And, in what work-flow integrated delivery channel should the alert be sent? This is the last challenge that AI helps us solve.
The alert needs to be sent to the specific set of Dataminr users who may be impacted or could be in a position to respond to the emergency, and it needs to be sent via a delivery method that can reach them. Dataminr’s real-time routing is based on a combination of a user’s individual previously configured scenarios, and Dataminr’s proprietary dynamic routing algorithms. Our users are often in different places: some are at their desk at work; others at home; and others somewhere in between or even on a plane. Dataminr’s alert routing algorithms take into account dynamic factors like the GPS of a Dataminr Mobile App user (if/when a Dataminr user opts in to share that data with us), and proprietary client meta-data such as a customer’s physical assets, headquarters/offices, and traveling employees. These factors determine which user gets which alerts, what the criticality level of the alert is, and whether the alert is delivered via delivery methods that can range from a mobile push alert, to an email alert, or via an API to a proprietary internal interface.
Dataminr’s work on ADS-B signals is an amazing example of the power of AI to transform massive machine data streams into a very small set of highly valuable and customer-readable alerts, and delivering those alerts in a hyper-targeted manner, straight into the workflow of the Dataminr end-users and clients that need them — all done in less than five seconds from signal detection to client delivery.
Dataminr’s most recent AI research is taking our multi-modal approach one step further and synthesizing our work in anomaly detection on machine data streams with our long-standing expertise in Natural Language Processing (NLP) and our expanding work in Computer Vision. Detecting the event described at the start of the blog — a possible plane crash — can often be optimized by processing and cross-correlating different types of data: the ADS-B signals emanating from the plane combined with the text, images, and videos of eyewitnesses on social media, blogs, local sources, combined with streaming audio signals. We’ve been most recently working on multi-modal Deep Learning approaches to detect events from joint combinations of text, visual, sound and machine-generated information — synthesizing NLP, Computer Vision, audio processing and classification, and machine data stream anomaly detection into more multi-faceted multi-variable event detection. We will publish a blog post on that in the future.
Want to work on awesome AI projects like this that make a highly positive impact in the world? Our AI and Engineering team at Dataminr is rapidly expanding! Browse our job posts here to see if a role is right for you.