World’s most popular mysterious cases solved with the help of big data

This blog is co-authored by Pranav Kumar, Balakarthiga, Janani

They say that the ancient civilizations would have survived if only they had the knowledge of big data and the analytics to use it. Luckily, our generation has found ways to harness the immense potential of big data that can in fact provide answers to the world’s most popular mysteries.
Let us take a look at few cases.

Who will die next in Game of Thrones?

Game of Thrones, with its fantasy theme and plot ambiguity, has made millions of fans worldwide who keep lamenting about the dead and speculating about the next victim.

In 2016, researchers at the Technical University of Munich predicted that King Tommen Baratheon has 97% chances of being killed which came true in the Season 6 finale. They had extracted 30 different features (gender, title, age, house, whether mother/father/heir is alive, the number of dead characters related to a character, popularity score, etc.) of 2000 characters from wiki and compared them with the fact whether they are dead or alive.

The researchers used John Platt’s sequential minimal optimization algorithm in WEKA environment to train a supervised learning model with associated learning algorithms that analyzed the features for classifying character as dead or alive.

A new research by Milan Janosov from the Central European University has published a new model to predict the death of characters in season 7. It created a network of the social system of the realm by studying the dialogues in the scenes of the show. In these networks, each of the nodes represents a character, and the weight of the link between nodes symbolized the strength of their social interaction. The algorithms then calculated the importance of the nodes based on the number of contacts a person has, the sum of link weights, how often pairs of contacts of the node are in contact themselves, and the centrality of nodes.

Using a similar model as mentioned above the algorithms predicted that Daenerys Targaryen has almost 95 % chances of meeting her end in this season. Will it come true? Must wait and watch.

How I learned to love  AI

With advancements in big data analytics, AIs are now showing promising qualities to process and replicate human decision-making skills, thoughts, and emotions, without human intervention! In fact, Google’s Deepmind is beating world’s top Go players with its self-taught approach using neural network based algorithm.

In 2016, the chatbot company Luka launched an app called ‘Replika’ which learns and grows through conversations with its users to create a digital persona of them. According to the creators, ‘The more you speak with Replika, the more it shares with you. It will tell you about your personality, will answer questions about you, and at some point, will be able to talk to your friends on your behalf’. Replika came into being when Eugenia Kyuda, one of its co-founder, used chat history with a deceased friend to create an AI which could reply like him. The AI was quite a hit with family and friends, and Eugenia made it into an app to benefit others. A chat with the AI can be found here.

Customizing products in an era of mass production

Netflix, the rock star of video streaming bought the rights to stream ‘House of Cards’ for almost $100 million for the first two seasons. With a subscriber base of nearly 100 million, Netflix is a small data empire in itself. Its algorithms decode patterns of individual viewer’s behavior – which genre she prefers, which one she dislikes, which movies or series she watches completely, where does she rewinds, pauses, forwards etc.

With its analytics, Netflix stumbled upon a pattern – viewers loved the 1990 BBC political drama ‘House of Cards’, these subscribers also loved watching Kevin Spacey movies or movies from the director David Fincher. When Media Rights Capital bought rights to remake the British drama with David Fincher as director, Netflix immediately saw the immense potential the series had and outbid other production houses to re-make it with Kevin Spacey in the lead. Netflix added icing on the cake by coming up with ten different trailers for the series – one for each category of its viewer behavior. The rest, as they say, is history. Netflix added more than 3 million subscribers, just on the back of this one show while retaining almost 85% of its old customers.

Needless to say, content is the ultimate king, but getting the analytics right can be the foundation on which organizations can build profitable and customized products.

Managing election campaigns

Gone are the days when political campaigns relied on sample data collected from door to door; the current scenario presents them with information about the entire population and not just samples.

‘Micro-targeting’ agencies use analytics on publicly available voter data (registration database, public records, demographic statistics etc.) to understand the psyche of the voters. The campaigns then design messages which are most appealing to a targeted voter base – youngsters, women, middle aged workers, students with loans etc., thus making them feel “connected and heard”. Another benefit from big data analytics is the lowered cost to reach every voter. For example, the 2012 ‘Obama for America’ campaign used TV viewership data available from firms such as Nielsen to reach targeted audiences at lower costs.
Back home in India, the Prime Minister launched the Narendra Modi app, which consistently asks users to rate the government policies and even to put forward their suggestions. These data help the government in deciding future policies as well as to set a precursor to the next elections.

Predicting earthquakes

Until recently, predicting earthquakes accurately and on time was considered impossible. Over the last few years, companies like Terra Seismic and GeoCosmo are trying to solve this puzzle.

According to Friedemann Freund, Nasa Scientist and GeoCosmo Chairman, rocks contain dormant electronic charge carriers which can be activated by tectonic movements. The electronic charge carriers that are activated in the rocks aka “Positive Holes” start flowing weeks before an earthquake and travel fast through many kilometers of rock.

On reaching the surface of the earth, positive hole currents produce a number of signals like ultra-low frequency electromagnetic waves, air ionization, total electron content anomalies, thermal infrared anomalies, ozone formation, carbon monoxide release, ground potential changes, and groundwater chemistry changes, that GeoCosmo uses to forecast earthquakes. GeoCosmo claims to have predicted 20 earthquakes in North and South America since 2012.

The Planarian regeneration mystery

Researchers have been struggling for more than a century to understand the regenerative capabilities of ‘planarian’ (tiny flatworms that live in freshwater and marine environments). Cut open a single one in hundreds of pieces and each will grow back into individual planarian!

Now, with the help of big data analytics, researchers from Tuft University are a step further into solving the mystery. They assembled a system which had an algorithm to crunch through the big data generated by published results in planarian regeneration, an in silico simulator in which the patterning properties of any regulatory network (the genetic expression produced by RNA and protein interactions) could be tested and a machine learning module. The system reverse-engineered the process and found two new proteins crucial to the process. This is the first time an Artificial Intelligence has generated its own theory without any help from human beings and it holds huge potential in organ regeneration for humans.

Predicting and fighting crimes

The speed with which the FBI nabbed the Boston Marathon bombers surprised many. It crowd-sourced photos and videos from people present in the proximity of the area and used analytics to comb through them. Once the suspects were identified from a store surveillance video, cross-referencing these with facial recognition software helped to locate their movement before and after the bombings. They were eventually caught within a week.

Big data analytics is not only helping fight crimes; it is also helping in preventing them. The Los Angeles Police is using its 13 million crime data over 80 years of time to decode patterns of crime. The algorithm used by them is same as the one used to predict aftershocks during Earthquakes. They found that crimes have a pattern similar to aftershocks – once a major crime is committed in an area, there is a high propensity of crime rates surging in nearby areas. Once crime hotspots were identified, police officers were assigned 500 square feet area to patrol and look for precursors to criminal activities. The algorithm proved to be quite a success at decoding crime patterns and reportedly resulted in 33% reduction in burglaries, 21% reduction in violent crimes and 12% reduction in property crime.

The perfect swing

Golf is quite different from other sports – a referee might not always be there on the field, one can play it stand alone or as a part of a team, every golf course is different (equivalent in the cricketing world would be varying length of the pitch, which is unimaginable) and other such nuances. It is thus very difficult to define the perfect swing.

However, big data analytics is changing that. A company called GolfTec analysed almost 90 million golf swings from 13,000 players in 48 different body motion and have come with a set of body positions that most highly correlate with skill levels ranging from a professional golfer to 30 handicaps (a 5 handicap is better than a 10 handicap, who in turn is better than a 25 handicap and so on). By analyzing a golfer’s body position, the algorithm can suggest techniques to achieve better swings. More details with videos can be found here.

Similarly Microsoft’s Arccos Caddie app is helping golfers make better decisions. It combines data from weather conditions like forecasted wind speed, precipitation, temperature etc. with patterns decoded from millions of shots to suggest the best strategies to the golfer, without the scope for human error.

Project Artemis

Premature babies are at high risks of contracting infections since their immune system is not fully developed. Doctors are often baffled by the lack of symptoms in such infants until it is too late. Almost 25% of premature babies develop infections and 10-20% of these cases lead to death.

A life altering insight in 2009 by Carolyn McGregor, the research chair of health informatics at the University of Ontario’s Institute of Technology, gave birth to Project Artemis, which changed the way Healthcare used big data to prevent premature deaths. Since premature babies are kept in hospital care for weeks to facilitate normal growth, they generate a huge amount of data. Project Artemis is advanced stream processing application which collects this data real time from monitoring devices to analyze heart rate, respiration, blood oxygen levels, blood pressure etc. and detects patterns as to when a baby might be contracting infections. It was found that in premature babies when infections start developing, the heart rate becomes unusually normal. When this result was coupled with few other indicators, and Project Artemis gives doctors up to 24 hours of lead time to work on the infection.

Fate of life on Earth

For the first time, a team of researchers from Microsoft and the United Nations has built a model called Madingley Model, which can simulate all the ecosystems of Earth. The model has helped scientists in zeroing down five key biological processes necessary in all ecosystems to sustain life – eating, metabolism, reproduction, dispersal, and death with mathematical proofs. It also gave some interesting insights – marine ecosystems showed increasing biomass for all three heterotroph types (carnivores, omnivores, and herbivores), but highly variable biomass of autotrophs (like algae, phytoplankton etc.) while in the most productive terrestrial ecosystems, carnivores typically had higher biomass densities than omnivores, but they were almost completely absent in low productivity desert ecosystem.

The model can also predict how Earth will look like in case certain ecosystems are disturbed (by man-made or natural causes), in the absence of certain species or with the presence of certain extinct species, whether we can substitute one ecosystem for another, reasons for the collapse of ecosystems etc.

Hope you found these interesting and informative. If you have similar mysteries that was solved with the help of big data, let us know in the comments below!