The emergence of Big Data has meant that everything we do online leaves digital traces. “Big data” is fairly new. It’s huge and it’s scary – very scary. This revolutionary approach to data-driven communications is said to have played an integral part in Brexit “Leave” campaign and U.S. president Donald Trump’s extraordinary win.
What is “Big Data”?
Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them.
The challenges of holding such large sets of data include
Analysis of data sets can help scientists, business executives, medical practitioners, advertising and governmental agencies to
find new correlations between variables,
spot business trends,
research new markets or
The term “Big Data” often refers to the use of predictive user behaviour analytics, or other advanced data analytics methods that extract value from data.
Every purchase with a bank card, every Google search you make, every move you take with a mobile phone in your pocket, every “Like” on Facebook gets stored.
Especially, every “Like.”
For a while, it was unclear what any of this data would be good for, other than showing targeted advertising to social network users and website visitors. Nevertheless, it was not entirely clear whether “Big Data” would turn out to be a blessing or a curse to humanity.
Blessing or Curse?
Since November 2016, data analysts know that answer.
It all began in 2014 at the Psychometrics Centre, located at the University of Cambridge.
Psychometrics is a scientific attempt to quantify human personality. In the 1980s, two groups of psychologists were able to demonstrate that the character profile of a person can be measured and expressed in five dimensions, the Big Five:
Openness – How open are you to new experiences?
Conscientiousness – How much of a perfectionist are you?
Extroversion – How sociable are you?
Agreeableness – How considerate and cooperative are you? and
Neuroticism – How sensitive or vulnerable are you?
This so-called OCEAN Method became the standard approach. Using these five dimensions, it is possible to determine fairly precisely what kind of person you are dealing with. You can infer their needs and fears, as well as predict how they are likely to behave.
For a long time, however, the problem was data collection, because to produce such a character profile meant asking subjects to fill out a complicated survey asking quite personal questions.
Then, came the World Wide Web, and Facebook.
And along came Michal Kosinski.
Facebook and MyPersonality
2008, Kosinski was chosen to do doctoral work at the Psychometrics Centre, one of the oldest institutions of its kind worldwide. There, he met fellow student David Stillwell, and the pair started to work on a little-known Facebook application.
With the MyPersonality app, a user could fill out psychometric questionnaires and receive a rating, or a “Personality Profile”. The test was designed to provide scores for the Big Five indicators of the OCEAN Method.
As part of the study, the users also allowed their Facebook Likes to be analysed.
Instead of the couple of dozen college friends that Kosinski had expected to be participating in the experiment, thousands, then millions, of people began bared their souls. Very soon, the two doctoral students got access to the largest set of psychological data ever produced at the time.
The analysis revealed which Likes equated with higher levels of certain personality traits.
The software was then able to predict their personality accurately.
Better than their work colleagues. Better than their friends.
In fact… even better than their own family!
Private traits and attributes are predictable from digital records of human behaviour.
Kosinski’s team would compare the quiz results to all sorts of other online data about their test subjects – what they liked, what they shared, or what they posted on Facebook. They looked at their gender, age, and location.
The researchers began establishing correlations, and noticed that extraordinarily reliable deductions could be made about a person by scrutinising their online behaviour.
Kosinski and his team continued to refine their models. In 2012, they demonstrated that from a mere 68 Facebook Likes on average, a lot about a Facebook user can be reliably predicted:
- skin colour (95% accuracy),
- sexual orientation (88% accuracy),
- Democrat or Republican voter (85%).
However, there was much more. Their intellect level, religious affiliation, alcohol-, cigarette-, and drug use could all be calculated. Even whether or not your parents were divorced could be teased out of the data.
Sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender can all be predicted from your Facebook Likes.
The strength of Kosinski’s model depended on how well it could predict a test subject’s answers. Kosinski kept working at it.
Pretty soon, the personality model could appraise a person’s character better than one of his or her co-workers, with only ten Likes as input. With 70 Likes, Kosinski’s model could “know” a subject better than a friend. With 150 Likes, it could guess someone’s personality better than their parents. With 300 Likes, Kosinski’s model could predict a subject’s answers better than their own partner.
Our smartphones are like psychological questionnaires that we are constantly updating, whether consciously or unconsciously.
With even more Likes, the result would exceed what an individual thinks they know about themselves!
The day he published these findings, Kosinski received two phone calls: one was a threat to sue and the other was a job offer. Both came from Facebook.
Weeks later, Facebook Likes became private by default. Previously, anyone on the Internet could see your “Likes”.
However, this was no obstacle to data mining. And while Michal Kosinski and his research team always asked consent from Facebook users to analyse their private data, many online apps and quizzes request access to this sensitive information as a precondition for taking personality tests.
Now, Kosinski’s team could also ascribe Big Five values based on how many profile pictures, or how many contacts, a user has on Facebook – both clear indicators of extraversion. Even when we are not online, the motion sensors on our devices reveal how quickly we move and how far we travel – this data correlated to emotional instability.
And it also worked in reverse. Not only could psychological profiles be created from your data, but your data could also be used to search for specific profiles.
Michal Kosinski had created a kind of people search engine. What he now wanted to do was to share his findings…
The Internet heralded the beginning of a new era. A gift from Heaven to an entire generation, it has the wonderful ability to transcend the limitations of our Physical World.
Data can be copied. So why should not everyone benefit from it?
But what would happen if someone used Kosinski’s search engine to manipulate people?
Kosinski warned his approach could pose to a threat to an individual’s freedom, well-being, or even life. Yet, no one seemed to grasp what he really meant.
Around 2014, Kosinski was approached by a member of the Psychology Department at Cambridge, who wanted access to the MyPersonality database on behalf of a company, although he was not able to reveal for what purpose. Kosinski and his team considered the offer, but he hesitated.
A threat to individual freedom.
Eventually, Aleksandr Kogan revealed the name of his clients – a company called Strategic Communication Laboratories (SLC) – a leading private British provider of communication research and analysis, also known as Cambridge Analytica in the United States. When Kosinski googled the company he found that they were involved into the study of mass behaviour and how to change it, describing themselves as the “premier election management agency”.
The company specialises in marketing based on psychological modelling. At its core focus: influencing elections.
Although it was unclear who exactly owned SLC, some of its offshoots had been involved in elections from Ukraine to Nigeria, helped the Nepalese monarch in a defence project, or developed methods to influence Eastern European and Afghan citizens for NATO.
Kosinski was troubled. What were these people planning to do?
According to a report in The Guardian, it emerged that SCL had learned about Kosinski’s method from Kogan. His company had reproduced the Facebook “Likes”-based personality measurement tool to sell it to this election-influencing firm.
In November 2015, the “Leave EU” Brexit campaign announced that it had commissioned a Big Data company to support its online campaign. Cambridge Analytica‘s core strength was an innovative microtargeting technique: political marketing that measures people’s personality from their digital footprints, based on the OCEAN model.
Kosinski was horrified. His methodology was being used on a grand scale for political purposes.
Initially, the digital side of Donald Trump’s presidential campaign had only consisted of more or less one person, a marketing entrepreneur who created a rudimentary website for Trump for $1,500. The 70-year old president is not digitally savvy, although he does have a smartphone and tweets incessantly. On the other hand, Hillary Clinton’s campaign relied heavily on social media and cutting-edge Big Data analysts.
The same company was behind both Trump’s online ad campaigns and mid-2016’s other shocker, the Brexit “Leave” campaign: Cambridge Analytica, with its CEO Alexander Nix.
Then, in June 2016, Trump’s campaign team announced that they had hired Cambridge Analytica, and with it the power of Big Data and psychographics.
Until now, election campaigns had been organised based on demographic concepts. But the idea that all women should receive the same message because of their gender, or that all African-Americans should receive the same message because of their race, is outdated. While political campaigners so far relied on demographics, Cambridge Analytica was using psychometrics and a Big Five ‘OCEAN’ model to predict the personality of every single adult in the U.S.
The way Cambridge Analytica is able to do that involves purchasing data from a range of different sources, such as:
what magazines you read,
what churches you attend…
In the United States, almost all personal data is up for sale. (Whereas European privacy laws require a person to “opt in” to a release of data, those in the U.S. permit data to be released unless a user “opts out”.)
Cambridge Analytica aggregated this data with the electoral rolls of the Republican party and online data, and calculated a Big Five personality profile. Digital footprints became real people with fears, needs, interests, residential addresses, ad phone numbers. They also used surveys on social media, and Facebook data.
The company did exactly what Kosinski had warned. They managed to profile the personality of 220 million people – every single adult in the U.S.A.
Psychographically-categorised voters can then be differently addressed.
A Different Message for Every Voter.
Suddenly, Trump’s much criticised fickleness, his striking inconsistencies and resulting array of contradictory messages, turned out to be his greatest asset.
Donald Trump’s presidential campaign team tested 175,000 different targeted ad variations for his arguments. Every message he put out was data-driven to target the recipients in the optimal psychological way. Down to the smallest of groups. Even, down to individuals.
One of the goals was to keep potential Clinton voters away from the ballot box, to “suppress” their votes. With Facebook, this was achieved by targeting users with specific profiles with specially-tailored news-feed-style ads.
The days of traditional blanket advertising were over. Trump’s digital troops used less mainstream television, and more advertising on social media and digital TV. The embedded Cambridge Analytica team received $100,000 from Trump last July, $250,000 in August, and $5 million in September 2016. Overall, the company earned a total of over $15 million.
From July 2016, Trumps’s canvassers were provided with a computer and smartphone app with which they were able to correlate the political views with the personality types of the inhabitants from any American household. Trump’s people only rang the doors of houses that the app had rated as being receptive to his messages. They came prepared with guidelines for conversations tailored to the personality type of each resident. They fed their targets’s reactions into the app, and new data then flowed back to the dashboards of the Trump campaign.
Although the Democrats did similar things, no evidence they relied on came from psychometric profiling. Cambridge Analytica divided the U.S. population into 32 personality types, and were able to focus on just 17 states. For instance, a preference for cars manufactured in the States was a great indication of a potential Trump voter. Such findings showed Trump which messages worked best and where. And the decision to focus on the states of Michigan and Wisconsin in the final weeks of his campaign was entirely made on the basis of advanced data analysis.
Exactly to what extent did psychometric methods influence the outcome of the election is impossible to answer. However, the surprising rise of Ted Cruz in the primaries, the increased number of voters in rural areas, and the decline in number of African-American early votes, provide some clues.
Trump’s unexpected success may as well be explained by the effectiveness of his personality-targeted advertising, than by his greater investment in terms of digital, rather than mainstream, TV campaigning. Facebook also proved to be the ultimate weapon and the best election campaign tool. In fact, it will remain a historical irony that Trump, who often grumbled about scientific research, used a scientific approach in his campaign.
For the sake of a handful of cleverly analysed data items, the World has been turned upside down. The United Kingdom IS leaving the European Union. Donald Trump IS the new leader of the Free World.
Meanwhile, Kosinski has been conducting a series of tests, which results will soon be published.
The new study shows the effectiveness of personality targeting by demonstrating that marketers can attract up to 63% more clicks and up to 1,400 more conversions in real-life advertising campaigns on Facebook when matching products and marketing messages to consumers’ personality characteristics. The alarming results further demonstrate that large numbers of consumers can be accurately targeted based on a single Facebook page!
Many people are guilty of oversharing on Facebook. Even the most reserved users may be giving away far more information about their personality than they do realise.
By ‘mining’ for Likes on social network, software was developed that can predict how open, conscientious, outgoing and neurotic an individual user is. And, in the majority of cases, these predictions were more accurate than those made by close acquaintances and family members.
What do you think? Does your phone know you better than yourself? And should we all be a lot more careful about what we readily disclose about ourselves?
More and more, we use virtual assistants to facilitate our interactions with our electronic devices. We control our TV sets with our voices, we entrust Cortana and Siri with our Internet searches. We invite new devices into our homes to listen and record every one of our requests.
And you voluntarily add more to the database every time you complete one of those Facebook personality quizzes that you do just for fun.
In an increasingly dystopian World, should we not be more aware of what information we give away daily?
The thing is… Big Data is watching YOU.