The power of machine learning – Lessons learned from the recent Facebook and Cambridge Analytica scandal
By Saunthra Thambyrajah
It’s beginning to look like Donald Trump’s electoral campaign was won through the use of machine learning. It’s a great case study into how powerful this technology is when it has access to a large sample size to play with.
Unless you have been living under a rock, you must have heard of the Facebook scandal involving Cambridge Analytica (CA). At the latest count, CA allegedly profiled over 87 million Facebook users and that allowed them to influence how people voted through the use of ‘strategic communications’.
So what CA did was to hire a university professor to collect Facebook data. The professor developed a personality survey app to determine which of the big five personality groups a person fell into. Users of the app, unwittingly or otherwise, also agreed to share their Facebook page-likes when they took the survey. Using machine learning and the results of the survey, CA allegedly was able to analyse their likes and correlate certain personality types with liking a certain combination of pages.
The ethically dodgy bit is where a flaw in Facebook allowed the professor to access the list of liked-pages of all the friends of the survey takers. These friends didn’t have to take the survey. The friends did not know they were being profiled. All this data then made its way into the hands of CA.
A machine learning algorithm was then used to predict what personality traits these friends might have based on their likes.
And as a result CA was able to profile over 87 million Facebook users to deliver targeted messages about electoral candidates in the elections.
Imagine being targeted by messages that were especially crafted to resonate with your personality type. Manipulating people is bad and it becomes especially heinous when it’s done without their knowledge.
Add to this mix allegations that CA also pushed the “Defeat Crooked Hillary” campaign to hundreds of social media users and the implications are frightening.
As a purely theoretical exercise, let’s crunch the numbers to show you how insidiously clever this whole scheme was.
Obviously 87 million Facebook users didn’t download the app. For the sake of argument let’s say that the average person has 300 Facebook friends. I know this is very conservative for some of you but this is an average number. If you divide 87 million by 300, only 290 000 people had to be involved for CA to profile over 87 million people.
This is very close to the 270 000 that the CA whistleblower, Chris Wylie, claimed downloaded the app.
Facebook has also released the graph above showing the top 10 countries affected by the breach. A whopping 81 percent or over 70.6 million users were from the US.
I’m personally astounded by this figure as it seems statistically skewed towards users in the United States. However, I know that it is possible to use Facebook ads to target users who reside in the US. And it stands to reason that most of the friends of the people in this group would also be from the US. More technology use at play here.
70.6 million is a huge sample size. Especially if you take into account that 139 million people voted in the 2016 US presidential elections. (Source: McDonald, Michael P. 2016. 2016 November General Election Turnout Rates” United States Elections Project. 8 April 2018).
Technology is a powerful tool. It can be used for good. In this case, there is a lot left to be desired.
Note: The Wikipedia description for Cambridge Analytica says: “Cambridge Analytica is a British political consulting firm which combines data mining, data brokerage, and data analysis with strategic communication for the electoral process. It was started in 2013 as an offshoot of the SCL Group. The company is partly owned by the family of Robert Mercer, an American hedge-fund manager who supports many politically conservative causes. The firm maintains offices in London, New York City, and Washington, DC.”