Facial Recognition As A Service 📹
A twitter bot that answers your questions. Facebook removes sources of political misinformation from Israel.
⚙️ Current Project
This week I released @gpt2bot on Twitter. For the first time, you can now interact with a state of the art neural net that responds to your questions in plain English.
How it works:
You, a witty and curious Twitter user, compose a question and include @gpt2bot in your tweet
gpt2bot, a transformer based language model created by OpenAI, receives your prompt and replies directly to your Tweet with a custom response
Okay, but how does it actually work?
gpt2 is a neural net that was trained on a large dataset composed of links from Reddit. When you give it a prompt, it uses the context of the sentence and it’s knowledge of human conversation from the Internet to guess the next word in the sequence.
For example, if you give it a question related to movies, you can expect to find quotes or opinions on Spielberg. If you give it the beginning of a recipe, you may end up with a new ingredient list for lasagna.
Here’s an example from a recent user:
Seems like good advice.
Give it a shot! Tweet @gpt2bot for your own AI crafted conversation.
Does Your Face Belong to You?
Last week, the city of San Francisco voted to pass an ordinance banning the city municipality from using facial recognition software. This ban, coming from the epicenter of technological adoption, signals the growing complexity of public debate on the dual-use of machine learning algorithms. Much has been said on the topic (this NYT post covers the history in detail), but I want to touch on a few points:
How we think about “our” data and what is a reasonable expectation of privacy
The adoption of systems that automate actions formerly reserved for humans
Privacy has exploded as a hot button topic in recent years with the advent of social media tracking and scandals from Facebook, Marriott, and Experian. When information is leaked that we consider to be private our outrage is directed at the custodian of that information, not the buyer. If I give you my name and email address, I expect you to use that information to provide me with a better service, not sell it for pennies to an advertising company. Information like my name, social security number, and financial information are all things that I expect to be kept private. The lines blur when considering information that doesn’t fit into our neat box of traditionally private data.
Facebook is a private company that operates a digital public square. If I post to my timeline that I am in the market for a new car, and then see advertisements for Chevrolet of Santa Monica, that seems like a reasonable use of my digital footprint to provide me with relevant information. However, if I confide through a private message that I am going through a health scare, and next week I am targeted with pharmaceuticals on a separate website, there is definitely a breach of privacy. Whether it comes from the scanning of messages, or the distribution of that data outside of the platform is open for debate, but it feels as though a line was crossed.
As machine learning starts to make huge areas of previously intractable data valuable, how do we rethink what is private?
Let’s use an example:
A man walks into a store and purchases a pack of cigarettes and a coffee on his way to work.
It seems reasonable to assume that the store tracks the number of customers who entered the store that day. They also might track what is purchased to help with inventory and future stocking options. This is information that the store is justified in knowing, and we don’t feel as though it violates our privacy for them to capture this.
The store clerk, through dozens of similar interactions, might say hello to the man and comment if he deviates from his normal purchase.
“No cigarettes today?” prompts our clerk with a penchant for up-selling.
“You know, I’ve been trying to quit” replies our hypothetical and now health conscious thought experiment.
“Have you considered Juul? These things are supposed to be great for kicking the habit.”
Our store clerk has now used observations about the man to recommend a purchase. Once again, information is being used, but it is now personalized. Since it came from a genuine human interaction we probably wouldn’t consider it nefarious, although we may choose to go to a different store with a less observant and pushy clerk.
If we walk into a similar store in a different geographic area and the clerk has our Cool Cucumber Juul pod ready for checkout, is that a violation, or merely impressive coordination?
Machine learning offers a way to scale these interactions independent of human involvement. The clerk’s suggestion for an e-cigarette is replaced by a targeted advertisement, but the outcome is the same. Our perception of the system is different because we can’t neatly trace how the information was captured.
The point being that an individual’s expectation of privacy is a subjective matter. In the U.S., it is often based on a loose definition of “reasonable” outlined in the 4th Amendment. As the pace of improvement in these humanesque systems increases, our definition of reasonable will be stretched. Furthermore, these systems increasingly cross international boundaries where no one particularly cares what the U.S. Constitution has to say about the matter.
As a closing thought on the issue, I want to highlight that all of the examples above are actions that we can imagine happening today, but require an impractical amount of human coordination to accomplish. It’s convenient to think of today’s machine learning systems as an army of 5-year olds. Limited capabilities enabled at previously impossible scale.
As these systems mature, we must consider the possibility of superhuman capabilities, scaled to global proportions.
📚 Reading
An overview of Feynman diagrams created by the illustrious bongo-playing Nobel Laureate. Link.
Facebook’s summary of a recent action to remove the spread of coordinated political misinformation within Israel. The accounts had 2.8 million followers at time of removal. Link.
Sony and Microsoft team up to take on the Cloud gaming market. Other notable players include Google’s Stadia and an upstart entrant from YC called Vectordash. Link.
The College Board (company that administers the SAT) announced a new metric intended to capture “adversity.” File this one under “ideas that should not have left the conference room.” Link.
Learning that someone has a political affiliation that does not match your own causes you to trust them less. Link.
📺 Videos
A Bloomberg video on Facebook released in 2013.
You will notice the same concerns about privacy that we are discussing today overshadowed by the soaring stock price and ambitious vision.
Not a subscriber yet? Sign up here.