Disclaimer: the examples in this post are for illustrative purposes and are not commentary on any specific content policy at any specific company. All views expressed in this article are mine and do not reflect my employer.
Why is there any spam on social media? No one aside from the spammers themselves enjoys clickbait scams or phishing attempts. We have decades of training data to feed machine learning classifiers. So why does spam on every major tech platform feel inevitable? After all these years, why do bot farms still exist?
The answer, in short, is that it is really hard to fight spam at scale, and exponentially harder to do so without harming genuine users and advertisers. In this post, we’ll use precision and recall as a framework for understanding the spam problem. We’ll see that eradicating 100% of spam is impractical, and that there is some “equilibrium” spam prevalence based on finance, regulations, and user sentiment.
Imagine we’re launching a competitor to TikTok and Instagram. (Forget that they have 1.1 billion and 2 billion monthly active users, respectively; we’re feeling ambitious!) Our key differentiator in this tight market is that we guarantee users will have only the highest quality of videos: absolutely no “get rich quick” schemes, blatant reposts of existing content, URLs that infect your computer with malware, etc.
Attempt 1: Human Review
To achieve this quality guarantee, we’ve hired a staggering 1,000 reviewers to audit every upload before it’s allowed on the platform. Some things just need a human touch, we argue: video spam is too complex and context-dependent to rely on automated logic. A video that urges users to click on a URL could be a malicious phishing attempt or a benign fundraiser for Alzheimer’s research, for example — the stakes are too high to…