Content moderation system

The Case:

As recent studies suggest:

"Over 2.5 quintillion bytes of data are created every single day, and it’s only going to grow from there. By 2020, it’s estimated that 1.7MB of data will be created every second for every person on earth."

Most of this data is user-generated content such as comments, posts, photo uploads and more. Keeping offensive content off of their platform while preserving the freedom of speech of their users is a tough balancing act that most companies that allow user-generated content have to perform every day.

The Challenge:

Moderating content is one of the toughest challenges that tech companies face. Simply blocking “offensive” words is often not enough since users can be very creative and come up with new ways to spell words or use emojis and pictures instead.


Furthermore blocking single words out of context can annoy users and can be viewed as a violation of their rights to free speech and expression.


Finally, language itself is constantly evolving so words and phrases can change meaning from year to year which means that an adaptive filtering system needs to be put in place.


Usually, this system comprises of human moderators who have to review every single piece of text, picture or video that gets uploaded. This is both time consuming, expensive and ruins the user experience on some platforms.

The Solution:

Unlike most other content moderation systems that take a bag of words approach, we decided to build a character based moderation system on top of convolutional-LSTM neural networks.

Looking at words character by character allows for typos and purposefully misspelled words to be handled properly. The LSTM component of the system allows it to model the context of each sentence and to make more accurate decisions on when a message is intended to offend and when it is just an expression or a quote.

The Outcome:

The final system is able to catch offensive language with an out of the box (before training on a text corpus for the particular platform) accuracy of over 85% and learn through examples provided by human moderators. Using this feedback the system can learn to adapt to changes in language and user behavior reducing the load on human moderators.