Twitter Algorithm Code Leaked and Released, An Explanation
Twitter's algorithm code was recently leaked online via GitHub. The New York Times reported that parts of Twitter's source code were publicly available on the platform before being taken down, following Twitter's DMCA request to GitHub. The leaked information included "proprietary source code for Twitter's platform and internal tools."
The Algorithm and Its Components
Prior to the leak, Twitter, via Elon Musk, had already announced that it was going to open-source codes used for recommended tweets on March 31, 2023. On Friday, Twitter released the code for the algorithm that determines which tweets appear on a user's "For You" timeline. The company published the code on its official GitHub page, stating that the move was part of its effort to increase transparency and give developers a better understanding of how the platform operates.
The code for the algorithm includes the following components:
- Different recommendation sources: These are the various sources Twitter uses to gather tweets that it thinks a user might be interested in seeing. They include accounts that the user follows, popular accounts, and tweets that are currently trending.
- Machine learning model: This is the tool Twitter uses to rank the tweets it has gathered based on how relevant they are to the user. The model takes into account factors like the user's activity on the platform, the content of the tweets, and the popularity of the tweets.
- Filters: These are the final step in the algorithm and are used to remove tweets that are inappropriate, have been blocked by the user, or have already been seen by the user.
Twitter's decision to release the algorithm code has been praised by some in the tech industry as a positive step towards greater transparency in social media. However, others have raised concerns about the potential for bad actors to use the code to exploit the platform's vulnerabilities.
Twitter's engineering team explained that the algorithm that determines which "top Tweets that ultimately show up on your device's For You timeline" is "composed of many interconnected services and jobs." The algorithm has a three-step process that gathers the best tweets from "different recommendation sources," ranks them using a "machine learning model," and filters out blocked tweets, inappropriate tweets, or posts the user has already seen.
Twitter also noted that the largest source of the tweets is "In-Network Sources," or users someone follows. The top tweets from that pile are ranked based on the likelihood of a user's engagement with that tweet's author. For the "Out-of-Network Sources," Twitter considers tweets that attracted engagement from people users follow and tweets liked by those who like tweets similar to a user.
Identification Values and Categories
When the code was leaked, many users pointed out some questionable considerations in Twitter's recommendation algorithm. For instance, in the "HomeTweetTypePredicates.scala" code branch, users found seemingly discriminatory categories such as "author_is_elon," "author_is_power_user," "author_is_democrat," and "author_is_republican."
A Twitter engineer clarified that these identification values were "used purely for metrics collection" and to "track how often we are serving Tweets from these authors and how often their tweets are being impressed by users." Twitter uses this information to validate that their A/B experimentation platform does not negatively impact one group over another.
However, many Users were still concerned with these categories, and during a Twitter Spaces audio session, Elon Musk expressed confusion and criticism over the categories' as well. Musk questioned why categories such as "Republican" and "Democrat" were included and suggested that they should not be there. He added that such categories only served to "divide people" and were "stupid embarrassing things." Musk's appearance on Twitter Spaces highlighted his plans to increase transparency on the platform by releasing the social media site's code.
The recent leak of Twitter's recommendation algorithm code has raised concerns over the platform's use of discriminatory categories in their algorithm. While Twitter claims that these categories were used for metrics collection, the revelation has caused many to question the platform's commitment to inclusion and diversity. However, Twitter's release of the algorithm code is also a significant step towards transparency and accountability, giving users and researchers the opportunity to understand how the platform operates.
As someone who studies algorithms intensely, this was a very interesting revelation, to say the least.