Building Transparent Algorithmic Filters in AT Protocol with Sky Feeder
Devin Gaffney contributes a write up on sky feeder, letting anyone build custom algorithmic feeds.
If the original sin of Web 1.0 was the pop-up ad, the original sin of web 2.0 was the move to algorithmic feeds. Opaque optimization strategies aimed at maximizing private revenue for the sake of what was otherwise externally billed as public goods became increasingly toxic, spawning discourse about echo chambers and filter bubbles.
One of, if not the principal, drivers of the concentration of power and capital on legacy social media has been the feed. The feed is the mechanism by which users are placed into information silos and made to subconsciously work for the platforms, exhausting ad revenue as a byproduct of their unwitting addiction. Without the feed, the platforms would largely evaporate, failing to maintain audience capture in order to sell ads. With the feed, platforms have been able to race to the bottom of content delivery, and wreck our informational ecosystem as a result.
Bluesky's AT Protocol (and federated / open platforms in general) is well positioned to break the back of this model. Because the firehose of data is available to anyone, and because anyone can create a feed that tightly and transparently integrates with the platform, it's possible to democratize the production of these content delivery mechanisms. While any particular server may provide their own feed as a default, it's exceedingly easy to switch to whatever feed you may desire, and self-host the production of that feed, and literally control the flow of information exposure, free of the constraints of needing to keep an eyeball to sell an ad.
Beyond merely self-hosting one's own feed, we also have the opportunity to create infrastructure that standardizes the operations that select, reject, reduce or favor any particular post. In sky-feeder, I'm developing my own thoughts of what those primitive operators may look like on a platform like Bluesky. By landing on a standard set of operators, with a standard set of software packages that produce the same boolean logical outcomes for particular items being processed, we can grow an ecosystem where users can lean on that infrastructure to create transparent feeds with simple declaratory rules that outline their own production.
For example, in sky-feeder, to produce a feed of any message from any member of the Swiftie Starter Pack, we can declare a feed as:
{
"filter": {
"and": [
{
"starter_pack_member": [
"https://bsky.app/starter-pack/taylorsglitters.swifties.social/3laf2yzj6lh2y",
"is_in"
]
}
]
}
}
If, instead, we want to create a feed based off of a particular user's followed list, restricted to posts that are classified according to an open-sourced dril-detector (posts that sound like they're written by dril), we can declare that model as:
{
"filter": {
"and": [
{
"social_graph": [
"brendannyhan.bsky.social",
"is_in",
"follows"
]
},
{
"model_probability": [
{
"model_name": "dril_detector"
},
">=",
0.95
]
}
]
},
"models": [
{
"feature_modules": [
{
"model_name": "all-MiniLM-L6-v2",
"type": "vectorizer"
}
],
"model_name": "dril_detector",
"training_file": "https://gist.githubusercontent.com/DGaffney/03973a469809eec191d63bd9fb4a78e3/raw/92d9a44d34348e8c7f41e7d30a17ff30e5c8f54d/dril_detector_dataset.json"
}
]
}
In this way, we can declare not just the basic filtering options, but openly document the precise replication conditions for creating the ML models we end up relying on for our decision-making, allowing anyone else to replicate the same results locally.
By building front-end applications on top of this "Algorithm Manifest" document, we can produce user interfaces that let any user easily mix, match, and remix feeds. Through this open, simple experimentation, we disintermediate the toxic brokerage that has broken legacy social media, and cooperatively find new ways to filter information in a way that is user-focused, replicable, transparent, and independent from any particular host.
The last version of the web took 20 years to start falling apart, and in that time, it did enormous damage to our informational and civic landscapes – and all we have to show for it are some freaky billionaires it produced on the other side. Platforms that are open to the core will be the next iteration of the web, because if there's going to be any value of being online, they must be. We should deeply explore the possibility of extending that openness all the way up the stack to the very mechanisms that ruined the internet the last time around.
Devin Gaffney is a machine learning engineer at the fact-checking software nonprofit Meedan, and lives in Portland, OR. You can find him spending time on Bluesky and can support projects like Sky-Feeder on the Github repo or on the discussion about adding standard algorithmic operators to the protocol.