Authors:
(1) Martin Kleppmann, University of Cambridge, Cambridge, UK (martin.kleppmann@cst.cam.ac.uk);
(2) Paul Frazee, Bluesky Social PBC United States;
(3) Jake Gold, Bluesky Social PBC United States;
(4) Jay Graber, Bluesky Social PBC United States;
(5) Daniel Holmgren, Bluesky Social PBC United States;
(6) Devin Ivy, Bluesky Social PBC United States;
(7) Jeromy Johnson, Bluesky Social PBC United States;
(8) Bryan Newbold, Bluesky Social PBC United States;
(9) Jaz Volpert, Bluesky Social PBC United States.
Table of Links
2.3 Custom Feeds and Algorithmic Choice
3 The at Protocol Architecture
3.2 Personal Data Servers (PDS)
3.4 Labelers and Feed Generators
5 Conclusions, Acknowledgments, and References
3.2 Personal Data Servers (PDS)
A PDS stores repositories and associated media files, and allows anybody to query the data it hosts via a HTTP API. Moreover, a PDS provides a real-time stream of updates for the repositories it hosts via a WebSocket. Indexers (see Section 3.3) subscribe to this stream in order to find out about new or deleted records (posts, likes, follows, etc.) with low latency. This architecture is illustrated in Figure 3.
Hosting a PDS for a small number of users requires only small computing resources, even if those users have a large number of followers. Users who wish to self-host their own PDS can therefore do so on a cheap virtual machine in the cloud, or even on a Raspberry Pi connected to their home internet router. However, we expect that most users will sign up for an account on a shared PDS run by a professional hosting provider – either Bluesky Social PBC, or another company.
Compared to choosing a Mastodon server, the user’s choice of PDS hosting provider is fairly inconsequential. The PDS URL is internal to the system, and is not normally visible to users. It makes no difference whether two users are on the same PDS or different PDSes, since interaction between users goes via the indexing infrastructure in any case. A user can migrate from one PDS to another by simply copying their repository and media files to the new PDS, and pointing their account ID at the new PDS URL (see Section 3.5). Even if a PDS shuts down without warning, users can upload a backup of their repository to a new PDS, and thus recover their account without losing any of their posts or their social graph.
PDS operators will generally want to perform some basic moderation by deleting any illegal content hosted on their servers. However, PDS-level moderation is much less important than server-level moderation in Mastodon, because in atproto, the primary moderation role is taken on by seperate actors in the system – the labelers and feed generators (see Section 3.4). This allows different sets of people to offer server hosting and moderation services, respectively; we believe this separation is valuable since operating a server and moderating a community require largely disjoint sets of skills [46].
At the time of writing, Bluesky’s indexing infrastructure (see Section 3.3) only indexes repositories on PDS instances hosted by Bluesky Social PBC itself; this limitation exists to limit infrastructure load and abuse problems during the beta period. In that sense, Bluesky is not yet fully decentralized. Support for third-party PDS operators is already implemented and enabled in Bluesky’s sandbox (testing) environment, and a PDS implementation suitable for self-hosting is already open source [8]. We plan for the Bluesky indexing infrastructure to begin indexing repositories on other PDS operators (indicated by dashed arrows in Figure 3) in early 2024.
This paper is available on arxiv under CC BY 4.0 DEED license.