How AT Protocol Repositories Secure and Store User Actions

cover
25 Sept 2024

Authors:

(1) Martin Kleppmann, University of Cambridge, Cambridge, UK (martin.kleppmann@cst.cam.ac.uk);

(2) Paul Frazee, Bluesky Social PBC United States;

(3) Jake Gold, Bluesky Social PBC United States;

(4) Jay Graber, Bluesky Social PBC United States;

(5) Daniel Holmgren, Bluesky Social PBC United States;

(6) Devin Ivy, Bluesky Social PBC United States;

(7) Jeromy Johnson, Bluesky Social PBC United States;

(8) Bryan Newbold, Bluesky Social PBC United States;

(9) Jaz Volpert, Bluesky Social PBC United States.

Abstract and 1 Introduction

2 The Bluesky Social App

2.1 Moderation Features

2.2 User Handles

2.3 Custom Feeds and Algorithmic Choice

3 The at Protocol Architecture

3.1 User Data Repositories

3.2 Personal Data Servers (PDS)

3.3 Indexing Infrastructure

3.4 Labelers and Feed Generators

3.5 User Identity

4 Related Work

5 Conclusions, Acknowledgments, and References

3.1 User Data Repositories

All data that a user wishes to publish is added to their repository, which stores a collection of records. Whenever a user performs some action – making a post, liking another user’s post, following another

Figure 3: The main services involved in providing Bluesky, and data flows between them. Icons from Flaticon.com.

user, etc. – that action becomes a record in their repository. Records are encoded in DAG-CBOR [45], a restricted form of CBOR [17], a compact binary data format. The schema of records is defined by the lexicon, and a repository may contain a mixture of records from several different lexicons, representing user actions in different social modes. Media files (e.g. images) are stored outside of the user’s repository, but referenced by their CID [32] (essentially a cryptographic hash) from a record in the repository. Similarly, a reference to a record in another repository (e.g. identifying a post being liked) also includes its CID.

Each user account has one repository, and it contains all of the actions they have ever performed, minus any records they have explicitly deleted. A Personal Data Server (PDS) hosts the user’s repository and makes it publicly available as a web service; we discuss PDSes in more detail in Section 3.2.

A user only updates their own repository; for example, if user 𝐴 follows user 𝐵, this results only in a follow record in user 𝐴’s repository, and no change to 𝐵’s repository. To find all followers of user 𝐵 requires indexing the content of all repositories. This design decision is similar to the way hyperlinks work on the web: it is easy to find all the outbound links from a web page at a given URL, but to find all the inbound links to a page requires an index of the entire web, which is maintained by web search engines.

The AT in atproto stands for Authenticated Transfer, which reflects the fact that repositories are cryptographically authenticated. The records in a repository are organized into a Merkle Search Tree (MST), a type of Merkle tree that remains balanced, even as records are inserted or deleted in arbitrary order [3]. After every change to a repository, the root hash of the MST is signed; the public verification key for this signature is part of the user identity described in Section 3.5. This enables an efficient cryptographic proof that a given record appears within a given user’s repository. Moreover, when a user updates or deletes a record, the MST enables a proof that the old record no longer appears in the repository.

This paper is available on arxiv under CC BY 4.0 DEED license.