jzhao.xyz

Search

Martin Kleppmann's Course
Why distributed?
Why not distributed?
Notes

Distributed Systems

May 03, 2022, 2 min read

#seed

A distributed system can be defined as multiple computers (nodes) communicating via a network trying to achieve some task together.

Martin Kleppmann’s Course

Notes from Martin Kleppmann’s Distributed Systems Course. He has a set of course notes on his teaching site as well.

How do we share data amongst different concurrent entities?

Recommended Reading
- “Distributed Systems” by van Steen & Tanenbaum: Implementation detail heavy, more practical
- “Introduction to Reliable and Secure Distributed Programs” (2nd ed) by Cachin, Guerraoui & Rodrigues: Theory heavy
- “Designing Data-Intensive Applications” by Kleppmann: More oriented toward distributed databases
- “Operating Systems: Concurrent and Distributed Software Design” by Addison-Wesley: links to Operating Systems

Why distributed?

Things are inherently distributed: sending a message from your phone to your friend’s phone
Reliability: even if one node fails, the system as a whole keeps functioning
Performance: get data from a nearby node rather than one centralized server halfway around the world
Solve bigger problems: some amounts of data can’t fit on just one machine

Why not distributed?

Communication may fail (and we might not even know it has failed)
Processes may crash (and we might not know)
All of this can happen nondeterministically
Thus we need to think about fault tolerance

Notes

RPCs
Fault Tolerance
- See Two Generals Problem and Byzantine Generals Problem
System models
Physical and Logical Time
Message ordering and Causality
Message broadcast
Replication
Quorum
Consensus
Consistency

Graph View

Backlinks

Building a BFT JSON CRDT
Here's to the fools who dream
A Certain Tendency Of The Database Community
Overlay Network
Rhizome Research Log
Distributed Web
Fault Tolerance
Liveness
Math
Safety
Software Principles
System model

Created with Quartz v4.1.0, © 2023

GitHub
Twitter