Title: Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley CoFounder
1Invariant BoundariesDr. Eric A.
BrewerProfessor, UC BerkeleyCo-Founder Chief
Scientist, Inktomi
2Our Perspective
- Inktomi builds two distributed systems
- Global Search Engines
- Distributed Web Caches
- Based on scalable cluster parallel computing
technology - But very little use of classic DS research...
3Distributed Systems dont work...
- There exist working DS
- Simple protocols DNS, WWW
- Inktomi search, Content Delivery Networks
- Napster, Verisign, AOL
- But these are not classic DS
- Not distributed objects
- No RPC
- No modularity
- Complex ones are single owner (except phones)
4Concept Invariant Boundaries
- Claim we dont understand boundaries
- Solution make Invariant Boundary explictit
- Goal simpler, easier, faster to correctness
Invariantmay not hold
InvariantHolds
5Three Basic Issues
- Where is the state?
- Consistency vs. Availability
- Communication Boundaries
6Santa Clara Cluster
- Very uniform
- No monitors
- No people
- No cables
- Working power
- Working A/C
- Working BW
7Boundaries for Kinds of State
- Stateless
- Front ends
- Immutable state
- Soft State (rebuild on restart)
- Durable Single-writer (e.g. user data)
- Emissaries, horizontal partitioning
- Durable MW
- Fiefdoms, DBMS
8Persistent State is HARD
- Classic DS focus on the computation, not the data
- this is WRONG, computation is the easy part
- Data centers exist for a reason
- cant have consistency or availability without
them - Other locations are for caching only
- proxies, basestations, set-top boxes, desktops
- phones, PDAs,
- Distributed systems cant ignore location
- Invariant Boundary is small
9Berkeley Ninja Architecture
Base Scalable, highly-available platform for
persistent-state services
Internet
PDAs (e.g. IBM Workpad)
Cellphones, Pagers, etc.
10Berkeley Ninja Architecture
Base Scalable, highly-available platform for
persistent-state services
Consistent Shared
Single user
Internet
Soft-state, immutable state
PDAs (e.g. IBM Workpad)
Cellphones, Pagers, etc.
11Three Basic Issues
- Where is the state?
- Consistency vs. Availability
- Communication Boundaries
12Data is only consistent inside
Data is not consistent(reference data)
Consistent A
13Data is only consistent inside
Data is not consistent(reference data)
Consistent B
Consistent A
14The CAP Theorem
Theorem You can have at most two of these
invariants for any shared-data system
15The CAP Theorem
Theorem You can have at most two of these
invariants for any shared-data system Corolla
ry consistency boundary must choose A or P
16Forfeit Partitions
- Examples
- Single-site databases
- Cluster databases
- LDAP
- Fiefdoms
- Traits
- 2-phase commit
- cache validation protocols
- The inside
17Forfeit Availability
- Examples
- Distributed databases
- Distributed locking
- Majority protocols
- Traits
- Pessimistic locking
- Make minority partitions unavailable
18Forfeit Consistency
- Examples
- Coda
- Web cachinge
- DNS
- Emissaries
- Traits
- expirations/leases
- conflict resolution
- Optimistic
- The outside
19ACID vs. BASE
- BASE
- Weak consistency
- stale data OK
- Availability first
- Best effort
- Approximate answers OK
- Aggressive (optimistic)
- Simpler and faster
- Easier evolution (XML)
- wide Invariant Boundary
- Outside consistency boundary
- ACID
- Strong consistency
- Isolation
- Focus on commit
- Nested transactions
- Availability?
- Conservative (pessimistic)
- Difficult evolution(e.g. schema)
- small Invariant Boundary
- The inside
but its a spectrum
20Consistency Boundary Summary
- Can have consistency availability within a
cluster. No partitions within boundary! - OS/Networking better at A than C
- Databases better at C than A
- Wide-area databases cant have both
- Disconnected clients cant have both
21Three Basic Issues
- Where is the state?
- Consistency vs. Availability
- Communication Boundaries
22 The Boundary
- The interface between two modules
- client/server, peers, libaries, etc
- Basic boundary the procedure call
- thread traverses the boundary
- two sides are in the same address space
- What invariants dont hold across?
C
S
23Different Address Spaces
- What if the two sides are NOT in the same address
space? - IPC or LRPC
- Cant do pass-by-reference (pointers)
- Most IPC screws this up pass by value-result
- There are TWO copies of args not one
- What if they share some memory?
- Can pass pointers, but
- Need synchronization between client/server
- Not all pointers can be passed
24Partial Failure
- Can the two sides fail independently?
- RPC, IPC, LRPC
- Cant be transparent (like RPC) !!
- New exceptions (other side gone)
- Idempotent calls?
- Use Transaction Ids (to solve replay problem)
- Reclaim local resources
- e.g. kernels leak sockets over time gt reboot
- RPC tries to hide these issues (but fails)
- Use Level 4/7 switches to hide failures?
25Resource Allocation
- How to reclaim resources allocated for client?
- Usually timeout exception cleans up
- Release locks? (must track them!)
- How to avoid long delays while holding resources?
- How long to remember client?
- Delayed responses (past timeout) must be ignored
- Problem with leases
- Great for servers, but
- Clients lease may expire mid operation
- Hard to make client updates atomic with multiple
leases (2PC?) - Which things have leases? (can be hidden)
26Trust the other side?
- What if we dont trust the other side?
- Or partial trust (legal contract), or malicious?
- Have to check args, no pointer passing
- Limited release of information (leaks)
- Kernels get this right
- copy/check args
- use opaque references (e.g. File Descriptors)
- Most systems do not
- TCP, Napster, web browsers
- Security boundaries tend to be explicit
- Holes come from services!
27Multiplexing clients?
- Does the server have to
- Deal with high concurrency?
- Say no sometimes (graceful degradation)
- Treat clients equally (fairness)
- Bill for resources (and have audit trail)
- Isolate clients performance, data, .
- These all affect the boundary definition
28Boundary evolution?
- Can the two sides be updated independently? (NO)
- The DLL problem...
- Boundaries need versions
- Negotiation protocol for upgrade?
- Promises of backward compatibility?
- Affects naming too (version number)
29Example protocols vs. APIs
- Protocols have been more successful than APIs
- Some reasons
- protocols are pass by value
- protocols designed for partial failure
- not trying to look like local procedure calls
- explicit state machine, rather than
call/return(this exposes exceptions well) - Protocols still not good at trust, billing,
evolution
30Example XML
- XML doesnt solve any of these issues
- It is RPC with an extensible type system
- It makes evolution better?
- two sides need to agree on schema
- can ignore stuff you dont understand
- Must agree on meaning, not just tags
- Can mislead us to ignore/postpone the real issues
31Example services
- Claim you cant magically convert a class to a
service - Behavior depends on the boundaries that callers
cross. - Trusted? Multiplexed? Partial failure?
Namespaces? - Shouldnt TRY to be transparent
- Instead make it easier to state boundary
assumptions (and check them)
32Annotated Interfaces
- IDL can annotate interfaces
- Timeout gt client may not respond
- Trusted gt no malicious clients
- Malicious gt client may be hostile
- Multiplexed gt many simultaneous callers
- Idempotent
- Etc.
- Could be checked at statically in some cases,
dynamically in others - Annotations Boundaries gt fewer bugs
33Lessons for Applications
- Make boundaries very explicit
- Not just client/server
- Independent systems
- Third-party software
- Third-party services (RPC to vendors/partners)
- Have a few big modules
- Otherwise too many boundaries, and no invariants
- Examples Apache, Oracle, Inktomi search engine
- Big Modules are well supported
- Big Modules justify their cost (Lampson)
34Partial checklist
- What is shared? (namespace, schema?)
- What kind of state in each boundary?
- How would you evolve an API?
- Lifetime of references? Expiration impact?
- Graceful degradation as modules go down?
- External persistent names?
- Consistency semantics and boundary?
35Conclusions
- Most systems are fragile
- Root causes
- False transparency assuming locality, trust,
privacy - Implicitly changing boundaries consistency,
partial failure, - Some of the causes
- focus on computation, not data
- ignoring location distinctions
- overestimating consistency boundary
- degraded boundaries (RPC is not a PC)
- Invariant Boundaries
- Help understanding, documentation
- Simplify detection
- Simpler and easier than full specifications (but
weaker)
36ACID vs. BASE
- DBMS research is about ACID (mostly)
- But we forfeit C and I for availability,
graceful degradation, and performance - This tradeoff is fundamental.
- BASE
- Basically Available
- Soft-state
- Eventual consistency