Spanner
Spanner
Key contributions:
- Multi versioned data used for lock free reads
- True time used for snapshoting
Consistency semantics:
- Externally consistent(If t1 commits before t2 => TS(t1) < TS(t2) t1 is committed only after t1.after() is true on all machines So a request which originates after this commit on any machine has a timestamp greater than previous operation
- All operations have timestamp.
- Lock free reads
- snapshots
Engineering details
- Granularity of data = directory (All data in a directory has the same replication configuration)
- Group of directory -> Spanner Tablet Tablet maintains Key of < id,timestamp> Paxos instance for every tablet
- Span server contains tablets.
- Location proxies contain map<id,span server>
- Zones are units of physical isolation contain<Location servers,Span servers>
- Placement driver to move around data.
- Provides a data model for a relational databse
TrueTime
- Every datacenter has atomic clocks and GPS clocks.
- Machines maintain time by contacting a subset of this.
- TT.now() -> [earliest,latest]
- TT.after(t) -> if t is definately passed
- TT.before(t) -> if t is definately not passed
Concurrency
- Implements long lived leaders with leases. Lease extensions are piggybacked on write msgs
- INVARIANT: The leaders lease interval is disjoint. group members do not vote for new leaders until lease_interval.after()
- Transactions are timestamped.
- Co-ordinator uses 2 phase commit . Chooses a timestamp greater than all the timestamps received in the prepare phase
- COMMIT_WAIT . co-ordinator choses a timestamp and waits until tt.after() to commit and unlock the transaction. If a client reads a transaction, its current time will be > than timestamp of transaction [Can we save this time if clocks are forwarded to the commited time without waiting.]
- Reads are satisfied if the read t_s < t_last_seen_write
- Snapshot reads are implemented when if a replica is sufficiently update.
Differences with gigapaxos
- Granularity of data.
- Extensibility to cases beyond databases
Still working on this ?? Guarantees of gigapaxos vs spanner
Table1<M1,M2,M3> and Table2<M4,M2,M3>. If Ts_1(M2)<Ts_2(M2) and Ts_1(M3)>Ts_2(M3).This can happen in gigapaxos. Without loss of generality let assume that t1 acquired the lock. If none acquired both the transactions are aborted. Prepare -> Ts_1_M1,Ts_1_M2 Commit -> max(Ts_1_M1,Ts_1_M2) if max is Ts_1_M1. The commit message would have Ts_1_M1. Clocks increase therefore Ts_1_M1<Ts_2_M1 But Ts_1_M2<Ts_1_M1 So it is possible for the next commit to lie in between [Ts_1_M2<Ts_1_M1] for some unrelated transaction in table2 as the locks are held only till table2. This is avoided by waiting till Ts_1_M1.after() at which point it is guaranteed that all clocks are > Ts_1_M1. And all their votes there after will have a higher vote.
Ideas related to the problem being solved.
- Each leader , a transaction manager is implemented. One of the transaction manager becomes the co-ordinator leader and the group members become slaves while the other transaction managers become participants.The transaction managers follow 2 phase commit.
Til next time,
Sandeep Polisetty
at 00:00