Spanner - Sandeep Polisetty Journal

Spanner

Key contributions:

Multi versioned data used for lock free reads
True time used for snapshoting

Consistency semantics:

Externally consistent(If t1 commits before t2 => TS(t1) < TS(t2) t1 is committed only after t1.after() is true on all machines So a request which originates after this commit on any machine has a timestamp greater than previous operation
All operations have timestamp.
Lock free reads
snapshots

Engineering details

Granularity of data = directory (All data in a directory has the same replication configuration)
Group of directory -> Spanner Tablet Tablet maintains Key of < id,timestamp> Paxos instance for every tablet
Span server contains tablets.
Location proxies contain map<id,span server>
Zones are units of physical isolation contain<Location servers,Span servers>
Placement driver to move around data.
Provides a data model for a relational databse

TrueTime

Every datacenter has atomic clocks and GPS clocks.
Machines maintain time by contacting a subset of this.
TT.now() -> [earliest,latest]
TT.after(t) -> if t is definately passed
TT.before(t) -> if t is definately not passed

Concurrency

Implements long lived leaders with leases. Lease extensions are piggybacked on write msgs
INVARIANT: The leaders lease interval is disjoint. group members do not vote for new leaders until lease_interval.after()
Transactions are timestamped.
Co-ordinator uses 2 phase commit . Chooses a timestamp greater than all the timestamps received in the prepare phase
COMMIT_WAIT . co-ordinator choses a timestamp and waits until tt.after() to commit and unlock the transaction. If a client reads a transaction, its current time will be > than timestamp of transaction [Can we save this time if clocks are forwarded to the commited time without waiting.]
Reads are satisfied if the read t_s < t_last_seen_write
Snapshot reads are implemented when if a replica is sufficiently update.

Differences with gigapaxos

Granularity of data.
Extensibility to cases beyond databases

Still working on this ?? Guarantees of gigapaxos vs spanner

Table1<M1,M2,M3> and Table2<M4,M2,M3>. If Ts_1(M2)<Ts_2(M2) and Ts_1(M3)>Ts_2(M3).This can happen in gigapaxos. Without loss of generality let assume that t1 acquired the lock. If none acquired both the transactions are aborted. Prepare -> Ts_1_M1,Ts_1_M2 Commit -> max(Ts_1_M1,Ts_1_M2) if max is Ts_1_M1. The commit message would have Ts_1_M1. Clocks increase therefore Ts_1_M1<Ts_2_M1 But Ts_1_M2<Ts_1_M1 So it is possible for the next commit to lie in between [Ts_1_M2<Ts_1_M1] for some unrelated transaction in table2 as the locks are held only till table2. This is avoided by waiting till Ts_1_M1.after() at which point it is guaranteed that all clocks are > Ts_1_M1. And all their votes there after will have a higher vote.

Ideas related to the problem being solved.

Each leader , a transaction manager is implemented. One of the transaction manager becomes the co-ordinator leader and the group members become slaves while the other transaction managers become participants.The transaction managers follow 2 phase commit.