18 Oct 2017

Spanner

Spanner

Key contributions:

  1. Multi versioned data used for lock free reads
  2. True time used for snapshoting

Consistency semantics:

  1. Externally consistent(If t1 commits before t2 => TS(t1) < TS(t2) t1 is committed only after t1.after() is true on all machines So a request which originates after this commit on any machine has a timestamp greater than previous operation
  2. All operations have timestamp.
  3. Lock free reads
  4. snapshots

Engineering details

  1. Granularity of data = directory (All data in a directory has the same replication configuration)
  2. Group of directory -> Spanner Tablet Tablet maintains Key of < id,timestamp> Paxos instance for every tablet
  3. Span server contains tablets.
  4. Location proxies contain map<id,span server>
  5. Zones are units of physical isolation contain<Location servers,Span servers>
  6. Placement driver to move around data.
  7. Provides a data model for a relational databse

TrueTime

  1. Every datacenter has atomic clocks and GPS clocks.
  2. Machines maintain time by contacting a subset of this.
  3. TT.now() -> [earliest,latest]
  4. TT.after(t) -> if t is definately passed
  5. TT.before(t) -> if t is definately not passed

Concurrency

  1. Implements long lived leaders with leases. Lease extensions are piggybacked on write msgs
  2. INVARIANT: The leaders lease interval is disjoint. group members do not vote for new leaders until lease_interval.after()
  3. Transactions are timestamped.
  4. Co-ordinator uses 2 phase commit . Chooses a timestamp greater than all the timestamps received in the prepare phase
  5. COMMIT_WAIT . co-ordinator choses a timestamp and waits until tt.after() to commit and unlock the transaction. If a client reads a transaction, its current time will be > than timestamp of transaction [Can we save this time if clocks are forwarded to the commited time without waiting.]
  6. Reads are satisfied if the read t_s < t_last_seen_write
  7. Snapshot reads are implemented when if a replica is sufficiently update.

Differences with gigapaxos

  1. Granularity of data.
  2. Extensibility to cases beyond databases

Still working on this ?? Guarantees of gigapaxos vs spanner

Table1<M1,M2,M3> and Table2<M4,M2,M3>. If Ts_1(M2)<Ts_2(M2) and Ts_1(M3)>Ts_2(M3).This can happen in gigapaxos. Without loss of generality let assume that t1 acquired the lock. If none acquired both the transactions are aborted. Prepare -> Ts_1_M1,Ts_1_M2 Commit -> max(Ts_1_M1,Ts_1_M2) if max is Ts_1_M1. The commit message would have Ts_1_M1. Clocks increase therefore Ts_1_M1<Ts_2_M1 But Ts_1_M2<Ts_1_M1 So it is possible for the next commit to lie in between [Ts_1_M2<Ts_1_M1] for some unrelated transaction in table2 as the locks are held only till table2. This is avoided by waiting till Ts_1_M1.after() at which point it is guaranteed that all clocks are > Ts_1_M1. And all their votes there after will have a higher vote.

Ideas related to the problem being solved.

  1. Each leader , a transaction manager is implemented. One of the transaction manager becomes the co-ordinator leader and the group members become slaves while the other transaction managers become participants.The transaction managers follow 2 phase commit.

Til next time,
Sandeep Polisetty at 00:00

scribble