On Wed, Nov 15, 2023, at 12:02 PM, William Faulk wrote:
> it isn't necessary to keep track of a list of CSNsIf it doesn't keep track of the CSNs, how does it know what data needs to be replicated?That is, imagine replica A, whose latest CSN is 48, talks to replica B, whose latest CSN is 40. Clearly replica A should send some data to replica B. But if it isn't keeping track of what data is associated with CSNs 41 through 48, how does it know what data to send?
I said it doesn't track a _list_. It has the changes originating from each node, including itself, ordered by CSN, in the changelog. It asks peer servers it connects to what CSN they have seen, and sends the difference if any. Basically a reliable, in-order message delivery mechanism.
> by asking the other node for its current ruv> can determine which if any of the changes it has need to be propagated to the peer.In addition, the CSNs are apparently a timestamp and replica ID. So imagine a simple ring topology of replicas, A-B-C-D-E-(A), all in sync. Now imagine simultaneous changes on replicas A and C. C has a new CSN of, say, 100C, and it replicates that to B and D. At the same time, A replicates its new CSN of 100A to B and E. Now E has a new CSN. Is it100A or 101E?
The CSNs have the property of globally order, meaning you can always compare two (e.g. 100A and 101E in your example) and come to a consistent conclusion about which is "after". All servers pick the one that's "after" as the eventual state of the entry (hence: eventually consistent). Note this is in the context of order theory, not the same as the time of day -- you don't have a guarantee that updates are ordered by wall clock time. You might have to look at the code to determine exactly how order is calculated -- it's usually done by comparing the time stamp first then doing a lexical compare on the node id in the case of a tie. Since node ids are unique this provides a consistent global order.
If E's new max CSN is 100A, then when it checks with D, D has a latest CSN of 100C, which is greater than 100A, so the algorithm would seem to imply that there's nothing to replicate and the change that started at A doesn't get replicated to D.
True, but iirc it doesn't work that way -- the code that propagates changes to another server is only concerned with sending changes the other server hasn't seen. It doesn't consider whether any of those changes might be superseded by other changes sent from other servers. At least that's the way it worked last time I was in this code. Might be different now.
If E's max CSN is 101E, then, when D checks in with its 101D, it thinks it doesn't have anything to send. I suppose in this scenario that the data would get there coming from the other direction. But if E's max CSN is 101E, eventually it's going to check in with A, which has a max CSN of 100A, so it would think that it needed to replicate that same data back to A, but it's already there. This is an obvious infinite loop.
No because see above the propagation scheme doesn't consider the vector timestamp (ruv), only the individual per-node timestamps (csn). Once a given change originating at some particular server has arrived at a server, no peer will send it again. You might have a race, but there is locking to handle that.
I'm certain I'm missing something or misunderstanding something, but I don't understand what, and these details are what I'm trying to unravel.
Understood. I've been through the same process many years ago, mainly by debugging/fixing the code and watching packet traces and logs.
No comments:
Post a Comment