Thursday, February 29, 2024

[389-users] Re: Determining max CSN of running server

Hi William, 

>I  don't think it's going backwards. What I'm trying to rule out is that the replica is failing to advance its max CSN in the RUV being used to compare.

Since you see CSN 4 months after the RUV, I think that your suspicion is right:
 The RUV is not updated any more.
FYI: There is a list of pending operations to ensure that the RUV is not updated while an older operation is not yet completed. And I suspect that you hit a bug about this list. I remember that we fixed something in that area a few years ago ...
As the list in memory, I think that simply restarting the server may fix the issue ...
The RUV should be updated after next change and the old changes should then get replicated (Unless the changelog get discarded when restarting in such case you will have to reinitialize the other replica ... )


On Thu, Feb 29, 2024 at 5:12 AM William Faulk <> wrote:
> Might be worth re-reading

Well, I still don't really know the details of the replication process.

I have deduced that changes originated on a replica seem to prompt that replica to start a replication process with its peers, but I don't really know what happens then. There's a comparison of the RUVs of the two replicas, but does the initiating system send its RUV to the receiver, or does it go the other way, or do both happen? Does the comparison prompt the comparing system to send the changes it thinks the other system needs, or does it cause the comparing system to request new changes from the other? Maybe none of this really makes much difference, but the lack of technical detail around this makes me just question everything.

> It doesn't send a single CSN, the replication compares the RUVs and determines the
> range of CSNs that are missing from the consumer.

Sure, but notionally any changes that originated on that replica would be reflected in the max CSN for itself in the RUV that is used to compare. And at least one side is sending its RUV to the other during the replication process.

> It's also not immediate. Between the server accepting a change (add, mod etc), the
> change is associated to a CSN. But then there may be a delay before the two nodes actually
> communicate and exchange data.

Sure, but the changes originated on this replica haven't made it to other replicas in weeks. This isn't a mere delay in replication.

> Generally you'd need replication logging (errorloglevel 8192). But it's very noisy
> and can be hard to read. What you need to see is the ranges that they agree to send.

Okay. I've done that and haven't had a chance to pore through them yet.

> Also remember CSN's are a monotonic lamport clock. This means they only ever advance
> and can never step backwards. So they have some different properties to what you may
> expect. If they ever go backwards I think the replication handler throws a pretty nasty
> error.

I don't think it's going backwards. What I'm trying to rule out is that the replica is failing to advance its max CSN in the RUV being used to compare.

> I *think* so. It's been a while since I had to look. The nsds50ruv shows the ruv of
> the server, and I think the other replica entries are "what the peers ruv was last
> time".

Well, it's at least nice to hear that my guess at least isn't asinine. :)

> replication monitoring code in newer versions does this for you, so I'd probably
> advise you attempt to upgrade your environment. 1.3 is really old at this point

I've been trying to get the current environment stable enough that I feel comfortable going through the relatively lengthy upgrade process. I think I'm going to have to adjust my comfort level.

> I'm not sure if even RH or SUSE still support that version anymore).

RedHat does, as it's what's in RHEL7.9, which is supported for another, uh, 4 months. They're working on this with me. I'm still just trying to understand the system better so that I can try to be productive while I'm waiting on them to come up with ideas.

> The problem here is that to read the RUV's and then compare them, you need to read
> each RUV from each server and then check if they are advancing (not that they are equal).

The problem is that the changes in my environment are few enough that all the replicas' RUVs _are_ equal the majority of the time. I'm not in front of that system as I respond right now, so my details might be wrong, but I'm asking about all of this because every RUV I see in all of the replicas is the same, and it shows a max CSN for this one replica that's much older than the CSNs I see it reference in the logs about changes originating on the replica. The CSNs I see in the logs when a new change is made are referencing the current time in them, while the max CSN I see in the RUVs is from 4 months ago.

Maybe it *did* go backwards somehow and that's why it's not working. Not that that would really help me understand what actually went wrong any better than I do now.

> If you want to assert that "Some change I made at CSN X is on all servers" then
> you would need to read and parse the ruv and ensure that all of them are at or past that
> CSN for that replica id.

Well, you'd think so. I've got that problem, too, where some CSNs just seem to get missed, but the max CSN in the RUV is well past that. But that's a different problem and not the one I'm working on now.

Thanks for the input.

William Faulk
389-users mailing list --
To unsubscribe send an email to
Fedora Code of Conduct:
List Guidelines:
List Archives:
Do not reply to spam, report it:


389 Directory Server Development Team

No comments:

Post a Comment