Tuesday, July 29, 2025

[389-users] Re: [Extern] Re: Re: Error message "Can't locate CSN in the changelog", allthough CSN is present in the changelog

Just as a heads-up: I am also experiencing the same behaviour in version
3.1.2

Kind regards
Julian

Am 11.07.25 um 13:09 schrieb Julian Kippels via 389-users:
> Hi,
>
> we are experiencing the exact same error on version 2.4.5.
>
> [11/Jul/2025:12:05:59.340776929 +0200] - ERR - agmt="cn=replication-
> agreement-ldap-consumer-1-test" (ldap-consumer-1-test:636) -
> clcache_load_buffer - Can't locate CSN 6870e207000100010000 in the
> changelog (DB rc=-12797). If replication stops, the consumer may need to
> be reinitialized.
> [11/Jul/2025:12:05:59.349374401 +0200] - ERR - agmt="cn=replication-
> agreement-ldap-consumer-2-test" (ldap-consumer-2-test:636) -
> clcache_load_buffer - Can't locate CSN 6870e207000200010000 in the
> changelog (DB rc=-12797). If replication stops, the consumer may need to
> be reinitialized.
>
> I have seen the error msg "clcache_load_buffer - Can't load changelog
> buffer starting at CSN ..." once, but that was when I was experimenting
> with using nsds5ReplicaIgnoreMissingChange=once. But since this produced
> even worse results than not having set that attribute I have reverted to
> not having it set.
>
> Unfortunately I am not able to run the dbscan command, because we are
> using mdb and not bdb and "dbscan -D mdb ..." produces the error message
> "Can't initialize db plugin: mdb"
>
> As for reproducing: It seems that it happenes when a sufficiently large
> volume of changes is made in a single connection. I have seen it with as
> low as 40 operations per connection.
>
> I hope this helps in debugging.
>
> Kind regards
> Julian Kippels
>
> Am 11.07.25 um 11:47 schrieb Thierry Bordaz via 389-users:
>> Hi,
>>
>> This is interesting finding and something we have not seen so far.
>> Did you also see some logs like "lcache_load_buffer - Can't load
>> changelog buffer starting at CSN...' ?
>> It is logged when a replication agreement is preparing an iterator and
>> can not locate the starting point (csn) in the replication changelog.
>>
>> Did you dump the changelog with dbscan ?
>> For a given missing CSN (from logs), are you able to retrieve it with:
>> dbscan -f /var/lib/dirsrv/slapd-instance/db/<backend_name>/
>> replication_changelog.db -k <csn>
>>
>> I was not able to reproduce on 2.6.1-6. Did you identify a
>> reproducible testcase ? (size of the DB, number of update, How long
>> did it happen after the topology was setup...)
>>
>>
>> best regards
>> thierry
>>
>> On 7/11/25 8:30 AM, Fl Sch via 389-users wrote:
>>> Hello,
>>>
>>> we have recently upgraded our 389-ds setup to version 2.6.1 running
>>> on AlmaLinux 9.6 (installed from the official AlmaLinux appstream
>>> repo). Or upgrade approach was to build a completely new setup,
>>> import all the data and afterwards switch the IP addresses of the old
>>> and new servers.
>>> Our setup consists of 2 suppliers (mdir01 + mdir02) and 2 consumers
>>> (sdir01 + sdir02). Both suppliers each have a replication agreement
>>> between each other, aswell as agreements to both consumers. Our
>>> provisioning system is designed to only write changes to mdir01, it
>>> just uses mdir02 in case it can't reach mdir01.
>>> Our clients (DHCP servers) use all 4 directory servers.
>>>
>>> Since the upgrade we have that problem that we observe the following
>>> messages in the error logs of both suppliers:
>>> <code>
>>> [10/Jul/2025:09:58:55.479913820 +0200] - ERR - agmt="cn=agreement-
>>> mdir01-to-sdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686f72bf00050f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> [10/Jul/2025:09:58:55.484629809 +0200] - ERR - agmt="cn=agreement-
>>> mdir01-to-mdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686f72bf00050f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> [10/Jul/2025:09:58:55.484868342 +0200] - ERR - agmt="cn=agreement-
>>> mdir01-to-sdir01" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686f72bf00050f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> [10/Jul/2025:10:01:07.738009372 +0200] - ERR - agmt="cn=agreement-
>>> mdir01-to-sdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686f734300000f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> [10/Jul/2025:10:01:07.741023198 +0200] - ERR - agmt="cn=agreement-
>>> mdir01-to-mdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686f734300000f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>>
>>> [09/Jul/2025:12:18:14.429187195 +0200] - ERR - agmt="cn=agreement-
>>> mdir02-to-sdir01" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686eb26600000f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> [09/Jul/2025:12:18:14.430628311 +0200] - ERR - agmt="cn=agreement-
>>> mdir02-to-sdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686eb26600000f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> [09/Jul/2025:12:18:16.909625172 +0200] - ERR - agmt="cn=agreement-
>>> mdir02-to-sdir01" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686eb26800000f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> [09/Jul/2025:12:18:16.913147068 +0200] - ERR - agmt="cn=agreement-
>>> mdir02-to-sdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686eb26800000f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> [09/Jul/2025:12:42:30.255121122 +0200] - ERR - agmt="cn=agreement-
>>> mdir02-to-sdir01" (10:1389) - clcache_load_buffer - Can't locate CSN
>>> 686f60d500010f4b0000 in the changelog (DB rc=-12797). If replication
>>> stops, the consumer may need to be reinitialized.
>>> </code>
>>>
>>> On mdir01 these messages appear on average every 5 minutes during
>>> peak hours.
>>> On mdir02 much more infrequently, on average every 15-20 minutes.
>>>
>>> However, looking through the changelog I can find all the CSNs which
>>> it apparently can't locate:
>>> <code>
>>> changetype: delete
>>> replgen: 62b5bf320000010f0000
>>> csn: 686f72bf00050f4b0000
>>> nsuniqueid: da0de304-593411f0-aef0dc83-b57f4cfd
>>> dn:
>>> ClientIdentifier=00:00:00:00:d9:05,ou=dhcpldap,o=customer,dc=domain,dc=net
>>>
>>> changetype: delete
>>> replgen: 62b5bf320000010f0000
>>> csn: 686f734300000f4b0000
>>> nsuniqueid: 4c315e08-5d6111f0-aef0dc83-b57f4cfd
>>> dn:
>>> ClientIdentifier=00:00:00:00:c6:f4,ou=dhcpldap,o=customer,dc=domain,dc=net
>>>
>>> changetype: delete
>>> replgen: 62b5bf320000010f0000
>>> csn: 686eb26600000f4b0000
>>> nsuniqueid: 54f4e05b-04a311f0-9233a0e8-dc56aea6
>>> dn:
>>> ClientIdentifier=00:00:00:00:26:99,ou=dhcpldap,o=customer,dc=domain,dc=net
>>>
>>> changetype: delete
>>> replgen: 62b5bf320000010f0000
>>> csn: 686eb26800000f4b0000
>>> nsuniqueid: 463ba84b-0af711f0-a597a0e8-dc56aea6
>>> dn:
>>> ClientIdentifier=00:00:00:00:f7:a0,ou=dhcpldap,o=customer,dc=domain,dc=net
>>>
>>> changetype: add
>>> replgen: 62b5bf320000010f0000
>>> csn: 686f60d500010f4b0000
>>> nsuniqueid: e9d45f81-5d5811f0-aef0dc83-b57f4cfd
>>> parentuniqueid: 6167af03-f3c311ec-862ceac3-35201d04
>>> dn:
>>> ClientIdentifier=00:00:00:00:16:68,ou=dhcpldap,o=customer,dc=domain,dc=net
>>> change:: ...
>>> </code>
>>>
>>> All the changes are populated to all directory servers in the
>>> cluster. So there is no real problem visible.
>>> In general, we have not seen any problems with replication
>>> whatsoever, we just have these seemingly "false" messages in the
>>> error log.
>>>
>>> Changelog trim is currently set to the following values:
>>> <code>
>>> nsslapd-changelogmaxage: 30d
>>> nsslapd-changelogtrim-interval: 3600
>>> </code>
>>>
>>> Does anybody know why these error messages appear? And if / how we
>>> can get rid of them?
>>> I just want to make sure that there is really no underlying issue
>>> somewhere. And if those messages really falsely appear, I would like
>>> to get rid of them if possible to avoid confusion and to stop
>>> spamming the error logs.
>>>
>>>
>>> Thank you very much in advance.
>>
>

No comments:

Post a Comment