Friday, July 11, 2025

[389-users] Re: [Extern] Re: Error message "Can't locate CSN in the changelog", allthough CSN is present in the changelog

Hi,

we are experiencing the exact same error on version 2.4.5.

[11/Jul/2025:12:05:59.340776929 +0200] - ERR -
agmt="cn=replication-agreement-ldap-consumer-1-test"
(ldap-consumer-1-test:636) - clcache_load_buffer - Can't locate CSN
6870e207000100010000 in the changelog (DB rc=-12797). If replication
stops, the consumer may need to be reinitialized.
[11/Jul/2025:12:05:59.349374401 +0200] - ERR -
agmt="cn=replication-agreement-ldap-consumer-2-test"
(ldap-consumer-2-test:636) - clcache_load_buffer - Can't locate CSN
6870e207000200010000 in the changelog (DB rc=-12797). If replication
stops, the consumer may need to be reinitialized.

I have seen the error msg "clcache_load_buffer - Can't load changelog
buffer starting at CSN ..." once, but that was when I was experimenting
with using nsds5ReplicaIgnoreMissingChange=once. But since this produced
even worse results than not having set that attribute I have reverted to
not having it set.

Unfortunately I am not able to run the dbscan command, because we are
using mdb and not bdb and "dbscan -D mdb ..." produces the error message
"Can't initialize db plugin: mdb"

As for reproducing: It seems that it happenes when a sufficiently large
volume of changes is made in a single connection. I have seen it with as
low as 40 operations per connection.

I hope this helps in debugging.

Kind regards
Julian Kippels

Am 11.07.25 um 11:47 schrieb Thierry Bordaz via 389-users:
> Hi,
>
> This is interesting finding and something we have not seen so far.
> Did you also see some logs like "lcache_load_buffer - Can't load
> changelog buffer starting at CSN...' ?
> It is logged when a replication agreement is preparing an iterator and
> can not locate the starting point (csn) in the replication changelog.
>
> Did you dump the changelog with dbscan ?
> For a given missing CSN (from logs), are you able to retrieve it with:
> dbscan -f /var/lib/dirsrv/slapd-instance/db/<backend_name>/
> replication_changelog.db -k <csn>
>
> I was not able to reproduce on 2.6.1-6. Did you identify a reproducible
> testcase ? (size of the DB, number of update, How long did it happen
> after the topology was setup...)
>
>
> best regards
> thierry
>
> On 7/11/25 8:30 AM, Fl Sch via 389-users wrote:
>> Hello,
>>
>> we have recently upgraded our 389-ds setup to version 2.6.1 running on
>> AlmaLinux 9.6 (installed from the official AlmaLinux appstream repo).
>> Or upgrade approach was to build a completely new setup, import all
>> the data and afterwards switch the IP addresses of the old and new
>> servers.
>> Our setup consists of 2 suppliers (mdir01 + mdir02) and 2 consumers
>> (sdir01 + sdir02). Both suppliers each have a replication agreement
>> between each other, aswell as agreements to both consumers. Our
>> provisioning system is designed to only write changes to mdir01, it
>> just uses mdir02 in case it can't reach mdir01.
>> Our clients (DHCP servers) use all 4 directory servers.
>>
>> Since the upgrade we have that problem that we observe the following
>> messages in the error logs of both suppliers:
>> <code>
>> [10/Jul/2025:09:58:55.479913820 +0200] - ERR - agmt="cn=agreement-
>> mdir01-to-sdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686f72bf00050f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> [10/Jul/2025:09:58:55.484629809 +0200] - ERR - agmt="cn=agreement-
>> mdir01-to-mdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686f72bf00050f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> [10/Jul/2025:09:58:55.484868342 +0200] - ERR - agmt="cn=agreement-
>> mdir01-to-sdir01" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686f72bf00050f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> [10/Jul/2025:10:01:07.738009372 +0200] - ERR - agmt="cn=agreement-
>> mdir01-to-sdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686f734300000f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> [10/Jul/2025:10:01:07.741023198 +0200] - ERR - agmt="cn=agreement-
>> mdir01-to-mdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686f734300000f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>>
>> [09/Jul/2025:12:18:14.429187195 +0200] - ERR - agmt="cn=agreement-
>> mdir02-to-sdir01" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686eb26600000f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> [09/Jul/2025:12:18:14.430628311 +0200] - ERR - agmt="cn=agreement-
>> mdir02-to-sdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686eb26600000f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> [09/Jul/2025:12:18:16.909625172 +0200] - ERR - agmt="cn=agreement-
>> mdir02-to-sdir01" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686eb26800000f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> [09/Jul/2025:12:18:16.913147068 +0200] - ERR - agmt="cn=agreement-
>> mdir02-to-sdir02" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686eb26800000f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> [09/Jul/2025:12:42:30.255121122 +0200] - ERR - agmt="cn=agreement-
>> mdir02-to-sdir01" (10:1389) - clcache_load_buffer - Can't locate CSN
>> 686f60d500010f4b0000 in the changelog (DB rc=-12797). If replication
>> stops, the consumer may need to be reinitialized.
>> </code>
>>
>> On mdir01 these messages appear on average every 5 minutes during peak
>> hours.
>> On mdir02 much more infrequently, on average every 15-20 minutes.
>>
>> However, looking through the changelog I can find all the CSNs which
>> it apparently can't locate:
>> <code>
>> changetype: delete
>> replgen: 62b5bf320000010f0000
>> csn: 686f72bf00050f4b0000
>> nsuniqueid: da0de304-593411f0-aef0dc83-b57f4cfd
>> dn:
>> ClientIdentifier=00:00:00:00:d9:05,ou=dhcpldap,o=customer,dc=domain,dc=net
>>
>> changetype: delete
>> replgen: 62b5bf320000010f0000
>> csn: 686f734300000f4b0000
>> nsuniqueid: 4c315e08-5d6111f0-aef0dc83-b57f4cfd
>> dn:
>> ClientIdentifier=00:00:00:00:c6:f4,ou=dhcpldap,o=customer,dc=domain,dc=net
>>
>> changetype: delete
>> replgen: 62b5bf320000010f0000
>> csn: 686eb26600000f4b0000
>> nsuniqueid: 54f4e05b-04a311f0-9233a0e8-dc56aea6
>> dn:
>> ClientIdentifier=00:00:00:00:26:99,ou=dhcpldap,o=customer,dc=domain,dc=net
>>
>> changetype: delete
>> replgen: 62b5bf320000010f0000
>> csn: 686eb26800000f4b0000
>> nsuniqueid: 463ba84b-0af711f0-a597a0e8-dc56aea6
>> dn:
>> ClientIdentifier=00:00:00:00:f7:a0,ou=dhcpldap,o=customer,dc=domain,dc=net
>>
>> changetype: add
>> replgen: 62b5bf320000010f0000
>> csn: 686f60d500010f4b0000
>> nsuniqueid: e9d45f81-5d5811f0-aef0dc83-b57f4cfd
>> parentuniqueid: 6167af03-f3c311ec-862ceac3-35201d04
>> dn:
>> ClientIdentifier=00:00:00:00:16:68,ou=dhcpldap,o=customer,dc=domain,dc=net
>> change:: ...
>> </code>
>>
>> All the changes are populated to all directory servers in the cluster.
>> So there is no real problem visible.
>> In general, we have not seen any problems with replication whatsoever,
>> we just have these seemingly "false" messages in the error log.
>>
>> Changelog trim is currently set to the following values:
>> <code>
>> nsslapd-changelogmaxage: 30d
>> nsslapd-changelogtrim-interval: 3600
>> </code>
>>
>> Does anybody know why these error messages appear? And if / how we can
>> get rid of them?
>> I just want to make sure that there is really no underlying issue
>> somewhere. And if those messages really falsely appear, I would like
>> to get rid of them if possible to avoid confusion and to stop spamming
>> the error logs.
>>
>>
>> Thank you very much in advance.
>

No comments:

Post a Comment