Monday, December 9, 2024

[389-users] Help with 389 Directory Server Replication

I'm facing issues with replication in the following scenario:

    3 Linux nodes (Rocky) running version 2.4.5 B2024.198.0000 of 389.
    Replication is configured in a ring topology:
    node01 -> node02 -> node03 -> node01.
    Password changes are made via the PWM-Project web interface.

Problem:
At some point, the synchronization between nodes is lost.
When I attempt to restart replication, the node being updated crashes the database.
For example, when initializing replication from node01 to node02, the following error occurs:
---------
[09/Dec/2024:11:32:30.382466035 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.tzv.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.387198997 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.poc.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.390378254 -0300] - ERR - factory_destructor - ERROR bulk import abandoned
[09/Dec/2024:11:32:30.557600717 -0300] - ERR - bdb_import_run_pass - import userroot: Thread monitoring returned: -23

[09/Dec/2024:11:32:30.559453847 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Aborting all Import threads...
[09/Dec/2024:11:32:36.468531612 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import threads aborted.
[09/Dec/2024:11:32:36.470641812 -0300] - INFO - bdb_public_bdb_import_main - import userroot: Closing files...
[09/Dec/2024:11:32:36.553007637 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import failed.
[09/Dec/2024:11:32:36.574692177 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_destructor - Aborting total update in progress for replicated area dc=colorado,dc=local connid=7019159
[09/Dec/2024:11:32:36.577255941 -0300] - ERR - process_bulk_import_op - NULL target sdn
[09/Dec/2024:11:32:36.579573401 -0300] - DEBUG - NSMMReplicationPlugin - replica_relinquish_exclusive_access - conn=7019159 op=-1 repl="dc=colorado,dc=local": Released replica held by locking_purl=conn=7019159 id=3
[09/Dec/2024:11:32:36.600514849 -0300] - ERR - pw_get_admin_users - Search failed for cn=GRP_SRV_PREHASHED_PASSWORD,ou=389,OU=GRUPOS,ou=colorado,dc=colorado,dc=local: error 10 - Password Policy Administrators can not be set
[09/Dec/2024:11:32:36.757883417 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoding payload...
[09/Dec/2024:11:32:36.760105387 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded protocol_oid: 2.16.840.1.113730.3.6.1
[09/Dec/2024:11:32:36.762467539 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded repl_root: dc=colorado,dc=local
[09/Dec/2024:11:32:36.765113155 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded csn: 6756ff84000001910000
[09/Dec/2024:11:32:36.767727935 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: RUV:
[09/Dec/2024:11:32:36.769205061 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replicageneration} 6748f91f000001910000
[09/Dec/2024:11:32:36.770721824 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 401 ldap://node01.ldap.colorado.br:389} 6748f921000001910000 6756ff6f000101910000 00000000
[09/Dec/2024:11:32:36.772753378 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 403 ldap://node03-ldap:389} 6748f9db000101930000 6756ff79000001930000 00000000
[09/Dec/2024:11:32:36.774289526 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 402 ldap://node02.ldap.colorado.br:389} 6748f996000101920000 6756ff34000001920000 00000000
[09/Dec/2024:11:32:36.775750926 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - Finshed decoding payload.
[09/Dec/2024:11:32:36.777404849 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_acquire_exclusive_access - conn=7019230 op=4 Acquired consumer connection extension
[09/Dec/2024:11:32:36.779856975 -0300] - DEBUG - NSMMReplicationPlugin - multisupplier_extop_StartNSDS50ReplicationRequest - conn=7019230 op=4 repl="dc=colorado,dc=local": Begin incremental protocol
[09/Dec/2024:11:32:36.781999075 -0300] - DEBUG - _csngen_adjust_local_time - gen state before 6756ff7b0001:1733754747:0:0
[09/Dec/2024:11:32:36.784626039 -0300] - DEBUG - _csngen_adjust_local_time - gen state after 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.786708353 -0300] - DEBUG - csngen_adjust_time - gen state before 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.788232997 -0300] - DEBUG - csngen_adjust_time - gen state after 6756ff840001:1733754756:0:0
[09/Dec/2024:11:32:36.790217310 -0300] - DEBUG - NSMMReplicationPlugin - replica_get_exclusive_access - conn=7019230 op=4 repl="dc=colorado,dc=local": Acquired replica
----------
To restore synchronization, I need to delete all replication configurations and recreate them. However, the issue reappears after some time.

I'd appreciate any suggestions on how to identify and resolve this issue permanently.

Thks.

No comments:

Post a Comment