Wednesday, December 11, 2024

[389-users] Re: Help with 389 Directory Server Replication

I used the iperf program to check for any issues in data traffic between the nodes, but no abnormalities were detected.

I also adjusted the Linux limits parameters according to the tuning and performance recommendations from the 389 documentation.

Additionally, I upgraded the 389 Directory Server from version 2.4.5 to 2.5.2.

However, despite these changes, the same errors persist when starting the replication process.


On Mon, Dec 9, 2024 at 3:47 PM Pierre Rogier <progier@redhat.com> wrote:
so node2 returns a LDAP_BUSY error which is relatively rare (and usually happens just after logging the 
"Retry count exceeded"  error ...) because txn get aborted too often because of db locks conflicts
You can try to tweak the nsslapd-db-deadlock-policy but it is a bit puzzling:
    During a bulk import (i.e importing entries from another supplier), only the import is active on the backend, so     
   there should not be db lock  conflict during an import 
And I do not understand why deleting and recreating the agreements could solve such issues. 
Especially since agreement toward and from the target replica are disabled while the replication is in progress.
 The fact that they exist or not, should not change anything ...
 Unless there is a bug somewhere an a db lock is leaking. But that does not ring any bells in mind mind ...


On Mon, Dec 9, 2024 at 6:37 PM Luiz Quirino via 389-users <389-users@lists.fedoraproject.org> wrote:
Hi, Pierre,

I appreciate your assertive approach to addressing this issue.

I reviewed the logs on node01, which was sending data for replication, and noticed that at the exact moment the error was recorded on node02, node01 logged an LDAP error 51 ("Server is busy").

Given that node01 and node02 are virtual machines on the same network segment, I do not believe this issue is related to the network interface.

This leads me to focus on the minimum resource requirements or internal parameters of the 389 Directory Server as potential causes.

Currently, both nodes are configured with 4 vCPUs and 4 GB of RAM each.

I suspect that some internal parameter in the 389 DS configuration might be contributing to this issue.

Looking forward to your insights.
--
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue


--
--

389 Directory Server Development Team

Tuesday, December 10, 2024

[389-users] Can not see all entries with admin user

Hi,

I've installed 389ds on Rocky 9, I've used the cockpit wizard to create the suffix, this operation created also the ou group with some admin entries and the ou people with the user demo, it also created some aci for the admin groups over the ou people dn.

I've then created some users, the only thing that I've changed is the dn of users from cn to uid including the demo user.

Then I've choosed one of the users and I've added it to all the admin groups to use it instead of the Directory Manager.

The problem I am facing is : the admin user can see only the demo user.  When doing ldapsearch only the demo user is displayed. 

The newly created users have the same objectclasses as the demo user and it's not a filter problem. 

The demo user doesn't have a special aci, I can't see the reason for this behavior.

Regars.

A little Fedora news...

See:

https://fedoramagazine.org/fedora-project-leader-matthew-miller-a-change-of-hats/

--
Matthew Miller
<mattdm@fedoraproject.org>
Fedora Project Leader
--
_______________________________________________
announce mailing list -- announce@lists.fedoraproject.org
To unsubscribe send an email to announce-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/announce@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

Monday, December 9, 2024

[389-users] Re: Help with 389 Directory Server Replication

so node2 returns a LDAP_BUSY error which is relatively rare (and usually happens just after logging the 
"Retry count exceeded"  error ...) because txn get aborted too often because of db locks conflicts
You can try to tweak the nsslapd-db-deadlock-policy but it is a bit puzzling:
    During a bulk import (i.e importing entries from another supplier), only the import is active on the backend, so     
   there should not be db lock  conflict during an import 
And I do not understand why deleting and recreating the agreements could solve such issues. 
Especially since agreement toward and from the target replica are disabled while the replication is in progress.
 The fact that they exist or not, should not change anything ...
 Unless there is a bug somewhere an a db lock is leaking. But that does not ring any bells in mind mind ...


On Mon, Dec 9, 2024 at 6:37 PM Luiz Quirino via 389-users <389-users@lists.fedoraproject.org> wrote:
Hi, Pierre,

I appreciate your assertive approach to addressing this issue.

I reviewed the logs on node01, which was sending data for replication, and noticed that at the exact moment the error was recorded on node02, node01 logged an LDAP error 51 ("Server is busy").

Given that node01 and node02 are virtual machines on the same network segment, I do not believe this issue is related to the network interface.

This leads me to focus on the minimum resource requirements or internal parameters of the 389 Directory Server as potential causes.

Currently, both nodes are configured with 4 vCPUs and 4 GB of RAM each.

I suspect that some internal parameter in the 389 DS configuration might be contributing to this issue.

Looking forward to your insights.
--
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue


--
--

389 Directory Server Development Team

[389-users] Re: Help with 389 Directory Server Replication

Hi, Pierre,

I appreciate your assertive approach to addressing this issue.

I reviewed the logs on node01, which was sending data for replication, and noticed that at the exact moment the error was recorded on node02, node01 logged an LDAP error 51 ("Server is busy").

Given that node01 and node02 are virtual machines on the same network segment, I do not believe this issue is related to the network interface.

This leads me to focus on the minimum resource requirements or internal parameters of the 389 Directory Server as potential causes.

Currently, both nodes are configured with 4 vCPUs and 4 GB of RAM each.

I suspect that some internal parameter in the 389 DS configuration might be contributing to this issue.

Looking forward to your insights.
--
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

[389-users] Re: Help with 389 Directory Server Replication

Hi,

You are describing two different symptoms:
   - replication gets desynchronized. 
   - replication initialization fails 

We do not have much data about the first one. It could be due to lots of reasons.
About the second one:

The  bdb_bulk_import_queue returned 0 with entry ... are normal
The first error is:
[09/Dec/2024:11:32:30.390378254 -0300] - ERR - factory_destructor - ERROR bulk import abandoned

Which  happens on the server being initialized when the connection handling the replication initialization gets unexpectedly closed.
That could usually be because:
   [1] there was a problem on the supplier from which the replication get initialized.
        (Is there errors on that supplier error log or has it crashed ?)
   [2] network is unreliable (that could also explain why replication get desynchronized)

regards
  Pierre
 

On Mon, Dec 9, 2024 at 5:20 PM Luiz Gustavo Quirino via 389-users <389-users@lists.fedoraproject.org> wrote:
I'm facing issues with replication in the following scenario:

    3 Linux nodes (Rocky) running version 2.4.5 B2024.198.0000 of 389.
    Replication is configured in a ring topology:
    node01 -> node02 -> node03 -> node01.
    Password changes are made via the PWM-Project web interface.

Problem:
At some point, the synchronization between nodes is lost.
When I attempt to restart replication, the node being updated crashes the database.
For example, when initializing replication from node01 to node02, the following error occurs:
---------
[09/Dec/2024:11:32:30.382466035 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.tzv.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.387198997 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.poc.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.390378254 -0300] - ERR - factory_destructor - ERROR bulk import abandoned
[09/Dec/2024:11:32:30.557600717 -0300] - ERR - bdb_import_run_pass - import userroot: Thread monitoring returned: -23

[09/Dec/2024:11:32:30.559453847 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Aborting all Import threads...
[09/Dec/2024:11:32:36.468531612 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import threads aborted.
[09/Dec/2024:11:32:36.470641812 -0300] - INFO - bdb_public_bdb_import_main - import userroot: Closing files...
[09/Dec/2024:11:32:36.553007637 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import failed.
[09/Dec/2024:11:32:36.574692177 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_destructor - Aborting total update in progress for replicated area dc=colorado,dc=local connid=7019159
[09/Dec/2024:11:32:36.577255941 -0300] - ERR - process_bulk_import_op - NULL target sdn
[09/Dec/2024:11:32:36.579573401 -0300] - DEBUG - NSMMReplicationPlugin - replica_relinquish_exclusive_access - conn=7019159 op=-1 repl="dc=colorado,dc=local": Released replica held by locking_purl=conn=7019159 id=3
[09/Dec/2024:11:32:36.600514849 -0300] - ERR - pw_get_admin_users - Search failed for cn=GRP_SRV_PREHASHED_PASSWORD,ou=389,OU=GRUPOS,ou=colorado,dc=colorado,dc=local: error 10 - Password Policy Administrators can not be set
[09/Dec/2024:11:32:36.757883417 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoding payload...
[09/Dec/2024:11:32:36.760105387 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded protocol_oid: 2.16.840.1.113730.3.6.1
[09/Dec/2024:11:32:36.762467539 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded repl_root: dc=colorado,dc=local
[09/Dec/2024:11:32:36.765113155 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded csn: 6756ff84000001910000
[09/Dec/2024:11:32:36.767727935 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: RUV:
[09/Dec/2024:11:32:36.769205061 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replicageneration} 6748f91f000001910000
[09/Dec/2024:11:32:36.770721824 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 401 ldap://node01.ldap.colorado.br:389} 6748f921000001910000 6756ff6f000101910000 00000000
[09/Dec/2024:11:32:36.772753378 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 403 ldap://node03-ldap:389} 6748f9db000101930000 6756ff79000001930000 00000000
[09/Dec/2024:11:32:36.774289526 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 402 ldap://node02.ldap.colorado.br:389} 6748f996000101920000 6756ff34000001920000 00000000
[09/Dec/2024:11:32:36.775750926 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - Finshed decoding payload.
[09/Dec/2024:11:32:36.777404849 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_acquire_exclusive_access - conn=7019230 op=4 Acquired consumer connection extension
[09/Dec/2024:11:32:36.779856975 -0300] - DEBUG - NSMMReplicationPlugin - multisupplier_extop_StartNSDS50ReplicationRequest - conn=7019230 op=4 repl="dc=colorado,dc=local": Begin incremental protocol
[09/Dec/2024:11:32:36.781999075 -0300] - DEBUG - _csngen_adjust_local_time - gen state before 6756ff7b0001:1733754747:0:0
[09/Dec/2024:11:32:36.784626039 -0300] - DEBUG - _csngen_adjust_local_time - gen state after 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.786708353 -0300] - DEBUG - csngen_adjust_time - gen state before 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.788232997 -0300] - DEBUG - csngen_adjust_time - gen state after 6756ff840001:1733754756:0:0
[09/Dec/2024:11:32:36.790217310 -0300] - DEBUG - NSMMReplicationPlugin - replica_get_exclusive_access - conn=7019230 op=4 repl="dc=colorado,dc=local": Acquired replica
----------
To restore synchronization, I need to delete all replication configurations and recreate them. However, the issue reappears after some time.

I'd appreciate any suggestions on how to identify and resolve this issue permanently.

Thks.
--
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue


--
--

389 Directory Server Development Team

[389-users] Help with 389 Directory Server Replication

I'm facing issues with replication in the following scenario:

    3 Linux nodes (Rocky) running version 2.4.5 B2024.198.0000 of 389.
    Replication is configured in a ring topology:
    node01 -> node02 -> node03 -> node01.
    Password changes are made via the PWM-Project web interface.

Problem:
At some point, the synchronization between nodes is lost.
When I attempt to restart replication, the node being updated crashes the database.
For example, when initializing replication from node01 to node02, the following error occurs:
---------
[09/Dec/2024:11:32:30.382466035 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.tzv.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.387198997 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.poc.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.390378254 -0300] - ERR - factory_destructor - ERROR bulk import abandoned
[09/Dec/2024:11:32:30.557600717 -0300] - ERR - bdb_import_run_pass - import userroot: Thread monitoring returned: -23

[09/Dec/2024:11:32:30.559453847 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Aborting all Import threads...
[09/Dec/2024:11:32:36.468531612 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import threads aborted.
[09/Dec/2024:11:32:36.470641812 -0300] - INFO - bdb_public_bdb_import_main - import userroot: Closing files...
[09/Dec/2024:11:32:36.553007637 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import failed.
[09/Dec/2024:11:32:36.574692177 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_destructor - Aborting total update in progress for replicated area dc=colorado,dc=local connid=7019159
[09/Dec/2024:11:32:36.577255941 -0300] - ERR - process_bulk_import_op - NULL target sdn
[09/Dec/2024:11:32:36.579573401 -0300] - DEBUG - NSMMReplicationPlugin - replica_relinquish_exclusive_access - conn=7019159 op=-1 repl="dc=colorado,dc=local": Released replica held by locking_purl=conn=7019159 id=3
[09/Dec/2024:11:32:36.600514849 -0300] - ERR - pw_get_admin_users - Search failed for cn=GRP_SRV_PREHASHED_PASSWORD,ou=389,OU=GRUPOS,ou=colorado,dc=colorado,dc=local: error 10 - Password Policy Administrators can not be set
[09/Dec/2024:11:32:36.757883417 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoding payload...
[09/Dec/2024:11:32:36.760105387 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded protocol_oid: 2.16.840.1.113730.3.6.1
[09/Dec/2024:11:32:36.762467539 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded repl_root: dc=colorado,dc=local
[09/Dec/2024:11:32:36.765113155 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded csn: 6756ff84000001910000
[09/Dec/2024:11:32:36.767727935 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: RUV:
[09/Dec/2024:11:32:36.769205061 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replicageneration} 6748f91f000001910000
[09/Dec/2024:11:32:36.770721824 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 401 ldap://node01.ldap.colorado.br:389} 6748f921000001910000 6756ff6f000101910000 00000000
[09/Dec/2024:11:32:36.772753378 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 403 ldap://node03-ldap:389} 6748f9db000101930000 6756ff79000001930000 00000000
[09/Dec/2024:11:32:36.774289526 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 402 ldap://node02.ldap.colorado.br:389} 6748f996000101920000 6756ff34000001920000 00000000
[09/Dec/2024:11:32:36.775750926 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - Finshed decoding payload.
[09/Dec/2024:11:32:36.777404849 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_acquire_exclusive_access - conn=7019230 op=4 Acquired consumer connection extension
[09/Dec/2024:11:32:36.779856975 -0300] - DEBUG - NSMMReplicationPlugin - multisupplier_extop_StartNSDS50ReplicationRequest - conn=7019230 op=4 repl="dc=colorado,dc=local": Begin incremental protocol
[09/Dec/2024:11:32:36.781999075 -0300] - DEBUG - _csngen_adjust_local_time - gen state before 6756ff7b0001:1733754747:0:0
[09/Dec/2024:11:32:36.784626039 -0300] - DEBUG - _csngen_adjust_local_time - gen state after 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.786708353 -0300] - DEBUG - csngen_adjust_time - gen state before 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.788232997 -0300] - DEBUG - csngen_adjust_time - gen state after 6756ff840001:1733754756:0:0
[09/Dec/2024:11:32:36.790217310 -0300] - DEBUG - NSMMReplicationPlugin - replica_get_exclusive_access - conn=7019230 op=4 repl="dc=colorado,dc=local": Acquired replica
----------
To restore synchronization, I need to delete all replication configurations and recreate them. However, the issue reappears after some time.

I'd appreciate any suggestions on how to identify and resolve this issue permanently.

Thks.