The below bug linked to this bug: https://pagure.io/389-ds-base/issue/49915
Running "top -H", I could see that a thread hit 100% CPU on the primary provider during re-initialization of the secondary provider:
top - 19:14:37 up 13 days, 5:00, 2 users, load average: 0.75, 0.39, 0.64
Threads: 406 total, 2 running, 404 sleeping, 0 stopped, 0 zombie
%Cpu(s): 29.4 us, 0.2 sy, 0.0 ni, 70.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 41027760 total, 274436 free, 26755384 used, 13997940 buff/cache
KiB Swap: 4095996 total, 4007932 free, 88064 used. 13755740 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1324 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 32:09.15 ns-slapd
1422 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 7:13.29 ns-slapd
1423 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 4:49.31 ns-slapd
1424 dirsrv 20 0 29.3g 26.4g 1.7g S 17.6 67.5 3440:05 ns-slapd
1425 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 0:38.04 ns-slapd
1427 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 0:00.00 ns-slapd
1428 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 0:13.75 ns-slapd
1429 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 1:58.45 ns-slapd
1430 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 1:53.75 ns-slapd
1431 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 1:55.51 ns-slapd
1432 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 1:56.32 ns-slapd
1433 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 1:55.66 ns-slapd
1434 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 2:11.95 ns-slapd
1435 dirsrv 20 0 29.3g 26.4g 1.7g R 99.9 67.5 97:37.64 ns-slapd
1436 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 0:00.00 ns-slapd
1437 dirsrv 20 0 29.3g 26.4g 1.7g S 0.0 67.5 0:00.00 ns-slapd
I don't want to do an db2ldif during production time since it will put the db into read-only mode.
The primary provider will do a backup at 01:00 (with -r) so I'll grab that and try to ldif2db the secondary provider with it.
Thanks,
Trev
From: "Fong, Trevor" <trevor.fong@ubc.ca>
Reply-To: "389-users@lists.fedoraproject.org" <389-users@lists.fedoraproject.org>
Date: Thursday, September 17, 2020 at 7:02 PM
To: "389-users@lists.fedoraproject.org" <389-users@lists.fedoraproject.org>
Subject: [389-users] Re: Re-initialization Failure: Bulk Import Abandoned, Thread Monitoring Returned -23
Hi Marc,
Thanks for your suggestions.
We had disabled autotuning, but even re-enabling autotuning we are seeing the same errors.
Just wondering if we are hitting this bug? https://pagure.io/389-ds-base/issue/49796
Thanks everyone!
Trev
From: Marc Sauton <msauton@redhat.com>
Reply-To: "389-users@lists.fedoraproject.org" <389-users@lists.fedoraproject.org>
Date: Thursday, September 17, 2020 at 5:23 PM
To: "389-users@lists.fedoraproject.org" <389-users@lists.fedoraproject.org>
Subject: [389-users] Re: Re-initialization Failure: Bulk Import Abandoned, Thread Monitoring Returned -23
The average rate and error code seem to point to a configuration issue.
Try to verify the import and cache settings in the console, they may be very small if the auto tuning feature is disabled.
for manual settings, the values depend on the amount of RAM available to the LDAP server, and the db, index sizes.
"small" db example of parameters and values:
dn: cn=config,cn=ldbm database,cn=plugins,cn=config
nsslapd-import-cachesize: 500000000
dn: cn=userroot,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 134217728
dn: cn=bdb,cn=config,cn=ldbm database,cn=plugins,cn=config
nsslapd-import-cache-autosize: 0
nsslapd-import-cachesize: 500000000
nsslapd-cache-autosize: 10
nsslapd-cache-autosize-split: 25
nsslapd-dbcachesize: 38314475
documentation references:
Thanks,
M.
On Thu, Sep 17, 2020 at 4:10 PM Fong, Trevor <trevor.fong@ubc.ca> wrote:
Hi Everyone,
I'm having an issue re-initializing our secondary muti-master replicated 389 DS provider node via 389-Console > replication agreement > "Initialize Consumer".
It eventually aborts the update with an error "import userRoot: Thread monitoring returned: -23"
Would anyone know how to fix it? Or what the issue may be? What we could try?
Thanks in advance,
Trev
We're running 389-Directory/1.3.10.1 B2020.167.146
Here's that the error log says on the node receiving re-initialization:
[17/Sep/2020:15:21:07.489737002 -0700] - NOTICE - NSMMReplicationPlugin - multimaster_be_state_change - Replica dc=<redacted> is going offline; disabling replication
[17/Sep/2020:15:21:07.514586270 -0700] - INFO - dblayer_instance_start - Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[17/Sep/2020:15:21:27.554375232 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.1/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:21:47.593876983 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:22:07.630801176 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:22:27.667537260 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:22:47.704917493 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:23:07.746084506 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:23:27.785902082 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:23:47.830564570 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:24:07.868457613 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:24:27.907239617 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:24:47.948025735 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:25:07.986469285 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:25:28.022970040 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:25:48.055641123 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:26:08.091707793 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries -- average rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:26:08.093192417 -0700] - INFO - import_throw_in_towel - import userRoot: Decided to end this pass because the progress rate has dropped below the 50% threshold.
[17/Sep/2020:15:26:08.094010231 -0700] - INFO - import_monitor_threads - import userRoot: Ending pass number 1 ...
[17/Sep/2020:15:26:08.195022638 -0700] - INFO - import_monitor_threads - import userRoot: Foreman is done; waiting for workers to finish...
[17/Sep/2020:15:26:08.196221021 -0700] - INFO - import_monitor_threads - import userRoot: Workers finished; cleaning up...
[17/Sep/2020:15:26:08.397372463 -0700] - INFO - import_monitor_threads - import userRoot: Workers cleaned up.
[17/Sep/2020:15:26:08.398525016 -0700] - INFO - import_sweep_after_pass - import userRoot: Sweeping files for merging later...
[17/Sep/2020:15:26:08.409694998 -0700] - INFO - dblayer_instance_start - Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[17/Sep/2020:15:26:08.411165210 -0700] - INFO - import_sweep_after_pass - import userRoot: Sweep done.
[17/Sep/2020:15:26:08.412153776 -0700] - INFO - import_main_offline - import userRoot: Beginning pass number 2
[17/Sep/2020:15:26:28.444537073 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 214748364.8/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:26:48.487800198 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 107374182.4/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:27:08.524672179 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 71582788.2/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:27:28.560669386 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 53687091.2/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:27:48.598581175 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 42949673.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:28:08.637175249 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 35791394.1/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:28:28.674637440 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 30678337.8/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:28:48.713863865 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 26843545.6/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:29:08.750649257 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 23860929.4/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:29:28.788308057 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 21474836.5/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:29:48.824483405 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 19522578.6/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:30:08.861454259 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 17895697.1/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:30:28.898478351 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 16519105.0/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:30:48.934493905 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 15339168.9/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:31:08.974176434 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 14316557.7/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:31:29.026523856 -0700] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 13379960.4/sec, recent rate 0.0/sec, hit ratio 0%
[17/Sep/2020:15:31:29.027647083 -0700] - INFO - import_throw_in_towel - import userRoot: Decided to end this pass because the progress rate has dropped below the 50% threshold.
[17/Sep/2020:15:31:29.028635146 -0700] - INFO - import_monitor_threads - import userRoot: Ending pass number 2 ...
[17/Sep/2020:15:31:29.129868576 -0700] - INFO - import_monitor_threads - import userRoot: Foreman is done; waiting for workers to finish...
[17/Sep/2020:15:31:29.131344494 -0700] - INFO - import_monitor_threads - import userRoot: Workers finished; cleaning up...
[17/Sep/2020:15:31:29.333383134 -0700] - INFO - import_monitor_threads - import userRoot: Workers cleaned up.
[17/Sep/2020:15:31:29.334756063 -0700] - INFO - import_sweep_after_pass - import userRoot: Sweeping files for merging later...
[17/Sep/2020:15:31:29.346472780 -0700] - INFO - dblayer_instance_start - Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[17/Sep/2020:15:31:29.348439407 -0700] - INFO - import_sweep_after_pass - import userRoot: Sweep done.
[17/Sep/2020:15:31:29.349300174 -0700] - INFO - import_main_offline - import userRoot: Beginning pass number 3
[17/Sep/2020:15:31:36.933419931 -0700] - ERR - factory_destructor - ERROR bulk import abandoned
[17/Sep/2020:15:31:36.967711128 -0700] - ERR - import_run_pass - import userRoot: Thread monitoring returned: -23
[17/Sep/2020:15:31:36.970482988 -0700] - ERR - import_main_offline - import userRoot: Aborting all Import threads...
[17/Sep/2020:15:31:44.285307200 -0700] - ERR - import_main_offline - import userRoot: Import threads aborted.
[17/Sep/2020:15:31:44.286444487 -0700] - INFO - import_main_offline - import userRoot: Closing files...
[17/Sep/2020:15:31:44.288138525 -0700] - ERR - import_main_offline - import userRoot: Import failed.
[17/Sep/2020:15:31:44.289342372 -0700] - ERR - process_bulk_import_op - NULL target sdn
[17/Sep/2020:15:31:44.312329369 -0700] - ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed to update replication update vector for replica dc=<redacted>: LDAP error - 1
[17/Sep/2020:15:31:47.320263729 -0700] - ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed to update replication update vector for replica dc=<redacted>: LDAP error - 1
[17/Sep/2020:15:31:50.325999803 -0700] - ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed to update replication update vector for replica dc=<redacted>: LDAP error - 1
Trevor Fong
Senior Programmer Analyst, Identity and Access Management
Cybersecurity | CISO Office
The University of British Columbia | Musqueam Traditional Territory
413 - 6356 Agricultural Road | Vancouver BC | V6T 1Z2 Canada
Phone 604 827 5247
Privacy Matters @ UBC_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
No comments:
Post a Comment