Wednesday, July 20, 2022

[389-users] Retro Changelog trimming causes deadlock

Hi,

It's me again, about Retro Changelog trimming :-(. Last time it was about the maxage
configuration, for which I created an issue [1].

This time, the problem is that of a deadlock. When I have maxage set to 2d (the
default), then soon after restart the server starts to do the trimming.

Unfortunately it quickly runs into a deadlock. All accesses to the server (e.g ldapsearch)
hang forever. And because this is a replica, the other servers are complaining too.

Looking at a gdb stack trace I see the following.
$ sudo cat gdb-trace-ns-slapd-4.txt | grep -E '^(Thread|#[01] .*lock)'
Thread 41 (Thread 0x7fefa3e72700 (LWP 170190)):
#0  0x00007fef9f9b52f5 in pthread_rwlock_wrlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2750 in map_wrlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 40 (Thread 0x7feef147d700 (LWP 170184)):
Thread 39 (Thread 0x7feeef2f9700 (LWP 170178)):
Thread 38 (Thread 0x7feef1c7e700 (LWP 170171)):
Thread 37 (Thread 0x7feef247f700 (LWP 170169)):
Thread 36 (Thread 0x7feef37ff700 (LWP 170166)):
Thread 35 (Thread 0x7feef67ff700 (LWP 170165)):
Thread 34 (Thread 0x7feef75fe700 (LWP 170164)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 33 (Thread 0x7feef7dff700 (LWP 170163)):
Thread 32 (Thread 0x7feef89fe700 (LWP 170162)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 31 (Thread 0x7feef91ff700 (LWP 170161)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 30 (Thread 0x7feef9dfe700 (LWP 170160)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 29 (Thread 0x7feefa7ff700 (LWP 170159)):
Thread 28 (Thread 0x7feefb7ff700 (LWP 170158)):
Thread 27 (Thread 0x7feefc3fe700 (LWP 170157)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 26 (Thread 0x7feefcdff700 (LWP 170156)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 25 (Thread 0x7feefe1fe700 (LWP 170155)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 24 (Thread 0x7feefebff700 (LWP 170154)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 23 (Thread 0x7feeff7da700 (LWP 170153)):
Thread 22 (Thread 0x7feefffdb700 (LWP 170152)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 21 (Thread 0x7fef007dc700 (LWP 170151)):
Thread 20 (Thread 0x7fef00fdd700 (LWP 170150)):
#0  0x00007fef9f9b4ec2 in pthread_rwlock_rdlock () at target:/lib64/libpthread.so.0
#1  0x00007fef8e9f2612 in map_rdlock () at target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
Thread 19 (Thread 0x7fef02fd9700 (LWP 170148)):
Thread 18 (Thread 0x7fef037da700 (LWP 170147)):
Thread 17 (Thread 0x7fef03fdb700 (LWP 170146)):
Thread 16 (Thread 0x7fef049e3700 (LWP 170145)):
Thread 15 (Thread 0x7fef051e4700 (LWP 170144)):
Thread 14 (Thread 0x7fef059e5700 (LWP 170143)):
Thread 13 (Thread 0x7fef071ff700 (LWP 170140)):
Thread 12 (Thread 0x7fef081ff700 (LWP 170139)):
Thread 11 (Thread 0x7fef08ffe700 (LWP 170138)):
Thread 10 (Thread 0x7fef097ff700 (LWP 170137)):
Thread 9 (Thread 0x7fefa3e93700 (LWP 170136)):
Thread 8 (Thread 0x7fef0a5ff700 (LWP 170135)):
Thread 7 (Thread 0x7fef0b3ff700 (LWP 170134)):
Thread 6 (Thread 0x7fef0ca09700 (LWP 170133)):
Thread 5 (Thread 0x7fef0d20a700 (LWP 170132)):
Thread 4 (Thread 0x7fef0da0b700 (LWP 170131)):
Thread 3 (Thread 0x7fef0e20c700 (LWP 170130)):
Thread 2 (Thread 0x7fef0ea0d700 (LWP 170129)):
Thread 1 (Thread 0x7fefa3f98240 (LWP 170127)):
The version info:
389-ds-base-libs-1.4.3.28-6.module_el8.6.0+1102+fe5d910f.x86_64
389-ds-base-1.4.3.28-6.module_el8.6.0+1102+fe5d910f.x86_64
For the time being I have changed maxage to 200d, to avoid trimming, to avoid deadlock.
But in the long run it causes to changelog to grow and grow. One server has over 2GB,
another server has already more than 4GB in the changelog db.

[1] https://github.com/389ds/389-ds-base/issues/5368
--
Kees

No comments:

Post a Comment