Monday, September 6, 2021

[389-users] Re: Enabling retro changelog maxage with 3 million entries make dirsrv not respond anymore

On 9/6/21 3:40 PM, Kees Bakker wrote:
On 06-09-2021 14:34, Thierry Bordaz wrote:
On 9/6/21 1:55 PM, Kees Bakker wrote:

First a bit of context.

CentOS 7, FreeIPA

A long time ago I was experiencing a deadlock during retro changelog cleanup
and I was advised to disable it as a workaround. Disabling was done by setting
nsslapd-changelogmaxage to -1. SInce then the number of entries grew to
about 3 million.

Last week I enabled maxage again. I set it to 470 days. I was hoping to limit
this pile of old changelog entries., starting by cleaning very old entries.

However, what I noticed is that it was removing entries with a pace of 16 entries
per second. Meanwhile the server was doing nothing. Server load was very low.

The real problem is that dirsrv (LDAP) is not responding to any requests anymore. I
had to disable maxage again, which requires patience restarting the server when
it is not responding ;-)

Now my questions
1) is it normal dat removing repo changelog entries is slooow?
2) why is dirsrv not responding anymore when the cleanup kicks in?
3) are there alternatives to cleanup the old repo changelog entries?


When the server is not responsive, can it process searches like

ldapsearch -b "" -s base ?

ldapsearch -D 'cn=direcrtory manager' -W -b "cn=config" -s base

or ldapsearch D 'cn=direcrtory manager' -W -b "cn=monitor" ?

I'll have to do this when I get a new chance. This LDAP server is
hard coded in several other services, even though we have replica's.
These services will be hanging when I do this.

One thing I can say is that the following command was hanging.

ldap -H ldaps:// -b cn=config
Interesting, I was "guessing" db update+checkpointing+compact being responsible of some temporary slowdown.  cn=config backend (in memory) being also frozen, the thing I can imagine is that others SRCH requests, that go to the db, were frozen by update+checkpoint+compact and there was no more workers to process request that are non database related.

Regarding the low rate of trimming, how did you monitor it ? Are you using internal op logging, plugin log level or something else ?

Just a rough estimate. After 15 minutes I had to disable maxage again.
Before and after I looked at the oldest entry. That way I saw it removed
about 15000 entries.

So you were able to do online update of maxage. So the server were (very) slow but still processing ?

Is there any particular logging you can recommend?
I was concerned that you enable debug level, to monitor the trimming, that would be very noisy.

When the server is not responsive, does it consum CPU ? Could you collect 'top -H -p `pidof ns-slapd` -b' and some pstack ?

As I said above, I'll have to pick the right moment to do this again.
Last time I got a lot of complaints from the users. :-(

Yes, this can be done during calm period.


-- Kees


_______________________________________________  389-users mailing list --  To unsubscribe send an email to  Fedora Code of Conduct:  List Guidelines:  List Archives:  Do not reply to spam on the list, report it:  

_______________________________________________  389-users mailing list --  To unsubscribe send an email to  Fedora Code of Conduct:  List Guidelines:  List Archives:  Do not reply to spam on the list, report it:  

No comments:

Post a Comment