Monday, September 6, 2021

[389-users] Re: Enabling retro changelog maxage with 3 million entries make dirsrv not respond anymore

On 06-09-2021 16:16, Thierry Bordaz wrote:
On 9/6/21 3:40 PM, Kees Bakker wrote:
On 06-09-2021 14:34, Thierry Bordaz wrote:
On 9/6/21 1:55 PM, Kees Bakker wrote:
Hi,

First a bit of context.

CentOS 7, FreeIPA
389-ds-base-snmp-1.3.9.1-13.el7_7.x86_64
389-ds-base-libs-1.3.9.1-13.el7_7.x86_64
389-ds-base-1.3.9.1-13.el7_7.x86_64

A long time ago I was experiencing a deadlock during retro changelog cleanup
and I was advised to disable it as a workaround. Disabling was done by setting
nsslapd-changelogmaxage to -1. SInce then the number of entries grew to
about 3 million.

Last week I enabled maxage again. I set it to 470 days. I was hoping to limit
this pile of old changelog entries., starting by cleaning very old entries.

However, what I noticed is that it was removing entries with a pace of 16 entries
per second. Meanwhile the server was doing nothing. Server load was very low.

The real problem is that dirsrv (LDAP) is not responding to any requests anymore. I
had to disable maxage again, which requires patience restarting the server when
it is not responding ;-)

Now my questions
1) is it normal dat removing repo changelog entries is slooow?
2) why is dirsrv not responding anymore when the cleanup kicks in?
3) are there alternatives to cleanup the old repo changelog entries?

Hi,

When the server is not responsive, can it process searches like

ldapsearch -b "" -s base ?

ldapsearch -D 'cn=direcrtory manager' -W -b "cn=config" -s base

or ldapsearch D 'cn=direcrtory manager' -W -b "cn=monitor" ?

I'll have to do this when I get a new chance. This LDAP server is
hard coded in several other services, even though we have replica's.
These services will be hanging when I do this.

One thing I can say is that the following command was hanging.

ldap -H ldaps://rotte.example.com -b cn=config
Interesting, I was "guessing" db update+checkpointing+compact being responsible of some temporary slowdown.  cn=config backend (in memory) being also frozen, the thing I can imagine is that others SRCH requests, that go to the db, were frozen by update+checkpoint+compact and there was no more workers to process request that are non database related.

Regarding the low rate of trimming, how did you monitor it ? Are you using internal op logging, plugin log level or something else ?

Just a rough estimate. After 15 minutes I had to disable maxage again.
Before and after I looked at the oldest entry. That way I saw it removed
about 15000 entries.


So you were able to do online update of maxage. So the server were (very) slow but still processing ?

No. I started with maxage disabled. As we did since May last year.

Then I did use ldapmodify
restarted dirsrv
First minute or so the system is still responding
Then trimming kicks in and dirsrv does not respond anymore.
However, dirsrv is still running but uses almost no resources
Next, I restart dirsrv which takes 10 - 15 minutes (an exercise in patience)
Immediately after restart I run ldapmodify again with maxage -1
After that again restart dirsrv

Now dirsrv behaves "normal", except it does not do any trimming.


Is there any particular logging you can recommend?
I was concerned that you enable debug level, to monitor the trimming, that would be very noisy.

When the server is not responsive, does it consum CPU ? Could you collect 'top -H -p `pidof ns-slapd` -b' and some pstack ?

As I said above, I'll have to pick the right moment to do this again.
Last time I got a lot of complaints from the users. :-(


Yes, this can be done during calm period.

I wish people stopped working in the weekend :-)

thanks
thierry

-- Kees

thanks
thierry


_______________________________________________  389-users mailing list -- 389-users@lists.fedoraproject.org  To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org  Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/  List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines  List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org  Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure  


_______________________________________________  389-users mailing list -- 389-users@lists.fedoraproject.org  To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org  Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/  List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines  List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org  Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure  

_______________________________________________  389-users mailing list -- 389-users@lists.fedoraproject.org  To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org  Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/  List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines  List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org  Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure  

No comments:

Post a Comment