Wednesday, August 3, 2022

[389-users] Re: Crash with SEGV after compacting

On 8/3/22 1:11 PM, Niklas Schmatloch wrote:
> Hi
>
> My organisation is using a replicated 389-dirsrv. Lately, it has been crashing
> each time after compacting.
>
> It is replicable on our instances by lowering the compactdb-interval to
> trigger the compacting:
>
> dsconf -D "cn=Directory Manager" ldap://127.0.0.1 -w 'PASSWORD_HERE' backend config set --compactdb-interval 300

Tip - you can use the server instance name in place of the credentials
and URL.  It will use LDAPI as long as run it as root:

    dsconf slapd-INSTANCE backend config set --compactdb-interval 300

or even shorter (without the "slapd-" if the instance name does not
match an argument in dsconf):

   dsconf INSTANCE backend config set --compactdb-interval 300

Makes it easier to use the new tools IMHO.


>
> This is the log:
>
> [03/Aug/2022:16:06:38.552781605 +0200] - NOTICE - checkpoint_threadmain - Compacting DB start: userRoot
> [03/Aug/2022:16:06:38.752592692 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact userRoot - 8 pages freed
> [03/Aug/2022:16:06:44.172233009 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact userRoot - 888 pages freed
> [03/Aug/2022:16:06:44.179315345 +0200] - NOTICE - checkpoint_threadmain - Compacting DB start: changelog
> [03/Aug/2022:16:13:18.020881527 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact changelog - 458 pages freed
> dirsrv@auth-alpha.service: Main process exited, code=killed, status=11/SEGV
> dirsrv@auth-alpha.service: Failed with result 'signal'.
> dirsrv@auth-alpha.service: Consumed 2d 6h 22min 1.122s CPU time.
>
> The first steps are done very quickly, but the step before the 458 pages of the
> retro-changelog are freed, takes several minutes. In this time the dirsrv writes
> more than 10 G and reads more than 7 G (according to iotop).
>
> After this line is printed the dirsrv crashes within seconds.
> What I also noticed is, that even though it said it freed a lot of pages the
> retro-changelog does not seem to change in size.
> The file `/var/lib/dirsrv/slapd-auth-alpha/db/changelog/id2entry.db` is 7.2 G
> before and after the compacting.
>
>
> Debian 11.4
> 389-ds-base/stable,now 1.4.4.11-2 amd64
>
> Does someone have an idea how to debug / fix this?

Definitely need a good stacktrace from the crash.  Unfortunately I think
this doc is slightly outdated but it's mostly accurate (the core file
location is probably wrong):
https://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes

You could also live debug it as well by just attaching gdb to the
ns-slapd process (after installing the devel and debuginfo packages) and
waiting for the compaction to occur.  Then when it crashes get the stack
of the crashing thread.  Or, all threads: (gdb) thread apply all bt full

Question, is there trimming set up on the retrocl?  How aggressive are
the trimming settings?  Not sure if trimming more entries before the
next compaction would help or hurt.

Anyway the server should never crash, so please provide the requested
information and we will take a look at it.

Thanks,

Mark

>
> Thanks
> _______________________________________________
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
> Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

--
Directory Server Development Team
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

No comments:

Post a Comment