Wednesday, August 11, 2021

[389-users] Re: Lock table is out of available lock entries

On 11-08-2021 15:01, Mark Reynolds wrote:
> On 8/11/21 8:52 AM, Kees Bakker wrote:
>> On 11-08-2021 14:38, Mark Reynolds wrote:
>>> On 8/11/21 8:30 AM, Kees Bakker wrote:
>>>> On 11-08-2021 11:49, Kees Bakker wrote:
>>>>> On 09-08-2021 20:23, Mark Reynolds wrote:
>>>>>> On 8/9/21 1:47 PM, Kees Bakker wrote:
>>>>>>> On 09-08-2021 19:25, Mark Reynolds wrote:
>>>>>>>> On 8/9/21 1:16 PM, Kees Bakker wrote:
>>>>>>>>> On 09-08-2021 18:43, Mark Reynolds wrote:
>>>>>>>>>> On 8/9/21 11:20 AM, Kees Bakker wrote:
>>>>>>>>>>> On 09-08-2021 16:00, Mark Reynolds wrote:
>>>>>>>>>>>> On 8/9/21 8:09 AM, Kees Bakker wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> When my dirsrv was trying to compact the databases I was
>>>>>>>>>>>>> getting
>>>>>>>>>>>>> this
>>>>>>>>>>>>> error
>>>>>>>>>>>>>
>>>>>>>>>>>>> [07/Aug/2021:23:59:02.715984489 +0200] - NOTICE -
>>>>>>>>>>>>> bdb_compact -
>>>>>>>>>>>>> Compacting databases ...
>>>>>>>>>>>>> [07/Aug/2021:23:59:02.765932397 +0200] - NOTICE -
>>>>>>>>>>>>> bdb_compact -
>>>>>>>>>>>>> Compacting DB start: userRoot
>>>>>>>>>>>>> [07/Aug/2021:23:59:03.518175414 +0200] - NOTICE -
>>>>>>>>>>>>> bdb_compact -
>>>>>>>>>>>>> compactdb: compact userRoot - 417 pages freed
>>>>>>>>>>>>> [07/Aug/2021:23:59:03.576427786 +0200] - NOTICE -
>>>>>>>>>>>>> bdb_compact -
>>>>>>>>>>>>> Compacting DB start: ipaca
>>>>>>>>>>>>> [07/Aug/2021:23:59:03.659941533 +0200] - NOTICE -
>>>>>>>>>>>>> bdb_compact -
>>>>>>>>>>>>> compactdb: compact ipaca - 419 pages freed
>>>>>>>>>>>>> [07/Aug/2021:23:59:03.718445310 +0200] - NOTICE -
>>>>>>>>>>>>> bdb_compact -
>>>>>>>>>>>>> Compacting DB start: changelog
>>>>>>>>>>>>> [08/Aug/2021:00:00:40.807571334 +0200] - NOTICE -
>>>>>>>>>>>>> NSMMReplicationPlugin - changelog program - cl5CompactDBs -
>>>>>>>>>>>>> compacting
>>>>>>>>>>>>> replication changelogs...
>>>>>>>>>>>>> [08/Aug/2021:00:00:54.309357211 +0200] - ERR - libdb -
>>>>>>>>>>>>> BDB2055 Lock
>>>>>>>>>>>>> table is out of available lock entries
>>>>>>>>>>>>> [08/Aug/2021:00:00:54.726504736 +0200] - ERR - bdb_compact -
>>>>>>>>>>>>> compactdb: failed to compact changelog; db error - 12 Cannot
>>>>>>>>>>>>> allocate
>>>>>>>>>>>>> memory
>>>>>>>>>>>>> [08/Aug/2021:00:00:54.801571421 +0200] - ERR - libdb -
>>>>>>>>>>>>> BDB2055 Lock
>>>>>>>>>>>>> table is out of available lock entries
>>>>>>>>>>>>> [08/Aug/2021:00:00:54.876618702 +0200] - ERR -
>>>>>>>>>>>>> NSMMReplicationPlugin -
>>>>>>>>>>>>> changelog program - cl5CompactDBs - Failed to compact
>>>>>>>>>>>>> a797bb0b-be1d11eb-88c0b677-613aa2ad; db error - 12 Cannot
>>>>>>>>>>>>> allocate
>>>>>>>>>>>>> memory
>>>>>>>>>>>>> [08/Aug/2021:00:00:57.253006449 +0200] - NOTICE -
>>>>>>>>>>>>> bdb_compact -
>>>>>>>>>>>>> Compacting databases finished.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are about 402k entries in cn=changelog.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a few questions
>>>>>>>>>>>>> 1) is it normal to have so many entries in cn=changelog? On
>>>>>>>>>>>>> another
>>>>>>>>>>>>> replica I have almost 3M entries Isn't this cleaned up?
>>>>>>>>>>>>> 2) the number of locks is 50000 (there are two config items).
>>>>>>>>>>>>> Should I
>>>>>>>>>>>>> increase that number? If so, increase to what?
>>>>>>>>>>>>> 3) is there maybe something else going on, causing the
>>>>>>>>>>>>> exhaustion of
>>>>>>>>>>>>> the locks?
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, so by default there is no changelog trimming enabled. So
>>>>>>>>>>>> the
>>>>>>>>>>>> changelog will grow without bounds, which is bad.
>>>>>>>>>>>
>>>>>>>>>>> How much of this [1] applies? Indeed it says "By default the
>>>>>>>>>>> Replication Changelog does not use any trimming by default."
>>>>>>>>>>> So, how is the trimming actually enabled? I have these entries
>>>>>>>>>>>
>>>>>>>>>>> dn: cn=changelog5,cn=config
>>>>>>>>>>> cn: changelog5
>>>>>>>>>>> nsslapd-changelogdir: /var/lib/dirsrv/slapd-EXAMPLE-COM/cldb
>>>>>>>>>>> nsslapd-changelogmaxage: 30d
>>>>>>>>>>> objectClass: top
>>>>>>>>>>> objectClass: extensibleobject
>>>>>>>>>>
>>>>>>>>>> So trimming is set, but it's set to 30 days (30d), that
>>>>>>>>>> could/should be
>>>>>>>>>> shortened to "7d".
>>>>>>>>>
>>>>>>>>> Somehow I doubt that this will do anything.
>>>>>>>>
>>>>>>>> Ahhh, I thought this was the Replication Changelog, but it is the
>>>>>>>> "Retro" Changelog.  Ok so check this out:
>>>>>>>>
>>>>>>>> https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/managing_replication-using_the_retro_changelog_plug_in#Using_the_Retro_Changelog_Plug_in-Trimming_the_Retro_Changelog
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Or update config directly using ldapmodify:
>>>>>>>>
>>>>>>>> # ldapmodify -D ...
>>>>>>>> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
>>>>>>>> changetype: modify
>>>>>>>> replace: nsslapd-changelogmaxage: 7d
>>>>>>>> nsslapd-changelogmaxage: 7d
>>>>>>>
>>>>>>> (( Typo? The replace line is without the value ))
>>>>>> yes, sorry...
>>>>>>>
>>>>>>> Hmm, that maxage was already at 2 days
>>>>>>
>>>>>> Ok, perhaps you are running into a known bug where indexes created by
>>>>>> plugins are not enabled.  Or might be a config issue?
>>>>>>
>>>>>> Try this first.  Set in the retro changelog entry:
>>>>>>
>>>>>> nsslapd-changelog-trim-interval: 300   <--- 5 minutes
>>>>> (Sorry to be picky, but it is nsslapd-changelogtrim-interval. It's
>>>>> default is 300.)
>>>>>>
>>>>>> Restart the server to make sure it was applied.  You can enable
>>>>>> "plugin"
>>>>>> logging to see what it does.
>>>>>>
>>>>>> nsslapd-errorlog-level: 65536
>>>>>>
>>>>>> wait over 5 minutes, then grep the errors log for "DSRetroclPlugin".
>>>>>> This will tell us what the plugin is doing when it goes to trim the
>>>>>> database.
>>>>>>
>>>>>> IMPORTANT, disable plugin logging immediately after you get your
>>>>>> data.
>>>>>> Set the errorlog level to 0 to go back to default logging.
>>>>>>
>>>>>> If the logging shows it's not finding any candidates then it might be
>>>>>> the indexing issue I mentioned above.  In that case you just need to
>>>>>> reindex the retro changelog:
>>>>>
>>>>> Before changing anything I enabled Plug-in debugging which revealed
>>>>> something interesting
>>>>>
>>>>> [root@linge ~]# grep trim
>>>>> /var/log/dirsrv/slapd-GHS-NL/errors-plugin-debug-20210811-1108
>>>>> [11/Aug/2021:11:01:49.792450548 +0200] - DEBUG - DSRetroclPlugin -
>>>>> cltrim: ldrc=0, first_time=1622032141, cur_time=3702600
>>>>> [11/Aug/2021:11:01:49.842339467 +0200] - DEBUG - DSRetroclPlugin -
>>>>> retrocl_housekeeping - changelog does not need to be trimmed
>>>>> [11/Aug/2021:11:06:49.953401407 +0200] - DEBUG - DSRetroclPlugin -
>>>>> cltrim: ldrc=0, first_time=1622032141, cur_time=3702900
>>>>> [11/Aug/2021:11:06:50.012249703 +0200] - DEBUG - DSRetroclPlugin -
>>>>> retrocl_housekeeping - changelog does not need to be trimmed
>>>>>
>>>>> This shows that the interval is 300 (even if
>>>>> nsslapd-changelogtrim-interval is not explicitly set in
>>>>> cn=changelog5,cn=config
>>>>>
>>>>> Look at cur_time. That is the "uptime" of the system, not UTC.
>>>>>
>>>>> [root@linge ~]# uptime
>>>>>  11:21:42 up 42 days, 20:49,  2 users,  load average: 0.15, 0.14, 0.16
>>>>>
>>>>> Next, assuming this code from retrocl_trim.c is making the decision
>>>>> to trim
>>>>>
>>>>>             first_time = retrocl_getchangetime(SLAPI_SEQ_FIRST,
>>>>> &ldrc);
>>>>>             slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
>>>>>                           "cltrim: ldrc=%d, first_time=%ld,
>>>>> cur_time=%ld\n",
>>>>>                           ldrc, first_time, cur_time);
>>>>>             if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
>>>>>                 first_time + ts.ts_c_max_age < cur_time) {
>>>>>                 must_trim = 1;
>>>>>             }
>>>>>
>>>>> There is a mixup of absolute (first_time) and relative (cur_time)
>>>>> time. Which of the two is wrong?
>>>>
>>>> This is probably introduced in commit a5029c8091 for Pull Request [1].
>>>> The queue function is now called with a relative time.
>>>>
>>>> [1] https://github.com/389ds/389-ds-base/pull/4450
>>>> [2] https://github.com/389ds/389-ds-base/issues/4384
>>>
>>> Yes I think you found a bug.  Are you building the server yourself?  Can
>>> I send you a code patch to try?
>>
>> No, I'm "just" a CentOS user, with access to the Github source code.
>> And the last time I build an rpm was more than 20 years ago in the
>> early RedHat days.
>>
>> Also, I don't want to mess too much with our production system.
>
> Understood.  What version of 389-ds-base are you using?

1.4.3.23 (CentOS 8 Stream)

>
> Thanks,
> Mark
>
>>
>>>
>>> Mark
>>>
>>>>
>>>>>
>>>>>>
>>>>>> # dsctl slapd-YOUR_INSTANCE stop
>>>>>>
>>>>>> # dsctl slapd-YOUR_INSTANCE db2index changelog
>>>>>>
>>>>>> # dsctl slapd-YOUR_INSTANCE start
>>>>>>
>>>>>> Then wait 5 minutes and see if anything was trimmed.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>>
>>>>>>> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
>>>>>>> cn: Retro Changelog Plugin
>>>>>>> nsslapd-attribute: nsuniqueid:targetUniqueId
>>>>>>> nsslapd-changelogmaxage: 2d
>>>>>>> nsslapd-include-suffix: cn=dns,dc=example,dc=com
>>>>>>> nsslapd-plugin-depends-on-named: Class of Service
>>>>>>> nsslapd-plugin-depends-on-type: database
>>>>>>> nsslapd-pluginDescription: Retrocl Plugin
>>>>>>> nsslapd-pluginEnabled: on
>>>>>>> nsslapd-pluginId: retrocl
>>>>>>> nsslapd-pluginInitfunc: retrocl_plugin_init
>>>>>>> nsslapd-pluginPath: libretrocl-plugin
>>>>>>> nsslapd-pluginType: object
>>>>>>> nsslapd-pluginVendor: 389 Project
>>>>>>> nsslapd-pluginVersion: 1.4.3.231
>>>>>>> nsslapd-pluginbetxn: on
>>>>>>> nsslapd-pluginprecedence: 25
>>>>>>> objectClass: top
>>>>>>> objectClass: nsSlapdPlugin
>>>>>>> objectClass: extensibleObject
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The first entry in cn=changelog is more than two months old. (May
>>>>>>>>> 26th
>>>>>>>>> is about
>>>>>>>>> the time I installed this server as a FreeIPA replica.)
>>>>>>>>>
>>>>>>>>> dn: changenumber=1,cn=changelog
>>>>>>>>> objectClass: top
>>>>>>>>> objectClass: changelogentry
>>>>>>>>> objectClass: extensibleObject
>>>>>>>>> targetuniqueid: 1e89ebcb-e20111e8-8820f96e-fc0c60a4
>>>>>>>>> changeNumber: 1
>>>>>>>>> targetDn:
>>>>>>>>> idnsname=_ldap._tcp,idnsname=example.com.,cn=dns,dc=example,dc=com
>>>>>>>>> changeTime: 20210526122901Z
>>>>>>>>> changeType: modify
>>>>>>>>> changes:: YWRkOiBzUlZSZWNvc...gplbnRyeXVzbjogMTEyCi0KAA==
>>>>>>>>>
>>>>>>>>> Is there a way to see trimming in action? I'm afraid it is not
>>>>>>>>> doing
>>>>>>>>> anything.
>>>>>>>
>>>>>>> What diagnostic can I enable to see the trimming in action?
>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I recommend setting up the changelog max age to 7 days:
>>>>>>>>>>>>
>>>>>>>>>>>> https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/trimming_the_replication_changelog
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Once that the trimming cleans up the changelog the database
>>>>>>>>>>>> compaction
>>>>>>>>>>>> should work without issue.  You can also increase the database
>>>>>>>>>>>> locks to
>>>>>>>>>>>> 1 million (that does require a restart of the server to take
>>>>>>>>>>>> effect).
>>>>>>>>>>>
>>>>>>>>>>> Let's hope that 1 million is enough :-)
>>>>>>>>>>
>>>>>>>>>> It might not be, you can keep bumping it up if needed. There
>>>>>>>>>> is no
>>>>>>>>>> limit, or negative impact on db performance.
>>>>>>>>>>
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> HTH,
>>>>>>>>>>>>
>>>>>>>>>>>> Mark
>>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://directory.fedoraproject.org/docs/389ds/FAQ/changelog-trimming.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Directory Server Development Team
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>>>>>>>>> To unsubscribe send an email to
>>>>>>>>> 389-users-leave@lists.fedoraproject.org
>>>>>>>>> Fedora Code of Conduct:
>>>>>>>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>>>>>>> List Guidelines:
>>>>>>>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>>>>>>> List Archives:
>>>>>>>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do not reply to spam on the list, report it:
>>>>>>>>> https://pagure.io/fedora-infrastructure
>>>>>>>>
>>>>>>>> --
>>>>>>>> Directory Server Development Team
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Directory Server Development Team
>>>>>>
>>>>> _______________________________________________
>>>>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>>>>> To unsubscribe send an email to
>>>>> 389-users-leave@lists.fedoraproject.org
>>>>> Fedora Code of Conduct:
>>>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>>> List Guidelines:
>>>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>>> List Archives:
>>>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>>>>
>>>>> Do not reply to spam on the list, report it:
>>>>> https://pagure.io/fedora-infrastructure
>>>> _______________________________________________
>>>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>>>> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
>>>> Fedora Code of Conduct:
>>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>> List Archives:
>>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>>>
>>>> Do not reply to spam on the list, report it:
>>>> https://pagure.io/fedora-infrastructure
>>>
>>> --
>>> Directory Server Development Team
>>>
>>
> --
> Directory Server Development Team
>
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure

No comments:

Post a Comment