Wednesday, August 11, 2021

[389-users] Re: Lock table is out of available lock entries

On 8/11/21 8:30 AM, Kees Bakker wrote:
> On 11-08-2021 11:49, Kees Bakker wrote:
>> On 09-08-2021 20:23, Mark Reynolds wrote:
>>> On 8/9/21 1:47 PM, Kees Bakker wrote:
>>>> On 09-08-2021 19:25, Mark Reynolds wrote:
>>>>> On 8/9/21 1:16 PM, Kees Bakker wrote:
>>>>>> On 09-08-2021 18:43, Mark Reynolds wrote:
>>>>>>> On 8/9/21 11:20 AM, Kees Bakker wrote:
>>>>>>>> On 09-08-2021 16:00, Mark Reynolds wrote:
>>>>>>>>> On 8/9/21 8:09 AM, Kees Bakker wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> When my dirsrv was trying to compact the databases I was getting
>>>>>>>>>> this
>>>>>>>>>> error
>>>>>>>>>>
>>>>>>>>>> [07/Aug/2021:23:59:02.715984489 +0200] - NOTICE - bdb_compact -
>>>>>>>>>> Compacting databases ...
>>>>>>>>>> [07/Aug/2021:23:59:02.765932397 +0200] - NOTICE - bdb_compact -
>>>>>>>>>> Compacting DB start: userRoot
>>>>>>>>>> [07/Aug/2021:23:59:03.518175414 +0200] - NOTICE - bdb_compact -
>>>>>>>>>> compactdb: compact userRoot - 417 pages freed
>>>>>>>>>> [07/Aug/2021:23:59:03.576427786 +0200] - NOTICE - bdb_compact -
>>>>>>>>>> Compacting DB start: ipaca
>>>>>>>>>> [07/Aug/2021:23:59:03.659941533 +0200] - NOTICE - bdb_compact -
>>>>>>>>>> compactdb: compact ipaca - 419 pages freed
>>>>>>>>>> [07/Aug/2021:23:59:03.718445310 +0200] - NOTICE - bdb_compact -
>>>>>>>>>> Compacting DB start: changelog
>>>>>>>>>> [08/Aug/2021:00:00:40.807571334 +0200] - NOTICE -
>>>>>>>>>> NSMMReplicationPlugin - changelog program - cl5CompactDBs -
>>>>>>>>>> compacting
>>>>>>>>>> replication changelogs...
>>>>>>>>>> [08/Aug/2021:00:00:54.309357211 +0200] - ERR - libdb -
>>>>>>>>>> BDB2055 Lock
>>>>>>>>>> table is out of available lock entries
>>>>>>>>>> [08/Aug/2021:00:00:54.726504736 +0200] - ERR - bdb_compact -
>>>>>>>>>> compactdb: failed to compact changelog; db error - 12 Cannot
>>>>>>>>>> allocate
>>>>>>>>>> memory
>>>>>>>>>> [08/Aug/2021:00:00:54.801571421 +0200] - ERR - libdb -
>>>>>>>>>> BDB2055 Lock
>>>>>>>>>> table is out of available lock entries
>>>>>>>>>> [08/Aug/2021:00:00:54.876618702 +0200] - ERR -
>>>>>>>>>> NSMMReplicationPlugin -
>>>>>>>>>> changelog program - cl5CompactDBs - Failed to compact
>>>>>>>>>> a797bb0b-be1d11eb-88c0b677-613aa2ad; db error - 12 Cannot
>>>>>>>>>> allocate
>>>>>>>>>> memory
>>>>>>>>>> [08/Aug/2021:00:00:57.253006449 +0200] - NOTICE - bdb_compact -
>>>>>>>>>> Compacting databases finished.
>>>>>>>>>>
>>>>>>>>>> There are about 402k entries in cn=changelog.
>>>>>>>>>>
>>>>>>>>>> I have a few questions
>>>>>>>>>> 1) is it normal to have so many entries in cn=changelog? On
>>>>>>>>>> another
>>>>>>>>>> replica I have almost 3M entries Isn't this cleaned up?
>>>>>>>>>> 2) the number of locks is 50000 (there are two config items).
>>>>>>>>>> Should I
>>>>>>>>>> increase that number? If so, increase to what?
>>>>>>>>>> 3) is there maybe something else going on, causing the
>>>>>>>>>> exhaustion of
>>>>>>>>>> the locks?
>>>>>>>>>
>>>>>>>>> Ok, so by default there is no changelog trimming enabled. So the
>>>>>>>>> changelog will grow without bounds, which is bad.
>>>>>>>>
>>>>>>>> How much of this [1] applies? Indeed it says "By default the
>>>>>>>> Replication Changelog does not use any trimming by default."
>>>>>>>> So, how is the trimming actually enabled? I have these entries
>>>>>>>>
>>>>>>>> dn: cn=changelog5,cn=config
>>>>>>>> cn: changelog5
>>>>>>>> nsslapd-changelogdir: /var/lib/dirsrv/slapd-EXAMPLE-COM/cldb
>>>>>>>> nsslapd-changelogmaxage: 30d
>>>>>>>> objectClass: top
>>>>>>>> objectClass: extensibleobject
>>>>>>>
>>>>>>> So trimming is set, but it's set to 30 days (30d), that
>>>>>>> could/should be
>>>>>>> shortened to "7d".
>>>>>>
>>>>>> Somehow I doubt that this will do anything.
>>>>>
>>>>> Ahhh, I thought this was the Replication Changelog, but it is the
>>>>> "Retro" Changelog.  Ok so check this out:
>>>>>
>>>>> https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/managing_replication-using_the_retro_changelog_plug_in#Using_the_Retro_Changelog_Plug_in-Trimming_the_Retro_Changelog
>>>>>
>>>>>
>>>>>
>>>>> Or update config directly using ldapmodify:
>>>>>
>>>>> # ldapmodify -D ...
>>>>> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
>>>>> changetype: modify
>>>>> replace: nsslapd-changelogmaxage: 7d
>>>>> nsslapd-changelogmaxage: 7d
>>>>
>>>> (( Typo? The replace line is without the value ))
>>> yes, sorry...
>>>>
>>>> Hmm, that maxage was already at 2 days
>>>
>>> Ok, perhaps you are running into a known bug where indexes created by
>>> plugins are not enabled.  Or might be a config issue?
>>>
>>> Try this first.  Set in the retro changelog entry:
>>>
>>> nsslapd-changelog-trim-interval: 300   <--- 5 minutes
>> (Sorry to be picky, but it is nsslapd-changelogtrim-interval. It's
>> default is 300.)
>>>
>>> Restart the server to make sure it was applied.  You can enable
>>> "plugin"
>>> logging to see what it does.
>>>
>>> nsslapd-errorlog-level: 65536
>>>
>>> wait over 5 minutes, then grep the errors log for "DSRetroclPlugin".
>>> This will tell us what the plugin is doing when it goes to trim the
>>> database.
>>>
>>> IMPORTANT, disable plugin logging immediately after you get your data.
>>> Set the errorlog level to 0 to go back to default logging.
>>>
>>> If the logging shows it's not finding any candidates then it might be
>>> the indexing issue I mentioned above.  In that case you just need to
>>> reindex the retro changelog:
>>
>> Before changing anything I enabled Plug-in debugging which revealed
>> something interesting
>>
>> [root@linge ~]# grep trim
>> /var/log/dirsrv/slapd-GHS-NL/errors-plugin-debug-20210811-1108
>> [11/Aug/2021:11:01:49.792450548 +0200] - DEBUG - DSRetroclPlugin -
>> cltrim: ldrc=0, first_time=1622032141, cur_time=3702600
>> [11/Aug/2021:11:01:49.842339467 +0200] - DEBUG - DSRetroclPlugin -
>> retrocl_housekeeping - changelog does not need to be trimmed
>> [11/Aug/2021:11:06:49.953401407 +0200] - DEBUG - DSRetroclPlugin -
>> cltrim: ldrc=0, first_time=1622032141, cur_time=3702900
>> [11/Aug/2021:11:06:50.012249703 +0200] - DEBUG - DSRetroclPlugin -
>> retrocl_housekeeping - changelog does not need to be trimmed
>>
>> This shows that the interval is 300 (even if
>> nsslapd-changelogtrim-interval is not explicitly set in
>> cn=changelog5,cn=config
>>
>> Look at cur_time. That is the "uptime" of the system, not UTC.
>>
>> [root@linge ~]# uptime
>>  11:21:42 up 42 days, 20:49,  2 users,  load average: 0.15, 0.14, 0.16
>>
>> Next, assuming this code from retrocl_trim.c is making the decision
>> to trim
>>
>>             first_time = retrocl_getchangetime(SLAPI_SEQ_FIRST, &ldrc);
>>             slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
>>                           "cltrim: ldrc=%d, first_time=%ld,
>> cur_time=%ld\n",
>>                           ldrc, first_time, cur_time);
>>             if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
>>                 first_time + ts.ts_c_max_age < cur_time) {
>>                 must_trim = 1;
>>             }
>>
>> There is a mixup of absolute (first_time) and relative (cur_time)
>> time. Which of the two is wrong?
>
> This is probably introduced in commit a5029c8091 for Pull Request [1].
> The queue function is now called with a relative time.
>
> [1] https://github.com/389ds/389-ds-base/pull/4450
> [2] https://github.com/389ds/389-ds-base/issues/4384

Yes I think you found a bug.  Are you building the server yourself?  Can
I send you a code patch to try?

Mark

>
>>
>>>
>>> # dsctl slapd-YOUR_INSTANCE stop
>>>
>>> # dsctl slapd-YOUR_INSTANCE db2index changelog
>>>
>>> # dsctl slapd-YOUR_INSTANCE start
>>>
>>> Then wait 5 minutes and see if anything was trimmed.
>>>
>>> Mark
>>>
>>>>
>>>> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
>>>> cn: Retro Changelog Plugin
>>>> nsslapd-attribute: nsuniqueid:targetUniqueId
>>>> nsslapd-changelogmaxage: 2d
>>>> nsslapd-include-suffix: cn=dns,dc=example,dc=com
>>>> nsslapd-plugin-depends-on-named: Class of Service
>>>> nsslapd-plugin-depends-on-type: database
>>>> nsslapd-pluginDescription: Retrocl Plugin
>>>> nsslapd-pluginEnabled: on
>>>> nsslapd-pluginId: retrocl
>>>> nsslapd-pluginInitfunc: retrocl_plugin_init
>>>> nsslapd-pluginPath: libretrocl-plugin
>>>> nsslapd-pluginType: object
>>>> nsslapd-pluginVendor: 389 Project
>>>> nsslapd-pluginVersion: 1.4.3.231
>>>> nsslapd-pluginbetxn: on
>>>> nsslapd-pluginprecedence: 25
>>>> objectClass: top
>>>> objectClass: nsSlapdPlugin
>>>> objectClass: extensibleObject
>>>>
>>>>>
>>>>>
>>>>> HTH,
>>>>>
>>>>> Mark
>>>>>
>>>>>>
>>>>>> The first entry in cn=changelog is more than two months old. (May
>>>>>> 26th
>>>>>> is about
>>>>>> the time I installed this server as a FreeIPA replica.)
>>>>>>
>>>>>> dn: changenumber=1,cn=changelog
>>>>>> objectClass: top
>>>>>> objectClass: changelogentry
>>>>>> objectClass: extensibleObject
>>>>>> targetuniqueid: 1e89ebcb-e20111e8-8820f96e-fc0c60a4
>>>>>> changeNumber: 1
>>>>>> targetDn:
>>>>>> idnsname=_ldap._tcp,idnsname=example.com.,cn=dns,dc=example,dc=com
>>>>>> changeTime: 20210526122901Z
>>>>>> changeType: modify
>>>>>> changes:: YWRkOiBzUlZSZWNvc...gplbnRyeXVzbjogMTEyCi0KAA==
>>>>>>
>>>>>> Is there a way to see trimming in action? I'm afraid it is not doing
>>>>>> anything.
>>>>
>>>> What diagnostic can I enable to see the trimming in action?
>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I recommend setting up the changelog max age to 7 days:
>>>>>>>>>
>>>>>>>>> https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/trimming_the_replication_changelog
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Once that the trimming cleans up the changelog the database
>>>>>>>>> compaction
>>>>>>>>> should work without issue.  You can also increase the database
>>>>>>>>> locks to
>>>>>>>>> 1 million (that does require a restart of the server to take
>>>>>>>>> effect).
>>>>>>>>
>>>>>>>> Let's hope that 1 million is enough :-)
>>>>>>>
>>>>>>> It might not be, you can keep bumping it up if needed. There is no
>>>>>>> limit, or negative impact on db performance.
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> HTH,
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://directory.fedoraproject.org/docs/389ds/FAQ/changelog-trimming.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Directory Server Development Team
>>>>>>>
>>>>>> _______________________________________________
>>>>>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>>>>>> To unsubscribe send an email to
>>>>>> 389-users-leave@lists.fedoraproject.org
>>>>>> Fedora Code of Conduct:
>>>>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>>>> List Guidelines:
>>>>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>>>> List Archives:
>>>>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>>>>>
>>>>>>
>>>>>> Do not reply to spam on the list, report it:
>>>>>> https://pagure.io/fedora-infrastructure
>>>>>
>>>>> --
>>>>> Directory Server Development Team
>>>>>
>>>>
>>> --
>>> Directory Server Development Team
>>>
>> _______________________________________________
>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
>> Fedora Code of Conduct:
>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives:
>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>> Do not reply to spam on the list, report it:
>> https://pagure.io/fedora-infrastructure
> _______________________________________________
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
> Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure

--
Directory Server Development Team
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure

No comments:

Post a Comment