Tuesday, September 12, 2023

[389-users] Re: 389-ds freezes with deadlock

Hi Julian,

Difficult to say. I do not recall specific issue but I know we fixed
several bugs in sync_repl.

First you may install debuginfo it would help to get a better
understanding what happens.

The two threads are likely Thread 62 and trickle thread (2 to 6) because
of intensive db page update.
Do you know if it recovers after that high CPU peak ?
A possibility would be a large update to write back to the changelog.
You may retrieve the problematic csn in access log (during high cpu) and
dump the update from the changelog with dbscan (-k).

Regarding the unindexed search, you may check if 'changeNumber' is
indexed (equality). It looks related to a sync_repl search with no
cookie or old cookie. The search is on a different backend than Thread
62, so there is no conflict between the sync_repl unindexed search and
update on thread62.

best regards
thierry

On 9/12/23 13:52, Julian Kippels wrote:
> Hi,
>
> there are two threads that are at 100% CPU utilisation. I did not
> start any admin task myself, maybe it is some built-in task that is
> doing this? Or could an unindexed search on the changelog be causing
> this?
>
> I have noticed this message:
> NOTICE - ldbm_back_search - Unindexed search: search
> base="cn=changelog" scope=1 filter="(changeNumber>=1)" conn=35871 op=1
>
> There is an external server that is reading the changelog and syncing
> some stuff depending on that. I don't know why they are starting at
> changeNumber>=1, they probably should start way higher. If it is
> possible that this is the cause I will kick them to stop that ;)
>
> I am running version 2.3.1 on Debian 12, installed from the Debian
> repositories.
>
> Kind regards
> Julian
>
> Am 08.09.23 um 13:23 schrieb Thierry Bordaz:
>> Hi Julian,
>>
>> It looks that an update (Thread 62) is either eating CPU either is
>> blocked while update the changelog.
>> When it occurs could you run 'top -H -p <pid>' to see if some thread
>> are eating CPU.
>> Else (no cpu consumption), you may take a pstack and dump DB lock
>> info (db_stat -N -C A -h /var/lib/dirsrv/<inst>db)
>>
>> Did you run admin task (import/export/index...) before it occurred ?
>> What version are you running ?
>>
>> best regards
>> Thierry
>>
>> On 9/8/23 09:28, Julian Kippels wrote:
>>> Hi,
>>>
>>> it happened again and now I ran the gdb-command like Mark suggested.
>>> The Stacktrace is attached. Again I got this error message:
>>>
>>> [07/Sep/2023:15:22:43.410333038 +0200] - ERR - ldbm_back_seq -
>>> deadlock retry BAD 1601, err=0 Unexpected dbimpl error code
>>>
>>> and the remote program that called also stopped working at that time.
>>>
>>> Thanks
>>> Julian Kippels
>>>
>>> Am 28.08.23 um 14:28 schrieb Thierry Bordaz:
>>>> Hi Julian,
>>>>
>>>> I agree with Mark suggestion. If new connections are failing a
>>>> pstack + error logged msg would be helpful.
>>>>
>>>> Regarding the error logged. LDAP server relies on a database that,
>>>> under pressure by multiple threads, may end into a db_lock
>>>> deadlock. In such situation the DB, selects one deadlocking thread,
>>>> returns a DB_Deadlock error to that thread while the others threads
>>>> continue to proceed. This is very normal error that is caught by
>>>> the server that simply retries to access the DB. If the same thread
>>>> fails to many time, it stops retry and return a fatal error to the
>>>> request.
>>>>
>>>> In your case it reports code 1601 that is transient deadlock with
>>>> retry. So the impacted request just retried and likely succeeded.
>>>>
>>>> best regards
>>>> thierry
>>>>
>>>> On 8/24/23 14:46, Mark Reynolds wrote:
>>>>> Hi Julian,
>>>>>
>>>>> It would be helpful to get a pstack/stacktrace so we can see where
>>>>> DS is stuck:
>>>>>
>>>>> https://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Hangs
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>> On 8/24/23 4:13 AM, Julian Kippels wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am using 389-ds Version 2.3.1 and have encountered the same
>>>>>> error twice in three days now. There are some MOD operations and
>>>>>> then I get a line like this in the errors-log:
>>>>>>
>>>>>> [23/Aug/2023:13:27:17.971884067 +0200] - ERR - ldbm_back_seq -
>>>>>> deadlock retry BAD 1601, err=0 Unexpected dbimpl error code
>>>>>>
>>>>>> After this the server keeps running, systemctl status says
>>>>>> everything is fine, but new incoming connections are failing with
>>>>>> timeouts.
>>>>>>
>>>>>> Any advice would be welcome.
>>>>>>
>>>>>> Thanks in advance
>>>>>> Julian Kippels
>>>>>>
>>>>>> _______________________________________________
>>>>>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>>>>>> To unsubscribe send an email to
>>>>>> 389-users-leave@lists.fedoraproject.org
>>>>>> Fedora Code of Conduct:
>>>>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>>>> List Guidelines:
>>>>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>>>> List Archives:
>>>>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>>>>> Do not reply to spam, report it:
>>>>>> https://pagure.io/fedora-infrastructure/new_issue
>>>>>
>>>> _______________________________________________
>>>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>>>> To unsubscribe send an email to
>>>> 389-users-leave@lists.fedoraproject.org
>>>> Fedora Code of Conduct:
>>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>> List Guidelines:
>>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>> List Archives:
>>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>>> Do not reply to spam, report it:
>>>> https://pagure.io/fedora-infrastructure/new_issue
>>>
>>>
>>> _______________________________________________
>>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>>> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
>>> Fedora Code of Conduct:
>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>>> List Archives:
>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>> Do not reply to spam, report it:
>>> https://pagure.io/fedora-infrastructure/new_issue
>> _______________________________________________
>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
>> Fedora Code of Conduct:
>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives:
>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>> Do not reply to spam, report it:
>> https://pagure.io/fedora-infrastructure/new_issue
>
>
> _______________________________________________
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
> Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

No comments:

Post a Comment