Wednesday, April 29, 2020

[389-users] Re: replication problems


On Wed, Apr 22, 2020 at 11:09 PM William Brown <wbrown@suse.de> wrote:


> On 23 Apr 2020, at 06:59, Alberto Viana <albertocrj@gmail.com> wrote:
>
> Mark,
>
> On frame 9:
>
> It's go until p *mod->mod_bvalues[20]
>
> (gdb)  p *mod->mod_bvalues[21]
> Cannot access memory at address 0x0
>
> On frame 7:
> It's go until p *replacevals[20]
>
> (gdb) p *replacevals[21]
> Cannot access memory at address 0x0

Yep, but we need to see all the outputs from 0 -> 20 and 0 -> 21 respectively :) So copy paste the full out put please! Thanks for your patience with this.

>
> On frame 6:
> (gdb) frame 6
> #6  0x00007ffff7ada6fa in entry_delete_present_values_wsi_multi_valued (e=0x7fff8401f500, type=0x7fff84012780 "memberOf", vals=0x0, csn=0x7fff967fb340, urp=8, mod_op=2, replacevals=0x7fff840127c0)
>     at ldap/servers/slapd/entrywsi.c:777
> 777            valueset_purge(a, &a->a_present_values, csn);
> (gdb) print *a
> $278 = {a_type = 0x7fff84022b30 "memberOf", a_present_values = {num = 21, max = 32, sorted = 0x7fff84023ad0, va = 0x7fff84022b50}, a_flags = 4, a_plugin = 0x6c7e80, a_deleted_values = {num = 0, max = 0,
>     sorted = 0x0, va = 0x0}, a_listtofree = 0x0, a_next = 0x7fff84023c00, a_deletioncsn = 0x7fff840247c0, a_mr_eq_plugin = 0x0, a_mr_ord_plugin = 0x0, a_mr_sub_plugin = 0x0}
> (gdb) print *a->a_present_values
> Structure has no component named operator*.
> (gdb) print *a->a_present_values.va[0]
>
>
> Thanks,
>
> Alberto Viana
>
> On Wed, Apr 22, 2020 at 4:57 PM Mark Reynolds <mreynolds@redhat.com> wrote:
> Goto frame 9 and start printing the mod:
>
> (gdb) p *mod
>
> (gdb) print i
>
> (gdb) p *mod->mod_bvalues[0]
>
> (gdb) p *mod->mod_bvalues[1]
>
> ... Keep doing that unitl its NULL
>
>
>
> Then goto frame 7
>
> (gdb) p *replacevals
>
> (gdb) p *replacevals[0]
>
> (gdb) p *replacevals[1]
>
> --- Keeping doing this until its NULL
>
>
>
> Then goto frame 6
>
> (gdb) print *a
>
> (gdb) print *a->a_present_values
>
> (gdb) print *a->a_present_values.va[0]
>
> (gdb) print *a->a_present_values.va[1]
>
> --- Keeping doing this until its NULL
>
>
>
> Thanks,
> Mark
>
>
>
> On 4/22/20 3:43 PM, Alberto Viana wrote:
>> Mark,
>>
>> Yes, I'm  in frame 3, and No, I do not know what modification is, sorry. I think thats what I'm  trying to find out, why one of the servers always crash if I enable the replication between 2 389.
>>
>> Maybe reconfigure my replication, enable debug log and see where stops?
>>
>> What else can I do?
>>
>> Thanks
>>
>>
>> On Wed, Apr 22, 2020 at 4:34 PM Mark Reynolds <mreynolds@redhat.com> wrote:
>>
>>
>> On 4/22/20 3:27 PM, Alberto Viana wrote:
>>> Mark,
>>>
>>> Here's:
>>> (gdb) where
>>> #0  0x00007ffff455399f in raise () at /lib64/libc.so.6
>>> #1  0x00007ffff453dcf5 in abort () at /lib64/libc.so.6
>>> #2  0x00007ffff5430cd0 in PR_Assert () at /lib64/libnspr4.so
>>> #3  0x00007ffff7b71627 in slapi_valueset_done (vs=0x7fff8c022aa8) at ldap/servers/slapd/valueset.c:471
>>> #4  0x00007ffff7b72257 in valueset_array_purge (a=0x7fff8c022aa0, vs=0x7fff8c022aa8, csn=0x7fff977fd340) at ldap/servers/slapd/valueset.c:804
>>> #5  0x00007ffff7b723c5 in valueset_purge (a=0x7fff8c022aa0, vs=0x7fff8c022aa8, csn=0x7fff977fd340) at ldap/servers/slapd/valueset.c:834
>>> #6  0x00007ffff7ada6fa in entry_delete_present_values_wsi_multi_valued (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x0, csn=0x7fff977fd340, urp=8, mod_op=2, replacevals=0x7fff8c0127c0)
>>>     at ldap/servers/slapd/entrywsi.c:777
>>> #7  0x00007ffff7ada20d in entry_delete_present_values_wsi (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x0, csn=0x7fff977fd340, urp=8, mod_op=2, replacevals=0x7fff8c0127c0)
>>>     at ldap/servers/slapd/entrywsi.c:623
>>> #8  0x00007ffff7adaa7a in entry_replace_present_values_wsi (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x7fff8c0127c0, csn=0x7fff977fd340, urp=8) at ldap/servers/slapd/entrywsi.c:869
>>> #9  0x00007ffff7adabf1 in entry_apply_mod_wsi (e=0x7fff8c01f500, mod=0x7fff8c0127a0, csn=0x7fff977fd340, urp=8) at ldap/servers/slapd/entrywsi.c:903
>>> #10 0x00007ffff7adae52 in entry_apply_mods_wsi (e=0x7fff8c01f500, smods=0x7fff977fd3c0, csn=0x7fff8c012160, urp=8) at ldap/servers/slapd/entrywsi.c:973
>>> #11 0x00007fffead19364 in modify_apply_check_expand
>>>     (pb=0x7fff8c000b20, operation=0x814160, mods=0x7fff8c012750, e=0x7fff8c01bc90, ec=0x7fff8c01f480, postentry=0x7fff977fd4b0, ldap_result_code=0x7fff977fd434, ldap_result_message=0x7fff977fd4d8)
>>>     at ldap/servers/slapd/back-ldbm/ldbm_modify.c:247
>>> #12 0x00007fffead1a430 in ldbm_back_modify (pb=0x7fff8c000b20) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:665
>>> #13 0x00007ffff7b0cd60 in op_shared_modify (pb=0x7fff8c000b20, pw_change=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1021
>>> #14 0x00007ffff7b0b266 in do_modify (pb=0x7fff8c000b20) at ldap/servers/slapd/modify.c:380
>>> #15 0x000000000041592c in connection_dispatch_operation (conn=0x150e220, op=0x814160, pb=0x7fff8c000b20) at ldap/servers/slapd/connection.c:638
>>> #16 0x0000000000417a0e in connection_threadmain () at ldap/servers/slapd/connection.c:1767
>>> #17 0x00007ffff544a568 in _pt_root () at /lib64/libnspr4.so
>>> #18 0x00007ffff4de52de in start_thread () at /lib64/libpthread.so.0
>>> #19 0x00007ffff46184b3 in clone () at /lib64/libc.so.6
>>> (gdb) print *vs->sorted[0]
>>> Cannot access memory at address 0xffffffffffffffff
>> Are you in the slapi_valueset_done frame?
>>
>> Do you know what the modify operation is doing?  It's something with memberOf, but if you knew the exact operation, and what the entry looks like prior to making that update, it would be very useful to us.
>>
>> Thanks,
>> Mark
>>
>>>
>>> Thanks,
>>>
>>> Alberto Viana
>>>
>>> On Wed, Apr 22, 2020 at 4:22 PM Mark Reynolds <mreynolds@redhat.com> wrote:
>>>
>>>
>>> On 4/22/20 3:15 PM, Alberto Viana wrote:
>>>> William,
>>>>
>>>> Here's:
>>>>
>>>> (gdb) frame 3
>>>> #3  0x00007ffff7b71627 in slapi_valueset_done (vs=0x7fff8c022aa8) at ldap/servers/slapd/valueset.c:471
>>>> 471        PR_ASSERT((vs->sorted == NULL) || (vs->num < VALUESET_ARRAY_SORT_THRESHOLD) || ((vs->num >= VALUESET_ARRAY_SORT_THRESHOLD) && (vs->sorted[0] < vs->num)));
>>>> (gdb) print *vs
>>>> $1 = {num = 21, max = 32, sorted = 0x7fff8c023ad0, va = 0x7fff8c022b50}
>>> Can you also do a "print *vs->sorted[0]" ?
>>>
>>> And a "where" so we can see the full stack trace that leads up to this assertion?
>>>
>>> Thanks,
>>>
>>> Mark
>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Alberto Viana
>>>>
>>>> On Sun, Apr 19, 2020 at 8:52 PM William Brown <wbrown@suse.de> wrote:
>>>>
>>>>
>>>> > On 18 Apr 2020, at 02:55, Alberto Viana <albertocrj@gmail.com> wrote:
>>>> >
>>>> > Hi Guys,
>>>> >
>>>> > I build my own packages (from source), here's the info:
>>>> > 389-ds-base-1.4.2.8-20200414gitfae920fc8.el8.x86_64.rpm
>>>> > 389-ds-base-debuginfo-1.4.2.8-20200414gitfae920fc8.el8.x86_64.rpm
>>>> > python3-lib389-1.4.2.8-20200414gitfae920fc8.el8.noarch.rpm
>>>> >
>>>> > I'm running in centos8.
>>>> >
>>>> > Here's what I could debug:
>>>> > https://gist.github.com/albertocrj/4d74732e4e357fbc5a27296199127a62
>>>> > https://gist.github.com/albertocrj/94fc3521024c7a508f1726923936e476
>>>>
>>>> So that assert seems to be:
>>>>
>>>> PR_ASSERT((vs->sorted == NULL) || (vs->num < VALUESET_ARRAY_SORT_THRESHOLD) || ((vs->num >= VALUESET_ARRAY_SORT_THRESHOLD) && (vs->sorted[0] < vs->num)));
>>>>
>>>> But it's not clear which condition here is being violated.
>>>>
>>>> It looks like your catching this in GDB though, so can you go to:
>>>>
>>>> https://gist.github.com/albertocrj/4d74732e4e357fbc5a27296199127a62
>>>>
>>>> (gdb) frame 3
>>>> (gdb) print *vs
>>>>
>>>> That would help to work out what condition is incorrectly being asserted here.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> >
>>>> >
>>>> > Do you guys need something else?
>>>> >
>>>> > Thanks
>>>> >
>>>> > Alberto Viana
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Mar 31, 2020 at 8:03 PM William Brown <wbrown@suse.de> wrote:
>>>> >
>>>> >
>>>> > > On 1 Apr 2020, at 05:18, Mark Reynolds <mreynolds@redhat.com> wrote:
>>>> > >
>>>> > >
>>>> > > On 3/31/20 1:36 PM, Alberto Viana wrote:
>>>> > >> Hey Guys,
>>>> > >>
>>>> > >> 389-Directory/1.4.2.8
>>>> > >>
>>>> > >> 389 (master) <=> 389 (master)
>>>> > >>
>>>> > >> In a master to master replication, start to see this error :
>>>> > >> [31/Mar/2020:17:30:52.610637150 +0000] - WARN - NSMMReplicationPlugin - replica_check_for_data_reload - Disorderly shutdown for replica dc=rnp,dc=local. Check if DB RUV needs to be updated
>>>> >
>>>> > Also might be good to remind us what distro and packages you have 389-ds from?
>>>> >
>>>> > > Looks like the server is crashing which is why you see these disorderly shutdown messages. Please get a core file and take some stack traces from it:
>>>> > >
>>>> > > http://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes
>>>> > >
>>>> > > Can you please provide the complete logs?  Also, you might want to try re-initializing the replication agreement instead of disabling and re-enabling replication (its less painful and it "might" solve the issue). 
>>>> > >
>>>> > > Mark
>>>> > >
>>>> > >>
>>>> > >> Even after restart the service the problem persists, I have to disable and re-enable replication (and replication agr) on both sides, it works for some time, and the problem comes back.
>>>> > >>
>>>> > >> Any tips?
>>>> > >>
>>>> > >> Thanks
>>>> > >>
>>>> > >> Alberto Viana
>>>> > >>
>>>> > >>
>>>> > >> _______________________________________________
>>>> > >> 389-users mailing list --
>>>> > >> 389-users@lists.fedoraproject.org
>>>> > >>
>>>> > >> To unsubscribe send an email to
>>>> > >> 389-users-leave@lists.fedoraproject.org
>>>> > >>
>>>> > >> Fedora Code of Conduct:
>>>> > >> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>> > >>
>>>> > >> List Guidelines:
>>>> > >> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>> > >>
>>>> > >> List Archives:
>>>> > >> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>>> > > --
>>>> > >
>>>> > > 389 Directory Server Development Team
>>>> > >
>>>> > > _______________________________________________
>>>> > > 389-users mailing list -- 389-users@lists.fedoraproject.org
>>>> > > To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
>>>> > > Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>> > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>> > > List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>>> >
>>>> > —
>>>> > Sincerely,
>>>> >
>>>> > William Brown
>>>> >
>>>> > Senior Software Engineer, 389 Directory Server
>>>> > SUSE Labs
>>>> >
>>>>
>>>> —
>>>> Sincerely,
>>>>
>>>> William Brown
>>>>
>>>> Senior Software Engineer, 389 Directory Server
>>>> SUSE Labs
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> 389-users mailing list --
>>>> 389-users@lists.fedoraproject.org
>>>>
>>>> To unsubscribe send an email to
>>>> 389-users-leave@lists.fedoraproject.org
>>>>
>>>> Fedora Code of Conduct:
>>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>>
>>>> List Guidelines:
>>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>>
>>>> List Archives:
>>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>> --
>>>
>>> 389 Directory Server Development Team
>>>
>>>
>>>
>>> _______________________________________________
>>> 389-users mailing list --
>>> 389-users@lists.fedoraproject.org
>>>
>>> To unsubscribe send an email to
>>> 389-users-leave@lists.fedoraproject.org
>>>
>>> Fedora Code of Conduct:
>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>>
>>> List Guidelines:
>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>
>>> List Archives:
>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>> --
>>
>> 389 Directory Server Development Team
>>
> --
>
> 389 Directory Server Development Team
>
> _______________________________________________
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org


Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org

No comments:

Post a Comment