Wednesday, April 22, 2020

[389-users] Re: replication problems

Mark,

Here's: 
(gdb) where
#0  0x00007ffff455399f in raise () at /lib64/libc.so.6
#1  0x00007ffff453dcf5 in abort () at /lib64/libc.so.6
#2  0x00007ffff5430cd0 in PR_Assert () at /lib64/libnspr4.so
#3  0x00007ffff7b71627 in slapi_valueset_done (vs=0x7fff8c022aa8) at ldap/servers/slapd/valueset.c:471
#4  0x00007ffff7b72257 in valueset_array_purge (a=0x7fff8c022aa0, vs=0x7fff8c022aa8, csn=0x7fff977fd340) at ldap/servers/slapd/valueset.c:804
#5  0x00007ffff7b723c5 in valueset_purge (a=0x7fff8c022aa0, vs=0x7fff8c022aa8, csn=0x7fff977fd340) at ldap/servers/slapd/valueset.c:834
#6  0x00007ffff7ada6fa in entry_delete_present_values_wsi_multi_valued (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x0, csn=0x7fff977fd340, urp=8, mod_op=2, replacevals=0x7fff8c0127c0)
    at ldap/servers/slapd/entrywsi.c:777
#7  0x00007ffff7ada20d in entry_delete_present_values_wsi (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x0, csn=0x7fff977fd340, urp=8, mod_op=2, replacevals=0x7fff8c0127c0)
    at ldap/servers/slapd/entrywsi.c:623
#8  0x00007ffff7adaa7a in entry_replace_present_values_wsi (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x7fff8c0127c0, csn=0x7fff977fd340, urp=8) at ldap/servers/slapd/entrywsi.c:869
#9  0x00007ffff7adabf1 in entry_apply_mod_wsi (e=0x7fff8c01f500, mod=0x7fff8c0127a0, csn=0x7fff977fd340, urp=8) at ldap/servers/slapd/entrywsi.c:903
#10 0x00007ffff7adae52 in entry_apply_mods_wsi (e=0x7fff8c01f500, smods=0x7fff977fd3c0, csn=0x7fff8c012160, urp=8) at ldap/servers/slapd/entrywsi.c:973
#11 0x00007fffead19364 in modify_apply_check_expand
    (pb=0x7fff8c000b20, operation=0x814160, mods=0x7fff8c012750, e=0x7fff8c01bc90, ec=0x7fff8c01f480, postentry=0x7fff977fd4b0, ldap_result_code=0x7fff977fd434, ldap_result_message=0x7fff977fd4d8)
    at ldap/servers/slapd/back-ldbm/ldbm_modify.c:247
#12 0x00007fffead1a430 in ldbm_back_modify (pb=0x7fff8c000b20) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:665
#13 0x00007ffff7b0cd60 in op_shared_modify (pb=0x7fff8c000b20, pw_change=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1021
#14 0x00007ffff7b0b266 in do_modify (pb=0x7fff8c000b20) at ldap/servers/slapd/modify.c:380
#15 0x000000000041592c in connection_dispatch_operation (conn=0x150e220, op=0x814160, pb=0x7fff8c000b20) at ldap/servers/slapd/connection.c:638
#16 0x0000000000417a0e in connection_threadmain () at ldap/servers/slapd/connection.c:1767
#17 0x00007ffff544a568 in _pt_root () at /lib64/libnspr4.so
#18 0x00007ffff4de52de in start_thread () at /lib64/libpthread.so.0
#19 0x00007ffff46184b3 in clone () at /lib64/libc.so.6
(gdb) print *vs->sorted[0]
Cannot access memory at address 0xffffffffffffffff

Thanks,

Alberto Viana

On Wed, Apr 22, 2020 at 4:22 PM Mark Reynolds <mreynolds@redhat.com> wrote:


On 4/22/20 3:15 PM, Alberto Viana wrote:
William,

Here's:

(gdb) frame 3
#3  0x00007ffff7b71627 in slapi_valueset_done (vs=0x7fff8c022aa8) at ldap/servers/slapd/valueset.c:471
471        PR_ASSERT((vs->sorted == NULL) || (vs->num < VALUESET_ARRAY_SORT_THRESHOLD) || ((vs->num >= VALUESET_ARRAY_SORT_THRESHOLD) && (vs->sorted[0] < vs->num)));
(gdb) print *vs
$1 = {num = 21, max = 32, sorted = 0x7fff8c023ad0, va = 0x7fff8c022b50}

Can you also do a "print *vs->sorted[0]" ?

And a "where" so we can see the full stack trace that leads up to this assertion?

Thanks,

Mark



Thanks,

Alberto Viana

On Sun, Apr 19, 2020 at 8:52 PM William Brown <wbrown@suse.de> wrote:


> On 18 Apr 2020, at 02:55, Alberto Viana <albertocrj@gmail.com> wrote:
>
> Hi Guys,
>
> I build my own packages (from source), here's the info:
> 389-ds-base-1.4.2.8-20200414gitfae920fc8.el8.x86_64.rpm
> 389-ds-base-debuginfo-1.4.2.8-20200414gitfae920fc8.el8.x86_64.rpm
> python3-lib389-1.4.2.8-20200414gitfae920fc8.el8.noarch.rpm
>
> I'm running in centos8.
>
> Here's what I could debug:
> https://gist.github.com/albertocrj/4d74732e4e357fbc5a27296199127a62
> https://gist.github.com/albertocrj/94fc3521024c7a508f1726923936e476

So that assert seems to be:

PR_ASSERT((vs->sorted == NULL) || (vs->num < VALUESET_ARRAY_SORT_THRESHOLD) || ((vs->num >= VALUESET_ARRAY_SORT_THRESHOLD) && (vs->sorted[0] < vs->num)));

But it's not clear which condition here is being violated.

It looks like your catching this in GDB though, so can you go to:

https://gist.github.com/albertocrj/4d74732e4e357fbc5a27296199127a62

(gdb) frame 3
(gdb) print *vs

That would help to work out what condition is incorrectly being asserted here.

Thanks!


>
>
> Do you guys need something else?
>
> Thanks
>
> Alberto Viana
>
>
>
>
> On Tue, Mar 31, 2020 at 8:03 PM William Brown <wbrown@suse.de> wrote:
>
>
> > On 1 Apr 2020, at 05:18, Mark Reynolds <mreynolds@redhat.com> wrote:
> >
> >
> > On 3/31/20 1:36 PM, Alberto Viana wrote:
> >> Hey Guys,
> >>
> >> 389-Directory/1.4.2.8
> >>
> >> 389 (master) <=> 389 (master)
> >>
> >> In a master to master replication, start to see this error :
> >> [31/Mar/2020:17:30:52.610637150 +0000] - WARN - NSMMReplicationPlugin - replica_check_for_data_reload - Disorderly shutdown for replica dc=rnp,dc=local. Check if DB RUV needs to be updated
>
> Also might be good to remind us what distro and packages you have 389-ds from?
>
> > Looks like the server is crashing which is why you see these disorderly shutdown messages. Please get a core file and take some stack traces from it:
> >
> > http://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes
> >
> > Can you please provide the complete logs?  Also, you might want to try re-initializing the replication agreement instead of disabling and re-enabling replication (its less painful and it "might" solve the issue). 
> >
> > Mark
> >
> >>
> >> Even after restart the service the problem persists, I have to disable and re-enable replication (and replication agr) on both sides, it works for some time, and the problem comes back.
> >>
> >> Any tips?
> >>
> >> Thanks
> >>
> >> Alberto Viana
> >>
> >>
> >> _______________________________________________
> >> 389-users mailing list --
> >> 389-users@lists.fedoraproject.org
> >>
> >> To unsubscribe send an email to
> >> 389-users-leave@lists.fedoraproject.org
> >>
> >> Fedora Code of Conduct:
> >> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> >>
> >> List Guidelines:
> >> https://fedoraproject.org/wiki/Mailing_list_guidelines
> >>
> >> List Archives:
> >> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
> > --
> >
> > 389 Directory Server Development Team
> >
> > _______________________________________________
> > 389-users mailing list -- 389-users@lists.fedoraproject.org
> > To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
> > Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>
> —
> Sincerely,
>
> William Brown
>
> Senior Software Engineer, 389 Directory Server
> SUSE Labs
>


Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs


_______________________________________________  389-users mailing list -- 389-users@lists.fedoraproject.org  To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org  Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/  List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines  List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org  
--     389 Directory Server Development Team

No comments:

Post a Comment