Wednesday, June 14, 2017

[389-users] Re: Broken replicas and CleanRUV question

On 06/14/2017 08:24 AM, Predrag Zečević - Technical Support Analyst wrote:
> On 06/02/17 16:22, Mark Reynolds wrote:
>>
>>
>> On 06/02/2017 08:47 AM, Predrag Zečević - Technical Support Analyst
>> wrote:
>>> On 05/31/17 20:44, Mark Reynolds wrote:
>>>>
>>>>
>>>> On 05/31/2017 06:00 AM, Predrag Zečević - Technical Support Analyst
>>>> wrote:
>>>>> Hi all,
>>>>>
>>>>> long time ago we have started with 389-DS and due to lack of
>>>>> experience I have installed and used admin server (which is abandoned
>>>>> later, because it is too complicated and requires someone at
>>>>> keyboard).
>>>>>
>>>>> As consequence of that, we have started to replicate netscapeRoot
>>>>> space... During time, we have upgraded s/w from initial
>>>>> 389-ds-1.2.1-1.el5 (started from FDS repository, moved to EPEL one
>>>>> later) to today's 389-ds-base-1.3.5.14-1.el6.x86_64 (this one is
>>>>> compiled from source and that was introduced before we have migrated
>>>>> boxes from RHEL5 to RHEL6 - actually CentOS OS).
>>>>>
>>>>> During various phases of upgrades, netscapeRoot replicas went out of
>>>>> sync (we did not spotted that, because of bug in monitoring script -
>>>>> that is another issue).
>>>>>
>>>>> Our setup includes MultiMaster ReadWrite replication (ldap1 <-->
>>>>> ldap2) and one ReadOnly (ldap3, consumes from both suppliers in MMR).
>>>>>
>>>>> Right now, this:
>>>>> $ for ldap in ldap1 ldap2; do
>>>>> ldapsearch -x -H ldaps://${ldap}.MyDomain.com -b "cn=mapping
>>>>> tree,cn=config" -D "cn=Directory Manager" -w ${DMPASS} -o
>>>>> ldif-wrap=no
>>>>> objectClass=nsDS5ReplicationAgreement |\
>>>>> awk -vLDAP=${ldap} '/^dn/ {printf("#===== %s =====#\n%s\n", LDAP,
>>>>> $0); next}; /^nsDS5ReplicaHost:/ {printf("%s\n", $0); next;};
>>>>> /^nsds5replicaLastUpdateStatus:/ {printf("%s\n", $0); next;}'
>>>>> done
>>>>>
>>>>> returns (I have excluded working MyDomain replicas output):
>>>>> $ #===== ldap1 =====#
>>>>> dn: cn=2eLDAPmmr,cn=replica,cn=o\3Dnetscaperoot,cn=mapping
>>>>> tree,cn=config
>>>>> nsDS5ReplicaHost: ldap2.MyDomain.com
>>>>> nsds5replicaLastUpdateStatus: Error (0) No replication sessions
>>>>> started since server startup
>>>>> #===== ldap1 =====#
>>>>> dn: cn=2eLDAPror,cn=replica,cn=o\3Dnetscaperoot,cn=mapping
>>>>> tree,cn=config
>>>>> nsDS5ReplicaHost: ldap3.MyDomain.com
>>>>> nsds5replicaLastUpdateStatus: Error (0) No replication sessions
>>>>> started since server startup
>>>>> #===== ldap2 =====#
>>>>> dn: cn=2eLDAPmmr,cn=replica,cn=o\3Dnetscaperoot,cn=mapping
>>>>> tree,cn=config
>>>>> nsDS5ReplicaHost: ldap1.MyDomain.com
>>>>> nsds5replicaLastUpdateStatus: Error (0) No replication sessions
>>>>> started since server startup
>>>>> #===== ldap2 =====#
>>>>> dn: cn=2eLDAPror,cn=replica,cn=o\3Dnetscaperoot,cn=mapping
>>>>> tree,cn=config
>>>>> nsDS5ReplicaHost: ldap3.MyDomain.com
>>>>> nsds5replicaLastUpdateStatus: Error (0) No replication sessions
>>>>> started since server startup
>>>>>
>>>>> I have tried various tricks to recover that replication, but w/o
>>>>> luck...
>>>>>
>>>>> When I check (for example ldap1) with this:
>>>>> $ ldapsearch -xLLLo ldif-wrap=no -H ldaps://ldap1.MyDomain.com -D
>>>>> 'cn=directory manager' -w ${DMPASS} -b o=netscapeRoot
>>>>> '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
>>>>>
>>>>>
>>>>>
>>>>> I get as result:
>>>>> dn: cn=replica,cn=o\3Dnetscaperoot,cn=mapping tree,cn=config
>>>>> objectClass: nsDS5Replica
>>>>> objectClass: top
>>>>> nsDS5ReplicaRoot: o=netscaperoot
>>>>> nsDS5ReplicaType: 3
>>>>> nsDS5Flags: 1
>>>>> nsDS5ReplicaId: 11
>>>>> nsds5ReplicaPurgeDelay: 604800
>>>>> nsDS5ReplicaBindDN: cn=replication manager,cn=config
>>>>> nsDS5ReplicaReferral: ldap://ldap2.MyDomain.com:636/o%3dnetscaperoot
>>>>> cn: replica
>>>>> nsState:: CwAAAAAAAACRKiRZAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAA==
>>>>> nsDS5ReplicaName: dc964102-1dd111b2-8970c75e-63880000
>>>>> nsds50ruv: {replicageneration} 4dcb9f790000000b0000
>>>>> nsds50ruv: {replica 11 ldap://ldap1.MyDomain.com:0}
>>>>> nsds50ruv: {replica 21 ldap://ldap2.MyDomain.com:0}
>>>>> 4dda4a3a000000150000 4fd5f742000300150000
>>>>> nsds5agmtmaxcsn:
>>>>> o=netscaperoot;2eLDAPror;ldap3.MyDomain.com;636;unavailable
>>>>> nsruvReplicaLastModified: {replica 11 ldap://ldap1.MyDomain.com:0}
>>>>> 00000000
>>>>> nsruvReplicaLastModified: {replica 21 ldap://ldap2.MyDomain.com:0}
>>>>> 00000000
>>>>> nsds5ReplicaChangeCount: 1
>>>>> nsds5replicareapactive: 0
>>>>>
>>>>> Tried to CleanRUV (ldif applied with ldapmodify command to all
>>>>> suppliers and consumers):
>>>>>
>>>>> $ cat /tmp/ldap.cleanRUV-tasks-for-netscapeRoot-replica.11.ldif
>>>>> dn: cn=replica,cn=o\3Dnetscaperoot,cn=mapping tree,cn=config
>>>>> changetype: modify
>>>>> replace: nsds5task
>>>>> nsds5task: CLEANRUV11
>>>>>
>>>>> At some moment, ldap1 replied:
>>>>> "ldap_modify: Server is unwilling to perform (53)"
>>>>>
>>>>> which explains nothing, because that error means:
>>>>>
>>>>> "Indicates that the LDAP server cannot process the request because of
>>>>> server-defined restrictions. This error is returned for the following
>>>>> reasons: The add entry request violates the server's structure
>>>>> rules...OR...The modify attribute request specifies attributes that
>>>>> users cannot modify...OR...Password restrictions prevent the
>>>>> action...OR...Connection restrictions prevent the action. "
>>>>>
>>>>> Right now, CleanRUV task is stuck...
>>>> You should be using the cleanAllRUV task:
>>>>
>>>> https://access.redhat.com/documentation/en-us/red_hat_directory_server/10/html/configuration_command_and_file_reference/perl_scripts#cleanallruv.pl
>>>>
>>>>
>>>
>>> Hi Mark,
>>>
>>> I have tried perl script from above:
>>>
>>> LDAP1# /usr/sbin/cleanallruv.pl -v -Z ldap1 -D "cn=directory manager"
>>> -w ${DMPASS} -b "dn:\
>>> cn=2eLDAPmmr,cn=replica,cn=o\3Dnetscaperoot,cn=mapping tree,cn=config"
>>> -r 11 -P LDAPS
>>
>> Hi Predrag,
>>
>> Close, but it's this:
>>
>> /usr/sbin/cleanallruv.pl -v -Z ldap1 -D "cn=directory manager" -w
>> ${DMPASS} -b "o=netscaperoot" -r 11 -P LDAPS
>>
>> Regards,
>> Mark
>
> Hi Mark,
>
> that also did not help.
What happened exactly? Was anything logged in the DS errors log?
/var/log/dirsrv/slapd-INSTANCE/errors

I'm assuming you ran the script on ldap1. The task uses the replication
agreements to contact and clean the other replicas. It always cleans
itself last. So ldap1 is the last server to be cleaned. It looks like
ldap1 could not properly contact ldap2 and clean it, but it did clean
ldap3. So by default the task waits until ldap2 is cleaned before
cleaning itself.

Based on what you said below, we have two options:

[1] Identify and fix the replication issue that is preventing
cleanallruv from contacting/cleaning ldap2. I can elaborate on this if
you decide to go in this direction.

[2] Stop replicating o=netscaperoot. Disabling replication and
removing replication agreements is simple - its just deleting a few
entries from cn=config.

# ldapdelete -D "cn=directory manager" -W -r
"cn=replica,cn=o\=netscaperoot,cn=mapping tree,cn=config"


Regards,
Mark
> Right now, when searching all replicas [using: -b "o=netscapeRoot"
> '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
> as base and filter]
>
> I get (very confusing, inconsistent) results:
> ### ldap1 ###
> dn: cn=replica,cn=o\3Dnetscaperoot,cn=mapping tree,cn=config
> nsds50ruv: {replica 11 ldap://ldap1.MyDomain.com:0}
> ### ldap2 ###
> dn: cn=replica,cn=o\3Dnetscaperoot,cn=mapping tree,cn=config
> nsds50ruv: {replica 21 ldap://ldap2.MyDomain.com:0}
> nsds50ruv: {replica 11 ldap://ldap1.MyDomain.com:0}
> ### ldap3 ###
> dn: cn=replica,cn=o\3Dnetscaperoot,cn=mapping tree,cn=config
> nsds50ruv: {replica 21 ldap://ldap2.MyDomain.com:0}
> 4dda4a3a000000150000 4fd5f742000300150000
>
> I can afford to live without that replication (since we are not using
> admin server at all), so next question is HOW to permanently remove
> all agreements for NetscapeRoot from all servers involved?
>
> Thank you in advance for your time.
>
> With best regards.
> Predrag Zečević
>>
>>> ldap_initialize( ldaps://ldap1.MyDomain.com:636/??base )
>>> ldap_add: Operations error (1)
>>> additional info: Could not find replica from dn((null))
>>> Failed to add task entry "cn=cleanallruv_2017_6_2_14_31_58,
>>> cn=cleanallruv, cn=tasks, cn=config" error (1)
>>>
>>> LDAP1# /usr/sbin/cleanallruv.pl -v -Z ldap1 -D "cn=directory manager"
>>> -w ${DMPASS} -b
>>> "cn=2eLDAPmmr,cn=replica,cn=o\3Dnetscaperoot,cn=mapping
>>> tree,cn=config" -r 11 -P LDAPS
>>> ldap_initialize( ldaps://ldap1.MyDomain.com:636/??base )
>>> ldap_add: Operations error (1)
>>> additional info: Could not find replica from
>>> dn(cn=2eLDAPmmr,cn=replica,cn=o\3Dnetscaperoot,cn=mapping
>>> tree,cn=config)
>>> Failed to add task entry "cn=cleanallruv_2017_6_2_14_42_1,
>>> cn=cleanallruv, cn=tasks, cn=config" error (1)
>>>
>>> Replica DN specified as '"cn=replica,cn=o\3Dnetscaperoot,cn=mapping
>>> tree,cn=config"' also fails... Although THIS was returned as DN from
>>> ldapsearch command:
>>> $ ldapsearch -xLLLo ldif-wrap=no -H ldaps://ldap1.MyDomain.com -D
>>> 'cn=directory manager' -w ${DMPASS} -b o=netscapeRoot
>>> '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
>>>
>>>
>>> What string I have to specify as replica DN?
>>>
>>> Thanks in advance.
>>>
>>> With best regards.
>>> Predrag Zečević
>>>
>>>> or read up on:
>>>>
>>>> http://www.port389.org/docs/389ds/howto/howto-cleanruv.html#cleanallruv
>>>>
>>>>
>>>> It also looks like your replicas are not initialized - so I would also
>>>> try that after cleaning out the old replica ids(ruvs).
>>>>> and replication is still broken... Similar situation is present on
>>>>> ldap2, with RUV 21 (if not worse):
>>>>>
>>>>> dn: cn=replica,cn=o\3Dnetscaperoot,cn=mapping tree,cn=config
>>>>> objectClass: nsDS5Replica
>>>>> objectClass: top
>>>>> nsDS5ReplicaRoot: o=netscaperoot
>>>>> nsDS5ReplicaType: 3
>>>>> nsDS5Flags: 1
>>>>> nsDS5ReplicaId: 21
>>>>> nsds5ReplicaPurgeDelay: 604800
>>>>> nsDS5ReplicaBindDN: cn=replication manager,cn=config
>>>>> nsDS5ReplicaReferral: ldap://ldap1.MyDomain.com:636/o%3dnetscaperoot
>>>>> cn: replica
>>>>> nsState:: FQAAAAAAAADeiyVZAAAAAAAAAAAAAAAAAQAAAAAAAAABAAAAAAAAAA==
>>>>> nsDS5ReplicaName: cb016902-1dd111b2-821cbcea-f7780000
>>>>> nsds50ruv: {replicageneration} 4dcb9f790000000b0000
>>>>> nsds50ruv: {replica 21 ldap://ldap2.MyDomain.com:0}
>>>>> 4dda4a3a000000150000 4fd5f742000300150000
>>>>> nsds5agmtmaxcsn:
>>>>> o=netscaperoot;2eLDAPmmr;ldap1.MyDomain.com;636;unavailable
>>>>> nsds5agmtmaxcsn:
>>>>> o=netscaperoot;2eLDAPror;ldap3.MyDomain.com;636;unavailable
>>>>> nsruvReplicaLastModified: {replica 21 ldap://ldap2.MyDomain.com:0}
>>>>> 00000000
>>>>> nsds5ReplicaChangeCount: 1
>>>>> nsds5replicareapactive: 0
>>>>>
>>>>>
>>>>> # What would be proper way to get out from this situation?
>>>>> # Do I have to execute CleanAllRUV task and start replication from
>>>>> scratch or there is better way?
>>>>>
>>>>> BTW, loglevel is set to 8192, so from ldap1 logs:
>>>>> $ sudo grep cleanruv_task: /var/log/dirsrv/slapd-ldap?/errors
>>>>> [31/May/2017:09:11:39 +0200] NSMMReplicationPlugin - cleanruv_task:
>>>>> cleaning rid (11)...
>>>>>
>>>>> we see that task is "started" and never finished
>>>>>
>>>>> Any advice or documentation (which is more up-2-date) than:
>>>>> *
>>>>> http://directory.fedoraproject.org/docs/389ds/howto/howto-cleanruv.html#cleanruv
>>>>>
>>>>>
>>>>> *
>>>>> https://access.redhat.com/documentation/en-us/red_hat_directory_server/9.0/html/administration_guide/managing_replication-solving_common_replication_conflicts
>>>>>
>>>>>
>>>>> *
>>>>> http://directory.fedoraproject.org/docs/389ds/FAQ/troubleshoot-cleanallruv.html
>>>>>
>>>>>
>>>>> (CleanRUV FAQ troubleshooting is missing at all)
>>>>>
>>>>> is welcome.
>>>>>
>>>>> With best regards.
>>>>> Predrag Zečević
>>>>
>>>
>>
>
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org

No comments:

Post a Comment