Thursday, March 6, 2025

[389-users] Re: New error messages with EL9

Ok I know what is going now...

We get this error when we try to end a replication session while the start of a replication session request is still in progress.

Replica access log (conn=7 is the supplier):

[05/Mar/2025:21:26:45.651516142 -0500] conn=7 op=19 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multisupplier-extop" --> START SESSION
[05/Mar/2025:21:26:45.666796859 -0500] conn=7 op=20 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multisupplier-extop" --> END SESSION
[05/Mar/2025:21:26:45.694423788 -0500] conn=7 op=20 RESULT err=2 tag=120 nentries=0 wtime=0.000123842 optime=0.027627701 etime=0.027750453 --> END SESSION RESULT

    ---> This "err=2" is what triggers this decoding error on the supplier:  [05/Mar/2025:21:26:45.706707732 -0500] - ERR - decode_repl_ext_response - decoding failed: response_code (Ok) ruv_bervals (Ok) data_guid (Ok) data (Ok) bvdata (No data)

[05/Mar/2025:21:26:45.664625578 -0500] conn=7 op=19 RESULT err=0 tag=120 nentries=0 wtime=0.000175835 optime=0.013111958 etime=0.013287409 --> START SESSION RESULT


So supplier 1 tries to start a replication session, but before the replica can send the response to that "start" request, the same supplier tries to "end" it.  This is what triggers that decoding error.  It still appears to be harmless, but there is some race condition going on here.  I'm going to open a new ticket to investigate it more thoroughly.

Mark


On 3/5/25 4:45 PM, Orion Poplawski wrote:
Mark,    I don't see that string in any of the /var/log/dirsrv/slapd-*/errors* files on  our IPA servers.    Thanks,    Orion    On 3/5/25 14:38, Mark Reynolds wrote:  
Orion,    Can you please check all your replicas to if see any error messages contain  "encoding failed" ?    That is the only case I can find where the bvdata is NULL during a replica  release extended operation.    Thanks,    Mark    On 2/25/25 12:38 PM, Orion Poplawski wrote:  
No sign of replication being broken, but we do get this reasonably regularly.  Would it be worth reporting anywhere or any other info the might be useful?    Thanks for the resonse,    Orion    On 2/3/25 13:27, Mark Reynolds wrote:  
Hi Orion,    These two messages just mean that the replication session did not close/end  correctly.  It is harmless on its own - unless replication is broken.  We have  been seeing these errors recently when a replica is being released, but no way  to reproduce it yet.    Not sure about the content-sync-plugin warning as I'm not familiar with that  plugin.    Mark      On 2/3/25 10:44 AM, Orion Poplawski via 389-users wrote:  
[02/Feb/2025:20:57:56.553507369 -0700] - ERR - decode_repl_ext_response -  decoding failed: response_code (Ok) ruv_bervals (Ok) data_guid (Ok) data  (Ok) bvdata (No data)  [02/Feb/2025:20:57:56.555181048 -0700] - ERR - NSMMReplicationPlugin -  release_replica - agmt="cn=IPA2-IPA1" (IPA1:389): Unable to parse the  response  to the endReplication extended operation.  
  
    
--   Identity Management Development Team

No comments:

Post a Comment