Can you turn on replication logging? I think it's level 8192 in the errorlog-level.
> On 4 Mar 2021, at 11:12, Colin Tulloch <Colin.Tulloch@entrust.com> wrote:
> Hello –
> We are seeing an issue where changes can be very slow to replicate to one of our consumers (up to 15m+). We have a large topology, but in this case the issue is isolated between 2 masters that replicate to 1 consumer.
> In one example entry addition I found, it appears that we see;
> - connection from master1 to consumer1, bunch of changes pushed
> - then master2 connects to consumer1
> - master1 stops doing changes but stays connected
> - master2 does literally 1 change (success, no issues), stays connected
> - 16 minutes goes by, no additional changes or replication EXT ops for that connection are done (master1->consumer1 EXT ops continue normally…)
> - after that long pause, master2 disconnects - B1 bad BER tag code
> - and then Master1 resumes making tons of changes
> Searches in this DB and others continue to take place, and changes in other DBs. So it wasn't as if the server was unresponsive/hung. It is almost as if that DB went "read-only" for a time – I'm unable to tell if something else besides replication was attempting but unable to make writes though.
> Anyone see something like this before? We see lots of B1 codes randomly, I've never understood what may cause that – the description of corruption/physical network problems does not make much sense. Maybe if our directories were replicating to eachother over the internet, or using Wifi….
> The time it takes for that connection to end in the B1 doesn't seem to line up with any dirsrv OR system/TCP timeouts either
> Nothing illuminating in the error logs really.
> Log snips of this happening;
> [03/Mar/2021:13:50:34.395939343 -0600] conn=270076 fd=280 slot=280 connection from master1 to consumer1
> [03/Mar/2021:13:50:34.608149165 -0600] conn=270076 op=13 MOD dn="cn=CRLblahblah,c=US"
> [03/Mar/2021:13:50:34.609565373 -0600] conn=270076 op=13 RESULT err=0 tag=103 nentries=0 etime=0.001451229 csn=603febdd00035aae0000
> [03/Mar/2021:13:50:34.818645762 -0600] conn=270076 op=14 EXT oid="2.16.840.1.1137126.96.36.199" name="replication-multimaster-extop"
> [03/Mar/2021:13:50:34.820544342 -0600] conn=270076 op=14 RESULT err=0 tag=120 nentries=0 etime=0.002004143
> [03/Mar/2021:13:50:34.460210520 -0600] conn=270077 fd=307 slot=307 connection from master2 to consumer1
> [03/Mar/2021:13:50:34.460676562 -0600] conn=270077 op=0 BIND dn="cn=replication manager,cn=config" method=128 version=3
> [03/Mar/2021:13:50:34.460992487 -0600] conn=270077 op=0 RESULT err=0 tag=97 nentries=0 etime=0.000376400 dn="cn=replication manager,cn=config"
> <snipped replication startup jargon>
> [03/Mar/2021:13:50:35.715331361 -0600] conn=270077 op=5 MOD dn="cn=CRLblahblah,c=US"
> [03/Mar/2021:13:50:35.717330540 -0600] conn=270077 op=5 RESULT err=0 tag=103 nentries=0 etime=0.002066162 csn=603febdd00045aae0000
> [03/Mar/2021:13:50:36.828647054 -0600] conn=270076 op=15 EXT oid="2.16.840.1.1137188.8.131.52" name="replication-multimaster-extop"
> [03/Mar/2021:13:50:36.828888493 -0600] conn=270076 op=15 RESULT err=0 tag=120 nentries=0 etime=0.000368293
> [03/Mar/2021:13:50:37.112334122 -0600] conn=270076 op=16 EXT oid="2.16.840.1.1137184.108.40.206" name="replication-multimaster-extop"
> [03/Mar/2021:13:50:37.112782309 -0600] conn=270076 op=16 RESULT err=0 tag=120 nentries=0 etime=0.000624719
> [03/Mar/2021:13:50:38.113792312 -0600] conn=270076 op=17 EXT oid="2.16.840.1.1137220.127.116.11" name="replication-multimaster-extop"
> [03/Mar/2021:13:50:38.114092751 -0600] conn=270076 op=17 RESULT err=0 tag=120 nentries=0 etime=0.000476209
> continued EXT ops on conn=270076, from master1
> [03/Mar/2021:14:06:07.872623403 -0600] conn=270077 op=-1 fd=307 closed - B1
> [03/Mar/2021:14:06:07.916640931 -0600] conn=270076 op=1106 EXT oid="2.16.840.1.113718.104.22.168" name="replication-multimaster-extop"
> [03/Mar/2021:14:06:07.917191372 -0600] conn=270076 op=1106 RESULT err=0 tag=120 nentries=0 etime=0.000662946
> [03/Mar/2021:14:06:07.921837193 -0600] conn=270076 op=1107 MOD dn="cn=CRLblahblah,c=US"
> [03/Mar/2021:14:06:07.923754219 -0600] conn=270076 op=1107 RESULT err=0 tag=103 nentries=0 etime=0.002036469 csn=603febdd00065aae0000
> and a flood of changes now
> Colin Tulloch
> Architect, USmPKI
> 389-users mailing list -- firstname.lastname@example.org
> To unsubscribe send an email to email@example.com
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://firstname.lastname@example.org
> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Senior Software Engineer, 389 Directory Server
SUSE Labs, Australia
389-users mailing list -- email@example.com
To unsubscribe send an email to firstname.lastname@example.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://email@example.com
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure