Friday, May 18, 2018

[389-users] Re: Replication Delay

On Fri, 2018-05-18 at 18:39 +0000, Fong, Trevor wrote:
> Hi Everyone,
>
> Hazzah! I've finally licked the slow (and erratic) replication
> between our 1.2 -> 1.3 clusters!
> The problem was that when I was setting up the 1.3 cluster, I'd done
> it with a view to replace the 1.2 cluster.
> In making that assumption, I'd set up the cluster in
> isolation. Everything worked as it was supposed to, but I didn't
> occur to me to set the masters up with different replica ID's from
> those in the 1.2 cluster. When I hooked the 1.3 cluster up to the
> 1.2 cluster, replication into the 1.3 was slow and sometimes it would
> just break.
>
> Rebuilding the 1.3 cluster with unique replica ID's for all master
> nodes across both clusters resolved the problem.

Great work to find this. I think we say you need unique rids on all
masters in the docs, but we don't enforce it at a programming level.

TBH I really want rids to be allocated by the server tools - there is
some things in the works for this, but they are not yet ready.

Better idea is rids should just be guid's, but I don't think Ludwig or
I want to rewrite all of replication for this :)

>
> Thanks everyone for their helpful comments.
> Trev
>
>
> On 2018-02-20, 4:13 PM, "Mark Reynolds" <mreynolds@redhat.com>
> wrote:
>
>
>
> On 02/20/2018 06:53 PM, William Brown wrote:
> > On Tue, 2018-02-20 at 23:36 +0000, Fong, Trevor wrote:
> >> Hi William,
> >>
> >> Thanks a lot for your reply.
> >>
> >> That's correct - replication schedule is not enabled.
> >> No - there are definitely changes to replicate - I know, I
> made the
> >> change myself (
> >> I changed the "description" attribute on an account, but it
> takes up
> >> to 15 mins for the change to appear in the 1.3 master.
> >> That master replicates to another master and a bunch of other
> hubs.
> >> Those hubs replicate amongst themselves and a bunch of
> consumers.
> > So to be correct in my understanding:
> >
> > 1.2 <-> 1.3 --> [ group of hubs/consumers ]
> >
> > Yes?
> >
> >> The update can take up to 15 mins to make it from the 1.2
> master,
> >> into the 1.3 master; but once it hits the 1.3 master, it is
> >> replicated around the 1.3 cluster within 1 sec.
> >>
> >> Only memberOf is disallowed for fractional replication.
> >>
> >> Can anyone give me any guidance as to the settings of the
> "backoff"
> >> and other parameters? Any doc links that may be useful?
> > Mark? You wrote thisn, I can't remember what it's called ....
> Before we should adjust the back off min and max values, we need
> to
> determine why 1.2.11 is having a hard time updating 1.3.6. 1.3.6
> is
> just receiving updates, so it's 1.2.11 that "seems" to be
> misbehaving.
> So... Is there anything in the errors log on 1.2.11? It wouldn't
> hurt
> to check 1.3.6, but I think 1.2.11 is where we will find our
> answer.
>
> If there is noting in the log, then turn on replication logging
> and do
> your test update. Once the update hits 1.3.6 turn replication
> logging
> off. Then we can look at the logs and see what happens with your
> test
> update.
>
> But as requested here is the backoff min & max info:
>
> http://www.port389.org/docs/389ds/design/replication-retry-settin
> gs.html
>
> >
> >> Thanks a lot,
> >> Trev
> >>
> >>
> >> On 2018-02-18, 3:32 PM, "William Brown" <william@blackhats.net
> .au>
> >> wrote:
> >>
> >> On Sat, 2018-02-17 at 01:49 +0000, Fong, Trevor wrote:
> >> > Hi Everyone,
> >> >
> >> > I've set up a new 389 DS cluster (389-Directory/1.3.6.1
> >> > B2018.016.1710) and have set up a replication agreement
> from
> >> our old
> >> > cluster (389-Directory/1.2.11.15 B2014.300.2010) to a
> master
> >> node in
> >> > the new cluster. Problem is that updates in the old
> cluster
> >> take up
> >> > to 15 mins to make it into the new cluster. We need it
> to be
> >> near
> >> > instantaneous, like it normally is. Any ideas what I
> can
> >> check?
> >>
> >> I am assuming you don't have a replication schedule
> enabled?
> >>
> >> In LDAP replication is always "eventual". So a delay isn't
> >> harmful.
> >>
> >> But there are many things that can influence this. Ludwig
> is the
> >> expert, and I expect he'll comment here.
> >>
> >> Only one master may be "replicating" to a server at a
> time. So if
> >> your
> >> 1.3 server is replicating with other servers, then your
> 1.2
> >> server may
> >> have to "wait it's turn".
> >>
> >> There is a replication 'backoff' timer, that sets how long
> it
> >> tries and
> >> scales these attempts too. I'm not sure if 1.2 has this or
> not
> >> though.
> >>
> >> Another reason could be there are no changes to be
> replicated,
> >> replication only runs when there is something to do. So
> your 1.2
> >> server
> >> may have no changes, or it could be eliminating the
> changes with
> >> fractional replication.
> >>
> >> Finally, it's very noisy but you could consider enabling
> >> replication
> >> logging to check what's happening.
> >>
> >> I hope that helps,
> >>
> >>
> >>
> >> >
> >> > Thanks a lot,
> >> > Trev
> >> >
> >> > _________________________________________________
> >> > Trevor Fong
> >> > Senior Programmer Analyst
> >> > Information Technology | Engage. Envision. Enable.
> >> > The University of British Columbia
> >> > trevor.fong@ubc.ca | 1-604-827-5247 | it.ubc.ca
> >> >
> >> > _______________________________________________
> >> > 389-users mailing list -- 389-users@lists.fedoraproject.
> org
> >> > To unsubscribe send an email to 389-users-leave@lists.fe
> dorapro
> >> ject.o
> >> > rg
> >> --
> >> Thanks,
> >>
> >> William Brown
> >> _______________________________________________
> >> 389-users mailing list -- 389-users@lists.fedoraproject.or
> g
> >> To unsubscribe send an email to 389-users-leave@lists.fedo
> raproje
> >> ct.org
> >>
> >>
> >> _______________________________________________
> >> 389-users mailing list -- 389-users@lists.fedoraproject.org
> >> To unsubscribe send an email to 389-users-leave@lists.fedorapr
> oject.o
> >> rg
> _______________________________________________
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-leave@lists.fedoraproje
> ct.org
>
>
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/L5F3BRALCWW34YFHPEMST2EGSAHZMBVD/

No comments:

Post a Comment