we have fix to address the incorrcet positioning in the changelog (using a csn of a consumer which is ahead for the given replicaid) and so also would prevent these messages.
It still has to be tested, but I am wondering if you want to test it as well.
Regards,
Ludwig
On 09/07/2016 03:33 PM, Ivanov Andrey (M.) wrote:
De: "Ludwig Krispenz" <lkrispen@redhat.com>
À: 389-users@lists.fedoraproject.org
Envoyé: Mercredi 7 Septembre 2016 12:48:38
Objet: [389-users] Re: 389DS v1.3.4.x after fixes for tickets 48766 and 48954
no more need for this, I found the messages in a deployment where repl logging was enabled. I think it happens when the smallest consumer maxCSN is ahead of the local maxCSN for this replicaID.so far I have not seen any replication problems related to these messages, all generatedcsns seem to be replicated. What makes it a bit more difficult is that most of the updates are updates of lastlogintime and the original MOD is not logged. I still do not understand why we have these messages so frequently, I will try to reproduce.
the fixes for the tickets you mention did change the iteration thru the changelog and how it handles situtations when the start csn is not found in the changelog. and it also did change the logging, so you might see messages now which were not there or hidden before.That was my understanding too.
Or, if it possible, could you run the servers for just an hour with replication logging enabled ?
It should do no harm, but in some scenarios could slow down replication a bit.
I will continue to investigate and work on a fixOk, thank you. And yes, as you say apparently it does no harm - i check the consistency of three replicated servers from time to time and there is no data discrepancy between these servers, .
Anyway, enabling replication logging on production servers is not something easily done, mainly due to performance reasons. And i was not able to reproduce the problem in our test environment with 2 replicated servers, maybe the charge or frequency of connections updating lastlogintime attribute was not high enough in test environment. Or the three-server full-replicated topology makes things a bit different too with one or two additional hops for the same mod arriving to the consumer by two different paths.
When looking into the provided data set I did notice three replicated ops with err=50, insufficient access. This should not happen and requires a separate investigation
Yes, i see the three modifications you are talking about. it is present only on one server of three. Strange indeed. No more err=50 in replicated ops today on any of the servers, i've just checked.
-- 389-users mailing list 389-users@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/389-users@lists.fedoraproject.org
-- Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander
No comments:
Post a Comment