On 13-07-2022 16:31, Thierry Bordaz wrote:
Hi Thierry,*** EXTERNAL E-MAIL ***
On 7/13/22 3:18 PM, Kees Bakker wrote:
On 13-07-2022 13:39, Kees Bakker wrote:
On 13-07-2022 13:01, Kees Bakker wrote:
Hi,
[...]
In other words, with 1.4.3.28 I don't get to see the message with first_time and cur_time. I'm
quite puzzled how that can happen. The code is like this (stripped a bit):
if (!ts.ts_s_trimming) {Puzzled, because I don't understand why "cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)"
int must_trim = 0;
/* See if we need to trim */
/* Has enough time elapsed since our last check? */
if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) {
/* Is the first entry too old? */
time_t first_time;
...
slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
"cltrim: ldrc=%d, first_time=%ld, cur_time=%ld\n",
ldrc, first_time, cur_time);
if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
first_time + ts.ts_c_max_age < now_maxage)
{
must_trim = 1;
}
}
if (must_trim) {
...
} else {
slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
"retrocl_housekeeping - changelog does not need to be trimmed\n");
}
}
is FALSE.
Unless, ...
cur_time is the relative time since start of the server (not sure if this is true,
but the code in eq_call_all_rel() seems to suggest it)
ts.ts_s_last_trim is 0 at startup
Shouldn't we compare "first_time" with the current (non-relative) time?
First an answer to Pierre's question
I had nsslapd-changelogmaxage set to 480d. (The reason for that is that I just want to trim
the small subset.)
I changed maxage back to the default 2d. Now I see the this message, quickly after restart.
Jul 13 14:33:13 iparep4.example.com ns-slapd[141407]: [13/Jul/2022:14:33:13.806790126 +0200] - DEBUG - DSRetroclPlugin - cltrim: ldrc=0, first_time=1615990532, cur_time=12006600I'm now convinced that the logic is (still) flawed. There is still a mix of UTC time and relative time.
int must_trim = 0;
/* See if we need to trim */
/* Has enough time elapsed since our last check? */
if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) { <<<< wrong condition
/* Is the first entry too old? */
time_t first_time;
time_t now_maxage = slapi_current_utc_time(); /* real time for trimming candidates */
Hi Kees,
The mix of monotonic/real time was already an issue (https://github.com/389ds/389-ds-base/issues/4869) that was fixed in you version 1.4.3.28-6.
I think 'first_time' is coming from the first registered record in the retroCL. I wonder if it could be because of an old retroCL DB containing an old value, not matching current monotonic time. Could you dump the first record of the retroCL (dbscan) ?
regards
thierry
Here is an example of an old changelog entry.
[root@iparep4 ~]# ldapsearch -H ldaps://$HOSTNAME -D "cn=Directory Manager" -y p -LLL -o ldif-wrap=no -b 'changenumber=20000,cn=changelog'
dn: changenumber=20000,cn=changelog
objectClass: top
objectClass: changelogentry
objectClass: extensibleObject
targetuniqueid: 23f6980a-87f611eb-872490e8-ab7c8ee8
changeNumber: 20000
targetDn: idnsName=110,idnsname=30.16.172.in-addr.arpa.,cn=dns,dc=example,dc=com
changeTime: 20210318152832Z
changeType: delete
Yes, I know about issue 4869. That's why I am now trying 1.4.3.28
But I want you to take another look at this line
if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) {This code has the effect of using "maxage" to determine when to do the trim run!!!.
This is wrong.
And a few lines further
if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&Here "maxage" is used the way it should be used, as a maximum age for the changelog
first_time + ts.ts_c_max_age < now_maxage)
{
must_trim = 1;
}
entries. This is good.
If I set maxage to 480 then I have to wait until 480 days after restarting the server before it finally
decides to look at first_time.
Maybe now is also a good time to explain why I set a much higher maxage value.
When trimming finally kicks in (since I now have maxage 2d) then entries are being deleted
at an extreemly slow pace, roughly one per second. But the worse part is, the server becomes
unresponsive after a few minutes. All connected LDAP clients hang, without getting a timeout.
And because plugin debug messages are enabled, I can see that it stopped deleting entries.
-- Kees
No comments:
Post a Comment