Wednesday, July 13, 2022

[389-users] Re: Retro Changelog trimming not working



On 13-07-2022 16:31, Thierry Bordaz wrote:
*** EXTERNAL E-MAIL ***


On 7/13/22 3:18 PM, Kees Bakker wrote:
On 13-07-2022 13:39, Kees Bakker wrote:
On 13-07-2022 13:01, Kees Bakker wrote:
Hi,

[...]
In other words, with 1.4.3.28 I don't get to see the message with first_time and cur_time. I'm
quite puzzled how that can happen. The code is like this (stripped a bit):
    if (!ts.ts_s_trimming) {
        int must_trim = 0;
        /* See if we need to trim */
        /* Has enough time elapsed since our last check? */
        if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) {
            /* Is the first entry too old? */
            time_t first_time;
...
            slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
                          "cltrim: ldrc=%d, first_time=%ld, cur_time=%ld\n",
                          ldrc, first_time, cur_time);
            if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
                first_time + ts.ts_c_max_age < now_maxage)
            {
                must_trim = 1;
            }
        }
        if (must_trim) {
...
        } else {
            slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
                          "retrocl_housekeeping - changelog does not need to be trimmed\n");
        }
    }

Puzzled, because I don't understand why "cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)"
is FALSE.

Unless, ...
cur_time is the relative time since start of the server (not sure if this is true,
but the code in eq_call_all_rel() seems to suggest it)
ts.ts_s_last_trim is 0 at startup

Shouldn't we compare "first_time" with the current (non-relative) time?

First an answer to Pierre's question
I had nsslapd-changelogmaxage set to 480d. (The reason for that is that I just want to trim
the small subset.)

I changed maxage back to the default 2d. Now I see the this message, quickly after restart.
Jul 13 14:33:13 iparep4.example.com ns-slapd[141407]: [13/Jul/2022:14:33:13.806790126 +0200] - DEBUG - DSRetroclPlugin - cltrim: ldrc=0, first_time=1615990532, cur_time=12006600
I'm now convinced that the logic is (still) flawed. There is still a mix of UTC time and relative time.
        int must_trim = 0;
        /* See if we need to trim */
        /* Has enough time elapsed since our last check? */
        if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) {      <<<< wrong condition
            /* Is the first entry too old? */
            time_t first_time;
            time_t now_maxage = slapi_current_utc_time(); /* real time for trimming candidates */


Hi Kees,

The mix of monotonic/real time was already an issue (https://github.com/389ds/389-ds-base/issues/4869) that was fixed in you version 1.4.3.28-6.

I think 'first_time' is coming from the first registered record in the retroCL.  I wonder if it could be because of an old retroCL DB containing an old value, not matching current monotonic time. Could you dump the first record of the retroCL (dbscan) ?

regards
thierry

Hi Thierry,

Here is an example of an old changelog entry.

[root@iparep4 ~]# ldapsearch -H ldaps://$HOSTNAME -D "cn=Directory Manager" -y p -LLL -o ldif-wrap=no -b 'changenumber=20000,cn=changelog'
dn: changenumber=20000,cn=changelog
objectClass: top
objectClass: changelogentry
objectClass: extensibleObject
targetuniqueid: 23f6980a-87f611eb-872490e8-ab7c8ee8
changeNumber: 20000
targetDn: idnsName=110,idnsname=30.16.172.in-addr.arpa.,cn=dns,dc=example,dc=com
changeTime: 20210318152832Z
changeType: delete

Yes, I know about issue 4869. That's why I am now trying 1.4.3.28

But I want you to take another look at this line
if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) {
This code has the effect of using "maxage"  to determine when to do the trim run!!!.
This is wrong.

And a few lines further
            if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
                first_time + ts.ts_c_max_age < now_maxage)
            {
                must_trim = 1;
            }
Here "maxage" is used the way it should be used, as a maximum age for the changelog
entries. This is good.

If I set maxage to 480 then I have to wait until 480 days after restarting the server before it finally
decides to look at first_time.

Maybe now is also a good time to explain why I set a much higher maxage value.
When trimming finally kicks in (since I now have maxage 2d) then entries are being deleted
at an extreemly slow pace, roughly one per second. But the worse part is, the server becomes
unresponsive after a few minutes. All connected LDAP clients hang, without getting a timeout.
And because plugin debug messages are enabled, I can see that it stopped deleting entries.
-- Kees

No comments:

Post a Comment