Wednesday, July 13, 2022

[389-users] Re: Retro Changelog trimming not working


On 7/13/22 5:08 PM, Kees Bakker wrote:


On 13-07-2022 16:31, Thierry Bordaz wrote:
*** EXTERNAL E-MAIL ***


On 7/13/22 3:18 PM, Kees Bakker wrote:
On 13-07-2022 13:39, Kees Bakker wrote:
On 13-07-2022 13:01, Kees Bakker wrote:
Hi,

[...]
In other words, with 1.4.3.28 I don't get to see the message with first_time and cur_time. I'm
quite puzzled how that can happen. The code is like this (stripped a bit):
    if (!ts.ts_s_trimming) {
        int must_trim = 0;
        /* See if we need to trim */
        /* Has enough time elapsed since our last check? */
        if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) {


I agree with you, the use of ts_c_max_age is wrong. The trimming mechanism should run periodically, whatever the maxage of the records. Here we should check an interval of time, (each hour or 10 min...). Would you please open a ticket for that ?

thanks
thierry

            /* Is the first entry too old? */
            time_t first_time;
...
            slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
                          "cltrim: ldrc=%d, first_time=%ld, cur_time=%ld\n",
                          ldrc, first_time, cur_time);
            if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
                first_time + ts.ts_c_max_age < now_maxage)
            {
                must_trim = 1;
            }
        }
        if (must_trim) {
...
        } else {
            slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
                          "retrocl_housekeeping - changelog does not need to be trimmed\n");
        }
    }

Puzzled, because I don't understand why "cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)"
is FALSE.

Unless, ...
cur_time is the relative time since start of the server (not sure if this is true,
but the code in eq_call_all_rel() seems to suggest it)
ts.ts_s_last_trim is 0 at startup

Shouldn't we compare "first_time" with the current (non-relative) time?

First an answer to Pierre's question
I had nsslapd-changelogmaxage set to 480d. (The reason for that is that I just want to trim
the small subset.)

I changed maxage back to the default 2d. Now I see the this message, quickly after restart.
Jul 13 14:33:13 iparep4.example.com ns-slapd[141407]: [13/Jul/2022:14:33:13.806790126 +0200] - DEBUG - DSRetroclPlugin - cltrim: ldrc=0, first_time=1615990532, cur_time=12006600
I'm now convinced that the logic is (still) flawed. There is still a mix of UTC time and relative time.
        int must_trim = 0;
        /* See if we need to trim */
        /* Has enough time elapsed since our last check? */
        if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) {      <<<< wrong condition
            /* Is the first entry too old? */
            time_t first_time;
            time_t now_maxage = slapi_current_utc_time(); /* real time for trimming candidates */


Hi Kees,

The mix of monotonic/real time was already an issue (https://github.com/389ds/389-ds-base/issues/4869) that was fixed in you version 1.4.3.28-6.

I think 'first_time' is coming from the first registered record in the retroCL.  I wonder if it could be because of an old retroCL DB containing an old value, not matching current monotonic time. Could you dump the first record of the retroCL (dbscan) ?

regards
thierry

Hi Thierry,

Here is an example of an old changelog entry.

[root@iparep4 ~]# ldapsearch -H ldaps://$HOSTNAME -D "cn=Directory Manager" -y p -LLL -o ldif-wrap=no -b 'changenumber=20000,cn=changelog'
dn: changenumber=20000,cn=changelog
objectClass: top
objectClass: changelogentry
objectClass: extensibleObject
targetuniqueid: 23f6980a-87f611eb-872490e8-ab7c8ee8
changeNumber: 20000
targetDn: idnsName=110,idnsname=30.16.172.in-addr.arpa.,cn=dns,dc=example,dc=com
changeTime: 20210318152832Z
changeType: delete

Yes, I know about issue 4869. That's why I am now trying 1.4.3.28

But I want you to take another look at this line
if (cur_time - ts.ts_s_last_trim >= (ts.ts_c_max_age)) {
This code has the effect of using "maxage"  to determine when to do the trim run!!!.
This is wrong.

And a few lines further
            if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
                first_time + ts.ts_c_max_age < now_maxage)
            {
                must_trim = 1;
            }
Here "maxage" is used the way it should be used, as a maximum age for the changelog
entries. This is good.

If I set maxage to 480 then I have to wait until 480 days after restarting the server before it finally
decides to look at first_time.

Maybe now is also a good time to explain why I set a much higher maxage value.
When trimming finally kicks in (since I now have maxage 2d) then entries are being deleted
at an extreemly slow pace, roughly one per second. But the worse part is, the server becomes
unresponsive after a few minutes. All connected LDAP clients hang, without getting a timeout.
And because plugin debug messages are enabled, I can see that it stopped deleting entries.
-- Kees

No comments:

Post a Comment