Thursday, December 5, 2024

[389-users] Re: high 'wtime' with 2.4.5

Well, this is frustrating.

Our trouble-ticket at Red Hat was updated late in October saying we could expect to see the correction in the "next update". RHDS 12.5 shipped in late November (yay!) and I've been reviewing it. As far as I can tell, the fix for 389-ds-base Issue 6284 was incorporated into code in early August (in commit 15a0b5b9e0c90fc5824752b8d58526ec8e13c256). 

But RHDS 12.5 release notes say nothing about updates to the connection-handling. And it looks like 12.5 uses 389-ds-base 2.5.2-2, which looks like it dates from July, 2024. So my read is RHDS 12.5 doesn't contain the fix for this problem, and we're going to have to wait for 12.6 (Q2 2025?) to see it . . . Or maybe 12.7, as we need to see a new release of 389-ds before it makes its way into RHDS, so maybe Q4 2025.

Can anyone tell me if I'm interpreting the situation and timelines correctly?

Oh . . and our problematic query still behaves terribly when run against a RHDS 12.5 server :(

--  Do things because you should, not just because you can.     John Thurston    907-465-8591  John.Thurston@alaska.gov  Department of Administration  State of Alaska
On 9/19/2024 12:25 AM, Thierry Bordaz wrote:

Hi John,

This is an excellent news that you created a CU case. We will investigate this regression hopefully soon.
Thanks for the well described testcase, it  accelerate resolution. If you have others details regarding the testcase please update the case with it.

Would you mind to give the case number or make sure that the support get in touch with us.

Best regards
thierry

On 9/18/24 21:37, John Thurston wrote:

Thank you for the pointer to the defect, Thierry. I appreciate the very quick, and informative response. It certainly smells like this is what is affecting us.

Our case is a single connection, through which ~32,000 sequential queries are passed. To work around this, we have re-created a DS11 replica, to which we have re-directed this job. On DS12, ~30 minutes are required. With DS11, the job completes in ~2 minutes.

(Our DS12 instance is actually running RHDS, so we have opened a Red Hat support case with the details.)

--  Do things because you should, not just because you can.     John Thurston    907-465-8591  John.Thurston@alaska.gov  Department of Administration  State of Alaska
On 9/12/2024 3:33 AM, Thierry Bordaz wrote:

Hi Jon,

Yes the description is "mostly" correct. We recently found a corner case [1], where large requests (requiring several poll/read) can get high wtime although there was no worker starvation.

Would you provide sample of access log showing this issue ?

[1] https://github.com/389ds/389-ds-base/issues/6284

regards
thierry

On 9/12/24 01:29, John Thurston wrote:

I have a new instance of 2.4.5, on which I'm seeing a very high* 'wtime' in the access log.

From https://www.port389.org/docs/389ds/design/access-log-new-time-stats-design.html I read

  • wtime - This is the amount of time the operation was waiting in the work queue before being picked up by a worker thread.

Is this still an accurate description of 'wtime' ?

If true, I suspect the high values I'm seeing have nothing to do with the version of the software I'm running, and everything to do with the system on which the software is running. Work has arrived, and been queued, but there aren't enough worker-threads to keep the queue serviced in a timely manner.

* 'high' as in 3,000% longer than what I see on a totally different system running 1.4.4

--   --  Do things because you should, not just because you can.     John Thurston    907-465-8591  John.Thurston@alaska.gov  Department of Administration  State of Alaska

No comments:

Post a Comment