Monday, October 7, 2024

[389-users] Re: Inconsistent Ldap connection issues

Hi Thierry,

Ok I ll decrease the timeout to 15 seconds then.  I apologize if you see this email twice.

Reducing the size of the logs will help.
Which log and how do I do this ?

Thanks Marc and Theirry!

-Gary

On 10/7/24 00:26, Thierry Bordaz wrote:

Hi,

Those slap_poll error means that the server was unable to send back PDU to the client. It can occur if the client sends a request and does not read fast enough the results. The timeout is high 30s (30000), could it be that the problem is on the client side (app) ?

I suggest that you focus on the timestamp when the application reports a failure. Then look in the access/error logs from 1-3min before and after the time of the failure. Logconv from that limited scope will be more helpful than a global one.

The pattern looks to be an app opens a connection, switch to secure connection (start-tls), issue 6-8 SRCH then close. etime/wtime/optime looks fine but as it is an average (over 1M op) it is not helpful. Reducing the size of the logs will help.
I found interesting the abandon op as it is possibly related to a performance issue.

best regards
thierry


On 10/4/24 11:54 PM, Gary Waters via 389-users wrote:

Hi Marc,

I have made nsslapd-listen-backlog-size to 512.

For the ioblocktimeout, I increased it because of an error I was seeing:

[30/Sep/2024:16:26:55.987681019 -0700] - ERR - slapd_poll - (743) - Timed out
[30/Sep/2024:16:34:49.646922635 -0700] - ERR - slapd_poll - (568) - Timed out

Googling stated that I should increase the ioblocktimeout. So I bumped it up from 20000 to 30000.

Since then, those slapd_poll timed out errors have not occurred. Should I have changed something else?

What should I increase these to?

net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096

Thanks so much for your help!

-Gary

On 10/4/24 11:55, Marc Sauton wrote:
tune up nsslapd-listen-backlog-size
and verify the net.core.somaxconn and net.ipv4.tcp_max_syn_backlog are high enough ( sysctl -a )
possibly tune down the nsslapd-ioblocktimeout value
Thanks,
M.

On Fri, Oct 4, 2024 at 11:06 AM gwaters-web--- via 389-users <389-users@lists.fedoraproject.org> wrote:
Hello,

We are experiencing a new issue since we upgraded from 389-ds-base from
1.4~ish to 2.0.15 on RHEL 8. I couldnt figure how to fix it, so I
switched to RHEL9 and are on 2.4.5-9.

The issue is during a performance load test of a web application. The
app logs into a website and does some things that searches against ldap,
and does some transactions. This app has been performing fine for years,
the app has changed so it could be something there, but I am not sure
about that because of the percentage of the traffic that is successful.

The errors for the web app are "Can't contact Ldap Server" and sometimes
"Can't contact LDAP server. Start TLS request accepted.Server willing to
negotiate SSL. (0xFFFF [-1])". Out of the 128k connections below, these
errors will happen like 5 or 6 times, so its wildly inconsistent and random.

I did a logconv analysis with 6 hours of a day of testing, see below. 
One thing that really stood out to me was the peak concurrent
connections = 22.. That peak is so low, I dont know how these errors are
happening.

I dont see any errors in the access log ( grepping for err=1).
I looked for cache warnings/errors in the access/errors logs, but didnt
find any. I dont see things like unavailable connections in the access logs.

Suggestions on what to change or look for in the logs ?

Thanks,
Gary


information:
Machine Size: 16G of ram, 4 core AMD  (its an EC2.m5.large, gp3 disk type)

kernel:
Linux  5.14.0-427.35.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC
packages:
389-ds-base-libs-2.4.5-9.el9_4.x86_64
389-ds-base-2.4.5-9.el9_4.x86_64

single instance of dirsrv running
dirsrv modifcations from default:

nsslapd-logging-backend: dirsrv-log,syslog
nsslapd-maxdescriptors: 8192
nsslapd-listen-backlog-size: 256
nsslapd-allow-hashed-passwords: on
nsslapd-idletimeout: 30
nsslapd-ioblocktimeout: 30000
nsslapd-sizelimit: -1
nsslapd-auditlog-logging-enabled: off
nsslapd-lookthroughlimit: -1

dirsrv.systemd:
limitNOFILE=8192

 >Total Log Lines Analszed:  2694287
 >
 >
 >
 > ---------- Access Log Output ------------
 >
 > Start of Logs:    26/Sep/2024:10:07:32.089983378
 > End of Logs:      26/Sep/2024:15:54:29.895403688
 >
 > Processed Log Time:  5 Hours, 46 Minutes, 57.805426688 Seconds
 >
 > Restarts:                      0
 > Secure Protocol Versions:
 >   - TLS1.2 128-bit AES-GCM (123117 connections)
 >
 > Peak Concurrent Connections:   22
 > Total Operations:              1097043
 > Total Results:                 1097044
 > Overall Performance:           100.0%
 >
 > Total Connections:             128646        (6.18/sec) (370.78/min)
 >  - LDAP Connections:           128646        (6.18/sec) (370.78/min)
 >  - LDAPI Connections:          0             (0.00/sec) (0.00/min)
 >  - LDAPS Connections:          0             (0.00/sec) (0.00/min)
 >  - StartTLS Extended Ops:      123116        (5.91/sec) (354.84/min)
 >
 > Searches:                      845279        (40.60/sec) (2436.22/min)
 > Modifications:                 0             (0.00/sec) (0.00/min)
 > Adds:                          0             (0.00/sec) (0.00/min)
 > Deletes:                       0             (0.00/sec) (0.00/min)
 > Mod RDNs:                      0             (0.00/sec) (0.00/min)
 > Compares:                      0             (0.00/sec) (0.00/min)
 > Binds:                         128647        (6.18/sec) (370.78/min)
 >
 > Average wtime (wait time):     0.001560856
 > Average optime (op time):      0.003310453
 > Average etime (elapsed time):  0.004868040
 >
 > Multi-factor Authentications:  0
 > Proxied Auth Operations:       0
 > Persistent Searches:           0
 > Internal Operations:           0
 > Entry Operations:              0
 > Extended Operations:           123116
 > Abandoned Requests:            1
 > Smart Referrals Received:      0
 >
 > VLV Operations:                0
 > VLV Unindexed Searches:        0
 > VLV Unindexed Components:      0
 > SORT Operations:               0
 >
 > Entire Search Base Queries:    0
 > Paged Searches:                0
 > Unindexed Searches:            0
 > Unindexed Components:          0
 > Invalid Attribute Filters:     0
 > FDs Taken:                     128646
 > FDs Returned:                  129318
 > Highest FD Taken:              968
 >
 > Broken Pipes:                  0
 > Connections Reset By Peer:     0
 > Resource Unavailable:          0
 > Max BER Size Exceeded:         0
 >
 > Binds:                         128647
 > Unbinds:                       119206
 > -------------------------------------
 >  - LDAP v2 Binds:              0
 >  - LDAP v3 Binds:              128647
 >  - AUTOBINDs(LDAPI):           0
 >  - SSL Client Binds:           0
 >  - Failed SSL Client Binds:    0
 >  - SASL Binds:                 0
 >  - Dir

--
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

No comments:

Post a Comment