Thursday, October 17, 2024

[Test-Announce]REMINDER: F41 Go/No-Go in One Week, and other dates

Hi all,

Please be advised of a few dates for the upcoming F41 release, and F42
development.

The Fedora Linux 41 Final Go/No-Go[1] is now happening on Thursday
24th October. You can find details in fedocal[2] and our schedule[3]
has been updated slightly to reflect new target dates. F41 Final is
still targeting a release date of Tuesday 29th October, however, if
the release candidate is deemed unsuitable, our next release target
date is Tuesday November 12th. We are currently in final freeze, which
means you are unable to land any major changes at this time and
updates will be pulled into the updates repository (if not fixing a
release blocker bug). Please refer to our updates policy: final freeze
section[4] for more details.

Fedora Linux 39 will go EOL on 26th November 2024.

For Fedora Linux 42 (the answer to life, the universe and everything),
please take note of some important upcoming dates for proposing
changes:

- Changes needing infra changes: 18th December 2024
- System Wide: 24th December 2024
- Self Contained: 14th January 2025
- F42 Branching AND Changes Testable: 4th February 2024

For other dates, please check the full F42 schedule[5].



[1] https://fedoraproject.org/wiki/Go_No_Go_Meeting
[2] https://calendar.fedoraproject.org/meeting/10917
[3] https://fedorapeople.org/groups/schedule/f-41/f-41-key-tasks.html
[4] https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/#final-freeze
[5] https://fedorapeople.org/groups/schedule/f-42/f-42-all-tasks.html

--
Aoife Moloney

Fedora Operations Architect

Fedora Project

Matrix: @amoloney:fedora.im

IRC: amoloney

--
_______________________________________________
test-announce mailing list -- test-announce@lists.fedoraproject.org
To unsubscribe send an email to test-announce-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/test-announce@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

Wednesday, October 16, 2024

[389-users] Re: Inconsistent Ldap connection issues

Yes I can during the next round of testing. I ll see if I can see anything obvious in wireshark. I ll look for mis colored connections, right ? ( I have not looking for missing syn-acks before, and wanted to check).

-Gary

On 10/16/24 01:02, Thierry Bordaz wrote:


On 10/16/24 2:26 AM, William Brown via 389-users wrote:

These errors are only shown on the client, yes? Is there any evidence of a failed connection in the access log?
Correct, those 2 different contacting ldap error issues. I have searched for various things in the logs, but I havent read it line by line. I dont see "err=1", no fd errors, or "Not listening for new connections - too many fds open".

So, that means the error is happening *before* 389-ds gets a chance to accept on the connection. 

Are there any routers, middlewares, firewalls, idp's etc between the client/ldap server? Load balancer? 

We encountered a similar issue recently with another load test, where the load tester wasn't averaging it's connections, it would launch 10,000 connections at once and hope they all worked. With your load test, is it actually spreading it's connections out, or is it bursting?
It's a ramp up of 500 users logging in and starting their searches, the initial ramp up is 60 seconds, but the searches and login/logouts is over 6 minutes.  I just spliced up the logs to see what that first minute was like:

Peak Concurrent Connections:   689
Total Operations:              18770
Total Results:                 18769
Overall Performance:           100.0%

Total Connections:             2603          (21.66/sec)  (1299.40/min)
 - LDAP Connections:           2603          (21.66/sec)  (1299.40/min)
 - LDAPI Connections:          0             (0.00/sec)  (0.00/min)
 - LDAPS Connections:          0             (0.00/sec)  (0.00/min)
 - StartTLS Extended Ops:      2571          (21.39/sec)  (1283.42/min)

Searches:                      13596         (113.12/sec)  (6787.01/min)
Modifications:                 0             (0.00/sec)  (0.00/min)
Adds:                          0             (0.00/sec)  (0.00/min)
Deletes:                       0             (0.00/sec)  (0.00/min)
Mod RDNs:                      0             (0.00/sec)  (0.00/min)
Compares:                      0             (0.00/sec)  (0.00/min)
Binds:                         2603          (21.66/sec)  (1299.40/min)

With these settings below, the test results are in, they still get 1 ldap error per test.

Any chance that you can get a tcp-dump over the 6 minutes and try to find the syn without ack around the time of the failure ?


net.ipv4.tcp_max_syn_backlog = 8192

net.core.somaxconn = 8192

Suggestions ? Should I bump these up more ?

We still don't know what the cause *is* so just tweaking values won't help. We need to know what layer is triggering the error before we make changes. 

Reading these numbers, this doesn't look like the server should be under any stress at all - I have tested with 2cpu / 4gb ram and can easily get 10,000 simultaneous connections launched and accepted by 389-ds.  

My thinking at this point is there is something in between the client and 389 that is not coping. 



-- 
Sincerely,

William Brown

Senior Software Engineer,
Identity and Access Management
SUSE Labs, Australia


[389-users] Re: Inconsistent Ldap connection issues

Are there any routers, middlewares, firewalls, idp's etc between the client/ldap server? Load balancer? 
When this first started happening, the client a cluster of containers just spoke to ldap server directly over a peering connection. Since the error was unable to connect to ldap, I thought perhaps the one ldap server could not handle it. So I added a load balancer (AWS NLB) and a second ldap server. It didnt help.  Since this was happening before the load balancer, I dont think its that. There is a ALB in front of the cluster.
-Gary

On 10/16/24 09:24, Gary Waters wrote:


Are there any routers, middlewares, firewalls, idp's etc between the client/ldap server? Load balancer? 

When this first started happening, the client a cluster of containers just spoke to ldap server directly over a peering connection. Since the error was unable to connect to ldap, I thought perhaps the one ldap server could not handle it. So I added a load balancer (AWS NLB) and a second ldap server. It didnt help.  Since this was happening before the load balancer, I dont think its that. There is a ALB in front of the cluster.

-Gary

On 10/15/24 17:26, William Brown wrote:

These errors are only shown on the client, yes? Is there any evidence of a failed connection in the access log?
Correct, those 2 different contacting ldap error issues. I have searched for various things in the logs, but I havent read it line by line. I dont see "err=1", no fd errors, or "Not listening for new connections - too many fds open".

So, that means the error is happening *before* 389-ds gets a chance to accept on the connection. 

We encountered a similar issue recently with another load test, where the load tester wasn't averaging it's connections, it would launch 10,000 connections at once and hope they all worked. With your load test, is it actually spreading it's connections out, or is it bursting?
It's a ramp up of 500 users logging in and starting their searches, the initial ramp up is 60 seconds, but the searches and login/logouts is over 6 minutes.  I just spliced up the logs to see what that first minute was like:

Peak Concurrent Connections:   689
Total Operations:              18770
Total Results:                 18769
Overall Performance:           100.0%

Total Connections:             2603          (21.66/sec)  (1299.40/min)
 - LDAP Connections:           2603          (21.66/sec)  (1299.40/min)
 - LDAPI Connections:          0             (0.00/sec)  (0.00/min)
 - LDAPS Connections:          0             (0.00/sec)  (0.00/min)
 - StartTLS Extended Ops:      2571          (21.39/sec)  (1283.42/min)

Searches:                      13596         (113.12/sec)  (6787.01/min)
Modifications:                 0             (0.00/sec)  (0.00/min)
Adds:                          0             (0.00/sec)  (0.00/min)
Deletes:                       0             (0.00/sec)  (0.00/min)
Mod RDNs:                      0             (0.00/sec)  (0.00/min)
Compares:                      0             (0.00/sec)  (0.00/min)
Binds:                         2603          (21.66/sec)  (1299.40/min)

With these settings below, the test results are in, they still get 1 ldap error per test.

net.ipv4.tcp_max_syn_backlog = 8192

net.core.somaxconn = 8192

Suggestions ? Should I bump these up more ?

We still don't know what the cause *is* so just tweaking values won't help. We need to know what layer is triggering the error before we make changes. 

Reading these numbers, this doesn't look like the server should be under any stress at all - I have tested with 2cpu / 4gb ram and can easily get 10,000 simultaneous connections launched and accepted by 389-ds.  

My thinking at this point is there is something in between the client and 389 that is not coping. 



-- 
Sincerely,

William Brown

Senior Software Engineer,
Identity and Access Management
SUSE Labs, Australia

[Test-Announce]Re: Fedora Linux 41 Final Go/No-Go Meeting on Thursday 17th Oct

Hi folks,

Please be advised this meeting is now *CANCELLED*. The Fedora Linux 41 Go/No-Go meeting will now take place on Thursday 24th October, where we will determine the status of F41, which is now currently targeting a release date of Tuesday October 29th. The schedule[1] has been updated, fedocal has been updated[2].


On Fri, Oct 11, 2024 at 1:59 PM Aoife Moloney <amoloney@redhat.com> wrote:
Hi all,

The Fedora Linux 41 Final Go/No-Go meeting[1] will be held next Thursday, 17th October @ 1700 UTC in #meeting:fedoraproject.org on Matrix.

At this time, we will determine the status of the F41 Final for the current target
date[2] of Tuesday October 22nd. For more information about the Go/No-Go meeting, see the wiki[3].







--

Aoife Moloney

Fedora Operations Architect

Fedora Project

Matrix: @amoloney:fedora.im

IRC: amoloney




--

Aoife Moloney

Fedora Operations Architect

Fedora Project

Matrix: @amoloney:fedora.im

IRC: amoloney


[389-users] Re: Inconsistent Ldap connection issues


On 10/16/24 2:26 AM, William Brown via 389-users wrote:

These errors are only shown on the client, yes? Is there any evidence of a failed connection in the access log?
Correct, those 2 different contacting ldap error issues. I have searched for various things in the logs, but I havent read it line by line. I dont see "err=1", no fd errors, or "Not listening for new connections - too many fds open".

So, that means the error is happening *before* 389-ds gets a chance to accept on the connection. 

Are there any routers, middlewares, firewalls, idp's etc between the client/ldap server? Load balancer? 

We encountered a similar issue recently with another load test, where the load tester wasn't averaging it's connections, it would launch 10,000 connections at once and hope they all worked. With your load test, is it actually spreading it's connections out, or is it bursting?
It's a ramp up of 500 users logging in and starting their searches, the initial ramp up is 60 seconds, but the searches and login/logouts is over 6 minutes.  I just spliced up the logs to see what that first minute was like:

Peak Concurrent Connections:   689
Total Operations:              18770
Total Results:                 18769
Overall Performance:           100.0%

Total Connections:             2603          (21.66/sec)  (1299.40/min)
 - LDAP Connections:           2603          (21.66/sec)  (1299.40/min)
 - LDAPI Connections:          0             (0.00/sec)  (0.00/min)
 - LDAPS Connections:          0             (0.00/sec)  (0.00/min)
 - StartTLS Extended Ops:      2571          (21.39/sec)  (1283.42/min)

Searches:                      13596         (113.12/sec)  (6787.01/min)
Modifications:                 0             (0.00/sec)  (0.00/min)
Adds:                          0             (0.00/sec)  (0.00/min)
Deletes:                       0             (0.00/sec)  (0.00/min)
Mod RDNs:                      0             (0.00/sec)  (0.00/min)
Compares:                      0             (0.00/sec)  (0.00/min)
Binds:                         2603          (21.66/sec)  (1299.40/min)

With these settings below, the test results are in, they still get 1 ldap error per test.

Any chance that you can get a tcp-dump over the 6 minutes and try to find the syn without ack around the time of the failure ?


net.ipv4.tcp_max_syn_backlog = 8192

net.core.somaxconn = 8192

Suggestions ? Should I bump these up more ?

We still don't know what the cause *is* so just tweaking values won't help. We need to know what layer is triggering the error before we make changes. 

Reading these numbers, this doesn't look like the server should be under any stress at all - I have tested with 2cpu / 4gb ram and can easily get 10,000 simultaneous connections launched and accepted by 389-ds.  

My thinking at this point is there is something in between the client and 389 that is not coping. 



-- 
Sincerely,

William Brown

Senior Software Engineer,
Identity and Access Management
SUSE Labs, Australia


Tuesday, October 15, 2024

[389-users] Re: Inconsistent Ldap connection issues


These errors are only shown on the client, yes? Is there any evidence of a failed connection in the access log?
Correct, those 2 different contacting ldap error issues. I have searched for various things in the logs, but I havent read it line by line. I dont see "err=1", no fd errors, or "Not listening for new connections - too many fds open".

So, that means the error is happening *before* 389-ds gets a chance to accept on the connection. 

Are there any routers, middlewares, firewalls, idp's etc between the client/ldap server? Load balancer? 

We encountered a similar issue recently with another load test, where the load tester wasn't averaging it's connections, it would launch 10,000 connections at once and hope they all worked. With your load test, is it actually spreading it's connections out, or is it bursting?
It's a ramp up of 500 users logging in and starting their searches, the initial ramp up is 60 seconds, but the searches and login/logouts is over 6 minutes.  I just spliced up the logs to see what that first minute was like:

Peak Concurrent Connections:   689
Total Operations:              18770
Total Results:                 18769
Overall Performance:           100.0%

Total Connections:             2603          (21.66/sec)  (1299.40/min)
 - LDAP Connections:           2603          (21.66/sec)  (1299.40/min)
 - LDAPI Connections:          0             (0.00/sec)  (0.00/min)
 - LDAPS Connections:          0             (0.00/sec)  (0.00/min)
 - StartTLS Extended Ops:      2571          (21.39/sec)  (1283.42/min)

Searches:                      13596         (113.12/sec)  (6787.01/min)
Modifications:                 0             (0.00/sec)  (0.00/min)
Adds:                          0             (0.00/sec)  (0.00/min)
Deletes:                       0             (0.00/sec)  (0.00/min)
Mod RDNs:                      0             (0.00/sec)  (0.00/min)
Compares:                      0             (0.00/sec)  (0.00/min)
Binds:                         2603          (21.66/sec)  (1299.40/min)

With these settings below, the test results are in, they still get 1 ldap error per test.

net.ipv4.tcp_max_syn_backlog = 8192

net.core.somaxconn = 8192

Suggestions ? Should I bump these up more ?

We still don't know what the cause *is* so just tweaking values won't help. We need to know what layer is triggering the error before we make changes. 

Reading these numbers, this doesn't look like the server should be under any stress at all - I have tested with 2cpu / 4gb ram and can easily get 10,000 simultaneous connections launched and accepted by 389-ds.  

My thinking at this point is there is something in between the client and 389 that is not coping. 



-- 
Sincerely,

William Brown

Senior Software Engineer,
Identity and Access Management
SUSE Labs, Australia

[389-users] Re: Inconsistent Ldap connection issues

Hi William,

These errors are only shown on the client, yes? Is there any evidence of a failed connection in the access log?
Correct, those 2 different contacting ldap error issues. I have searched for various things in the logs, but I havent read it line by line. I dont see "err=1", no fd errors, or "Not listening for new connections - too many fds open".

We encountered a similar issue recently with another load test, where the load tester wasn't averaging it's connections, it would launch 10,000 connections at once and hope they all worked. With your load test, is it actually spreading it's connections out, or is it bursting?
It's a ramp up of 500 users logging in and starting their searches, the initial ramp up is 60 seconds, but the searches and login/logouts is over 6 minutes.  I just spliced up the logs to see what that first minute was like:


Peak Concurrent Connections:   689
Total Operations:              18770
Total Results:                 18769
Overall Performance:           100.0%

Total Connections:             2603          (21.66/sec)  (1299.40/min)
 - LDAP Connections:           2603          (21.66/sec)  (1299.40/min)
 - LDAPI Connections:          0             (0.00/sec)  (0.00/min)
 - LDAPS Connections:          0             (0.00/sec)  (0.00/min)
 - StartTLS Extended Ops:      2571          (21.39/sec)  (1283.42/min)

Searches:                      13596         (113.12/sec)  (6787.01/min)
Modifications:                 0             (0.00/sec)  (0.00/min)
Adds:                          0             (0.00/sec)  (0.00/min)
Deletes:                       0             (0.00/sec)  (0.00/min)
Mod RDNs:                      0             (0.00/sec)  (0.00/min)
Compares:                      0             (0.00/sec)  (0.00/min)
Binds:                         2603          (21.66/sec)  (1299.40/min)

With these settings below, the test results are in, they still get 1 ldap error per test.

net.ipv4.tcp_max_syn_backlog = 8192

net.core.somaxconn = 8192

Suggestions ? Should I bump these up more ?

Thanks,

Gary

On 10/14/24 20:42, William Brown wrote:

Ah yes of course. Here is 1 run of their web app load test, it is 6 minutes long, and it should mostly be only the test it self. I will start looking for 

We encountered 2 "Can not contact ldap server" errors during this run. 

2 cant contact ldap server errors in this run below.


These errors are only shown on the client, yes? Is there any evidence of a failed connection in the access log? 


After the run I bumped up these from 4096, 

net.ipv4.tcp_max_syn_backlog = 6144
net.core.somaxconn = 6144

Yet we still get the ldap errors (this one and the start tls request error previously mentioned.)

Should I bump up the nsslapd-listen-backlog-size, net.ipv4.tcp_max_syn_backlog, net.core.somaxconn more ?


We encountered a similar issue recently with another load test, where the load tester wasn't averaging it's connections, it would launch 10,000 connections at once and hope they all worked. With your load test, is it actually spreading it's connections out, or is it bursting?



-- 
Sincerely,

William Brown

Senior Software Engineer,
Identity and Access Management
SUSE Labs, Australia