RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1305608 - 100% CPU utilization when SSH connections reach ~10 per second
Summary: 100% CPU utilization when SSH connections reach ~10 per second
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.2
Hardware: Unspecified
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: systemd-maint
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks: 1360159
TreeView+ depends on / blocked
 
Reported: 2016-02-08 17:14 UTC by Bryan Totty
Modified: 2021-12-10 14:35 UTC (History)
23 users (show)

Fixed In Version: systemd-219-22.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1360159 (view as bug list)
Environment:
Last Closed: 2016-11-04 00:51:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2216 0 normal SHIPPED_LIVE systemd bug fix and enhancement update 2016-11-03 13:24:51 UTC

Description Bryan Totty 2016-02-08 17:14:17 UTC
Description of problem:

1. Please reference- 

https://github.com/systemd/systemd/issues/1961 

for a background, as it appears to affect RHEL 7.2's systemd 219-19.

2. A test application is continuously making ssh connections at a rate on the order of 10/sec to execute command to the hosts that experience this problem. Each of these remotely executed commands is typically very short in execution time. 

3. We do not believe that rate limiting parameters in the SSH configuration file /etc/ssh/sshd_config defined, such as "MaxSessions" or "MaxStartups" are the root cause or are related in this scenario. 

4. The problem seems to be introduced somewhere in systemd 208 - 219. The exact same application test does not cause the below problems with systemd 208 or older.

5. When the host receiving the the ssh commands is operating well, the systemd-logind will typically look as follows, here you can see that new sessions are being created and removed soon there after.

[root@hostname ~]# journalctl -u systemd-logind -f
-- Logs begin at Mon 2016-02-01 20:36:09 EST. --
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: New session 759045 of user root.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: New session 759046 of user root.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: New session 759048 of user root.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: New session 759047 of user root.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: New session 759049 of user root.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: Removed session 759045.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: Removed session 759047.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: Removed session 759046.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: Removed session 759048.
Feb 04 15:35:12 hostname.example.com systemd-logind[571]: Removed session 759049.
Feb 04 15:35:13 hostname.example.com systemd-logind[571]: New session 759050 of user root.
Feb 04 15:35:13 hostname.example.com systemd-logind[571]: Removed session 759050.
Feb 04 15:35:13 hostname.example.com systemd-logind[571]: New session 759051 of user root.
Feb 04 15:35:13 hostname.example.com systemd-logind[571]: Removed session 759051.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: New session 759054 of user root.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: New session 759056 of user root.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: New session 759053 of user root.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: New session 759052 of user root.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: New session 759055 of user root.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: New session 759057 of user root.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: New session 759058 of user root.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: Removed session 759054.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: Removed session 759053.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: Removed session 759057.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: Removed session 759055.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: Removed session 759052.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: Removed session 759056.
Feb 04 15:35:14 hostname.example.com systemd-logind[571]: Removed session 759058.

When a host running systemd 219 experience the problem (usually some time after 10's of minutes to a couple of hours of the above loading,) the "Remove session" logs will cease and only "New session" logs will be seen.

At this time, or some time soon there after, the systemd-logind process will begin utilizing near 100% CPU and can hang the system. 

6. One indication of the problem is a large number (10s-100s) of logs indicating abandon sessions as seen with the following command: systemctl |grep "of user root" |grep "abandoned"

7. Another indication of this problem is a large and increasing number files (100s-1000s) indicated via "ls /run/systemd/sessions/ | wc -l". 


Version-Release number of selected component (if applicable):

This issue has been narrowed down to having been introduced some time between systemd version 208 (where the problem is not seen and version 219 where the problem is seen.)

Further the version of openssh-server is the same in both in good and bad hosts.

openssh-6.6.1p1-23.el7_2.x86_64
openssh-clients-6.6.1p1-23.el7_2.x86_64
openssh-server-6.6.1p1-23.el7_2.x86_64


How reproducible:
Always after 10 minutes - a few hours.


Steps to Reproduce:
1. Create a script that opens and closes SSH connections at a rate of about 10 per second. Keep in mind the "MaxSessions" or "MaxStartups" parameters mentioned above as they could play a role if the rate or amount of connections tested exceeds what is allowed by SSHD. If the sshd_config limits are exceeded in your configuration file, then the test could be invalidated/masked.

2. Wait for a period of time and monitor with "journalctl -u systemd-logind -f"


Actual results:
100% CPU utilization and possible system hangs.


Expected results:
Graceful handling of the connections.

Comment 2 Lukáš Nykrýn 2016-02-10 12:36:57 UTC
We are working on this, but there is no known solution -> condnack_patch

Comment 9 Lukáš Nykrýn 2016-05-05 12:32:21 UTC
So there was some progress regarding this bug.
There are two sets of patches in upstream for this issue and I am currently working on their backport for rhel7. Unfortunately the changes are really huge, so it will take some time.

Also there might be additional bug in dbus https://bugzilla.redhat.com/show_bug.cgi?id=1333349

Comment 11 Lukáš Nykrýn 2016-05-10 12:25:54 UTC
So I think we have a patch for this: https://github.com/lnykryn/systemd-rhel/pull/19

I have made a test build https://people.redhat.com/lnykryn/systemd/bz1305608/
but please note that it is highly experimental, untested, not supported, ...

Can you please try it?

Comment 12 Branislav Blaškovič 2016-05-16 08:38:49 UTC
qa_acking.

Comment 36 errata-xmlrpc 2016-11-04 00:51:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2216.html


Note You need to log in before you can comment on or make changes to this bug.