1947190 – After upgrading to systemd 219-78.el7_9.3, intermittent excessive cpu usage and service timeouts

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1947190 - After upgrading to systemd 219-78.el7_9.3, intermittent excessive cpu usage and service timeouts

Summary: After upgrading to systemd 219-78.el7_9.3, intermittent excessive cpu usage a...

Keywords:
Status:	CLOSED DUPLICATE of bug 1828758
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.9
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	systemd-maint
QA Contact:	Frantisek Sumsal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-07 21:07 UTC by Matt Flusche
Modified:	2024-10-01 17:52 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-06 09:19:02 UTC
Target Upstream Version:
Embargoed:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Description Matt Flusche 2021-04-07 21:07:37 UTC

Description of problem:
After upgrading a set of openstack controllers to the latest rhel 7 patches, systemd is experiencing intermittent issues where it is consuming excessive cpu time and calls to systemd will timeout.  This will last for a period of time then the system will clear and function normally.  This occurs on multiple systems after the upgrade.

CPU usage from sosreport:
cat sos_commands/process/ps_auxwww |head -2
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1 32.8  0.0 227780 36716 ?        Rs   15:00 161:22 /usr/lib/systemd/systemd --switched-root --system --deserialize 22


Failed systemctl commands from sosreport:
$ cat sos_commands/systemd/systemctl_status_--all 
Failed to read server status: Connection timed out
Failed to list units: Connection timed out

$ cat sos_commands/systemd/systemctl_list-units
Failed to list units: Connection timed out



Version-Release number of selected component (if applicable):
RHEL 7.9 - current
ocd03-controller-1]$ cat uname 
Linux ocd03-controller-1 3.10.0-1160.15.2.el7.x86_64 #1 SMP Thu Jan 21 16:15:07 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
systemd-219-78.el7_9.2.x86_64


How reproducible:
intermittent in this specific environment.


Additional info:
I'll provide additional information in private comments

Comment 5 David Tardon 2021-04-09 09:54:28 UTC

I am pretty sure this is caused (again) by excessive parsing of mountinfo (bug 1819868). I checked one of the sosreports and there were 1731 docker instances running and 4372 mounts. If the docker instances were all started at the same time, that would certainly cause a lot of changes of mountinfo and therefore a lot of churn in systemd. strace of PID 1 would show that.

Comment 7 Charles Haithcock 2021-04-09 18:19:17 UTC

I contributed to the kernel side of this case and reviewed some of the perf data supplied. Note: 1. the perf data I analysed (output pasted below) is from perf2g.data attached to the case and this does not seem attached to the bz. 2. This is just profiling kernelspace function names of the functions taken. IE it does not have parameter listing for the functions (which is fine honestly). As such, some of the analysis is making logical assumptions inferred from other bits of the perf data. 


Below is a copy/past of the analysis I put into the case, though I am leaving out the perf data output for legibility reasons. If this is still needed, let me know and I can paste that here as well (though it can be kinda rough to read). 

 [A.0] Roughly 83.62% of all stacks are specific to the systemd process 
     [A.0-a] 33.54% of all stacks are systemd in kernelspace 
     [A.0-b] 50.08% of all stacks are systemd in userspace
 [A.1] Systemd is reading from mountinfo a _lot_
     [A.1-a] 24.09% of all stacks are systemd with the following backtrace:
             -> __GI___libc_read (userspace)
               -> system_call_fastpath (kernelspace)
                 -> sys_read
                   -> vfs_read
                     -> seq_read
                       -> m_show
                         -> show_mountinfo
     [A.1-b] Given [A.0-a] and [A.1-a], this means ~71% of all systemd's kernelspace
             stacks are just reading /proc/*/mountinfo 
 [A.2] Perf also shows a number of other stacks in kernelspace doing file 
       read-adjacent operations
     [A.2-a] For example, the next most common kernelspace backtrace is 
             -> __lxstat64 (userspace)
               -> system_call_fastpath
                 -> sys_newlstat
                   -> SYSC_newlstat
                     -> vfs_fstatat
                       -> user_path_at
                         -> user_path_at_empty
                           -> filename_lookup
                             -> path_lookupat
                               -> path_init
     [A.2-b] it is entirely possible the stacks in [A.2-a] is systemd simply 
             lstat-ing /proc/1/mountinfo before reading it. 
     [A.2-c] That being said, this is not a common stack trace; 3.22% of all 
             kernelspace stacks for systemd is the noted trace. That's only ~10%
             of all it's kernelspace stacks. 
     [A.2-d] Other stacks exist doing filesystem-adjacent things, however, so it
             is very likely systemd is simply reading /proc/*/mountinfo a _ton_.

Overall, the data strongly indicates running through /proc/*/mountinfo extremely
frequently. As suspected, an strace would likely tell us why. 


While I admittedly am _not_ familiar with the noted bz in c.5, the quick blurb about it in the comment seems to match with what is above.

Comment 8 Charles Haithcock 2021-04-09 18:19:28 UTC

I contributed to the kernel side of this case and reviewed some of the perf data supplied. Note: 1. the perf data I analysed (output pasted below) is from perf2g.data attached to the case and this does not seem attached to the bz. 2. This is just profiling kernelspace function names of the functions taken. IE it does not have parameter listing for the functions (which is fine honestly). As such, some of the analysis is making logical assumptions inferred from other bits of the perf data. 


Below is a copy/past of the analysis I put into the case, though I am leaving out the perf data output for legibility reasons. If this is still needed, let me know and I can paste that here as well (though it can be kinda rough to read). 

 [A.0] Roughly 83.62% of all stacks are specific to the systemd process 
     [A.0-a] 33.54% of all stacks are systemd in kernelspace 
     [A.0-b] 50.08% of all stacks are systemd in userspace
 [A.1] Systemd is reading from mountinfo a _lot_
     [A.1-a] 24.09% of all stacks are systemd with the following backtrace:
             -> __GI___libc_read (userspace)
               -> system_call_fastpath (kernelspace)
                 -> sys_read
                   -> vfs_read
                     -> seq_read
                       -> m_show
                         -> show_mountinfo
     [A.1-b] Given [A.0-a] and [A.1-a], this means ~71% of all systemd's kernelspace
             stacks are just reading /proc/*/mountinfo 
 [A.2] Perf also shows a number of other stacks in kernelspace doing file 
       read-adjacent operations
     [A.2-a] For example, the next most common kernelspace backtrace is 
             -> __lxstat64 (userspace)
               -> system_call_fastpath
                 -> sys_newlstat
                   -> SYSC_newlstat
                     -> vfs_fstatat
                       -> user_path_at
                         -> user_path_at_empty
                           -> filename_lookup
                             -> path_lookupat
                               -> path_init
     [A.2-b] it is entirely possible the stacks in [A.2-a] is systemd simply 
             lstat-ing /proc/1/mountinfo before reading it. 
     [A.2-c] That being said, this is not a common stack trace; 3.22% of all 
             kernelspace stacks for systemd is the noted trace. That's only ~10%
             of all it's kernelspace stacks. 
     [A.2-d] Other stacks exist doing filesystem-adjacent things, however, so it
             is very likely systemd is simply reading /proc/*/mountinfo a _ton_.

Overall, the data strongly indicates running through /proc/*/mountinfo extremely
frequently. As suspected, an strace would likely tell us why. 


While I admittedly am _not_ familiar with the noted bz in c.5, the quick blurb about it in the comment seems to match with what is above.

Comment 9 Renaud Métrich 2021-04-09 19:11:53 UTC

(In reply to David Tardon from comment #5)
> I am pretty sure this is caused (again) by excessive parsing of mountinfo
> (bug 1819868). I checked one of the sosreports and there were 1731 docker
> instances running and 4372 mounts. If the docker instances were all started
> at the same time, that would certainly cause a lot of changes of mountinfo
> and therefore a lot of churn in systemd. strace of PID 1 would show that.

If I'm not mistaken, docker instances do a "mount/unmount" every 3 seconds of some tmpfs filesystem as a kind of health check.
With such many instances, clearly systemd will just spend its time reading /proc/1/mountinfo ...

Comment 18 Renaud Métrich 2021-05-06 09:19:02 UTC

The root cause is exactly BZ #1828758, the difference being it's not users but scopes that triggers this (which is the same actually).

*** This bug has been marked as a duplicate of bug 1828758 ***

Note You need to log in before you can comment on or make changes to this bug.