Bug 1947190
| Summary: | After upgrading to systemd 219-78.el7_9.3, intermittent excessive cpu usage and service timeouts | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Matt Flusche <mflusche> |
| Component: | systemd | Assignee: | systemd-maint |
| Status: | CLOSED DUPLICATE | QA Contact: | Frantisek Sumsal <fsumsal> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.9 | CC: | bdobreli, bsawyers, camorris, chaithco, cmuresan, dtardon, rmetrich, systemd-maint-list, tcrider |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-06 09:19:02 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Matt Flusche
2021-04-07 21:07:37 UTC
I am pretty sure this is caused (again) by excessive parsing of mountinfo (bug 1819868). I checked one of the sosreports and there were 1731 docker instances running and 4372 mounts. If the docker instances were all started at the same time, that would certainly cause a lot of changes of mountinfo and therefore a lot of churn in systemd. strace of PID 1 would show that. I contributed to the kernel side of this case and reviewed some of the perf data supplied. Note: 1. the perf data I analysed (output pasted below) is from perf2g.data attached to the case and this does not seem attached to the bz. 2. This is just profiling kernelspace function names of the functions taken. IE it does not have parameter listing for the functions (which is fine honestly). As such, some of the analysis is making logical assumptions inferred from other bits of the perf data.
Below is a copy/past of the analysis I put into the case, though I am leaving out the perf data output for legibility reasons. If this is still needed, let me know and I can paste that here as well (though it can be kinda rough to read).
[A.0] Roughly 83.62% of all stacks are specific to the systemd process
[A.0-a] 33.54% of all stacks are systemd in kernelspace
[A.0-b] 50.08% of all stacks are systemd in userspace
[A.1] Systemd is reading from mountinfo a _lot_
[A.1-a] 24.09% of all stacks are systemd with the following backtrace:
-> __GI___libc_read (userspace)
-> system_call_fastpath (kernelspace)
-> sys_read
-> vfs_read
-> seq_read
-> m_show
-> show_mountinfo
[A.1-b] Given [A.0-a] and [A.1-a], this means ~71% of all systemd's kernelspace
stacks are just reading /proc/*/mountinfo
[A.2] Perf also shows a number of other stacks in kernelspace doing file
read-adjacent operations
[A.2-a] For example, the next most common kernelspace backtrace is
-> __lxstat64 (userspace)
-> system_call_fastpath
-> sys_newlstat
-> SYSC_newlstat
-> vfs_fstatat
-> user_path_at
-> user_path_at_empty
-> filename_lookup
-> path_lookupat
-> path_init
[A.2-b] it is entirely possible the stacks in [A.2-a] is systemd simply
lstat-ing /proc/1/mountinfo before reading it.
[A.2-c] That being said, this is not a common stack trace; 3.22% of all
kernelspace stacks for systemd is the noted trace. That's only ~10%
of all it's kernelspace stacks.
[A.2-d] Other stacks exist doing filesystem-adjacent things, however, so it
is very likely systemd is simply reading /proc/*/mountinfo a _ton_.
Overall, the data strongly indicates running through /proc/*/mountinfo extremely
frequently. As suspected, an strace would likely tell us why.
While I admittedly am _not_ familiar with the noted bz in c.5, the quick blurb about it in the comment seems to match with what is above.
I contributed to the kernel side of this case and reviewed some of the perf data supplied. Note: 1. the perf data I analysed (output pasted below) is from perf2g.data attached to the case and this does not seem attached to the bz. 2. This is just profiling kernelspace function names of the functions taken. IE it does not have parameter listing for the functions (which is fine honestly). As such, some of the analysis is making logical assumptions inferred from other bits of the perf data.
Below is a copy/past of the analysis I put into the case, though I am leaving out the perf data output for legibility reasons. If this is still needed, let me know and I can paste that here as well (though it can be kinda rough to read).
[A.0] Roughly 83.62% of all stacks are specific to the systemd process
[A.0-a] 33.54% of all stacks are systemd in kernelspace
[A.0-b] 50.08% of all stacks are systemd in userspace
[A.1] Systemd is reading from mountinfo a _lot_
[A.1-a] 24.09% of all stacks are systemd with the following backtrace:
-> __GI___libc_read (userspace)
-> system_call_fastpath (kernelspace)
-> sys_read
-> vfs_read
-> seq_read
-> m_show
-> show_mountinfo
[A.1-b] Given [A.0-a] and [A.1-a], this means ~71% of all systemd's kernelspace
stacks are just reading /proc/*/mountinfo
[A.2] Perf also shows a number of other stacks in kernelspace doing file
read-adjacent operations
[A.2-a] For example, the next most common kernelspace backtrace is
-> __lxstat64 (userspace)
-> system_call_fastpath
-> sys_newlstat
-> SYSC_newlstat
-> vfs_fstatat
-> user_path_at
-> user_path_at_empty
-> filename_lookup
-> path_lookupat
-> path_init
[A.2-b] it is entirely possible the stacks in [A.2-a] is systemd simply
lstat-ing /proc/1/mountinfo before reading it.
[A.2-c] That being said, this is not a common stack trace; 3.22% of all
kernelspace stacks for systemd is the noted trace. That's only ~10%
of all it's kernelspace stacks.
[A.2-d] Other stacks exist doing filesystem-adjacent things, however, so it
is very likely systemd is simply reading /proc/*/mountinfo a _ton_.
Overall, the data strongly indicates running through /proc/*/mountinfo extremely
frequently. As suspected, an strace would likely tell us why.
While I admittedly am _not_ familiar with the noted bz in c.5, the quick blurb about it in the comment seems to match with what is above.
(In reply to David Tardon from comment #5) > I am pretty sure this is caused (again) by excessive parsing of mountinfo > (bug 1819868). I checked one of the sosreports and there were 1731 docker > instances running and 4372 mounts. If the docker instances were all started > at the same time, that would certainly cause a lot of changes of mountinfo > and therefore a lot of churn in systemd. strace of PID 1 would show that. If I'm not mistaken, docker instances do a "mount/unmount" every 3 seconds of some tmpfs filesystem as a kind of health check. With such many instances, clearly systemd will just spend its time reading /proc/1/mountinfo ... The root cause is exactly BZ #1828758, the difference being it's not users but scopes that triggers this (which is the same actually). *** This bug has been marked as a duplicate of bug 1828758 *** |