Bug 1884262
Summary: | Device properties (in particular description) is recomputed every time a /proc/1/mountinfo is processed | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> |
Component: | systemd | Assignee: | systemd-maint |
Status: | CLOSED WONTFIX | QA Contact: | Frantisek Sumsal <fsumsal> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.2 | CC: | dtardon, ikent, kwalker, kzak, msekleta, systemd-maint-list |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 8.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-04-01 07:27:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Renaud Métrich
2020-10-01 13:31:59 UTC
Attaching Upstream PR (code slightly modified): https://github.com/systemd/systemd/pull/17283 New Upstream PR: https://github.com/systemd/systemd/pull/17321 The new implementation uses a Hashmap. My PR is largely broken, I believe it optimizes too much and leads to issues in the unit states. For example there are cases where device will remain as "loading tentative". Hopefully raven-au is proposing some enhancements and these seem to work pretty well. The optimization is less important but still profiling shows a 8x times decrease in mount_load_proc_self_mountinfo(). He should open a PR tomorrow. Just for the records, the profiling systemtap script I wrote: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- #!/usr/bin/env stap global stats probe process("/usr/lib/systemd/systemd").function("mount_load_proc_self_mountinfo").return { stats <<< gettimeofday_ns() - @entry(gettimeofday_ns()) } probe end { printf("Distribution of %s latencies (in nanoseconds) for %d samples\n", "mount_load_proc_self_mountinfo", @count(stats)) printf("max/avg/min: %d/%d/%d\n", @max(stats), @avg(stats), @min(stats)) print(@hist_log(stats)) } -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Usage: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # stap ./systemd.stp -x 1 -v -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- (In reply to Renaud Métrich from comment #12) > My PR is largely broken, I believe it optimizes too much and leads to issues > in the unit states. > For example there are cases where device will remain as "loading tentative". > > Hopefully raven-au is proposing some enhancements and these seem to work > pretty well. > The optimization is less important but still profiling shows a 8x times > decrease in mount_load_proc_self_mountinfo(). > He should open a PR tomorrow. That's right, thanks Renaud. I will need to do some more testing of my own before I submit the PR. I'm still getting familiar with the systemd PR process and what's expected of submissions so it may take a little while to get that right. The testing will be more focused on CPU usage under load since that seems to be the main concern in the reports we are seeing, and it's a bit different to mount table handling testing I've done before. That was isolated purely on rate of notifications and mount table size testing but the problem here needs to cover the affect of flow on notifications over the d-bus too. Ian (In reply to Ian Kent from comment #15) > (In reply to Ian Kent from comment #14) > > > > There is no way to test the effectiveness of any possible fix if the needed > > parameters to simulate the problem aren't known. > > This info. might not be needed, but would be useful if we can get > it. > > I've been doing some testing with the change I have and the result > is not too good. So it turns out that I had made a small change to my patches and that resulted in negligible improvement. I went back to the patches I originally sent to Renaud and tested again. This time I got results closer to what I expected. For these tests I've used the new mountbomber utility from Karel Zak in his experimental util-linux branch with a small change to report the number of mounts and umounts performed during the test run. I think this is more representative of normal(ish) usage and so is better for testing. When looking at the test results it's important to understand that mountbomber will mount the number of mounts specified for the test at the start and umount them at the end. That skews the results a little since we're concerned with measuring the effect the patches have for mounts changing such as new, remounted or umounnted mounts during normal activity and mounting and umounting a bunch of mounts will register as changed mounts. So I've kept the number of mounts used by mount bomber fairly low and the number of random mounts and umounts fairly high. First the system call count test. mountbomber 200 mounts, with 10 seconds of random mount and umount. Unpatched: [root@localhost test]# wc -l /tmp/systemd.strace 146848 /tmp/systemd.strace Patched: [root@localhost test]# wc -l /tmp/systemd.strace 86915 /tmp/systemd.strace This looks sensible at around 40% reduction in system calls given the 200 mount actions at the start and again at the finish which will count as changed mounts. Next I used Renaud's systemtap script to count the number of calls to mount_load_proc_self_mountinfo() and hence the number of events being caught. mountbomber 200 mounts, with 10 seconds of random mount and umount. Unpatched: # stap ./mount_load_proc_self_mountinfo.stp -x 1 -v [root@localhost test]# stap ./mount_load_proc_self_mountinfo.stp -x 1 -v Pass 1: parsed user script and 492 library scripts using 400468virt/170024res/13080shr/156964data kb, in 470usr/150sys/842real ms. Pass 2: analyzed script: 3 probes, 2 functions, 1 embed, 3 globals using 414068virt/184864res/14092shr/170564data kb, in 180usr/10sys/223real ms. Pass 3: translated to C into "/tmp/stapcF3vGC/stap_16e25d390818cd77a19693a6f5e4ec02_3054_src.c" using 414068virt/185080res/14308shr/170564data kb, in 20usr/380sys/1492real ms. Pass 4: compiled C into "stap_16e25d390818cd77a19693a6f5e4ec02_3054.ko" in 2880usr/650sys/3786real ms. Pass 5: starting run. Distribution of mount_load_proc_self_mountinfo latencies (in nanoseconds) for 4000 samples max/avg/min: 4250949/2144625/893277 value |-------------------------------------------------- count 131072 | 0 262144 | 0 524288 | 2 1048576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2459 2097152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1538 4194304 | 1 8388608 | 0 16777216 | 0 Pass 5: run completed in 20usr/40sys/17907real ms. 17:29:11 CPU %user %nice %system %iowait %steal %idle 17:29:15 all 0.00 0.00 0.00 0.00 0.00 100.00 17:29:16 all 21.56 0.00 15.38 0.13 0.00 62.93 17:29:17 all 23.65 0.00 16.94 0.00 0.00 59.41 17:29:18 all 23.67 0.00 17.20 0.00 0.00 59.13 17:29:19 all 23.48 0.00 17.20 0.00 0.19 59.13 17:29:20 all 23.56 0.00 17.23 0.00 0.06 59.15 17:29:21 all 23.99 0.00 16.67 1.33 0.13 57.89 17:29:22 all 23.83 0.00 17.02 0.00 0.06 59.08 17:29:23 all 23.81 0.00 17.29 0.00 0.00 58.90 17:29:24 all 23.93 0.00 16.96 0.00 0.25 58.86 17:29:25 all 23.47 0.00 17.50 0.00 0.00 59.03 17:29:26 all 3.06 0.00 2.50 0.00 0.00 94.45 17:29:27 all 0.00 0.00 0.00 0.00 0.00 100.00 Patched: [root@localhost test]# stap ./mount_load_proc_self_mountinfo.stp -x 1 -v Pass 1: parsed user script and 492 library scripts using 400468virt/170124res/13176shr/156964data kb, in 460usr/150sys/821real ms. Pass 2: analyzed script: 3 probes, 2 functions, 1 embed, 3 globals using 414072virt/185076res/14296shr/170568data kb, in 170usr/20sys/216real ms. Pass 3: using cached /root/.systemtap/cache/1f/stap_1f5145cb146da316e6083966f10436fc_3054.c Pass 4: using cached /root/.systemtap/cache/1f/stap_1f5145cb146da316e6083966f10436fc_3054.ko Pass 5: starting run. Distribution of mount_load_proc_self_mountinfo latencies (in nanoseconds) for 5409 samples max/avg/min: 3184960/1662429/222757 value |-------------------------------------------------- count 32768 | 0 65536 | 0 131072 | 2 262144 | 3 524288 | 6 1048576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4909 2097152 |@@@@ 489 4194304 | 0 8388608 | 0 Pass 5: run completed in 10usr/40sys/17412real ms. 17:29:11 CPU %user %nice %system %iowait %steal %idle 17:38:30 all 0.00 0.00 0.00 0.00 0.00 100.00 17:38:31 all 0.06 0.00 0.00 0.00 0.00 99.94 17:38:32 all 21.59 0.00 14.46 0.25 0.00 63.70 17:38:33 all 23.19 0.00 16.31 0.00 0.00 60.50 17:38:34 all 23.26 0.00 15.68 0.00 0.00 61.06 17:38:35 all 23.64 0.00 15.61 0.00 0.00 60.75 17:38:36 all 23.67 0.00 15.57 0.00 0.19 60.58 17:38:37 all 23.77 0.00 15.60 1.70 0.06 58.87 17:38:38 all 23.51 0.00 15.65 0.00 0.06 60.78 17:38:39 all 23.90 0.00 15.41 0.00 0.06 60.63 17:38:40 all 24.06 0.00 15.62 0.00 0.00 60.33 17:38:41 all 23.65 0.00 15.72 0.25 0.06 60.31 17:38:42 all 3.19 0.00 2.25 0.00 0.00 94.56 17:38:43 all 0.00 0.00 0.00 0.00 0.00 100.00 Not as good as we would like but, again, there's those mounts at the start and end of the run. The thing that's most interesting about these results is the CPU usage is pretty much identical for the patched and unpatched runs but we see around a 26% increase in the number of calls to mount_load_proc_self_mountinfo() with the patched results. That's because as the mount unit processing is improved the systemd event handling sub-system processes more change events negating the improvements that have been made. So, from a user perspective, there's been no improvement in resource usage! I think this is a clear indication of the importance of Michal Sekletar's notifications rate limiting change proposal. Without the ability to mange the rate of notifications that are serviced under load, improvements made won't be realised. I saw this with the newly proposed kernel API (that was ultimately rejected) where there was a significant improvement in resource usage but to be able to take advantage of the notification queueing feature of the new notifications API the notification rate needed to be managed. That then gave a further large improvement that resulted in reaching the expected improvement for the changes (around 5-10% CPU per mount table monitoring process under heavy load) from new kernel API. Anyway, it's an improvement so I'll construct a PR, but really it also depends on Michal's change to get us anywhere with this or any other improvements for that matter. Ian Upstream PR (awaiting review): https://github.com/systemd/systemd/pull/17409 After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |