Bug 1931535
| Summary: | systemd-run executions are delayed by several minutes due to heavy mount activity | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> |
| Component: | systemd | Assignee: | systemd-maint |
| Status: | CLOSED DUPLICATE | QA Contact: | Frantisek Sumsal <fsumsal> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 8.3 | CC: | dtardon, joedward, systemd-maint-list, systemd-maint |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-03-08 08:45:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Renaud Métrich
2021-02-22 16:17:15 UTC
PARTIAL REPRODUCER (not exactly the same, since systemd-run fails in Timeout for me). 1. Create 1000 loopback devices and associated mount points # for i in $(seq 1 1000); do truncate -s 10M /root/lo_$i.img; mkfs.ext4 /root/lo_$i.img; mkdir /mnt/$i; echo "/root/lo_$i.img /mnt/$i ext4 defaults,noauto 0 0" >> /etc/fstab; done 2. Mount the devices # for i in $(seq 1 1000); do mount /mnt/$i; done 3. Strace systemd (to slow it down) # strace -ttTvyy -o /tmp/systemd.strace -s 256 -p 1 & 4. Perform a remount of one device in loop, with a small pause # while :; do mount -o remount /mnt/999; sleep 0.1; done 5. From another terminal, try to execute systemd-run # date; systemd-run --scope -- echo "HELLO from the scope"; date Result: Mon Feb 22 14:38:25 CET 2021 --> Hang (or timeout) until I hit Ctrl-C on the mount running in loop Running scope as unit: run-r0387c9f5da614c26aff4a933b9a65059.scope HELLO from the scope Mon Feb 22 14:39:04 CET 2021 From the strace, I can see that systemd-run executed only once I stopped the mount running in loop, which even indicates that mountinfo is of higher priority over incoming connections. Of course, this reproducer is slightly different than the observed behaviour, since here even systemd-run cannot connect at all. In fact it really depends on the dynamic. We can however mimic the behaviour by allowing systemd-run to connect to systemd before starting the mount in loop, stopping it using gdb, as shown below: 1. Monitor systemd using strace # strace -ttTvyy -o /tmp/systemd_83.strace -s 256 -p 1 & 2. Set a gdb breakpoint on run.c:1219 # gdb --args systemd-run --scope -- echo "HELLO from the scope" [...] (gdb) break run.c:1219 (gdb) run [...] Breakpoint 1, start_transient_scope (argv=<optimized out>, bus=<optimized out>) at ../src/run/run.c:1219 1219 r = sd_bus_call(bus, m, 0, &error, &reply); (gdb) 3. From another terminal, start the mount in loop # while :; do mount -o remount /mnt/999; sleep 0.1; done 4. Continue the systemd-run execution (gdb) cont Continuing. Checking the strace, we then see the connection was made but not handled, due to the mountinfo updates. This looks like a different manifestation of bug 1819868 to me... If the time spent in parsing mountinfo is reduced significantly, this problem will just disappear, won't it? I don't think it's same cause, since this also happens on my 8.3 system (which doesn't send PropertiesChanged anymore). Still I cannot confirm since the customer is on 8.2. The issue here seems more with the event priority: mountinfo seems higher priority than the rest (including starting units) so triggers over and over. Then, when there is no mountinfo change, new incoming connections are handled. Finally when systemd is idle, units are started. (In reply to Renaud Métrich from comment #4) > I don't think it's same cause, since this also happens on my 8.3 system > (which doesn't send PropertiesChanged anymore). But the problem with excessive parsing of mountinfo is still there. > Still I cannot confirm since the customer is on 8.2. > > The issue here seems more with the event priority: mountinfo seems higher > priority than the rest (including starting units) so triggers over and over. > Then, when there is no mountinfo change, new incoming connections are > handled. Finally when systemd is idle, units are started. That "triggers over and over" is the real problem which the fix for bug 1819868 tries to address. If that is fixed, events with lower priority do get a chance to run and everything works fine. *** This bug has been marked as a duplicate of bug 1819868 *** |