1931535 – systemd-run executions are delayed by several minutes due to heavy mount activity

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1931535 - systemd-run executions are delayed by several minutes due to heavy mount activity

Summary: systemd-run executions are delayed by several minutes due to heavy mount acti...

Keywords:
Status:	CLOSED DUPLICATE of bug 1819868
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	8.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	systemd-maint
QA Contact:	Frantisek Sumsal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-22 16:17 UTC by Renaud Métrich
Modified:	2021-04-13 20:48 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-08 08:45:58 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Renaud Métrich 2021-02-22 16:17:15 UTC

Description of problem:

A customer running Openshift 4.5 encounters an issue where "systemd-run mount ..." commands hang for several minutes.
These commands are issued by kubelet to mount temporary directories such as the ones used to store secrets.

The strace of systemd shows that systemd is spending its time processing /proc/self/mountinfo and sending PropertiesChanged messages.
It accepts the systemd-run connections but doesn't process them immediately (there are gaps of 1 minute or so between the authentication and the transient unit startup).

It looks like to me the issue is due to having /proc/self/mountinfo be of higher priority than "normal events".
Additionally, upon a "systemd-run mount ..." executing, this creates a new mount that will again be processed by /proc/self/mountinfo, adding load to systemd.

I'm not able to reproduce exactly but can reproduce somethings similar: when remounting in loop with some delay (0.5 seconds) a mount while stracing systemd (to slow it down), systemd-run commands never execute (they fail in timeout).

This happens on both 8.2 and 8.3, hence even when PropertiesChanged are not sent (on 8.3 there was some optimization apparently).


Version-Release number of selected component (if applicable):

systemd-239-31.el8_2.2.x86_64


How reproducible:

Easily on customer environment.

Comment 2 Renaud Métrich 2021-02-22 16:19:28 UTC

PARTIAL REPRODUCER (not exactly the same, since systemd-run fails in Timeout for me).

1. Create 1000 loopback devices and associated mount points

  # for i in $(seq 1 1000); do truncate -s 10M /root/lo_$i.img; mkfs.ext4 /root/lo_$i.img; mkdir /mnt/$i; echo "/root/lo_$i.img /mnt/$i ext4 defaults,noauto 0 0" >> /etc/fstab; done

2. Mount the devices

  # for i in $(seq 1 1000); do mount /mnt/$i; done

3. Strace systemd (to slow it down)

  # strace -ttTvyy -o /tmp/systemd.strace -s 256 -p 1 &

4. Perform a remount of one device in loop, with a small pause

  # while :; do mount -o remount /mnt/999; sleep 0.1; done

5. From another terminal, try to execute systemd-run

  # date; systemd-run --scope -- echo "HELLO from the scope"; date

Result:
  
  Mon Feb 22 14:38:25 CET 2021

  --> Hang (or timeout) until I hit Ctrl-C on the mount running in loop

  Running scope as unit: run-r0387c9f5da614c26aff4a933b9a65059.scope
  HELLO from the scope
  Mon Feb 22 14:39:04 CET 2021

From the strace, I can see that systemd-run executed only once I stopped the mount running in loop, which even indicates that mountinfo is of higher priority over incoming connections.
Of course, this reproducer is slightly different than the observed behaviour, since here even systemd-run cannot connect at all.
In fact it really depends on the dynamic.

We can however mimic the behaviour by allowing systemd-run to connect to systemd before starting the mount in loop, stopping it using gdb, as shown below:

1. Monitor systemd using strace

  # strace -ttTvyy -o /tmp/systemd_83.strace -s 256 -p 1 &

2. Set a gdb breakpoint on run.c:1219

  # gdb --args systemd-run --scope -- echo "HELLO from the scope"
  [...]
  (gdb) break run.c:1219
  (gdb) run
  [...]
  Breakpoint 1, start_transient_scope (argv=<optimized out>, bus=<optimized out>) at ../src/run/run.c:1219
  1219	        r = sd_bus_call(bus, m, 0, &error, &reply);
  (gdb)

3. From another terminal, start the mount in loop

  # while :; do mount -o remount /mnt/999; sleep 0.1; done

4. Continue the systemd-run execution

  (gdb) cont
  Continuing.


Checking the strace, we then see the connection was made but not handled, due to the mountinfo updates.

Comment 3 David Tardon 2021-02-23 15:51:34 UTC

This looks like a different manifestation of bug 1819868 to me... If the time spent in parsing mountinfo is reduced significantly, this problem will just disappear, won't it?

Comment 4 Renaud Métrich 2021-02-23 17:47:41 UTC

I don't think it's same cause, since this also happens on my 8.3 system (which doesn't send PropertiesChanged anymore).
Still I cannot confirm since the customer is on 8.2.

The issue here seems more with the event priority: mountinfo seems higher priority than the rest (including starting units) so triggers over and over.
Then, when there is no mountinfo change, new incoming connections are handled. Finally when systemd is idle, units are started.

Comment 5 David Tardon 2021-02-24 09:27:07 UTC

(In reply to Renaud Métrich from comment #4)
> I don't think it's same cause, since this also happens on my 8.3 system
> (which doesn't send PropertiesChanged anymore).

But the problem with excessive parsing of mountinfo is still there.

> Still I cannot confirm since the customer is on 8.2.
> 
> The issue here seems more with the event priority: mountinfo seems higher
> priority than the rest (including starting units) so triggers over and over.
> Then, when there is no mountinfo change, new incoming connections are
> handled. Finally when systemd is idle, units are started.

That "triggers over and over" is the real problem which the fix for bug 1819868 tries to address. If that is fixed, events with lower priority do get a chance to run and everything works fine.

Comment 8 David Tardon 2021-03-08 08:45:58 UTC


*** This bug has been marked as a duplicate of bug 1819868 ***

Note You need to log in before you can comment on or make changes to this bug.