Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Cause:
it might happen that by the time PID 1 adds our process to the scope unit the process might already have died, if the process is short-running (such as an invocation to /bin/true).
Consequence:
When systemd picked some recycled name for scope following error could appear:
'Failed to start transient scope unit: Unit run-XXXXX.scope already exists.'
Fix:
Synchronously wait until the scope unit we create is started.
Result:
It should work now.
Description of problem:
Similar bug was discussed on [1] and [2]. Sometimes due to race condition, unit might show up as already existing, even though the process has already finished.
We face this problem on oVirt VDSM testing server, sometimes this happens:
$ /usr/sbin/ifdown enp1s0f1
$ /usr/bin/systemd-run --scope --slice=vdsm-dhclient /usr/sbin/ifup enp1s0f1
'Failed to start transient scope unit: Unit run-13034.scope already exists.'
When we wait a second and then retry, it's OK.
A fix for this problem was introduced in systemd v220 which is not available for EL7. Would it be possible to backport it?
Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.2 Beta (Maipo)
Linux 3.10.0-322.el7.x86_64
systemd-219-18.el7.x86_64
How reproducible:
On our test server sometimes, but in mentioned bugs [1] [2] they were able to reproduce it always.
Steps to Reproduce [1]:
[root@axis-00408cc563e5 /mnt/flash/root]27929# cat stress
#!/bin/sh
systemd-run --scope /bin/true &
systemd-run --scope /bin/true &
systemd-run --scope /bin/true &
systemd-run --scope /bin/true &
systemd-run --scope /bin/true &
systemd-run --scope /bin/true &
systemd-run --scope /bin/true &
systemd-run --scope /bin/true &
[root@axis-00408cc563e5 /mnt/flash/root]27929# ./stress
[root@axis-00408cc563e5 /mnt/flash/root]27929# Running as unit run-27947.scope.
Running as unit run-27946.scope.
Running as unit run-27945.scope.
Running as unit run-27948.scope.
Running as unit run-27952.scope.
Running as unit run-27950.scope.
Running as unit run-27951.scope.
Running as unit run-27949.scope.
[root@axis-00408cc563e5 /mnt/flash/root]27929# systemctl -t scope
UNIT LOAD ACTIVE SUB DESCRIPTION
run-27945.scope loaded active running /bin/true
run-27946.scope loaded active running /bin/true
run-27947.scope loaded active running /bin/true
run-27948.scope loaded active running /bin/true
run-27949.scope loaded active running /bin/true
run-27950.scope loaded active running /bin/true
run-27951.scope loaded active running /bin/true
run-27952.scope loaded active running /bin/true
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
8 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
[root@axis-00408cc563e5 /mnt/flash/root]27929# systemctl status run-27945.scope
● run-27945.scope - /bin/true
Loaded: loaded (/run/systemd/system/run-27945.scope; static)
Drop-In: /run/systemd/system/run-27945.scope.d
└─50-Description.conf
Active: active (running) since Fri 2014-10-24 13:13:26 GMT; 15s ago
Steps to reproduce [2]:
[robryk@sharya-rana bin]$ systemctl --user daemon-reload
[robryk@sharya-rana bin]$ systemctl --version
systemd 226
+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD +IDN
[robryk@sharya-rana bin]$ systemd-run --user --scope --unit=foobar /bin/sleep 5
Running scope as unit foobar.scope.
[robryk@sharya-rana bin]$ systemctl status --user foobar.scope
● foobar.scope - /bin/sleep 5
Loaded: loaded (/run/user/1000/systemd/user/foobar.scope; static; vendor preset: enabled)
Drop-In: /run/user/1000/systemd/user/foobar.scope.d
└─50-Description.conf
Active: active (running) since Wed 2015-09-23 02:51:56 CEST; 30s ago
Sep 23 02:51:56 sharya-rana systemd[763]: Started /bin/sleep 5.
[robryk@sharya-rana bin]$ systemd-run --user --scope --unit=foobar /bin/sleep 5
Failed to start transient scope unit: Unit foobar.scope already exists.
[1] https://bugs.freedesktop.org/show_bug.cgi?id=86520
[2] https://github.com/systemd/systemd/issues/1351
Note this mail from Lennart Poettering. This patch may not be 100% solution, but it's better than nothing:
On Thu, 15.10.15 13:25, Petr Horacek (phoracek) wrote:
> Hello,
>
> recently we encountered strange systemd problem on automated tests of
> networking
> part of oVirt VDSM project.
>
> Sometimes this happens:
> $ /usr/sbin/ifdown enp1s0f1
> $ /usr/bin/systemd-run --scope --slice=vdsm-dhclient /usr/sbin/ifup enp1s0f1
> Failed to start transient scope unit: Unit run-13034.scope already exists.
>
> systemd-run should create a new scope every time it's called, should not
> it? Could it be
> a racefull bug in systemd?
The code for this is actually really naive... the number is just the
PID of the caller, and there's no check at all to ensure it is
unique. PIDs overrun easily, hence this is not nice at all...
What's even worse: when you use -H or -M to invoke things remotely we
still pick the client side PID for the name....
I figure we should rework this to pick some sufficiently large random
token instead, so that this is unlikely to conflict without actually
having to check for conflicts.
In the meantime, you should be able to fix this by explicitly picking
a randomized name for the scope using --unit=. For example, consider
just adding --unit=`uuidgen` to your command line, and the clashes
should not happen anymore.
> I found recently added issue [1] which describes similar problem,
> but with --unit instead of --slice. Note that our machine which
> reproduced it has systemd older than v220.
>
> Is it possible, that this is the same case as described in [1] and
> therefore it should be
> fixed in systemd 220?
>
> Is it possible to backport [1]'s fix to EL7?
Well, there are still cases where we unable to clean up scope units
properly, because we don't get any notifications for them when they
run empty. But yeah the current upstream versions should be better
than older versions.
Lennart
--
Lennart Poettering, Red Hat
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHBA-2016-2216.html
Description of problem: Similar bug was discussed on [1] and [2]. Sometimes due to race condition, unit might show up as already existing, even though the process has already finished. We face this problem on oVirt VDSM testing server, sometimes this happens: $ /usr/sbin/ifdown enp1s0f1 $ /usr/bin/systemd-run --scope --slice=vdsm-dhclient /usr/sbin/ifup enp1s0f1 'Failed to start transient scope unit: Unit run-13034.scope already exists.' When we wait a second and then retry, it's OK. A fix for this problem was introduced in systemd v220 which is not available for EL7. Would it be possible to backport it? Version-Release number of selected component (if applicable): Red Hat Enterprise Linux Server release 7.2 Beta (Maipo) Linux 3.10.0-322.el7.x86_64 systemd-219-18.el7.x86_64 How reproducible: On our test server sometimes, but in mentioned bugs [1] [2] they were able to reproduce it always. Steps to Reproduce [1]: [root@axis-00408cc563e5 /mnt/flash/root]27929# cat stress #!/bin/sh systemd-run --scope /bin/true & systemd-run --scope /bin/true & systemd-run --scope /bin/true & systemd-run --scope /bin/true & systemd-run --scope /bin/true & systemd-run --scope /bin/true & systemd-run --scope /bin/true & systemd-run --scope /bin/true & [root@axis-00408cc563e5 /mnt/flash/root]27929# ./stress [root@axis-00408cc563e5 /mnt/flash/root]27929# Running as unit run-27947.scope. Running as unit run-27946.scope. Running as unit run-27945.scope. Running as unit run-27948.scope. Running as unit run-27952.scope. Running as unit run-27950.scope. Running as unit run-27951.scope. Running as unit run-27949.scope. [root@axis-00408cc563e5 /mnt/flash/root]27929# systemctl -t scope UNIT LOAD ACTIVE SUB DESCRIPTION run-27945.scope loaded active running /bin/true run-27946.scope loaded active running /bin/true run-27947.scope loaded active running /bin/true run-27948.scope loaded active running /bin/true run-27949.scope loaded active running /bin/true run-27950.scope loaded active running /bin/true run-27951.scope loaded active running /bin/true run-27952.scope loaded active running /bin/true LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 8 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'. [root@axis-00408cc563e5 /mnt/flash/root]27929# systemctl status run-27945.scope ● run-27945.scope - /bin/true Loaded: loaded (/run/systemd/system/run-27945.scope; static) Drop-In: /run/systemd/system/run-27945.scope.d └─50-Description.conf Active: active (running) since Fri 2014-10-24 13:13:26 GMT; 15s ago Steps to reproduce [2]: [robryk@sharya-rana bin]$ systemctl --user daemon-reload [robryk@sharya-rana bin]$ systemctl --version systemd 226 +PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD +IDN [robryk@sharya-rana bin]$ systemd-run --user --scope --unit=foobar /bin/sleep 5 Running scope as unit foobar.scope. [robryk@sharya-rana bin]$ systemctl status --user foobar.scope ● foobar.scope - /bin/sleep 5 Loaded: loaded (/run/user/1000/systemd/user/foobar.scope; static; vendor preset: enabled) Drop-In: /run/user/1000/systemd/user/foobar.scope.d └─50-Description.conf Active: active (running) since Wed 2015-09-23 02:51:56 CEST; 30s ago Sep 23 02:51:56 sharya-rana systemd[763]: Started /bin/sleep 5. [robryk@sharya-rana bin]$ systemd-run --user --scope --unit=foobar /bin/sleep 5 Failed to start transient scope unit: Unit foobar.scope already exists. [1] https://bugs.freedesktop.org/show_bug.cgi?id=86520 [2] https://github.com/systemd/systemd/issues/1351