Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2108686

Summary: rpm-ostreed: start limit hit easily
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: RHCOSAssignee: Colin Walters <walters>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: high    
Version: 4.12CC: dornelas, jligon, mnguyen, mrussell, nstielau, smilner, sregidor, vlaad
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:21:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2108320    
Bug Blocks:    

Description OpenShift BugZilla Robot 2022-07-19 17:15:25 UTC
+++ This bug was initially created as a clone of Bug #2108320 +++

See https://github.com/openshift/os/pull/898

A recent PR in the MCO openshift/machine-config-operator#3243
tipped things over the edge and we now see failures a lot more often.

For example, in https://bugzilla.redhat.com/show_bug.cgi?id=2104978

--- Additional comment from skumari on 2022-07-19 13:14:55 UTC ---

*** Bug 2108488 has been marked as a duplicate of this bug. ***

Comment 3 Michael Nguyen 2022-07-21 13:48:50 UTC
Verified on RHCOS 411.86.202207210724-0. This however has not made it into a nightly OCP build yet.

[core@cosa-devsh ~]$ cat test.sh
#!/bin/bash
set -euo pipefail
# https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f
for x in $(seq 100); do rpm-ostree status >/dev/null; done
echo ok
[core@cosa-devsh ~]$ ./test.sh 
ok
[core@cosa-devsh ~]$ rpm-ostree status
State: idle
Deployments:
● 1994ffeef78d96e6af89e03552214df06465d75e3b4f8a4eb37aa6582814c00e
                   Version: 411.86.202207210724-0 (2022-07-21T07:27:48Z)
[core@cosa-devsh ~]$ systemctl status rpm-ostreed
● rpm-ostreed.service - rpm-ostree System Management Daemon
   Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static; vendor >
  Drop-In: /usr/lib/systemd/system/rpm-ostreed.service.d
           └─startlimit.conf
   Active: active (running) since Thu 2022-07-21 13:45:16 UTC; 52s ago
     Docs: man:rpm-ostree(1)
 Main PID: 2059 (rpm-ostree)
   Status: "clients=0; idle exit in 52 seconds"
    Tasks: 12 (limit: 5559)
   Memory: 8.1M
   CGroup: /system.slice/rpm-ostreed.service
           └─2059 /usr/bin/rpm-ostree start-daemon

Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.259 unit:sess>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: Allowing active client :1.261 (uid>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.261 unit:sess>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.261 unit:sess>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6>
Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: Allowing active client :1.263 (uid>
Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.263 unit:sess>
Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.263 unit:sess>
Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6>
[core@cosa-devsh ~]$ systemctl cat rpm-ostreed
# /usr/lib/systemd/system/rpm-ostreed.service
[Unit]
Description=rpm-ostree System Management Daemon
Documentation=man:rpm-ostree(1)
ConditionPathExists=/ostree
RequiresMountsFor=/boot

[Service]
Type=dbus
BusName=org.projectatomic.rpmostree1
# To use the read-only sysroot bits
MountFlags=slave
# We have no business accessing /var/roothome or /var/home.  In general
# the ostree design clearly avoids touching those, but since systemd offers
# us easy tools to toggle on protection, let's use them.  In the future
# it'd be nice to do something like using DynamicUser=yes for the main service,
# and have a system rpm-ostreed-transaction.service that runs privileged
# but as a subprocess.
ProtectHome=true
# Explicitly list paths here which we should never access.  The initial
# entry here ensures that the skopeo process we fork won't interact with
# application containers.
InaccessiblePaths=/var/lib/containers
NotifyAccess=main
ExecStart=/usr/bin/rpm-ostree start-daemon
ExecReload=/usr/bin/rpm-ostree reload

# /usr/lib/systemd/system/rpm-ostreed.service.d/startlimit.conf
[Unit]
# Work around for lack of https://github.com/coreos/rpm-ostree/pull/3523/commit>
# on older RHEL
StartLimitBurst=1000

Comment 4 Colin Walters 2022-07-21 14:27:52 UTC
I don't think we technically need to update the boot images for this - just machine-os-content.

The firstboot may be a bit less reliable, but we landed code to do retries in the MCO code.

Comment 5 Michael Nguyen 2022-07-21 17:46:03 UTC
From the summary in https://bugzilla.redhat.com/show_bug.cgi?id=2104978:

"So it looks like rpm-ostreed didn't start yet (but eventually was successful)"

Since it was eventually successful, I agree with Colin that machine-os-content should be enough since it self resolves.

Comment 9 Michael Nguyen 2022-07-22 18:40:35 UTC
Verified on 4.11.0-rc.5

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.5   True        False         107s    Cluster version is 4.11.0-rc.5
$ oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ci-ln-mlj7g4t-72292-7jqbd-master-0         Ready    master   22m   v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-master-1         Ready    master   22m   v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-master-2         Ready    master   22m   v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t   Ready    worker   12m   v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-worker-b-4zdxv   Ready    worker   12m   v1.24.0+9546431
$ oc debug node/ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.2
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c978380274ed551b3d6a8ca53ab2fc1408bfad00b8c235cc7dbe523dbc251d8
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202207210724-0 (2022-07-21T07:27:48Z)
sh-4.4# cat test.sh
#!/bin/bash
set -euo pipefail
# https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f
for x in $(seq 100); do rpm-ostree status >/dev/null; done
echo ok
sh-4.4# chmod +x test.sh
sh-4.4# ./test.sh 
ok
sh-4.4# systemctl cat rpm-ostreed.service
# /usr/lib/systemd/system/rpm-ostreed.service
[Unit]
Description=rpm-ostree System Management Daemon
Documentation=man:rpm-ostree(1)
ConditionPathExists=/ostree
RequiresMountsFor=/boot

[Service]
Type=dbus
BusName=org.projectatomic.rpmostree1
# To use the read-only sysroot bits
MountFlags=slave
# We have no business accessing /var/roothome or /var/home.  In general
# the ostree design clearly avoids touching those, but since systemd offers
# us easy tools to toggle on protection, let's use them.  In the future
# it'd be nice to do something like using DynamicUser=yes for the main service,
# and have a system rpm-ostreed-transaction.service that runs privileged
# but as a subprocess.
ProtectHome=true
# Explicitly list paths here which we should never access.  The initial
# entry here ensures that the skopeo process we fork won't interact with
# application containers.
InaccessiblePaths=/var/lib/containers
NotifyAccess=main
ExecStart=/usr/bin/rpm-ostree start-daemon
ExecReload=/usr/bin/rpm-ostree reload

# /usr/lib/systemd/system/rpm-ostreed.service.d/startlimit.conf
[Unit]
# Work around for lack of https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f
# on older RHEL
StartLimitBurst=1000
sh-4.4# exit
exit
sh-4.4# exit
exit

Removing debug pod ...

Comment 10 errata-xmlrpc 2022-08-10 11:21:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069