Bug 2108686
| Summary: | rpm-ostreed: start limit hit easily | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | RHCOS | Assignee: | Colin Walters <walters> |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.12 | CC: | dornelas, jligon, mnguyen, mrussell, nstielau, smilner, sregidor, vlaad |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:21:24 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2108320 | ||
| Bug Blocks: | |||
|
Description
OpenShift BugZilla Robot
2022-07-19 17:15:25 UTC
Verified on RHCOS 411.86.202207210724-0. This however has not made it into a nightly OCP build yet. [core@cosa-devsh ~]$ cat test.sh #!/bin/bash set -euo pipefail # https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f for x in $(seq 100); do rpm-ostree status >/dev/null; done echo ok [core@cosa-devsh ~]$ ./test.sh ok [core@cosa-devsh ~]$ rpm-ostree status State: idle Deployments: ● 1994ffeef78d96e6af89e03552214df06465d75e3b4f8a4eb37aa6582814c00e Version: 411.86.202207210724-0 (2022-07-21T07:27:48Z) [core@cosa-devsh ~]$ systemctl status rpm-ostreed ● rpm-ostreed.service - rpm-ostree System Management Daemon Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static; vendor > Drop-In: /usr/lib/systemd/system/rpm-ostreed.service.d └─startlimit.conf Active: active (running) since Thu 2022-07-21 13:45:16 UTC; 52s ago Docs: man:rpm-ostree(1) Main PID: 2059 (rpm-ostree) Status: "clients=0; idle exit in 52 seconds" Tasks: 12 (limit: 5559) Memory: 8.1M CGroup: /system.slice/rpm-ostreed.service └─2059 /usr/bin/rpm-ostree start-daemon Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.259 unit:sess> Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6> Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: Allowing active client :1.261 (uid> Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.261 unit:sess> Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.261 unit:sess> Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6> Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: Allowing active client :1.263 (uid> Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.263 unit:sess> Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.263 unit:sess> Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6> [core@cosa-devsh ~]$ systemctl cat rpm-ostreed # /usr/lib/systemd/system/rpm-ostreed.service [Unit] Description=rpm-ostree System Management Daemon Documentation=man:rpm-ostree(1) ConditionPathExists=/ostree RequiresMountsFor=/boot [Service] Type=dbus BusName=org.projectatomic.rpmostree1 # To use the read-only sysroot bits MountFlags=slave # We have no business accessing /var/roothome or /var/home. In general # the ostree design clearly avoids touching those, but since systemd offers # us easy tools to toggle on protection, let's use them. In the future # it'd be nice to do something like using DynamicUser=yes for the main service, # and have a system rpm-ostreed-transaction.service that runs privileged # but as a subprocess. ProtectHome=true # Explicitly list paths here which we should never access. The initial # entry here ensures that the skopeo process we fork won't interact with # application containers. InaccessiblePaths=/var/lib/containers NotifyAccess=main ExecStart=/usr/bin/rpm-ostree start-daemon ExecReload=/usr/bin/rpm-ostree reload # /usr/lib/systemd/system/rpm-ostreed.service.d/startlimit.conf [Unit] # Work around for lack of https://github.com/coreos/rpm-ostree/pull/3523/commit> # on older RHEL StartLimitBurst=1000 I don't think we technically need to update the boot images for this - just machine-os-content. The firstboot may be a bit less reliable, but we landed code to do retries in the MCO code. From the summary in https://bugzilla.redhat.com/show_bug.cgi?id=2104978: "So it looks like rpm-ostreed didn't start yet (but eventually was successful)" Since it was eventually successful, I agree with Colin that machine-os-content should be enough since it self resolves. Verified on 4.11.0-rc.5
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-rc.5 True False 107s Cluster version is 4.11.0-rc.5
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-mlj7g4t-72292-7jqbd-master-0 Ready master 22m v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-master-1 Ready master 22m v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-master-2 Ready master 22m v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t Ready worker 12m v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-worker-b-4zdxv Ready worker 12m v1.24.0+9546431
$ oc debug node/ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.2
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c978380274ed551b3d6a8ca53ab2fc1408bfad00b8c235cc7dbe523dbc251d8
CustomOrigin: Managed by machine-config-operator
Version: 411.86.202207210724-0 (2022-07-21T07:27:48Z)
sh-4.4# cat test.sh
#!/bin/bash
set -euo pipefail
# https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f
for x in $(seq 100); do rpm-ostree status >/dev/null; done
echo ok
sh-4.4# chmod +x test.sh
sh-4.4# ./test.sh
ok
sh-4.4# systemctl cat rpm-ostreed.service
# /usr/lib/systemd/system/rpm-ostreed.service
[Unit]
Description=rpm-ostree System Management Daemon
Documentation=man:rpm-ostree(1)
ConditionPathExists=/ostree
RequiresMountsFor=/boot
[Service]
Type=dbus
BusName=org.projectatomic.rpmostree1
# To use the read-only sysroot bits
MountFlags=slave
# We have no business accessing /var/roothome or /var/home. In general
# the ostree design clearly avoids touching those, but since systemd offers
# us easy tools to toggle on protection, let's use them. In the future
# it'd be nice to do something like using DynamicUser=yes for the main service,
# and have a system rpm-ostreed-transaction.service that runs privileged
# but as a subprocess.
ProtectHome=true
# Explicitly list paths here which we should never access. The initial
# entry here ensures that the skopeo process we fork won't interact with
# application containers.
InaccessiblePaths=/var/lib/containers
NotifyAccess=main
ExecStart=/usr/bin/rpm-ostree start-daemon
ExecReload=/usr/bin/rpm-ostree reload
# /usr/lib/systemd/system/rpm-ostreed.service.d/startlimit.conf
[Unit]
# Work around for lack of https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f
# on older RHEL
StartLimitBurst=1000
sh-4.4# exit
exit
sh-4.4# exit
exit
Removing debug pod ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |