Bug 2105082
Summary: | Cancel rpm-ostree transaction after failed rebase | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Colin Walters <walters> |
Component: | Machine Config Operator | Assignee: | Colin Walters <walters> |
Machine Config Operator sub component: | Machine Config Operator | QA Contact: | Rio Liu <rioliu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | mkrejci, skumari, sregidor |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-07-25 14:20:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2057544 | ||
Bug Blocks: |
Description
Colin Walters
2022-07-07 20:37:34 UTC
Hello, I have tried to verify the BZ using the strace command as we did in 4.8. Unfortunately, in versions 4.6 and 4.7 strace version does not admit --fault and --inject options. sh-4.4# strace -f --fault connect:error=EPERM:when=2 rpm-ostree upgrade strace: invalid option -- '-' Try 'strace -h' for more information. # strace -V strace -- version 5.1 Copyright (c) 1991-2019 The strace developers <https://strace.io>. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Optional features enabled: stack-trace=libdw stack-demangle m32-mpers mx32-mpers Do you know if there is any way to reproduce this error with this strace version? Thank you very much! It should work to pull a newer strace binary from RHEL; you're not restricted to the stock one. `rpm-ostree usroverlay` will make /usr/ transiently writable, then get the latest from https://access.redhat.com/errata/RHEA-2022:2026 or so and `rpm -Uvh http://url/strace.rpm` Verified the BZ upgrading 4.6.59 to 4.7. In order to verify the BZ we need to update the version of strace package in 4.6.59 cluster nodes To update strace package in 4.6.59 nodes: 1) oc debug node/$NODENAME; chroot /host 2) curl http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/strace-5.13-4.el8.x86_64.rpm -o /tmp/strace-5.13-4.el8.x86_64.rpm 3) rpm-ostree usroverlay 4) rpm-ostree override replace /tmp/strace-5.13-4.el8.x86_64.rpm 5) reboot node 6) verify strace version sh-4.4# strace -V strace -- version 5.13 Reproducing the bug, upgrade 4.6.59 -> 4.7.54 (does not contain the fix): 1) Update strace package to 5.13 as described above in master and worker nodes 2) In master and worker nodes execute strace -f --fault connect:error=EPERM:when=2 rpm-ostree upgrade 3) In master and worker nodes run "rpm-ostree upgrade", the command will be stuck, type ctrl+z to continue 4) Upgrade to 4.7.54 oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release:4.7.54-x86_64 --force --allow-explicit-upgrade 5) After the upgrade we can see the pools reporting degraded status because: - lastTransitionTime: "2022-07-11T13:39:52Z" message: 'Node ip-10-0-133-239.us-east-2.compute.internal is reporting: "failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f1d1872a8db06bd1a60c222a8c26d9615e295b6837fa140ea57c0690028858e2 : with stdout output: : error running rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-523033659/srv/repo:db8b72f0b3fe7f887f9bbe64b2317eaff79aec45e7ee3e0d3f98be32253fb11f --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f1d1872a8db06bd1a60c222a8c26d9615e295b6837fa140ea57c0690028858e2 --custom-origin-description Managed by machine-config-operator: exit status 1\nerror: Transaction in progress: (null)\n"' reason: 1 nodes are reporting degraded status on sync 6)Executing `systemctl restart rpm-ostreed` will manually fix the issue and the upgrade will finish OK. Verifying the fix, upgrade 4.6.59 -> 4.7.0-0.nightly-2022-07-08-193842 (contains the fix): 1) Update strace package to 5.13 as described above 2) In master and worker node execute strace -f --fault connect:error=EPERM:when=2 rpm-ostree upgrade 3) Run "rpm-ostree upgrade", the command will be stuck, type ctrl+z to continue 4) Upgrade to 4.7.0-0.nightly-2022-07-08-193842 oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.7.0-0.nightly-2022-07-08-193842 --force --allow-explicit-upgrade 5) The upgrade should finish OK without manual intervention 6) We can see in the MCD logs that the pending transaction is detected and the service was restarted to fix it I0711 22:45:54.654047 81279 start.go:108] Version: v4.7.0-202207081845.p0.g54116a9.assembly.stream-dirty (54116a94477f669acb9b00b43069a6c8b1ca6282) I0711 22:45:54.656251 81279 start.go:121] Calling chroot("/rootfs") I0711 22:45:54.656319 81279 update.go:1969] Running: systemctl start rpm-ostreed I0711 22:45:54.664154 81279 rpm-ostree.go:325] Running captured: rpm-ostree status --json W0711 22:45:54.700073 81279 rpm-ostree.go:129] Detected active transaction during daemon startup, restarting to clear it I0711 22:45:54.700092 81279 update.go:1969] Running: systemctl restart rpm-ostreed I0711 22:45:54.760683 81279 rpm-ostree.go:325] Running captured: rpm-ostree status --json I0711 22:45:54.795165 81279 daemon.go:222] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fedf46061ae5b3c7915b6ce5d1b7f4c0f0f5be761d467a95edc6b6e150d3b727 (46.82.202206080340-0) $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2022-07-08-193842 True False 27s Cluster version is 4.7.0-0.nightly-2022-07-08-193842 We move the status to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.55 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5660 |