Bug 2049056 - [tracker] [OCP 4.9] Include fix for xfs metadata corruption in RHCOS
Summary: [tracker] [OCP 4.9] Include fix for xfs metadata corruption in RHCOS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.9.z
Assignee: Micah Abbott
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 2049055 2049291
Blocks: 2049057
TreeView+ depends on / blocked
 
Reported: 2022-02-01 12:46 UTC by Mario Abajo
Modified: 2022-04-12 04:16 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2049055
Environment:
Last Closed: 2022-03-16 11:39:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2022:0798 0 None None None 2022-03-16 11:39:33 UTC

Description Mario Abajo 2022-02-01 12:46:34 UTC
+++ This bug was initially created as a clone of Bug #2049055 +++

Description of the problem:

Request to include fix for the issue described in bugzilla #2020764 [OCP node XFS metadata corruption after numerous reboots] in Openshift RHCOS.

Comment 2 Micah Abbott 2022-02-01 13:54:27 UTC
In order to get this into RHCOS 4.9, we need the fix backported into RHEL 8.4.z EUS.

I've requested the backport here - https://bugzilla.redhat.com/show_bug.cgi?id=2020764#c25

If the z-stream request is accepted, I'll reset the DependsOn field to point to the 8.4.z BZ.

Comment 5 Micah Abbott 2022-02-14 14:50:54 UTC
@Rio thanks for testing the hotfix, but this tracker BZ cannot be VERIFIED until the fixed kernel build lands in a version of RHCOS that will be shipped to all customers.

Moving this back to ASSIGNED

Comment 7 Micah Abbott 2022-03-08 19:07:10 UTC
The fixed kernel (kernel-4.18.0-305.40.1.el8_4) for 8.4.z was shipped as part of https://access.redhat.com/errata/RHSA-2022:0777

It was included in RHCOS 410.84.202203081640-0 and will be included in a future OCP 4.10.z release payload

Comment 10 Rio Liu 2022-03-10 09:23:06 UTC
verified kernel version with 4.9.24

oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.24    True        False         4m50s   Cluster version is 4.9.24

oc get node -o wide                                                                                                                                                                                      1 ↵
NAME                                         STATUS   ROLES    AGE   VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-151-81.us-east-2.compute.internal    Ready    master   30m   v1.22.5+5c84e52   10.0.151.81    <none>        Red Hat Enterprise Linux CoreOS 49.84.202203081945-0 (Ootpa)   4.18.0-305.40.1.el8_4.x86_64   cri-o://1.22.2-2.rhaos4.9.gitb030be8.el8
ip-10-0-152-76.us-east-2.compute.internal    Ready    worker   22m   v1.22.5+5c84e52   10.0.152.76    <none>        Red Hat Enterprise Linux CoreOS 49.84.202203081945-0 (Ootpa)   4.18.0-305.40.1.el8_4.x86_64   cri-o://1.22.2-2.rhaos4.9.gitb030be8.el8
ip-10-0-170-224.us-east-2.compute.internal   Ready    master   32m   v1.22.5+5c84e52   10.0.170.224   <none>        Red Hat Enterprise Linux CoreOS 49.84.202203081945-0 (Ootpa)   4.18.0-305.40.1.el8_4.x86_64   cri-o://1.22.2-2.rhaos4.9.gitb030be8.el8
ip-10-0-175-29.us-east-2.compute.internal    Ready    worker   22m   v1.22.5+5c84e52   10.0.175.29    <none>        Red Hat Enterprise Linux CoreOS 49.84.202203081945-0 (Ootpa)   4.18.0-305.40.1.el8_4.x86_64   cri-o://1.22.2-2.rhaos4.9.gitb030be8.el8
ip-10-0-203-38.us-east-2.compute.internal    Ready    master   31m   v1.22.5+5c84e52   10.0.203.38    <none>        Red Hat Enterprise Linux CoreOS 49.84.202203081945-0 (Ootpa)   4.18.0-305.40.1.el8_4.x86_64   cri-o://1.22.2-2.rhaos4.9.gitb030be8.el8
ip-10-0-220-112.us-east-2.compute.internal   Ready    worker   22m   v1.22.5+5c84e52   10.0.220.112   <none>        Red Hat Enterprise Linux CoreOS 49.84.202203081945-0 (Ootpa)   4.18.0-305.40.1.el8_4.x86_64   cri-o://1.22.2-2.rhaos4.9.gitb030be8.el8

for node in $(oc get node -o name);do echo;oc debug $node -- chroot /host uname -r;done                                                                                                                130 ↵

Starting pod/ip-10-0-151-81us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
4.18.0-305.40.1.el8_4.x86_64

Removing debug pod ...

Starting pod/ip-10-0-152-76us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
4.18.0-305.40.1.el8_4.x86_64

Removing debug pod ...

Starting pod/ip-10-0-170-224us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
4.18.0-305.40.1.el8_4.x86_64

Removing debug pod ...

Starting pod/ip-10-0-175-29us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
4.18.0-305.40.1.el8_4.x86_64

Removing debug pod ...

Starting pod/ip-10-0-203-38us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
4.18.0-305.40.1.el8_4.x86_64

Removing debug pod ...

Starting pod/ip-10-0-220-112us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
4.18.0-305.40.1.el8_4.x86_64

Removing debug pod ...

@miabbott is there any other info need to check?

Comment 11 Micah Abbott 2022-03-10 13:36:55 UTC
> @miabbott is there any other info need to check?

As this is a tracker BZ, we are only verifying that the fixed package was included in RHCOS/OCP.  Verification of the actual problem being fixed is handled by the respective RHEL QE team.

Thanks for the verification; moving to VERIFIED

Comment 13 errata-xmlrpc 2022-03-16 11:39:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.24 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0798

Comment 14 HuijingHei 2022-04-12 04:16:04 UTC
According to original https://bugzilla.redhat.com/show_bug.cgi?id=2020764#c48, which is covered by automation test, set the flag qe_test_coverage to '+'


Note You need to log in before you can comment on or make changes to this bug.