Description of problem: PV backed by FC lun is not being unmounted properly and this leads to IO errors / xfs corruption. We can see from the logs and the k8 1.12 code that there is an unclear removal of the paths before removing the map. This results in the IO errors and are expected when this state occurs. Will provide notes/logs. Version-Release number of selected component (if applicable): 3.11.306-1 How reproducible: Happens any time the node is drained. Steps to Reproduce: 1. https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#evacuating-pods-on-nodes 2. 3. Actual results: After a node drain, the FC backed PVs start having consistency errors like below: ~~~ [ 6489.218162] XFS (dm-152): Metadata CRC error detected at xfs_inobt_read_verify+0x79/0xb0 [xfs], xfs_inobt block 0x18 [ 6489.218456] XFS (dm-152): metadata I/O error: block 0x18 ("xfs_trans_read_buf_map") error 74 numblks 8 [ 6489.222223] XFS (dm-152): Metadata CRC error detected at xfs_inobt_read_verify+0x79/0xb0 [xfs], xfs_inobt block 0x18 [ 6489.222589] XFS (dm-152): metadata I/O error: block 0x18 ("xfs_trans_read_buf_map") error 74 numblks 8 [ 6489.222890] XFS (dm-152): Metadata CRC error detected at xfs_inobt_read_verify+0x79/0xb0 [xfs], xfs_inobt block 0x18 [ 6489.223154] XFS (dm-152): metadata I/O error: block 0x18 ("xfs_trans_read_buf_map") error 74 numblks 8 [ 6489.505911] XFS (dm-152): Internal error XFS_WANT_CORRUPTED_GOTO at line 1637 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_extent+0xaa/0x140 [xfs] [ 6489.506006] [<ffffffffc03a13db>] xfs_error_report+0x3b/0x40 [xfs] ~~~ Expected results: Pods are drained and storage is consistent. Additional info:
Thanks for a great explanation and links to code! The fix looks straightforward now.
Upstream PR: https://github.com/kubernetes/kubernetes/pull/97013
Waiting for upstream to un-freeze.
Verified with: 4.7.0-0.nightly-2021-02-02-223803
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633