Cause: OpenShift iSCSI volume plugin did scan an iSCSI session and discovered and mapped all its LUNs, including LUNs that were not required nor used. These unused LUNs could be added to local multipath.
Consequence: When such such an unrelated LUN is deleted on the storage backend and a new volume is created with the same LUN number, the multipath running on a node may get confused and report that filesystem on the volume is corrupted.
Fix: OpenShift iSCSI volume plugin uses manual iSCSI scanning and discovers and map only volumes really needed to be attached to a node.
Result: Unrelated volumes are not added to multipath.
Description of problem:
This is a NetApp trident setup, using a virtual netapp ontap device.
We have been seeing random fsck issues with netapp block causing corruption issue:
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: I0416 12:15:04.470996 17920 mount_linux.go:488] `fsck` error fsck from util-linux 2.23.2
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: fsck.ext2: Bad magic number in super-block while trying to open /dev/mapper/3600a09805a506576375d4f4e754d5434
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: /dev/mapper/3600a09805a506576375d4f4e754d5434:
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: The superblock could not be read or does not describe a correct ext2
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: filesystem. If the device is valid and it really contains an ext2
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: filesystem (and not swap or ufs or something else), then the superblock
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: is corrupt, and you might try running e2fsck with an alternate superblock:
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: e2fsck -b 8193 <device>
Looking at the device, we could see it already had data and a filesystem on it which is why fsck in mount_linux.go was failing.
We tried to reproduce the issue and collect data by running the following in a loop until it failed:
1) Collect pre logs
2) Delete PVC / PV
3) Collect deleted logs
4) Create PVC / PV
5) Collect created logs
6) Scale up pod
7) Wait for success / error
8) Scale down pod
9) Collect logs
What we found was the following:
1) Before the failure mount, the device already existed:
dmsetup ls (pre creation):
3600a09805a506576375d4f4e754d5434 (253:50)
2) Before the failure mount, the dm-50 already existed in multipath:
multipath (pre creation):
3600a09805a506576375d4f4e754d5434 dm-50 NETAPP ,LUN C-Mode
size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 462:0:0:49 sdhe 133:64 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
`- 461:0:0:49 sdbh 67:176 active ready running
3) Before the failure mount, sdhe and sdbh existed
lsscsi (pre creation)
[462:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdhe
[461:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdbh
4) After deleting the old PVC / PV, the device was not removed:
dmsetup ls (pvc deleted):
3600a09805a506576375d4f4e754d5434 (253:50)
5) After deleting the old PVC / PV, dm-50 still existed and went into an active/faulty/running state
multipath (pvc deleted):
3600a09805a506576375d4f4e754d5434 dm-50 NETAPP ,LUN C-Mode
size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 462:0:0:49 sdhe 133:64 active faulty running
`-+- policy='service-time 0' prio=0 status=enabled
`- 461:0:0:49 sdbh 67:176 active faulty running
6) After deleting the old PVC / PV, sdhe and sdbh still existed
lsscsi (pvc deleted)
[462:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdhe
[461:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdbh
7) On the netapp, LUN 49 was removed
8) After recreating the PVC / PV, it was still there
dmsetup ls (pvc create):
3600a09805a506576375d4f4e754d5434 (253:50)
9) After recreating the PVC / PV, it reconnected to dm-50 / 3600a09805a506576375d4f4e754d5434 and back to active/ready/running
multipath (pvc create):
3600a09805a506576375d4f4e754d5434 dm-50 NETAPP ,LUN C-Mode
size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 462:0:0:49 sdhe 133:64 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
`- 461:0:0:49 sdbh 67:176 active ready running
10) After recreating the PVC / PV, lsscsi is the same
lsscsi (pvc created)
[462:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdhe
[461:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdbh
11) On the netapp, it created a new LUN49
Other interesting thing, the DM device is 5Gi, but the LUN / PVC are only 1Gi in size.
Version-Release number of selected component (if applicable):
3.11.146
How reproducible:
Random
Steps to Reproduce:
1) Collect pre logs
2) Delete PVC / PV
3) Collect deleted logs
4) Create PVC / PV
5) Collect created logs
6) Scale up pod
7) Wait for success / error
8) Scale down pod
9) Collect logs
10) Repeat
Actual results:
Mostly successful, but randomly you will hit this issue.
Expected results:
Devices should not be getting reused.
Master Log:
Node Log (of failed PODs):
PV Dump:
PVC Dump:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rh-test
annotations:
volume.beta.kubernetes.io/storage-class: netapp-block-standard
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
StorageClass Dump (if StorageClass used by PV/PVC):
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: 2019-11-25T18:43:15Z
name: netapp-block-standard
resourceVersion: "1171291242"
selfLink: /apis/storage.k8s.io/v1/storageclasses/netapp-block-standard
uid: 58473621-0fb3-11ea-abeb-1948765234cc
parameters:
backendType: ontap-san-economy
provisioner: netapp.io/trident
reclaimPolicy: Delete
volumeBindingMode: Immediate
Additional info:
Some related issues / fixes that have happened:
https://github.com/NetApp/trident/issues/101https://github.com/NetApp/trident/issues/133https://github.com/kubernetes/kubernetes/issues/59946https://github.com/kubernetes/kubernetes/issues/60894
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:2409
Description of problem: This is a NetApp trident setup, using a virtual netapp ontap device. We have been seeing random fsck issues with netapp block causing corruption issue: Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: I0416 12:15:04.470996 17920 mount_linux.go:488] `fsck` error fsck from util-linux 2.23.2 Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: fsck.ext2: Bad magic number in super-block while trying to open /dev/mapper/3600a09805a506576375d4f4e754d5434 Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: /dev/mapper/3600a09805a506576375d4f4e754d5434: Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: The superblock could not be read or does not describe a correct ext2 Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: filesystem. If the device is valid and it really contains an ext2 Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: filesystem (and not swap or ufs or something else), then the superblock Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: is corrupt, and you might try running e2fsck with an alternate superblock: Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: e2fsck -b 8193 <device> Looking at the device, we could see it already had data and a filesystem on it which is why fsck in mount_linux.go was failing. We tried to reproduce the issue and collect data by running the following in a loop until it failed: 1) Collect pre logs 2) Delete PVC / PV 3) Collect deleted logs 4) Create PVC / PV 5) Collect created logs 6) Scale up pod 7) Wait for success / error 8) Scale down pod 9) Collect logs What we found was the following: 1) Before the failure mount, the device already existed: dmsetup ls (pre creation): 3600a09805a506576375d4f4e754d5434 (253:50) 2) Before the failure mount, the dm-50 already existed in multipath: multipath (pre creation): 3600a09805a506576375d4f4e754d5434 dm-50 NETAPP ,LUN C-Mode size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | `- 462:0:0:49 sdhe 133:64 active ready running `-+- policy='service-time 0' prio=10 status=enabled `- 461:0:0:49 sdbh 67:176 active ready running 3) Before the failure mount, sdhe and sdbh existed lsscsi (pre creation) [462:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdhe [461:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdbh 4) After deleting the old PVC / PV, the device was not removed: dmsetup ls (pvc deleted): 3600a09805a506576375d4f4e754d5434 (253:50) 5) After deleting the old PVC / PV, dm-50 still existed and went into an active/faulty/running state multipath (pvc deleted): 3600a09805a506576375d4f4e754d5434 dm-50 NETAPP ,LUN C-Mode size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 462:0:0:49 sdhe 133:64 active faulty running `-+- policy='service-time 0' prio=0 status=enabled `- 461:0:0:49 sdbh 67:176 active faulty running 6) After deleting the old PVC / PV, sdhe and sdbh still existed lsscsi (pvc deleted) [462:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdhe [461:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdbh 7) On the netapp, LUN 49 was removed 8) After recreating the PVC / PV, it was still there dmsetup ls (pvc create): 3600a09805a506576375d4f4e754d5434 (253:50) 9) After recreating the PVC / PV, it reconnected to dm-50 / 3600a09805a506576375d4f4e754d5434 and back to active/ready/running multipath (pvc create): 3600a09805a506576375d4f4e754d5434 dm-50 NETAPP ,LUN C-Mode size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | `- 462:0:0:49 sdhe 133:64 active ready running `-+- policy='service-time 0' prio=10 status=enabled `- 461:0:0:49 sdbh 67:176 active ready running 10) After recreating the PVC / PV, lsscsi is the same lsscsi (pvc created) [462:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdhe [461:0:0:49] disk NETAPP LUN C-Mode 9600 /dev/sdbh 11) On the netapp, it created a new LUN49 Other interesting thing, the DM device is 5Gi, but the LUN / PVC are only 1Gi in size. Version-Release number of selected component (if applicable): 3.11.146 How reproducible: Random Steps to Reproduce: 1) Collect pre logs 2) Delete PVC / PV 3) Collect deleted logs 4) Create PVC / PV 5) Collect created logs 6) Scale up pod 7) Wait for success / error 8) Scale down pod 9) Collect logs 10) Repeat Actual results: Mostly successful, but randomly you will hit this issue. Expected results: Devices should not be getting reused. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rh-test annotations: volume.beta.kubernetes.io/storage-class: netapp-block-standard spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi StorageClass Dump (if StorageClass used by PV/PVC): allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: creationTimestamp: 2019-11-25T18:43:15Z name: netapp-block-standard resourceVersion: "1171291242" selfLink: /apis/storage.k8s.io/v1/storageclasses/netapp-block-standard uid: 58473621-0fb3-11ea-abeb-1948765234cc parameters: backendType: ontap-san-economy provisioner: netapp.io/trident reclaimPolicy: Delete volumeBindingMode: Immediate Additional info: Some related issues / fixes that have happened: https://github.com/NetApp/trident/issues/101 https://github.com/NetApp/trident/issues/133 https://github.com/kubernetes/kubernetes/issues/59946 https://github.com/kubernetes/kubernetes/issues/60894