Bug 1825915 - LUN and device mapper not removed and reused on new PVC
Summary: LUN and device mapper not removed and reused on new PVC
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.11.0
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
: 4.5.0
Assignee: aos-storage-staff@redhat.com
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-20 13:34 UTC by Matthew Robson
Modified: 2020-11-11 02:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: OpenShift iSCSI volume plugin did scan an iSCSI session and discovered and mapped all its LUNs, including LUNs that were not required nor used. These unused LUNs could be added to local multipath. Consequence: When such such an unrelated LUN is deleted on the storage backend and a new volume is created with the same LUN number, the multipath running on a node may get confused and report that filesystem on the volume is corrupted. Fix: OpenShift iSCSI volume plugin uses manual iSCSI scanning and discovers and map only volumes really needed to be attached to a node. Result: Unrelated volumes are not added to multipath.
Clone Of:
Environment:
Last Closed: 2020-07-13 17:29:07 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 25007 0 None closed Bug 1825915: UPSTREAM: 90985: Set session scanning to manual to avoid discovering all iSCSI devices during login 2020-12-02 12:47:32 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:29:22 UTC

Description Matthew Robson 2020-04-20 13:34:06 UTC
Description of problem:

This is a NetApp trident setup, using a virtual netapp ontap device.

We have been seeing random fsck issues with netapp block causing corruption issue:

Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: I0416 12:15:04.470996   17920 mount_linux.go:488] `fsck` error fsck from util-linux 2.23.2
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: fsck.ext2: Bad magic number in super-block while trying to open /dev/mapper/3600a09805a506576375d4f4e754d5434
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: /dev/mapper/3600a09805a506576375d4f4e754d5434:
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: The superblock could not be read or does not describe a correct ext2
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: filesystem.  If the device is valid and it really contains an ext2
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: filesystem (and not swap or ufs or something else), then the superblock
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: is corrupt, and you might try running e2fsck with an alternate superblock:
Apr 16 12:15:04 server.dmz atomic-openshift-node[17920]: e2fsck -b 8193 <device>

Looking at the device, we could see it already had data and a filesystem on it which is why fsck in mount_linux.go was failing.

We tried to reproduce the issue and collect data by running the following in a loop until it failed:

1) Collect pre logs
2) Delete PVC / PV
3) Collect deleted logs
4) Create PVC / PV
5) Collect created logs
6) Scale up pod
7) Wait for success / error
8) Scale down pod
9) Collect logs

What we found was the following:

1) Before the failure mount, the device already existed:
dmsetup ls (pre creation):
3600a09805a506576375d4f4e754d5434       (253:50)

2) Before the failure mount, the dm-50 already existed in multipath:
multipath (pre  creation):
3600a09805a506576375d4f4e754d5434 dm-50 NETAPP  ,LUN C-Mode
size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 462:0:0:49  sdhe 133:64  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 461:0:0:49  sdbh 67:176  active ready running

3) Before the failure mount, sdhe and sdbh existed
lsscsi (pre creation)
[462:0:0:49] disk    NETAPP   LUN C-Mode       9600  /dev/sdhe
[461:0:0:49] disk    NETAPP   LUN C-Mode       9600  /dev/sdbh


4) After deleting the old PVC / PV, the device was not removed:
dmsetup ls (pvc deleted):
3600a09805a506576375d4f4e754d5434       (253:50)

5) After deleting the old PVC / PV, dm-50 still existed and went into an active/faulty/running state
multipath (pvc deleted):
3600a09805a506576375d4f4e754d5434 dm-50 NETAPP  ,LUN C-Mode
size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 462:0:0:49  sdhe 133:64  active faulty running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 461:0:0:49  sdbh 67:176  active faulty running

6) After deleting the old PVC / PV, sdhe and sdbh still existed
lsscsi (pvc deleted)
[462:0:0:49] disk    NETAPP   LUN C-Mode       9600  /dev/sdhe
[461:0:0:49] disk    NETAPP   LUN C-Mode       9600  /dev/sdbh

7) On the netapp, LUN 49 was removed

8) After recreating the PVC / PV, it was still there
dmsetup ls (pvc create):
3600a09805a506576375d4f4e754d5434       (253:50)

9) After recreating the PVC / PV, it reconnected to dm-50 / 3600a09805a506576375d4f4e754d5434 and back to active/ready/running
multipath (pvc create):
3600a09805a506576375d4f4e754d5434 dm-50 NETAPP  ,LUN C-Mode
size=5.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 462:0:0:49  sdhe 133:64  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 461:0:0:49  sdbh 67:176  active ready running

10) After recreating the PVC / PV, lsscsi is the same
lsscsi (pvc created)
[462:0:0:49] disk    NETAPP   LUN C-Mode       9600  /dev/sdhe
[461:0:0:49] disk    NETAPP   LUN C-Mode       9600  /dev/sdbh

11) On the netapp, it created a new LUN49

Other interesting thing, the DM device is 5Gi, but the LUN / PVC are only 1Gi in size.


Version-Release number of selected component (if applicable):

3.11.146

How reproducible:

Random

Steps to Reproduce:
1) Collect pre logs
2) Delete PVC / PV
3) Collect deleted logs
4) Create PVC / PV
5) Collect created logs
6) Scale up pod
7) Wait for success / error
8) Scale down pod
9) Collect logs
10) Repeat

Actual results:
Mostly successful, but randomly you will hit this issue.

Expected results:
Devices should not be getting reused.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rh-test
  annotations:
    volume.beta.kubernetes.io/storage-class: netapp-block-standard
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

StorageClass Dump (if StorageClass used by PV/PVC):

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: 2019-11-25T18:43:15Z
  name: netapp-block-standard
  resourceVersion: "1171291242"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/netapp-block-standard
  uid: 58473621-0fb3-11ea-abeb-1948765234cc
parameters:
  backendType: ontap-san-economy
provisioner: netapp.io/trident
reclaimPolicy: Delete
volumeBindingMode: Immediate

Additional info:

Some related issues / fixes that have happened:

https://github.com/NetApp/trident/issues/101
https://github.com/NetApp/trident/issues/133

https://github.com/kubernetes/kubernetes/issues/59946
https://github.com/kubernetes/kubernetes/issues/60894

Comment 20 Qin Ping 2020-05-25 14:34:52 UTC
Verified with: 4.5.0-0.nightly-2020-05-25-052746

Comment 21 errata-xmlrpc 2020-07-13 17:29:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.