Bug 1888894
| Summary: | Unable to deploy pods with different iSCSI devices on different nodes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Stefan Orth <sorth> |
| Component: | Documentation | Assignee: | Lisa Pettyjohn <lpettyjo> |
| Status: | CLOSED WONTFIX | QA Contact: | Xiaoli Tian <xtian> |
| Severity: | medium | Docs Contact: | Latha S <lmurthy> |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | hchen, jsafrane, lmurthy, nbziouec |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | s390x | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-09-27 13:01:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Initially looking at it, the resources seem to be configured correctly. Assigning to storage for additional help in troubleshooting. "Multi-Attach error for volume "iscsi-pv0001" Volume is already used by pod(s) so-test001" tells you that you're trying to attach one volume to several nodes. Kubernetes won't do that in order to protect data on the volume - one xfs filesystem mounted twice won't do any good. > # cat iSCSI_PV_001.yaml > iscsi: > targetPortal: 10.209.9.1:3260 > iqn: iqn.1986-03.com.ibm:2145.v7k06.node1 > fsType: 'xfs' > # cat iSCSI_PV_002.yaml > iscsi: > targetPortal: 10.209.9.1:3260 > iqn: iqn.1986-03.com.ibm:2145.v7k06.node1 > fsType: 'xfs' Both PVs use the same IQN, therefore they point to the same volume on the storage backend. Kubernetes sees it's the same volume and stops. What happens when your PVs actually point to a different volume? Either different "iqn:" or "lun:" (which defaults to 0 when omitted). We have different disks and they are attached via different initiatornames. It works manually fine. I did the test again with a iSCSI Server setup by my own. I got the same result as in our standard environment. I am able to attach the different disks manually on different nodes, but if i try it wit OCP, the second POD (on another node) failed. Please see my steps and configuration: iSCSI Configuration: -------------------- /> ls o- / ......................................................................................................................... [...] o- backstores .............................................................................................................. [...] | o- block .................................................................................................. [Storage Objects: 0] | o- fileio ................................................................................................. [Storage Objects: 4] | | o- file1 .................................................................... [/tmp/disk1.img (200.0MiB) write-back activated] | | | o- alua ................................................................................................... [ALUA Groups: 1] | | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized] | | o- file2 .................................................................... [/tmp/disk2.img (200.0MiB) write-back activated] | | | o- alua ................................................................................................... [ALUA Groups: 1] | | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized] | | o- file3 .................................................................... [/tmp/disk3.img (200.0MiB) write-back activated] | | | o- alua ................................................................................................... [ALUA Groups: 1] | | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized] | | o- file4 .................................................................... [/tmp/disk4.img (200.0MiB) write-back activated] | | o- alua ................................................................................................... [ALUA Groups: 1] | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized] | o- pscsi .................................................................................................. [Storage Objects: 0] | o- ramdisk ................................................................................................ [Storage Objects: 0] o- iscsi ............................................................................................................ [Targets: 1] | o- iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b ........................................................ [TPGs: 1] | o- tpg1 ............................................................................................... [no-gen-acls, no-auth] | o- acls .......................................................................................................... [ACLs: 4] | | o- iqn.2020-10.com.m3558001.foo:991 ..................................................................... [Mapped LUNs: 1] | | | o- mapped_lun0 ................................................................................ [lun1 fileio/file1 (rw)] | | o- iqn.2020-10.com.m3558001.foo:992 ..................................................................... [Mapped LUNs: 1] | | | o- mapped_lun0 ................................................................................ [lun2 fileio/file2 (rw)] | | o- iqn.2020-10.com.m3558001.foo:993 ..................................................................... [Mapped LUNs: 2] | | | o- mapped_lun0 ................................................................................ [lun3 fileio/file3 (rw)] | | | o- mapped_lun4 ................................................................................ [lun4 fileio/file4 (rw)] | | o- iqn.2020-10.com.m3558001.foo:994 ..................................................................... [Mapped LUNs: 0] | o- luns .......................................................................................................... [LUNs: 4] | | o- lun1 ............................................................... [fileio/file1 (/tmp/disk1.img) (default_tg_pt_gp)] | | o- lun2 ............................................................... [fileio/file2 (/tmp/disk2.img) (default_tg_pt_gp)] | | o- lun3 ............................................................... [fileio/file3 (/tmp/disk3.img) (default_tg_pt_gp)] | | o- lun4 ............................................................... [fileio/file4 (/tmp/disk4.img) (default_tg_pt_gp)] | o- portals .................................................................................................... [Portals: 1] | o- 10.107.1.51:3260 ................................................................................................. [OK] o- loopback ......................................................................................................... [Targets: 0] o- qla2xxx .......................................................................................................... [Targets: 0] ------------------------------------------------------------------------ Manually: (0) Configure /etc/iscsi/initiatorname.iscsi [core@worker-001 ~]$ cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.2020-10.com.m3558001.foo:991 [core@worker-002 ~]$ cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.2020-10.com.m3558001.foo:992 (1) Map and mount disks to worker-001 and worker-002 manually. [core@worker-001 ~]$ sudo iscsiadm -m discovery -t sendtargets -p 10.107.1.51:3260 10.107.1.51:3260,1 iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b [core@worker-002 ~]$ sudo iscsiadm -m discovery -t sendtargets -p 10.107.1.51:3260 10.107.1.51:3260,1 iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b [core@worker-001 ~]$ sudo iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b -p 10.107.1.51:3260 -l Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b, portal: 10.107.1.51,3260] Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b, portal: 10.107.1.51,3260] successful. [core@worker-002 ~]$ sudo iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b -p 10.107.1.51:3260 -l Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b, portal: 10.107.1.51,3260] Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b, portal: 10.107.1.51,3260] successful. dmesg worker-001: ---------------- [402135.437926] scsi host0: iSCSI Initiator over TCP/IP [402135.444453] scsi 0:0:0:0: Direct-Access LIO-ORG file1 4.0 PQ: 0 ANSI: 5 [402135.445409] scsi 0:0:0:0: alua: supports implicit and explicit TPGS [402135.445415] scsi 0:0:0:0: alua: device naa.6001405238d513b66614f9db344636d0 port group 0 rel port 1 [402135.446229] sd 0:0:0:0: Attached scsi generic sg0 type 0 [402135.455975] sd 0:0:0:0: [sda] 409600 512-byte logical blocks: (210 MB/200 MiB) [402135.456162] sd 0:0:0:0: [sda] Write Protect is off [402135.456167] sd 0:0:0:0: [sda] Mode Sense: 43 00 10 08 [402135.456483] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [402135.456854] sd 0:0:0:0: [sda] Optimal transfer size 8388608 bytes [402135.458693] sd 0:0:0:0: alua: transition timeout set to 60 seconds [402135.458699] sd 0:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA [402135.465023] sd 0:0:0:0: [sda] Attached SCSI disk [core@worker-001 ~]$ cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: LIO-ORG Model: file1 Rev: 4.0 Type: Direct-Access ANSI SCSI revision: 05 dmesg worker-002: ----------------- [402259.434275] scsi host0: iSCSI Initiator over TCP/IP [402259.466467] scsi 0:0:0:0: Direct-Access LIO-ORG file2 4.0 PQ: 0 ANSI: 5 [402259.506941] scsi 0:0:0:0: alua: supports implicit and explicit TPGS [402259.506950] scsi 0:0:0:0: alua: device naa.600140513126b434bca4a6c8feb1e49f port group 0 rel port 1 [402259.526232] scsi 0:0:0:0: alua: transition timeout set to 60 seconds [402259.526238] scsi 0:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA [402259.734646] scsi 0:0:0:0: Attached scsi generic sg0 type 0 [402259.894249] sd 0:0:0:0: [sda] 409600 512-byte logical blocks: (210 MB/200 MiB) [402259.904256] sd 0:0:0:0: [sda] Write Protect is off [402259.904262] sd 0:0:0:0: [sda] Mode Sense: 43 00 10 08 [402259.914789] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [402259.918980] sd 0:0:0:0: [sda] Optimal transfer size 8388608 bytes [402260.027200] sd 0:0:0:0: [sda] Attached SCSI disk [core@worker-002 ~]$ cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: LIO-ORG Model: file2 Rev: 4.0 Type: Direct-Access ANSI SCSI revision: 05 (2) Create file on disks DONE (3) Remove /etc/iscsi/initiatorname.iscsi and detach disks and stop iscsid [core@worker-001 ~]$ sudo iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b -p 10.107.1.51:3260 -u Logging out of session [sid: 2, target: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b, portal: 10.107.1.51,3260] Logout of [sid: 2, target: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b, portal: 10.107.1.51,3260] successful. [core@worker-002 ~]$ sudo iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b -p 10.107.1.51:3260 -u Logging out of session [sid: 1, target: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b, portal: 10.107.1.51,3260] Logout of [sid: 1, target: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b, portal: 10.107.1.51,3260] successful. [core@worker-001 ~]$ sudo rm -f /etc/iscsi/initiatorname.iscsi [core@worker-001 ~]$ sudo systemctl stop iscsid Warning: Stopping iscsid.service, but it can still be activated by: iscsid.socket [core@worker-002 ~]$ sudo rm -f /etc/iscsi/initiatorname.iscsi [core@worker-002 ~]$ sudo systemctl stop iscsid Warning: Stopping iscsid.service, but it can still be activated by: iscsid.socket => Everything works as expected ------------------------------------------------------------------------ OCP: (4) Create PVs [root@m3558001 iSCSI_Test]# cat iSCSI_PV_001.yaml apiVersion: v1 kind: PersistentVolume metadata: name: "iscsi-pv001" namespace: "iscsi-test" spec: capacity: storage: 100M accessModes: - ReadWriteOnce iscsi: targetPortal: 10.107.1.51:3260 iqn: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b lun: 0 initiatorName: iqn.2020-10.com.m3558001.foo:991 fsType: 'xfs' [root@m3558001 iSCSI_Test]# oc create -f iSCSI_PV_001.yaml persistentvolume/iscsi-pv001 created [root@m3558001 iSCSI_Test]# cat iSCSI_PV_002.yaml apiVersion: v1 kind: PersistentVolume metadata: name: "iscsi-pv002" namespace: "iscsi-test" spec: capacity: storage: 100M accessModes: - ReadWriteOnce iscsi: targetPortal: 10.107.1.51:3260 iqn: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b lun: 0 initiatorName: iqn.2020-10.com.m3558001.foo:992 [root@m3558001 iSCSI_Test]# oc create -f iSCSI_PV_002.yaml persistentvolume/iscsi-pv002 created (5) Create PVCs [root@m3558001 iSCSI_Test]# cat iSCSI_PVC_001.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: iscsi-claim001 spec: accessModes: - ReadWriteOnce volumeMode: Filesystem resources: requests: storage: 100M [root@m3558001 iSCSI_Test]# oc create -f iSCSI_PVC_001.yaml persistentvolumeclaim/iscsi-claim001 created [root@m3558001 iSCSI_Test]# cat iSCSI_PVC_002.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: iscsi-claim002 spec: accessModes: - ReadWriteOnce volumeMode: Filesystem resources: requests: storage: 100M [root@m3558001 iSCSI_Test]# oc create -f iSCSI_PVC_002.yaml persistentvolumeclaim/iscsi-claim002 created [root@m3558001 iSCSI_Test]# oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE iscsi-pv001 100M RWO Retain Bound default/iscsi-claim001 7m59s iscsi-pv002 100M RWO Retain Bound default/iscsi-claim002 6m12s pvfcpwwid 10Gi RWO Retain Bound default/pvc-fcp1 41h pvfcpwwnshort 10Gi RWO Retain Bound default/pvc-fcp2 41h [root@m3558001 iSCSI_Test]# oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE iscsi-claim001 Bound iscsi-pv001 100M RWO 2m iscsi-claim002 Bound iscsi-pv002 100M RWO 17s pvc-fcp1 Bound pvfcpwwid 10Gi RWO 41h pvc-fcp2 Bound pvfcpwwnshort 10Gi RWO 41h (6) Deploy 2 Pods on 2 workers [root@m3558001 iSCSI_Test]# cat iSCSI_POD_001.yaml apiVersion: v1 kind: Pod metadata: name: so-iscsi001 namespace: default spec: securityContext: runAsUser: 0 nodeSelector: kubernetes.io/hostname: worker-001.m3558ocp.lnxne.boe containers: - name: soiscsitest001 image: sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0 volumeMounts: - name: iscsiso mountPath: "/iscsidata" volumes: - name: iscsiso persistentVolumeClaim: claimName: iscsi-claim001 [root@m3558001 iSCSI_Test]# oc create -f iSCSI_POD_001.yaml pod/so-iscsi001 created Inside POD (file from Step (2) ------------------------------ sh-4.2# cd iscsidata/ sh-4.2# ls this_is_disk1.txt sh-4.2# --------------------- dmesg on worker-001: -------------------- [405602.132448] scsi host0: iSCSI Initiator over TCP/IP [405602.175159] scsi 0:0:0:0: Direct-Access LIO-ORG file1 4.0 PQ: 0 ANSI: 5 [405602.204005] scsi 0:0:0:0: alua: supports implicit and explicit TPGS [405602.204013] scsi 0:0:0:0: alua: device naa.6001405238d513b66614f9db344636d0 port group 0 rel port 1 [405602.211219] sd 0:0:0:0: Attached scsi generic sg0 type 0 [405602.217299] sd 0:0:0:0: [sda] 409600 512-byte logical blocks: (210 MB/200 MiB) [405602.218720] sd 0:0:0:0: [sda] Write Protect is off [405602.218724] sd 0:0:0:0: [sda] Mode Sense: 43 00 10 08 [405602.219816] sd 0:0:0:0: alua: transition timeout set to 60 seconds [405602.219821] sd 0:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA [405602.220898] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [405602.228456] sd 0:0:0:0: [sda] Optimal transfer size 8388608 bytes [405602.261973] sd 0:0:0:0: [sda] Attached SCSI disk [405603.640621] XFS (sda): Mounting V5 Filesystem [405603.679427] XFS (sda): Ending clean mount [root@m3558001 iSCSI_Test]# cat iSCSI_POD_002.yaml apiVersion: v1 kind: Pod metadata: name: so-iscsi002 namespace: default spec: securityContext: runAsUser: 0 nodeSelector: kubernetes.io/hostname: worker-002.m3558ocp.lnxne.boe containers: - name: soiscsitest002 image: sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0 volumeMounts: - name: iscsiso mountPath: "/iscsidata" volumes: - name: iscsiso persistentVolumeClaim: claimName: iscsi-claim002 [root@m3558001 iSCSI_Test]# oc create -f iSCSI_POD_002.yaml pod/so-iscsi002 created ERROR: ------ Pso-iscsi002 NamespaceNSdefault a minute ago Generated from Successfully assigned default/so-iscsi002 to worker-002.m3558ocp.lnxne.boe PodPso-iscsi002 NamespaceNSdefault less than a minute ago Generated from attachdetach-controller Multi-Attach error for volume "iscsi-pv002" Volume is already used by pod(s) so-iscsi001 As written above, OpenShift / Kubernetes uses IQN:lun as unique ID of a volume. How could Kubernetes say that the storage backend provides two different volumes for the same iqn:lun (and thus can safely mount it on multiple nodes) or it's the same volume (and thus can't ever allow to mount it twice)? Is it something s390x specific? We haven't heard about such issues with any other storage backend. If your storage backend provides different volumes for the same IQN:lun and different initiator names, those cannot be used with Kubernetes iSCSI in-tree volume plugin. Can you use CSI instead? I re-configured my iSCSI Server with a new target (iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.9aaf9fc90880) and one initiatorname and different luns:
/> ls
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 0]
| o- fileio ................................................................................................. [Storage Objects: 7]
| | o- file1 .................................................................... [/tmp/disk1.img (200.0MiB) write-back activated]
| | | o- alua ................................................................................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- file2 .................................................................... [/tmp/disk2.img (200.0MiB) write-back activated]
| | | o- alua ................................................................................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- file3 .................................................................... [/tmp/disk3.img (200.0MiB) write-back activated]
| | | o- alua ................................................................................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- file4 .................................................................... [/tmp/disk4.img (200.0MiB) write-back activated]
| | | o- alua ................................................................................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- file10 .................................................................. [/tmp/disk10.img (200.0MiB) write-back activated]
| | | o- alua ................................................................................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- file11 .................................................................. [/tmp/disk11.img (200.0MiB) write-back activated]
| | | o- alua ................................................................................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- file12 .................................................................. [/tmp/disk12.img (200.0MiB) write-back activated]
| | o- alua ................................................................................................... [ALUA Groups: 1]
| | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| o- pscsi .................................................................................................. [Storage Objects: 0]
| o- ramdisk ................................................................................................ [Storage Objects: 0]
o- iscsi ............................................................................................................ [Targets: 2]
| o- iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.491daf9ec05b ........................................................ [TPGs: 1]
| | o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
| | o- acls .......................................................................................................... [ACLs: 4]
| | | o- iqn.2020-10.com.m3558001.foo:991 ..................................................................... [Mapped LUNs: 1]
| | | | o- mapped_lun0 ................................................................................ [lun1 fileio/file1 (rw)]
| | | o- iqn.2020-10.com.m3558001.foo:992 ..................................................................... [Mapped LUNs: 1]
| | | | o- mapped_lun0 ................................................................................ [lun2 fileio/file2 (rw)]
| | | o- iqn.2020-10.com.m3558001.foo:993 ..................................................................... [Mapped LUNs: 2]
| | | | o- mapped_lun0 ................................................................................ [lun3 fileio/file3 (rw)]
| | | | o- mapped_lun4 ................................................................................ [lun4 fileio/file4 (rw)]
| | | o- iqn.2020-10.com.m3558001.foo:994 ..................................................................... [Mapped LUNs: 0]
| | o- luns .......................................................................................................... [LUNs: 4]
| | | o- lun1 ............................................................... [fileio/file1 (/tmp/disk1.img) (default_tg_pt_gp)]
| | | o- lun2 ............................................................... [fileio/file2 (/tmp/disk2.img) (default_tg_pt_gp)]
| | | o- lun3 ............................................................... [fileio/file3 (/tmp/disk3.img) (default_tg_pt_gp)]
| | | o- lun4 ............................................................... [fileio/file4 (/tmp/disk4.img) (default_tg_pt_gp)]
| | o- portals .................................................................................................... [Portals: 1]
| | o- 10.107.1.51:3260 ................................................................................................. [OK]
| o- iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.9aaf9fc90880 ........................................................ [TPGs: 1]
| o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
| o- acls .......................................................................................................... [ACLs: 1]
| | o- iqn.2020-10.com.m3558001.foo:111 ..................................................................... [Mapped LUNs: 3]
| | o- mapped_lun0 ............................................................................... [lun0 fileio/file10 (rw)]
| | o- mapped_lun1 ............................................................................... [lun1 fileio/file11 (rw)]
| | o- mapped_lun2 ............................................................................... [lun2 fileio/file12 (rw)]
| o- luns .......................................................................................................... [LUNs: 3]
| | o- lun0 ............................................................. [fileio/file10 (/tmp/disk10.img) (default_tg_pt_gp)]
| | o- lun1 ............................................................. [fileio/file11 (/tmp/disk11.img) (default_tg_pt_gp)]
| | o- lun2 ............................................................. [fileio/file12 (/tmp/disk12.img) (default_tg_pt_gp)]
| o- portals .................................................................................................... [Portals: 1]
| o- 10.107.1.51:3260 ................................................................................................. [OK]
o- loopback ......................................................................................................... [Targets: 0]
o- qla2xxx .......................................................................................................... [Targets: 0]
/>
With this configuration, I am able to deploy 3 Pods with mapped_lun0, mapped_lun1 and mapped_lun2.
iscsi-pv1110 100M RWO Retain Bound default/iscsi-claim001 39m
iscsi-pv1111 100M RWO Retain Bound default/iscsi-claim002 39m
iscsi-pv1112 100M RWO Retain Bound default/iscsi-claim003 39m
pvfcpwwid 10Gi RWO Retain Bound default/pvc-fcp1 8d
pvfcpwwnshort 10Gi RWO Retain Bound default/pvc-fcp2 8d
[root@m3558001 iSCSI_Test]# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
iscsi-claim001 Bound iscsi-pv1110 100M RWO 39m
iscsi-claim002 Bound iscsi-pv1111 100M RWO 39m
iscsi-claim003 Bound iscsi-pv1112 100M RWO 39m
pvc-fcp1 Bound pvfcpwwid 10Gi RWO 8d
pvc-fcp2 Bound pvfcpwwnshort 10Gi RWO 8d
[root@m3558001 iSCSI_Test]# oc describe pv iscsi-pv1110
Name: iscsi-pv1110
Labels: <none>
Annotations: pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pv-protection]
StorageClass:
Status: Bound
Claim: default/iscsi-claim001
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 100M
Node Affinity: <none>
Message:
Source:
Type: ISCSI (an ISCSI Disk resource that is attached to a kubelet's host machine and then exposed to the pod)
TargetPortal: 10.107.1.51:3260
IQN: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.9aaf9fc90880
Lun: 0
ISCSIInterface default
FSType: xfs
ReadOnly: false
Portals: []
DiscoveryCHAPAuth: false
SessionCHAPAuth: false
SecretRef: nil
InitiatorName: iqn.2020-10.com.m3558001.foo:111
Events: <none>
PV configuration with initiatorname (and different luns (0-2)):
---------------------------------------------------------------
apiVersion: v1
kind: PersistentVolume
metadata:
name: "iscsi-pv1110"
namespace: "iscsi-test"
spec:
capacity:
storage: 100M
accessModes:
- ReadWriteOnce
iscsi:
targetPortal: 10.107.1.51:3260
iqn: iqn.2003-01.org.linux-iscsi.m3558001.s390x:sn.9aaf9fc90880
lun: 0
initiatorName: iqn.2020-10.com.m3558001.foo:111
fsType: 'xfs'
I guess, the problem is not s390x specific, but an issue of the iSCSI server configuration. If you have a configuration with ACLs to have exclusive access to a LUN like it is described here (7.1.11):
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_storage_devices/getting-started-with-iscsi_managing-storage-devices
with one initiatorname for each LUN in one target, the handling of iSCSI devices will fail. See my configuration example above. Linux on OS level is able to handle this.
How can we handle different configurations of iSCSI on customer side? Is it possible to update the documentation with some hints, which configuration of iSCSI devices is supported?
Checked the iscsi in-tree plugin code, looks like it's difficult to tell the two PVs with the same tpg+lun are mapped to the same device or not. But I don't know if we can add some limitation in the doc, such as: 1. The hosts in a cluster should use the same initiator name, so when the pod which consumes the iscsi volume can be run in any host, and the issue in this bug will be avoided. 2. We should recommend using the initiatorName in PV, so when the LSO and iscsi volume are used at the same time, the iscsi volumes not consumed by PVs but attached to the host will not be discovered by LSO. Summing up the case for docs: Kubernetes uses IQN:LUN of an iSCSI PV as an unique ID of the volume. We cannot add initiator name there, because we don't know if the initator name matters (i.e. a different intiator would get different volume or not). We need to update docs somewhere around "iSCSI Custom Initiator IQN" to clarify that only IQN + LUN can be used to identify a volume on a iSCSI target. OpenShift / Kubernetes expects that initiatorName is used only for authentication, using the same IQN + LUN and different initiatorNames should still select the same volume. https://docs.openshift.com/container-platform/4.6/storage/persistent_storage/persistent-storage-iscsi.html#iscsi-custom-iqn_persistent-storage-iscsi |
Description of problem: Deploying more then one pod with different iSCSI devices on different nodes failed. I have 3 worker nodes with 3 iSCSI (different InitiatorNames) configured (/etc/iscsi/initiatorname.iscsi): worker-001: InitiatorName=iqn.2020-06.com.ibm:storage.v7k06.xxx worker-002: InitiatorName=iqn.2020-06.com.ibm:storage.v7k06.yyy worker-003: InitiatorName=iqn.2020-06.com.ibm:storage.v7k06.zzz I am able to access, attach and mount the devices on CoreOS level with iscsiadm -m node -l [Thu Oct 15 12:00:03 2020] scsi host1: iSCSI Initiator over TCP/IP [Thu Oct 15 12:00:03 2020] scsi 1:0:0:0: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 [Thu Oct 15 12:00:03 2020] scsi 1:0:0:0: alua: supports implicit TPGS [Thu Oct 15 12:00:03 2020] scsi 1:0:0:0: alua: device naa.600507640081818ab000000000000xxx port group 1 rel port 180 [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: Attached scsi generic sg1 type 0 [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: Power-on or device reset occurred [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB) [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: [sdb] Write Protect is off [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: [sdb] Mode Sense: 97 00 10 08 [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: alua: transition timeout set to 60 seconds [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: alua: port group 01 state N non-preferred supports tolusna [Thu Oct 15 12:00:03 2020] sd 1:0:0:0: [sdb] Attached SCSI disk [Thu Oct 15 12:10:42 2020] XFS (sdb): Mounting V5 Filesystem [Thu Oct 15 12:10:42 2020] XFS (sdb): Ending clean mount I am also able to detach the device, change the InitiatorName, restart iscsid and access a different device (for example: iqn.2020-06.com.ibm:storage.v7k06.aaa). All this is working fine on each of the 3 nodes in parallel on CoreOS level. ######################################################################## Issue 1: configured initiator name in /etc/iscsi/initiatorname.iscsi -------------------------------------------------------------------- iscsid stopped on all nodes, so that no old configuration is active. Having the same configuration as above with 3 different InitiatorNames on 3 nodes configured in: /etc/iscsi/initiatorname.iscsi I am able to deploy one pod, with node selector: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned default/so-test001 to worker-001.ocp-m3559001.lnxne.boe Normal SuccessfulAttachVolume 10m attachdetach-controller AttachVolume.Attach succeeded for volume "iscsi-pv001" Normal AddedInterface 10m multus Add eth0 [10.128.2.17/23] Normal Pulling 10m kubelet Pulling image "sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0" Normal Pulled 9m49s kubelet Successfully pulled image "sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0" in 41.507380881s Normal Created 9m48s kubelet Created container soiscsitest001 Normal Started 9m48s kubelet Started container soiscsitest001 The pod uses the configured iSCSI volume on worker-001. If I deploy a second pod, on another node, I receive the following error: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 7m11s default-scheduler Successfully assigned default/so-test002 to worker-002.ocp-m3559001.lnxne.boe Warning FailedAttachVolume 7m11s attachdetach-controller Multi-Attach error for volume "iscsi-pv002" Volume is already used by pod(s) so-test001 Warning FailedMount 2m51s (x2 over 5m8s) kubelet Unable to attach or mount volumes: unmounted volumes=[iscsiso], unattached volumes=[iscsiso default-token-hff5z]: timed out waiting for the condition Warning FailedMount 33s kubelet Unable to attach or mount volumes: unmounted volumes=[iscsiso], unattached volumes=[default-token-hff5z iscsiso]: timed out waiting for the condition PV and PVC: ----------- # oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE iscsi-pv001 10Gi RWO Retain Bound default/iscsi-claim001 15h iscsi-pv002 10Gi RWO Retain Bound default/iscsi-claim002 14h # oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE iscsi-claim001 Bound iscsi-pv001 10Gi RWO 15h iscsi-claim002 Bound iscsi-pv002 10Gi RWO 14h Configuration Files: -------------------- # cat iSCSI_PV_001.yaml apiVersion: v1 kind: PersistentVolume metadata: name: "iscsi-pv001" namespace: "iscsi-test" spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce iscsi: targetPortal: 10.209.9.1:3260 iqn: iqn.1986-03.com.ibm:2145.v7k06.node1 fsType: 'xfs' # cat iSCSI_PV_002.yaml apiVersion: v1 kind: PersistentVolume metadata: name: "iscsi-pv002" namespace: "iscsi-test" spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce iscsi: targetPortal: 10.209.9.1:3260 iqn: iqn.1986-03.com.ibm:2145.v7k06.node1 fsType: 'xfs' # cat iSCSI_PVC_001.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: iscsi-claim001 spec: accessModes: - ReadWriteOnce volumeMode: Filesystem resources: requests: storage: 10Gi # cat iSCSI_PVC_002.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: iscsi-claim002 spec: accessModes: - ReadWriteOnce volumeMode: Filesystem resources: requests: storage: 10Gi # cat iSCSI_POD_001.yaml apiVersion: v1 kind: Pod metadata: name: so-test001 namespace: default spec: securityContext: runAsUser: 0 nodeSelector: kubernetes.io/hostname: worker-001.ocp-m3559001.lnxne.boe containers: - name: soiscsitest001 image: sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0 volumeMounts: - name: iscsiso mountPath: "/iscsidata" volumes: - name: iscsiso persistentVolumeClaim: claimName: iscsi-claim001 # cat iSCSI_POD_002.yaml apiVersion: v1 kind: Pod metadata: name: so-test002 namespace: default spec: securityContext: runAsUser: 0 nodeSelector: kubernetes.io/hostname: worker-002.ocp-m3559001.lnxne.boe containers: - name: soiscsitest002 image: sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0 volumeMounts: - name: iscsiso mountPath: "/iscsidata" volumes: - name: iscsiso persistentVolumeClaim: claimName: iscsi-claim002 ######################################################################## Issue 2: configured initiator name in PV configuration ------------------------------------------------------ - removed /etc/iscsi/initiatorname.iscsi on each node - deploy PVs with different IGN - deploy PVCs - deploy pods with nodeSelector and different PVCs on different nodes PV and PVC: ----------- # oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE iscsi-pv0000 10Gi RWO Retain Bound default/iscsi-claim001 111s iscsi-pv0001 10Gi RWO Retain Bound default/iscsi-claim002 104s oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE iscsi-claim001 Bound iscsi-pv0000 10Gi RWO 16s iscsi-claim002 Bound iscsi-pv0001 10Gi RWO 13s Deploying first pod on worker-001 works fine: --------------------------------------------- [763600.380807] scsi host1: iSCSI Initiator over TCP/IP [763600.394516] scsi 1:0:0:0: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 [763600.395337] scsi 1:0:0:0: alua: supports implicit TPGS [763600.395345] scsi 1:0:0:0: alua: device naa.600507640081818ab000000000000xxx port group 1 rel port 180 [763600.395609] sd 1:0:0:0: Attached scsi generic sg1 type 0 [763600.395739] sd 1:0:0:0: Power-on or device reset occurred [763600.408576] sd 1:0:0:0: alua: transition timeout set to 60 seconds [763600.408585] sd 1:0:0:0: alua: port group 01 state N non-preferred supports tolusna [763600.410046] sd 1:0:0:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB) [763600.410217] sd 1:0:0:0: [sdb] Write Protect is off [763600.410222] sd 1:0:0:0: [sdb] Mode Sense: 97 00 10 08 [763600.411317] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA [763600.424648] sd 1:0:0:0: [sdb] Attached SCSI disk [763601.650763] XFS (sdb): Mounting V5 Filesystem [763601.842052] XFS (sdb): Ending clean mount Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5m2s default-scheduler Successfully assigned default/so-test001 to worker-001.ocp-m3559001.lnxne.boe Normal SuccessfulAttachVolume 5m2s attachdetach-controller AttachVolume.Attach succeeded for volume "iscsi-pv0000" Normal AddedInterface 4m54s multus Add eth0 [10.128.2.18/23] Normal Pulled 4m53s kubelet Container image "sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0" already present on machine Normal Created 4m52s kubelet Created container soiscsitest001 Normal Started 4m52s kubelet Started container soiscsitest001 Deploying second pod on worker-002 failed: ------------------------------------------ Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m58s default-scheduler Successfully assigned default/so-test002 to worker-002.ocp-m3559001.lnxne.boe Warning FailedAttachVolume 3m59s attachdetach-controller Multi-Attach error for volume "iscsi-pv0001" Volume is already used by pod(s) so-test001 Warning FailedMount 116s kubelet Unable to attach or mount volumes: unmounted volumes=[iscsiso], unattached volumes=[iscsiso default-token-hff5z]: timed out waiting for the condition Configuration Files: -------------------- # cat iSCSI_PV_IGN_xxx.yaml apiVersion: v1 kind: PersistentVolume metadata: name: "iscsi-pv0000" namespace: "iscsi-test" spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce iscsi: targetPortal: 10.209.9.1:3260 iqn: iqn.1986-03.com.ibm:2145.v7k06.node1 lun: 0 initiatorName: iqn.2020-06.com.ibm:storage.v7k06.xxx fsType: 'xfs' # cat iSCSI_PV_IGN_yyy.yaml apiVersion: v1 kind: PersistentVolume metadata: name: "iscsi-pv0001" namespace: "iscsi-test" spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce iscsi: targetPortal: 10.209.9.1:3260 iqn: iqn.1986-03.com.ibm:2145.v7k06.node1 lun: 0 initiatorName: iqn.2020-06.com.ibm:storage.v7k06.yyy fsType: 'xfs' # cat iSCSI_PVC_001.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: iscsi-claim001 spec: accessModes: - ReadWriteOnce volumeMode: Filesystem resources: requests: storage: 10Gi # cat iSCSI_PVC_002.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: iscsi-claim002 spec: accessModes: - ReadWriteOnce volumeMode: Filesystem resources: requests: storage: 10Gi # cat iSCSI_POD_001.yaml apiVersion: v1 kind: Pod metadata: name: so-test001 namespace: default spec: securityContext: runAsUser: 0 nodeSelector: kubernetes.io/hostname: worker-001.ocp-m3559001.lnxne.boe containers: - name: soiscsitest001 image: sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0 volumeMounts: - name: iscsiso mountPath: "/iscsidata" volumes: - name: iscsiso persistentVolumeClaim: claimName: iscsi-claim001 # cat iSCSI_POD_002.yaml apiVersion: v1 kind: Pod metadata: name: so-test002 namespace: default spec: securityContext: runAsUser: 0 nodeSelector: kubernetes.io/hostname: worker-002.ocp-m3559001.lnxne.boe containers: - name: soiscsitest002 image: sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0 volumeMounts: - name: iscsiso mountPath: "/iscsidata" volumes: - name: iscsiso persistentVolumeClaim: claimName: iscsi-claim002 Version-Release number of selected component (if applicable): Client Version: 4.6.0-0.nightly-s390x-2020-10-06-145952 Server Version: 4.6.0-0.nightly-s390x-2020-10-06-145952 Kubernetes Version: v1.19.0+db1fc96 How reproducible: Deploy two pods with above configurations. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: I am able to deploy pods on different nodes with different iSCSI devices. Additional info: