Bug 1977792
| Summary: | iscsi volume connection gets stuck if multipath is enabled and "iscsi -m session" fails | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Pablo Caruana <pcaruana> |
| Component: | python-os-brick | Assignee: | Pablo Caruana <pcaruana> |
| Status: | CLOSED ERRATA | QA Contact: | Tzach Shefi <tshefi> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.1 (Train) | CC: | apevec, devops, geguileo, gregraka, jschluet, kthakre, lhh, ltoscano, ndeevy, nyewale, pcaruana, pkundal, tkajinam, tshefi |
| Target Milestone: | z7 | Keywords: | Triaged |
| Target Release: | 16.1 (Train on RHEL 8.2) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | python-os-brick-2.10.5-1.20210706143310.634fb4a.el8ost | Doc Type: | Bug Fix |
| Doc Text: |
Before this update, there were unhandled exceptions during connection to iSCSI portals. For example, failures in `iscsiadm -m session`.
This occurred because the `_connect_vol` threads can abort unexpectedly in some failure patterns, and this abort causes a hang in subsequent steps while waiting for results from `_connct_vol` threads.
+
With this update, any exceptions during connection to iSCSI portals are handled in the `_connect_vol` method correctly and avoids any unexpected abort without updating thread results.
|
Story Points: | --- |
| Clone Of: | 1923975 | Environment: | |
| Last Closed: | 2021-12-09 20:20:10 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1923975 | ||
| Bug Blocks: | |||
|
Description
Pablo Caruana
2021-06-30 13:42:25 UTC
*** Bug 1977796 has been marked as a duplicate of this bug. *** *** Bug 1942487 has been marked as a duplicate of this bug. *** Verified on: python3-os-brick-2.10.5-1.20210706143310.634fb4a.el8ost.noarch On a multipath deployment using netapp iSCSI as Cinder backend, I ran the below scripts in parallel terminals: [root@controller-2 ~]# cat iscsiwatch.sh # cat check_session while true; do date ; iscsiadm -m session echo "===================================================================================" done [root@controller-2 ~]# cat lin.sh # cat login loop while true; do date ; iscsiadm -m node -T iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 -p 10.XXXXXXXX:3260 --login (xxx ->internal IP) echo "===================================================================================" done [root@controller-2 ~]# cat lout.sh # cat logout loop while true; do date ; iscsiadm -m node -T iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 -p 10.XXXXXXXXX:3260 --logout echo "===================================================================================" done With all three loops running simultaneously on the controller hosting c-vol, I successfully created 4 volumes from a rhel image: (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+----------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+----------+------+-------------+----------+-------------+ | 69f5db2e-32ac-47ca-a721-290afe36c3a8 | available | rhelvol3 | 10 | tripleo | true | | | 74125dde-db5e-40ef-bb5f-edc359c01e51 | available | rhelvol1 | 10 | tripleo | true | | | 9816afd4-3182-4994-8395-c0b9dde2e5d7 | available | rhelvol2 | 10 | tripleo | true | | | ea8902f4-34cd-4ea5-8d96-5754ad0d6ccd | available | rhelvol4 | 10 | tripleo | true | | +--------------------------------------+-----------+----------+------+-------------+----------+-------------+ Now lets test volume attachment, I'll run the same 3 loops on compute node hosting my instance Before the loops are running: (overcloud) [stack@undercloud-0 ~]$ nova volume-attach inst1 74125dde-db5e-40ef-bb5f-edc359c01e51 +-----------------------+--------------------------------------+ | Property | Value | +-----------------------+--------------------------------------+ | delete_on_termination | False | | device | /dev/vdb | | id | 74125dde-db5e-40ef-bb5f-edc359c01e51 | | serverId | 5a6b5a47-1d33-4324-a486-96a9d02f7f42 | | tag | - | | volumeId | 74125dde-db5e-40ef-bb5f-edc359c01e51 | +-----------------------+--------------------------------------+ We see the positve flow, 4 sessions/paths: [root@compute-0 ~]# iscsiadm -m session tcp: [6] 10.xxxxxxxx:3260,1039 iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 (non-flash) tcp: [7] 10.xxxxxxxx:3260,1045 iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 (non-flash) tcp: [8] 10.xxxxxxxx:3260,1046 iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 (non-flash) tcp: [9] 10.xxxxxxxx:3260,1047 iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 (non-flash) Now lets detach and reattach with loops running: Attachment works fine, we notice only 3 sessions, 4th session is missing due to constant logout/login attempts. | 74125dde-db5e-40ef-bb5f-edc359c01e51 | in-use | rhelvol1 | 10 | tripleo | true | 5a6b5a47-1d33-4324-a486-96a9d02f7f42 | tcp: [11] 10.xxxxxxxx:3260,1045 iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 (non-flash) tcp: [13] 10.xxxxxxxx:3260,1046 iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 (non-flash) tcp: [14] 10.xxxxxxxx:3260,1047 iqn.1992-08.com.netapp:sn.83806661cc2f11eba182d039ea28c8f6:vs.19 (non-flash) Good to verify. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762 |