Bug 1387953 - [ceph-iscsi-ansible] removal on a LUN from a client may result in data loss/corruption
Summary: [ceph-iscsi-ansible] removal on a LUN from a client may result in data loss/c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: ceph-ansible
Version: 2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2
Assignee: Paul Cuzner
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-24 02:23 UTC by Paul Cuzner
Modified: 2016-11-22 23:41 UTC (History)
10 users (show)

Fixed In Version: ceph-iscsi-ansible-1.4-1.el7scon ceph-iscsi-config-1.4-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-22 23:41:53 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2817 0 normal SHIPPED_LIVE ceph-iscsi-ansible and ceph-ansible bug fix update 2017-04-18 19:50:43 UTC

Description Paul Cuzner 2016-10-24 02:23:55 UTC
Description of problem:
LUN Id's are not currently stored in the iscsi gateway configuration object - they are dynamically generated when the client is defined on the gateway. However, there is a potential problem if the admin removes a LUN from a client definition, and then one of the gateways is rebooted. In this scenario, the gateway will rebuild it's configuration, adjusting LUN ids to match the active sett of mapped luns for the client - this remapping may mean that the client sees device X as LUN 3 on gateway A, but LUN 2 on gateway 2. This mismatch of lun id's will trigger issues in the client multipathing layer on RHEL/Windows and ESX.

Version-Release number of selected component (if applicable):
ceph-iscsi-ansible 1.3

How reproducible:
Always

Steps to Reproduce:
1.Create a multi-gateway environment with a client using 3 rbd images
2. remove one of the images
3. reboot one of the gateways

Actual results:
LUN IDs mismatch between gateways for the same physical device

Expected results:
LUN IDs should be maintained across LUN mapping changes

Additional info:

Comment 2 Paul Cuzner 2016-10-26 02:03:11 UTC
patch available currently being tested

Comment 8 Hemanth Kumar 2016-11-09 19:29:04 UTC
I am still seeing the issue.. Moving this back to assigned state..

Steps I followed to verify 
----------------------------
Deleted 2 images - LUN58(ansible58) and LUN59(ansible59) and rebooted the gw node and once the system is up executed "targetcli ls" and the LUN ID is not consistent across the gw nodes..

=========================================
"targetcli ls" output after deleting 2 images 

--------------------------
 |     | o- lun56 .............................................................. [block/rbd.ansible39 (/dev/mapper/0-30fe12200854)]
  |     | o- lun57 ................................................................ [block/rbd.ansible8 (/dev/mapper/0-28cf216231b)]
  |     | o- lun60 ............................................................... [block/rbd.ansible10 (/dev/mapper/0-7d93216231b)]
  |     | o- lun61 ................................................................ [block/rbd.myimage1 (/dev/mapper/0-895d216231b)]
  |     | o- lun62 ............................................................... [block/rbd.myimage2 (/dev/mapper/0-897112200854)]
  |     | o- lun63 ............................................................... [block/rbd.myimage3 (/dev/mapper/0-898e12200854)]


=====================================
after rebooting the gw node
------------------------------

 |     | o- lun54 .............................................................. [block/rbd.ansible25 (/dev/mapper/0-2ae112200854)]
  |     | o- lun55 .............................................................. [block/rbd.ansible24 (/dev/mapper/0-2ac112200854)]
  |     | o- lun56 ............................................................... [block/rbd.ansible27 (/dev/mapper/0-2b19216231b)]
  |     | o- lun57 ............................................................... [block/rbd.ansible26 (/dev/mapper/0-2b01216231b)]
  |     | o- lun58 ............................................................... [block/rbd.myimage20 (/dev/mapper/0-8b7c216231b)]
  |     | o- lun59 ............................................................... [block/rbd.myimage21 (/dev/mapper/0-8b99216231b)]
  |     | o- lun60 .............................................................. [block/rbd.ansible38 (/dev/mapper/0-30e412200854)]
  |     | o- lun61 .............................................................. [block/rbd.ansible39 (/dev/mapper/0-30fe12200854)]
  |     | o- lun62 ............................................................... [block/rbd.ansible32 (/dev/mapper/0-3031216231b)]
  |     | o- lun63 ............................................................... [block/rbd.ansible33 (/dev/mapper/0-304e216231b)]
  |     | o- lun64 ............................................................... [block/rbd.ansible30 (/dev/mapper/0-2b74216231b)]
  |     | o- lun65 ............................................................... [block/rbd.ansible31 (/dev/mapper/0-301d216231b)]


===========================================================================

The same output on other gw node shows a different LUN ID mapped to different image.. Do I have to reboot the other gw node also after rebooting one of the gw node ??

------------------------------

 |     | o- lun55 .............................................................. [block/rbd.ansible38 (/dev/mapper/0-30e412200854)]
  |     | o- lun56 .............................................................. [block/rbd.ansible39 (/dev/mapper/0-30fe12200854)]
  |     | o- lun57 ................................................................ [block/rbd.ansible8 (/dev/mapper/0-28cf216231b)]
  |     | o- lun60 ............................................................... [block/rbd.ansible10 (/dev/mapper/0-7d93216231b)]
  |     | o- lun61 ................................................................ [block/rbd.myimage1 (/dev/mapper/0-895d216231b)]
  |     | o- lun62 ............................................................... [block/rbd.myimage2 (/dev/mapper/0-897112200854)]
  |     | o- lun63 ............................................................... [block/rbd.myimage3 (/dev/mapper/0-898e12200854)]
  |     | o- lun64 ................................................................ [block/rbd.myimage4 (/dev/mapper/0-89ab216231b)]
  |     | o- lun65 ................................................................ [block/rbd.myimage5 (/dev/mapper/0-89c8216231b)]
  |     | o- lun66 ............................................................... [block/rbd.myimage6 (/dev/mapper/0-89e512200854)]
  |     | o- lun67 ................................................................ [block/rbd.myimage7 (/dev/mapper/0-8a02216231b)]
  |     | o- lun68 ............................................................... [block/rbd.myimage8 (/dev/mapper/0-8a1f12200854)]

Comment 9 Mike Christie 2016-11-09 22:57:30 UTC
It's the mapped_lun$ID under the ACL that you need to compare across nodes.

The outpout is cut off, but it looks like you are looking at the iscsi tpg's LUN value which is internal to the local target. It is not exported to an initiator.

You can also verify on the initiator by doing

sg_inq /dev/sdX

through each path to a device and verifying that the same serial/uuid is returned.

Comment 10 Hemanth Kumar 2016-11-10 05:58:47 UTC
Thanks Mike for letting me know the LUN section to verify.

Verified the mapped LUN ID across GW Nodes after reboot.. the IDs are consistently maintained

Comment 12 errata-xmlrpc 2016-11-22 23:41:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2817


Note You need to log in before you can comment on or make changes to this bug.