Bug 1391510

Summary: [RBD:ISCSI]:-Few existing LUNs are corrupted after doing addition and removal of new LUNs through ceph-iscsi-ansible
Product: Red Hat Ceph Storage Reporter: shylesh <shmohan>
Component: DocumentationAssignee: Aron Gunn <agunn>
Status: CLOSED CURRENTRELEASE QA Contact: Tejas <tchandra>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.1CC: agunn, ceph-eng-bugs, ceph-qe-bugs, hnallurv, kdreyer, mchristi, nlevine, pcuzner, tchandra
Target Milestone: rcKeywords: Reopened
Target Release: 2.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-28 09:38:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description shylesh 2016-11-03 13:03:12 UTC
Description of problem:
Few existing LUNs are corrupted after doing addition and removal of new LUNs through ceph-iscsi-ansible

Version-Release number of selected component (if applicable):

ceph-iscsi-ansible-1.5-1.el7test.noarch
ceph-iscsi-config-1.5-1.el7test.noarch

ceph:- 10.2.3-12.el7cp.x86_64
 



Steps to Reproduce:
1.configured 30 iscsi LUNs on a 3:1 gateway setup

2.Mounted 15 LUNs on RHEL 7.3 iscsi initiator and ran I/Os on few

3.While I/Os are in progress added 30 more LUNs from ceph-iscsi-ansible

4. On the initiator side newly created LUNs were discovered meanwhile I/O were successfully running on existing LUN mounts

5. Performed removal of the newly added LUNs using ceph-iscsi-ansible
   Note:- Method of removal is by making status of the client as "absent" in "client_connections" part.
  ex:- - { client: 'iqn.1994-05.com.redhat:f3ca348ccc5b', image_list: 'rbd.ansible46,rbd.ansible47,rbd.ansible48,rbd.ansible49,rbd.ansible50,rbd.ansible51,rbd.ansible52,rbd.ansible53,rbd.ansible54,rbd.ansible55,rbd.ansible56,rbd.ansible57,rbd.ansible58,rbd.ansible59,rbd.ansible60', chap:  'admin/redhat', status: 'absent' }

6. Performed iscsi-session rescan after the removal

Actual results:

1. After the removal some of the existing LUNs are throwing I/O error
[ubuntu@magna048 mpathm]$ ls
ls: cannot open directory .: Input/output error

2. Some of the mounted LUNs were disappeared after rescanning





from dmesg:-
============
[526559.662821] XFS (dm-3): xfs_log_force: error -5 returned.
[526589.743377] XFS (dm-3): xfs_log_force: error -5 returned.
[526619.823932] XFS (dm-3): xfs_log_force: error -5 returned.
[526649.904499] XFS (dm-3): xfs_log_force: error -5 returned.
[526679.985053] XFS (dm-3): xfs_log_force: error -5 returned.
[526710.065614] XFS (dm-3): xfs_log_force: error -5 returned.
[526740.146196] XFS (dm-3): xfs_log_force: error -5 returned.
[526770.226732] XFS (dm-3): xfs_log_force: error -5 returned.
[526800.307297] XFS (dm-3): xfs_log_force: error -5 returned.
[526830.387842] XFS (dm-3): xfs_log_force: error -5 returned.
[526835.947415] blk_update_request: 111 callbacks suppressed
[526835.952826] blk_update_request: I/O error, dev dm-0, sector 0
[526835.958838] blk_update_request: I/O error, dev dm-0, sector 20971392
[526835.965405] blk_update_request: I/O error, dev dm-0, sector 20971504
[526835.971899] blk_update_request: I/O error, dev dm-0, sector 0
[526835.977763] blk_update_request: I/O error, dev dm-0, sector 8
[526835.983815] blk_update_request: I/O error, dev dm-0, sector 0
[526836.004535] blk_update_request: I/O error, dev dm-2, sector 0
[526836.010475] blk_update_request: I/O error, dev dm-2, sector 62914432
[526836.016964] blk_update_request: I/O error, dev dm-2, sector 62914544
[526836.023521] blk_update_request: I/O error, dev dm-2, sector 0

Comment 3 Paul Cuzner 2016-11-03 21:06:49 UTC
This is not how LUNs are removed from a client

The luns presented to a client are detailed in the image_list variable - to remove luns simply update the image_list by removing the required entries.

The status field on the client definition is the status of the WHOLE client definition to the gateways i.e. you only set status=absent to remove the client from the gateway configuration.

This is covered in the the deployment guide.

Pleas retest using the correct process.

Comment 5 shylesh 2016-11-07 11:27:52 UTC
After following paul and mike's suggestion this is not seen, hence marking this as not a bug.

Comment 6 shylesh 2016-11-07 13:48:22 UTC
(In reply to shylesh from comment #5)
> After following paul and mike's suggestion this is not seen, hence marking
> this as not a bug.

Lately I realised that LUNs are reshuffled and the LUN's which were properly mounted and not removed are throwing I/O errors. Please refer to the bug https://bugzilla.redhat.com/show_bug.cgi?id=1392378 for more info. 

Sorry for the inconvience ,reopening the bug.

Comment 7 Mike Christie 2016-11-07 16:29:17 UTC
In this test are you removing a image from the client's image_list or marking the client absent?

Comment 13 Tejas 2016-11-09 10:16:25 UTC
The changes look good to me.

Mike, 
  Could you take a look at the warning.

Thanks,
Tejas

Comment 15 Tejas 2016-11-10 04:40:11 UTC
Thanks Mike.