Bug 1197292 - [3.5-6.6/7.1] Failed to retrieve iscsi lun from hardware iscsi after register to RHEV-M
Summary: [3.5-6.6/7.1] Failed to retrieve iscsi lun from hardware iscsi after register...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Amit Aviram
QA Contact: cshao
URL:
Whiteboard:
Depends On:
Blocks: 1209108
TreeView+ depends on / blocked
 
Reported: 2015-02-28 03:33 UTC by cshao
Modified: 2016-03-09 19:32 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1209108 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:32:26 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)
iscsi.tar.gz (11.77 MB, application/x-gzip)
2015-02-28 03:33 UTC, cshao
no flags Details
failed to retrieve iscsi lun.png (31.74 KB, image/png)
2015-02-28 03:36 UTC, cshao
no flags Details
failed to active iscsi (21.32 KB, image/png)
2015-11-04 08:21 UTC, cshao
no flags Details
failed-active-iscsi.tar.gz (978.07 KB, application/x-gzip)
2015-11-04 08:25 UTC, cshao
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0362 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 38354 master MERGED iscsi: Fix iscsi_host path lookup for hardware iSCSI. Never
oVirt gerrit 39555 ovirt-3.5 MERGED iscsi: Fix iscsi_host path lookup for hardware iSCSI. Never

Description cshao 2015-02-28 03:33:51 UTC
Created attachment 996364 [details]
iscsi.tar.gz

Description of problem:
Failed to retrieve iscsi lun after register to RHEV-M

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.1-20150226.0.el7ev
ovirt-node-3.2.1-7.el7.noarch
vdsm-4.16.12-2.el7ev.x86_64
RHEV-M vt13.11 (3.5.0-0.32.el6ev)

How reproducible:
100%

Steps to Reproduce:
1. Install RHEV-H on iscsi machine(multipath or single path) via pxe.
2. Register to RHEV-M
3. Create new iscsi storage in RHEVM side.

Actual results:
Failed to retrieve iscsi lun after register to RHEV-M

Expected results:
Create iscsi storage can succeed.

Additional info:
No such issue with previous version(rhev-hypervisor7-7.1-20150213.0.el7ev), so consider this is a regression bug.

All log info include:
/var/log/*.*
/tmp/ovirt.log
engine.log
sosreport-LogCollector (rhevm-log-collector --no-postgresql --host=rhevhip collect)

Comment 1 cshao 2015-02-28 03:36:18 UTC
Created attachment 996365 [details]
failed to retrieve iscsi lun.png

Comment 2 Ying Cui 2015-02-28 08:59:59 UTC
This bug consider a blocker 3.5.1, the iscsi luns can not be discovered via rhevm. And no such issue with previous version(rhev-hypervisor7-7.1-20150213.0.el7ev,vdsm-4.16.8.1-6.el7ev.x86_64) in bug description.
Try to discover iscsi on rhevh manually, the iscsi can be discovery, but failed via rhevm. Move this bug to vdsm component to investigation more. Thanks.

# iscsiadm --mode discoverydb --type sendtargets --portal 10.66.90.115 --discover
10.66.90.115:3260,1000 iqn.1992-08.com.netapp:sn.135053389
10.66.90.116:3260,1001 iqn.1992-08.com.netapp:sn.135053389

# iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.135053389 --portal :3260 10.66.90.115 --login
Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.135053389, portal: 10.66.90.116,3260] (multiple)
Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.135053389, portal: 10.66.90.115,3260] (multiple)
Login to [iface: default, target: iqn.1992-08.com.netapp:sn.135053389, portal: 10.66.90.116,3260] successful.
Login to [iface: default, target: iqn.1992-08.com.netapp:sn.135053389, portal: 10.66.90.115,3260] successful.

<snip>
engine.log
2015-02-27 05:26:04,291 ERROR [org.ovirt.engine.core.bll.storage.GetDeviceListQuery] (ajp-/127.0.0.1:8702-6) Query GetDeviceListQuery failed. Exception message is VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600) : org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600): org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600)
</snip>

more log, you can check iscsi.tar.gz in attachment.

Comment 4 Allon Mureinik 2015-03-02 05:21:34 UTC
Aharon - have we seen this on RHEL based VDSM, or is this just for RHEVH?

Comment 5 Allon Mureinik 2015-03-02 05:22:16 UTC
Ying, have you tried to reproduce this with other storage servers, or just the netapp machine noted in the logs?

Comment 6 cshao 2015-03-02 05:33:02 UTC
(In reply to Allon Mureinik from comment #5)
> Ying, have you tried to reproduce this with other storage servers, or just
> the netapp machine noted in the logs?

Yes, connect NFS/FC/local storage can successful. just failed to retrieve iscsi lun.

Comment 7 Allon Mureinik 2015-03-02 05:41:06 UTC
(In reply to shaochen from comment #6)
> (In reply to Allon Mureinik from comment #5)
> > Ying, have you tried to reproduce this with other storage servers, or just
> > the netapp machine noted in the logs?
> 
> Yes, connect NFS/FC/local storage can successful. just failed to retrieve
> iscsi lun.
I meant are you able to use iSCSI with a different storage machine? I.e., is iSCSI completely broken, or juts unable to connect to this specific netapp?

Comment 8 cshao 2015-03-02 05:50:55 UTC
This iSCSI can work fine on previous version(rhev-hypervisor7-7.1-20150213.0.el7ev), please see my original description(Additional info part), so I guess this should not be iscsi storage issue :)

Comment 9 Ying Cui 2015-03-02 09:31:41 UTC
(In reply to Allon Mureinik from comment #5)
> Ying, have you tried to reproduce this with other storage servers, or just
> the netapp machine noted in the logs?

Hey Allon, tested this issue on _dell_ iSCSI machine, still can not retrieve iscsi lun. 

Thanks
Ying

Comment 10 Liron Aravot 2015-03-02 13:53:55 UTC
Amit and I looked at the issue - 

1. shaochen, the vdsm logs are from incorrect hour - please attach the logs from the correct hour so we can perform RCA of the exact issue.

2. in the attached logs we've noticed the following error:

Thread-39::DEBUG::2015-02-27 10:21:29,958::lvm::348::Storage.OperationMutex::(_reloadpvs) Operation 'lvm reload operation' released the operation mutex
Thread-39::ERROR::2015-02-27 10:21:29,976::task::866::Storage.TaskManager.Task::(_setError) Task=`645d79cc-bb3d-4a8b-90c1-6df4cc8b869f`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
  File "/usr/share/vdsm/storage/hsm.py", line 1970, in getDeviceList
  File "/usr/share/vdsm/storage/hsm.py", line 2002, in _getDeviceList
  File "/usr/share/vdsm/storage/multipath.py", line 360, in pathListIter
  File "/usr/share/vdsm/storage/iscsi.py", line 87, in getSessionInfo
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
  File "<string>", line 2, in readSessionInfo
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
OSError: [Errno 2] No iscsi_host for session [1]

which is raised by the code added in https://gerrit.ovirt.org/#/c/37312/ (BZ 1178838). which may be caused by few reasons:
unupdated package in RHEVH/code unsuitable to RHEVH packages/problem in the code.

Amador, can you please take a look on that part?

Comment 11 Liron Aravot 2015-03-02 14:02:14 UTC
forgot to mention - it could be a configuration problem as well (pending for amador answer before continue to investigate).

Comment 15 Amador Pahim 2015-03-03 13:55:27 UTC
The issue is triggered by the sessions using hardware iscsi devices present on the system:

[root@hp-z600-03 iscsi]# iscsiadm -m session
qla4xxx: [1] 10.66.90.116:3260,1001 iqn.1992-08.com.netapp:sn.135053389 (flash)
qla4xxx: [2] 10.66.90.115:3260,1000 iqn.1992-08.com.netapp:sn.135053389 (flash)
^ (Qlogic QLA4XXX HBAs)

tcp: [4] 10.66.90.116:3260,1001 iqn.1992-08.com.netapp:sn.135053389 (non-flash)
tcp: [5] 10.66.90.115:3260,1000 iqn.1992-08.com.netapp:sn.135053389 (non-flash)

They are causing the code in https://gerrit.ovirt.org/#/c/37312/ to fall in exception due to the missing "/sys/devices/platform/" structure in the hardware iscsi:

Testing session 1 (hardware iscsi):

>>> import glob
>>> import os
>>> 
>>> def getIscsiHostPath(sessionID):
...     pattern = '/sys/devices/platform/host*/session%s' % sessionID
...     for path in glob.iglob(pattern):
...         host = os.path.basename(os.path.dirname(path))
...         return '/sys/class/iscsi_host/' + host
...     raise OSError(-1, "No iscsi_host for session [%s]" % sessionID)
... 
>>> 
>>> print getIscsiHostPath(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in getIscsiHostPath
OSError: [Errno -1] No iscsi_host for session [1]

Testing session 4 (software iscsi):

>>> import glob
>>> import os
>>> 
>>> def getIscsiHostPath(sessionID):
...     pattern = '/sys/devices/platform/host*/session%s' % sessionID
...     for path in glob.iglob(pattern):
...         host = os.path.basename(os.path.dirname(path))
...         return '/sys/class/iscsi_host/' + host
...     raise OSError(-1, "No iscsi_host for session [%s]" % sessionID)
... 
>>> 
>>> print getIscsiHostPath(4)
/sys/class/iscsi_host/host9


The fix seems to be to skip the "/sys/devices/platform/" when it's not present instead of raise an exception as we do currently. Sending an upstream patch for discussion.

Comment 17 haiyang,dong 2015-03-16 08:38:26 UTC
Also could reproduce this bug in RHEV-H 6.6 for RHEV 3.5.1 UNSIGNED rhev-hypervisor6-6.6-20150312.0 build, ovirt-node version is ovirt-node-3.2.1-10.el6.noarch

Comment 18 Ying Cui 2015-03-17 06:00:53 UTC
(In reply to haiyang,dong from comment #17)
> Also could reproduce this bug in RHEV-H 6.6 for RHEV 3.5.1 UNSIGNED
> rhev-hypervisor6-6.6-20150312.0 build, ovirt-node version is
> ovirt-node-3.2.1-10.el6.noarch

This bug is in vdsm component, should provide vdsm version as well.
# rpm -q vdsm
vdsm-4.16.12-2.el6ev.x86_64

Comment 19 Aharon Canan 2015-03-30 11:56:48 UTC
Hi, 

We (RHEV storage QE) do not have such card to verify, Can you verify this once we will have a fix?

Comment 20 Nir Soffer 2015-03-30 17:42:24 UTC
Jen, can you review the attach patch, and advice about the best way to detect hardware iscsi?

Comment 21 Jeff Nelson 2015-03-30 19:06:09 UTC
In Comment 20, Nir Soffer writes:
>Jen, can you review the attach patch, and advice about the best way to detect >hardware iscsi?

To find a system with an iSCSI hardware adapter, you'll need to issue a beaker query as follows:

    Devices/Device_id is XXXX
    Devices/Vendor_id is YYYY

where XXXX is the device ID of the adapter and YYYY is the vendor's ID.

PCI device and vendor id values are maintained in a repository at http://pciids.sourceforge.net/. Download the pci.ids file and search for iSCSI. Currently there are 17 matches, so that means 17 beaker searches if you want to look for each one.

Comment 23 Nir Soffer 2015-03-30 19:12:11 UTC
Thanks Jeff!

Amit, please follow Jef instructions and get a live session on one of these machines, so we can verify how hardware iscsi looks like when looking at sysfs.

Comment 27 cshao 2015-11-04 08:15:56 UTC
Test version:
rhev-hypervisor7-7.2-20151025.0
ovirt-node-3.3.0-0.18.20151022git82dc52c.el7ev.noarch
vdsm-4.17.10-5.el7ev.noarch
RHEV-M 3.6.0.2-0.1.el6

Test steps:
1. Install RHEV-H 7.2 on iscsi machine via pxe.
2. Register to RHEV-M 3.6
3. Try to create new iscsi storage in RHEVM side.

Test results:
RHEV-H can retrieve iscsi lun, but failed to activate iscsi storage after register to RHEV-M.


Please see new attachment for more details.

Hi aaviram,

  Seem the original issue is gone, but the new issue is coming, can I verify this bug and report a new one? or assigned this bug directly?

Thanks!

Comment 28 cshao 2015-11-04 08:21:31 UTC
Created attachment 1089446 [details]
failed to active iscsi

Comment 29 cshao 2015-11-04 08:25:38 UTC
Created attachment 1089447 [details]
failed-active-iscsi.tar.gz

Comment 30 Ying Cui 2015-11-04 09:38:41 UTC
From vdsm.log, Chen, it seems we have to report new blocker bug.

<snip>
Thread-1017::DEBUG::2015-11-04 08:16:10,941::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) FAILED: <err> = 'iscsiadm: Could not login to [iface: default, target: iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02, portal: 10.66.90.100,3260].\niscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure)\niscsiadm: Could not log into all portals\n'; <rc> = 24
Thread-1017::DEBUG::2015-11-04 08:16:10,941::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m iface (cwd None)
Thread-1017::DEBUG::2015-11-04 08:16:11,021::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-1017::DEBUG::2015-11-04 08:16:11,022::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02 -I default -p 10.66.90.100:3260,1 -u (cwd None)
mailbox.SPMMonitor::DEBUG::2015-11-04 08:16:11,040::storage_mailbox::735::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/144962d8-1eae-4e04-ae02-4fb63320cc31/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None)
mailbox.SPMMonitor::DEBUG::2015-11-04 08:16:11,075::storage_mailbox::735::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0131026 s, 78.2 MB/s\n'; <rc> = 0
Thread-1017::DEBUG::2015-11-04 08:16:11,104::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) FAILED: <err> = 'iscsiadm: No matching sessions found\n'; <rc> = 21
Thread-1017::DEBUG::2015-11-04 08:16:11,104::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m iface (cwd None)
Thread-1017::DEBUG::2015-11-04 08:16:11,182::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-1017::DEBUG::2015-11-04 08:16:11,182::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02 -I default -p 10.66.90.100:3260,1 --op=delete (cwd None)
Thread-1017::DEBUG::2015-11-04 08:16:11,261::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-1017::INFO::2015-11-04 08:16:11,261::iscsi::564::Storage.ISCSI::(setRpFilterIfNeeded) iSCSI iface.net_ifacename not provided. Skipping.
Thread-1017::ERROR::2015-11-04 08:16:11,261::hsm::2465::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2462, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 473, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/share/vdsm/storage/iscsi.py", line 201, in addIscsiNode
    iscsiadm.node_login(iface.name, portalStr, target.iqn)
  File "/usr/share/vdsm/storage/iscsiadm.py", line 312, in node_login
    raise IscsiAuthenticationError(rc, out, err)
IscsiAuthenticationError: (24, ['Logging in to [iface: default, target: iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02, portal: 10.66.90.100,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02, portal: 10.66.90.100,3260].', 'iscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure)', 'iscsiadm: Could not log into all portals'])
Thread-1017::DEBUG::2015-11-04 08:16:11,262::hsm::2489::Storage.HSM::(connectStorageServer) knownSDs: {82ed3d80-19e2-4870-84db-7c2796d3e5c9: storage.blockSD.findDomain}

</snip>

Comment 31 Amit Aviram 2015-11-04 12:41:27 UTC
It looks like the bug you've found is in the same area as this one- however it is a different bug, with a different solution. I think it worth opening a new bug for it.

Comment 32 cshao 2015-11-05 09:29:08 UTC
(In reply to Amit Aviram from comment #31)
> It looks like the bug you've found is in the same area as this one- however
> it is a different bug, with a different solution. I think it worth opening a
> new bug for it.

Report a new bug 1278321 can trace the new issue.
According #c27 & 31, the original issue is gone, so the bug is fixed, change bug status to VERIFIED.

Comment 34 errata-xmlrpc 2016-03-09 19:32:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.