Created attachment 996364 [details] iscsi.tar.gz Description of problem: Failed to retrieve iscsi lun after register to RHEV-M Version-Release number of selected component (if applicable): rhev-hypervisor7-7.1-20150226.0.el7ev ovirt-node-3.2.1-7.el7.noarch vdsm-4.16.12-2.el7ev.x86_64 RHEV-M vt13.11 (3.5.0-0.32.el6ev) How reproducible: 100% Steps to Reproduce: 1. Install RHEV-H on iscsi machine(multipath or single path) via pxe. 2. Register to RHEV-M 3. Create new iscsi storage in RHEVM side. Actual results: Failed to retrieve iscsi lun after register to RHEV-M Expected results: Create iscsi storage can succeed. Additional info: No such issue with previous version(rhev-hypervisor7-7.1-20150213.0.el7ev), so consider this is a regression bug. All log info include: /var/log/*.* /tmp/ovirt.log engine.log sosreport-LogCollector (rhevm-log-collector --no-postgresql --host=rhevhip collect)
Created attachment 996365 [details] failed to retrieve iscsi lun.png
This bug consider a blocker 3.5.1, the iscsi luns can not be discovered via rhevm. And no such issue with previous version(rhev-hypervisor7-7.1-20150213.0.el7ev,vdsm-4.16.8.1-6.el7ev.x86_64) in bug description. Try to discover iscsi on rhevh manually, the iscsi can be discovery, but failed via rhevm. Move this bug to vdsm component to investigation more. Thanks. # iscsiadm --mode discoverydb --type sendtargets --portal 10.66.90.115 --discover 10.66.90.115:3260,1000 iqn.1992-08.com.netapp:sn.135053389 10.66.90.116:3260,1001 iqn.1992-08.com.netapp:sn.135053389 # iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.135053389 --portal :3260 10.66.90.115 --login Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.135053389, portal: 10.66.90.116,3260] (multiple) Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.135053389, portal: 10.66.90.115,3260] (multiple) Login to [iface: default, target: iqn.1992-08.com.netapp:sn.135053389, portal: 10.66.90.116,3260] successful. Login to [iface: default, target: iqn.1992-08.com.netapp:sn.135053389, portal: 10.66.90.115,3260] successful. <snip> engine.log 2015-02-27 05:26:04,291 ERROR [org.ovirt.engine.core.bll.storage.GetDeviceListQuery] (ajp-/127.0.0.1:8702-6) Query GetDeviceListQuery failed. Exception message is VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600) : org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600): org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600) </snip> more log, you can check iscsi.tar.gz in attachment.
Aharon - have we seen this on RHEL based VDSM, or is this just for RHEVH?
Ying, have you tried to reproduce this with other storage servers, or just the netapp machine noted in the logs?
(In reply to Allon Mureinik from comment #5) > Ying, have you tried to reproduce this with other storage servers, or just > the netapp machine noted in the logs? Yes, connect NFS/FC/local storage can successful. just failed to retrieve iscsi lun.
(In reply to shaochen from comment #6) > (In reply to Allon Mureinik from comment #5) > > Ying, have you tried to reproduce this with other storage servers, or just > > the netapp machine noted in the logs? > > Yes, connect NFS/FC/local storage can successful. just failed to retrieve > iscsi lun. I meant are you able to use iSCSI with a different storage machine? I.e., is iSCSI completely broken, or juts unable to connect to this specific netapp?
This iSCSI can work fine on previous version(rhev-hypervisor7-7.1-20150213.0.el7ev), please see my original description(Additional info part), so I guess this should not be iscsi storage issue :)
(In reply to Allon Mureinik from comment #5) > Ying, have you tried to reproduce this with other storage servers, or just > the netapp machine noted in the logs? Hey Allon, tested this issue on _dell_ iSCSI machine, still can not retrieve iscsi lun. Thanks Ying
Amit and I looked at the issue - 1. shaochen, the vdsm logs are from incorrect hour - please attach the logs from the correct hour so we can perform RCA of the exact issue. 2. in the attached logs we've noticed the following error: Thread-39::DEBUG::2015-02-27 10:21:29,958::lvm::348::Storage.OperationMutex::(_reloadpvs) Operation 'lvm reload operation' released the operation mutex Thread-39::ERROR::2015-02-27 10:21:29,976::task::866::Storage.TaskManager.Task::(_setError) Task=`645d79cc-bb3d-4a8b-90c1-6df4cc8b869f`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run File "/usr/share/vdsm/logUtils.py", line 45, in wrapper File "/usr/share/vdsm/storage/hsm.py", line 1970, in getDeviceList File "/usr/share/vdsm/storage/hsm.py", line 2002, in _getDeviceList File "/usr/share/vdsm/storage/multipath.py", line 360, in pathListIter File "/usr/share/vdsm/storage/iscsi.py", line 87, in getSessionInfo File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> File "<string>", line 2, in readSessionInfo File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod OSError: [Errno 2] No iscsi_host for session [1] which is raised by the code added in https://gerrit.ovirt.org/#/c/37312/ (BZ 1178838). which may be caused by few reasons: unupdated package in RHEVH/code unsuitable to RHEVH packages/problem in the code. Amador, can you please take a look on that part?
forgot to mention - it could be a configuration problem as well (pending for amador answer before continue to investigate).
The issue is triggered by the sessions using hardware iscsi devices present on the system: [root@hp-z600-03 iscsi]# iscsiadm -m session qla4xxx: [1] 10.66.90.116:3260,1001 iqn.1992-08.com.netapp:sn.135053389 (flash) qla4xxx: [2] 10.66.90.115:3260,1000 iqn.1992-08.com.netapp:sn.135053389 (flash) ^ (Qlogic QLA4XXX HBAs) tcp: [4] 10.66.90.116:3260,1001 iqn.1992-08.com.netapp:sn.135053389 (non-flash) tcp: [5] 10.66.90.115:3260,1000 iqn.1992-08.com.netapp:sn.135053389 (non-flash) They are causing the code in https://gerrit.ovirt.org/#/c/37312/ to fall in exception due to the missing "/sys/devices/platform/" structure in the hardware iscsi: Testing session 1 (hardware iscsi): >>> import glob >>> import os >>> >>> def getIscsiHostPath(sessionID): ... pattern = '/sys/devices/platform/host*/session%s' % sessionID ... for path in glob.iglob(pattern): ... host = os.path.basename(os.path.dirname(path)) ... return '/sys/class/iscsi_host/' + host ... raise OSError(-1, "No iscsi_host for session [%s]" % sessionID) ... >>> >>> print getIscsiHostPath(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 6, in getIscsiHostPath OSError: [Errno -1] No iscsi_host for session [1] Testing session 4 (software iscsi): >>> import glob >>> import os >>> >>> def getIscsiHostPath(sessionID): ... pattern = '/sys/devices/platform/host*/session%s' % sessionID ... for path in glob.iglob(pattern): ... host = os.path.basename(os.path.dirname(path)) ... return '/sys/class/iscsi_host/' + host ... raise OSError(-1, "No iscsi_host for session [%s]" % sessionID) ... >>> >>> print getIscsiHostPath(4) /sys/class/iscsi_host/host9 The fix seems to be to skip the "/sys/devices/platform/" when it's not present instead of raise an exception as we do currently. Sending an upstream patch for discussion.
Also could reproduce this bug in RHEV-H 6.6 for RHEV 3.5.1 UNSIGNED rhev-hypervisor6-6.6-20150312.0 build, ovirt-node version is ovirt-node-3.2.1-10.el6.noarch
(In reply to haiyang,dong from comment #17) > Also could reproduce this bug in RHEV-H 6.6 for RHEV 3.5.1 UNSIGNED > rhev-hypervisor6-6.6-20150312.0 build, ovirt-node version is > ovirt-node-3.2.1-10.el6.noarch This bug is in vdsm component, should provide vdsm version as well. # rpm -q vdsm vdsm-4.16.12-2.el6ev.x86_64
Hi, We (RHEV storage QE) do not have such card to verify, Can you verify this once we will have a fix?
Jen, can you review the attach patch, and advice about the best way to detect hardware iscsi?
In Comment 20, Nir Soffer writes: >Jen, can you review the attach patch, and advice about the best way to detect >hardware iscsi? To find a system with an iSCSI hardware adapter, you'll need to issue a beaker query as follows: Devices/Device_id is XXXX Devices/Vendor_id is YYYY where XXXX is the device ID of the adapter and YYYY is the vendor's ID. PCI device and vendor id values are maintained in a repository at http://pciids.sourceforge.net/. Download the pci.ids file and search for iSCSI. Currently there are 17 matches, so that means 17 beaker searches if you want to look for each one.
Thanks Jeff! Amit, please follow Jef instructions and get a live session on one of these machines, so we can verify how hardware iscsi looks like when looking at sysfs.
Test version: rhev-hypervisor7-7.2-20151025.0 ovirt-node-3.3.0-0.18.20151022git82dc52c.el7ev.noarch vdsm-4.17.10-5.el7ev.noarch RHEV-M 3.6.0.2-0.1.el6 Test steps: 1. Install RHEV-H 7.2 on iscsi machine via pxe. 2. Register to RHEV-M 3.6 3. Try to create new iscsi storage in RHEVM side. Test results: RHEV-H can retrieve iscsi lun, but failed to activate iscsi storage after register to RHEV-M. Please see new attachment for more details. Hi aaviram, Seem the original issue is gone, but the new issue is coming, can I verify this bug and report a new one? or assigned this bug directly? Thanks!
Created attachment 1089446 [details] failed to active iscsi
Created attachment 1089447 [details] failed-active-iscsi.tar.gz
From vdsm.log, Chen, it seems we have to report new blocker bug. <snip> Thread-1017::DEBUG::2015-11-04 08:16:10,941::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) FAILED: <err> = 'iscsiadm: Could not login to [iface: default, target: iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02, portal: 10.66.90.100,3260].\niscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure)\niscsiadm: Could not log into all portals\n'; <rc> = 24 Thread-1017::DEBUG::2015-11-04 08:16:10,941::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m iface (cwd None) Thread-1017::DEBUG::2015-11-04 08:16:11,021::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0 Thread-1017::DEBUG::2015-11-04 08:16:11,022::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02 -I default -p 10.66.90.100:3260,1 -u (cwd None) mailbox.SPMMonitor::DEBUG::2015-11-04 08:16:11,040::storage_mailbox::735::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/144962d8-1eae-4e04-ae02-4fb63320cc31/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) mailbox.SPMMonitor::DEBUG::2015-11-04 08:16:11,075::storage_mailbox::735::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0131026 s, 78.2 MB/s\n'; <rc> = 0 Thread-1017::DEBUG::2015-11-04 08:16:11,104::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) FAILED: <err> = 'iscsiadm: No matching sessions found\n'; <rc> = 21 Thread-1017::DEBUG::2015-11-04 08:16:11,104::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m iface (cwd None) Thread-1017::DEBUG::2015-11-04 08:16:11,182::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0 Thread-1017::DEBUG::2015-11-04 08:16:11,182::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02 -I default -p 10.66.90.100:3260,1 --op=delete (cwd None) Thread-1017::DEBUG::2015-11-04 08:16:11,261::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0 Thread-1017::INFO::2015-11-04 08:16:11,261::iscsi::564::Storage.ISCSI::(setRpFilterIfNeeded) iSCSI iface.net_ifacename not provided. Skipping. Thread-1017::ERROR::2015-11-04 08:16:11,261::hsm::2465::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2462, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 473, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 201, in addIscsiNode iscsiadm.node_login(iface.name, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 312, in node_login raise IscsiAuthenticationError(rc, out, err) IscsiAuthenticationError: (24, ['Logging in to [iface: default, target: iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02, portal: 10.66.90.100,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02, portal: 10.66.90.100,3260].', 'iscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure)', 'iscsiadm: Could not log into all portals']) Thread-1017::DEBUG::2015-11-04 08:16:11,262::hsm::2489::Storage.HSM::(connectStorageServer) knownSDs: {82ed3d80-19e2-4870-84db-7c2796d3e5c9: storage.blockSD.findDomain} </snip>
It looks like the bug you've found is in the same area as this one- however it is a different bug, with a different solution. I think it worth opening a new bug for it.
(In reply to Amit Aviram from comment #31) > It looks like the bug you've found is in the same area as this one- however > it is a different bug, with a different solution. I think it worth opening a > new bug for it. Report a new bug 1278321 can trace the new issue. According #c27 & 31, the original issue is gone, so the bug is fixed, change bug status to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html