Created attachment 1247983 [details] /var/log/ from hypervisor and engine.log Description of problem: Installed clean RHV 4.1.0-4 and tried to add an iSCSI domain. The hosts are exposed to few LUN via 4 XtremIO iSCSI portals. Discovered and logged in to all targets from Webadmin but no LUNs were displayed. In vdsm.log saw that multipath failed to access the exposed volumes with a InaccessiblePhysDev exception. Version-Release number of selected component (if applicable): vdsm-4.19.4-1.el7ev.x86_64 vdsm-python-4.19.4-1.el7ev.noarch vdsm-cli-4.19.4-1.el7ev.noarch vdsm-hook-ethtool-options-4.19.4-1.el7ev.noarch vdsm-jsonrpc-4.19.4-1.el7ev.noarch vdsm-hook-vmfex-dev-4.19.4-1.el7ev.noarch vdsm-api-4.19.4-1.el7ev.noarch vdsm-yajsonrpc-4.19.4-1.el7ev.noarch vdsm-xmlrpc-4.19.4-1.el7ev.noarch device-mapper-event-libs-1.02.135-1.el7_3.2.x86_64 device-mapper-libs-1.02.135-1.el7_3.2.x86_64 device-mapper-event-1.02.135-1.el7_3.2.x86_64 device-mapper-multipath-0.4.9-99.el7_3.1.x86_64 device-mapper-1.02.135-1.el7_3.2.x86_64 device-mapper-multipath-libs-0.4.9-99.el7_3.1.x86_64 device-mapper-persistent-data-0.6.3-1.el7.x86_64 iscsi-initiator-utils-6.2.0.873-35.el7.x86_64 iscsi-initiator-utils-iscsiuio-6.2.0.873-35.el7.x86_64 libiscsi-1.9.0-7.el7.x86_64 How reproducible: Steps to Reproduce: 1. Add an iSCSI domain in a newly installed environment Actual results: No LUNs are presented after discovery and login were performed with no error in Webadmin. No iSCSI discoverSendTargets of iSCSI login in vdsm.log for the following connectStorageServer, as should be. 2017-02-06 01:38:04,917 INFO (jsonrpc/3) [dispatcher] Run and protect: connectStorageServer(domType=3, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'id': '00000000-0000-0000-0000-000000000000', 'con nection': '10.35.146.129', 'iqn': 'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', 'user': u'', 'tpgt': '1', 'password': '********', 'port': '3260'}], options=None) (logUtils:49) 2017-02-06 01:38:04,977 INFO (jsonrpc/3) [storage.ISCSI] iSCSI iface.net_ifacename not provided. Skipping. (iscsi:590) 2017-02-06 01:38:05,072 INFO (jsonrpc/1) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) Multipath exception: 2017-02-06 01:38:11,850 INFO (check/loop) [storage.SANLock] Acquiring host id for domain db9dad41-fd69-4904-a3fc-d51fd315dbb2 (id: 1) (clusterlock:282) 2017-02-06 01:38:12,049 WARN (jsonrpc/6) [storage.LVM] lvm pvs failed: 5 [] [' WARNING: Not using lvmetad because config setting use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).', ' Failed to find device for physical volume "/dev/mapper/3514f0c5a516008d4".'] (lvm:324) 2017-02-06 01:38:12,049 WARN (jsonrpc/6) [storage.HSM] getPV failed for guid: 3514f0c5a516008d4 (hsm:1972) Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 1969, in _getDeviceList pv = lvm.getPV(guid) File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV raise se.InaccessiblePhysDev((pvName,)) InaccessiblePhysDev: Multipath cannot access physical device(s): "devices=('3514f0c5a516008d4',)" 'multipath -ll' output is null Workaround: After reboot to the hosts, the devices get exposed Expected results: LUNs should be visible Additional info: /var/log/ from hypervisor and engine.log
How reproducible: 100%
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Looks like bad deploy - vdsm was started without running: "vdsm-tool configure --force" Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: Error: Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: One of the modules is not configured to work with VDSM. Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: To configure the module use the following: Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: 'vdsm-tool configure [--module module-name]'. Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: If all modules are not configured try to use: Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: 'vdsm-tool configure --force' Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: (The force flag will stop the module's service and start it Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: afterwards automatically to load the new configuration.) Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: multipath requires configuration Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: libvirt is not configured for vdsm yet Feb 6 00:59:07 storage-ge2-vdsm1 vdsmd_init_common.sh: Modules certificates, sebool, multipath, passwd, sanlock, libvirt are not configured Please attach host-deploy logs from this installation.
Created attachment 1248050 [details] host-deploy log
Looking in host-deploy log, vdsm was deployed successfully on 2017-02-06 01:34:41. The logs from Feb 6 00:59:07 were the results of unconfigured vdsm installed on this system and are not related to the issue with multipath devices. 2017-02-06 01:34:41 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:921 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stdout: Checking configuration status... multipath requires configuration libvirt is not configured for vdsm yet FAILED: conflicting vdsm and libvirt-qemu tls configuration. vdsm.conf with ssl=True requires the following changes: libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1 qemu.conf: spice_tls=1. Running configure... Reconfiguration of sebool is done. Reconfiguration of multipath is done. Reconfiguration of passwd is done. Reconfiguration of sanlock is done. Reconfiguration of libvirt is done. Done configuring modules to VDSM. 2017-02-06 01:34:41 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:926 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stderr: Need to check the rest of the logs after this time.
Tested with previous build [1], the issue doesn't reproduce. [1] rhevm-4.1.0.3-0.1.el7.noarch vdsm-4.19.4-1.el7ev.x86_64rhevm-4.1.0.3-0.1.el7.noarch device-mapper-multipath-0.4.9-99.el7_3.1.x86_64 iscsi-initiator-utils-6.2.0.873-35.el7.x86_64
Created attachment 1248106 [details] all host-deploy logs
(In reply to Elad from comment #0) > How reproducible: Do you know how to reproduce this? I would like to inspect the host in the state before the reboot. Can you reproduce it again with the same build?
Liron, we can see that getDeviceList retruns empty list - no devices are available on the host: 2017-02-06 01:38:10,710 INFO (jsonrpc/0) [dispatcher] Run and protect: getDeviceList, Return response: {'devList': []} (logUtils:52) 2017-02-06 01:38:10,710 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getDeviceList succeeded in 0.22 seconds (__init__:515) But after that, engine call again getDeviceList with a specific guid: 2017-02-06 01:38:11,816 INFO (jsonrpc/6) [dispatcher] Run and protect: getDeviceList(storageType=3, guids=['3514f0c5a516008d4'], checkStatus=False, options={}) (logUtils:49) Regardless of the reason why devices are not available on this, host, it seems that engine is ignoring the result of getDeviceList, and trying to create a vg with non-existent device. Can you explain this?
Back to /var/log/messages... Before vdsm was started, multipath was started as expected (required by vdsm): Feb 6 01:34:42 storage-ge2-vdsm1 kernel: device-mapper: multipath: version 1.11.0 loaded vdsm was started successfully: Feb 6 01:34:45 storage-ge2-vdsm1 systemd: Started Virtual Desktop Server Manager network restoration. Feb 6 01:34:45 storage-ge2-vdsm1 systemd: Starting Virtual Desktop Server Manager... ... Feb 6 01:34:46 storage-ge2-vdsm1 vdsmd_init_common.sh: vdsm: Running check_is_configured Feb 6 01:34:47 storage-ge2-vdsm1 vdsmd_init_common.sh: Current revision of multipath.conf detected, preserving ... Feb 6 01:34:49 storage-ge2-vdsm1 systemd: Started Virtual Desktop Server Manager. Then vdsm connected to the iscsi server, and new sessions started: Feb 6 01:38:05 storage-ge2-vdsm1 kernel: scsi host3: iSCSI Initiator over TCP/IP ... Feb 6 01:38:05 storage-ge2-vdsm1 iscsid: Connection1:0 to [target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00, portal: 10.35.146.129,3260] through [iface: default] is operational now In vdsm log, we see that vdsm connected to iscsi target: 2017-02-06 01:38:04,917 INFO (jsonrpc/3) [dispatcher] Run and protect: connectStorageServer(domType=3, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'id': '00000000-0000-0000-000 0-000000000000', 'connection': '10.35.146.129', 'iqn': 'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', 'user': u'', 'tpgt': '1', 'password': '********', 'port': '3260'}], options= None) (logUtils:49) 2017-02-06 01:38:04,977 INFO (jsonrpc/3) [storage.ISCSI] iSCSI iface.net_ifacename not provided. Skipping. (iscsi:590) 2017-02-06 01:38:05,072 INFO (jsonrpc/1) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-06 01:38:05,072 INFO (jsonrpc/1) [dispatcher] Run and protect: repoStats, Return response: {'0840c75a-8292-4900-91de-51b6599afe86': {'code': 0, 'actual': True, 'version': 4, 'acquir ed': True, 'delay': '0.000583175', 'lastCheck': '1.7', 'valid': True}, 'db9dad41-fd69-4904-a3fc-d51fd315dbb2': {'code': 0, 'actual': True, 'version': 4, 'acquired': False, 'delay': '0.001194 32', 'lastCheck': '3.2', 'valid': True}, 'ecca1745-5f22-4841-8d1b-fb5b62f0c6ee': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000578069', 'lastCheck': '8.2', 'vali d': True}} (logUtils:52) 2017-02-06 01:38:05,091 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.02 seconds (__init__:515) 2017-02-06 01:38:05,300 INFO (jsonrpc/3) [dispatcher] Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '00000000-0000-0000-0000-000000000000'}]} ( logUtils:52) (We see couple of connections) Finally vdsm is running getDeviceList - in this flow we do scsi rescan, and we should find the devices exposed by the storage server. 2017-02-06 01:38:10,493 INFO (jsonrpc/0) [dispatcher] Run and protect: getDeviceList(storageType=3, guids=(), checkStatus=False, options={}) (logUtils:49) 2017-02-06 01:38:10,710 INFO (jsonrpc/0) [dispatcher] Run and protect: getDeviceList, Return response: {'devList': []} (logUtils:52) There is no error, so the scan was successful. Since we don't see anything not in vdsm log or in /var/log/messages my conclusion is that the devices were not accessible to the host at this time, and this is environment issue. Until we have a host in this state that we can inspect, we cannot do anything with this bug. When you reproduce again, please change log levels to DEBUG so we have more information. Should be closed as insufficient data.
I'm preparing an environment so you'll be able to examine
Environment provided offline
Resetting the need-info for Liron
(In reply to Elad from comment #12) > I'm preparing an environment so you'll be able to examine Does it happen against any other storage?
The issue was incorrect iscsi initiator name because iscsid service was not restarted after replacing /etc/iscsi/initiatorname.iscsi.