Description of problem: RHEV-M: 3.5.1-0.2.el6ev # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor 6.6 (20150325.0.el6ev) # getenforce Enforcing # audit2allow -a # # rpm -qa | grep -i vdsm vdsm-jsonrpc-4.16.8.1-6.el6ev.noarch vdsm-reg-4.16.8.1-6.el6ev.noarch vdsm-xmlrpc-4.16.8.1-6.el6ev.noarch vdsm-yajsonrpc-4.16.8.1-6.el6ev.noarch vdsm-python-4.16.8.1-6.el6ev.noarch vdsm-cli-4.16.8.1-6.el6ev.noarch vdsm-hook-vhostmd-4.16.8.1-6.el6ev.noarch vdsm-4.16.8.1-6.el6ev.x86_64 ovirt-node-plugin-vdsm-0.2.0-18.el6ev.noarch vdsm-hook-ethtool-options-4.16.8.1-6.el6ev.noarch vdsm-python-zombiereaper-4.16.8.1-6.el6ev.noarch # rpm -qa | grep -i libvirt libvirt-client-0.10.2-46.el6_6.2.x86_64 libvirt-python-0.10.2-46.el6_6.2.x86_64 libvirt-cim-0.6.1-12.el6.x86_64 libvirt-0.10.2-46.el6_6.2.x86_64 libvirt-lock-sanlock-0.10.2-46.el6_6.2.x86_64 vdsm.log =============== <snip> Thread-253::DEBUG::2015-03-25 20:05:31,440::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 32212189184: Input/output error\n /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 32212246528: Input/output error\n /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/35000144f13886704 was disabled\n Volume group "57e7e1dd-fc33-4f2c-88ee-4a73aba77597" not found\n Skipping volume group 57e7e1dd-fc33-4f2c-88ee-4a73aba77597\n'; <rc> = 5 Thread-253::DEBUG::2015-03-25 20:05:31,443::lvm::495::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex Thread-253::DEBUG::2015-03-25 20:05:31,444::lvm::497::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex Thread-253::DEBUG::2015-03-25 20:05:31,444::lvm::505::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex Thread-253::DEBUG::2015-03-25 20:05:31,444::lvm::517::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex Thread-253::ERROR::2015-03-25 20:05:31,445::sp::561::Storage.StoragePool::(create) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/sp.py", line 556, in create File "/usr/share/vdsm/storage/blockSD.py", line 840, in validate File "/usr/share/vdsm/storage/lvm.py", line 976, in chkVG StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('57e7e1dd-fc33-4f2c-88ee-4a73aba77597: [\' /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 32212189184: Input/output error\', \' /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 32212246528: Input/output error\', \' /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 0: Input/output error\', \' WARNING: Error counts reached a limit of 3. Device /dev/mapper/35000144f13886704 was disabled\', \' Volume group "57e7e1dd-fc33-4f2c-88ee-4a73aba77597" not found\', \' Skipping volume group 57e7e1dd-fc33-4f2c-88ee-4a73aba77597\']',) Thread-253::ERROR::2015-03-25 20:05:31,445::task::866::Storage.TaskManager.Task::(_setError) Task=`3e0f52ba-6e99-4995-805d-ac890f16a853`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run File "/usr/share/vdsm/logUtils.py", line 45, in wrapper File "/usr/share/vdsm/storage/hsm.py", line 978, in createStoragePool File "/usr/share/vdsm/storage/sp.py", line 562, in create StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('57e7e1dd-fc33-4f2c-88ee-4a73aba77597',) </snip> How reproducible: Steps to Reproduce: 1. Autoinstall RHEV-H 20150325.0.el6ev and approve the node 2. Try to add ISCSI data storage, it will fail 3. Actual results: Cannot add iscsi storage Expected results: Be able to add iscsi storage
Created attachment 1006468 [details] vdsm logs
Looking at vdsm log: The login to the iscsi server looks ok: Thread-103::INFO::2015-03-25 20:00:50,486::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='00000002-0002-0002-0002-0000000001ed', c onList=[{'connection': '192.168.1.103', 'iqn': 'iqn.1992-04.com.emc:storage.storage.rbdata', 'user': '', 'tpgt': '1', 'password': '******', 'id': 'e221b3e0-f6c3-4f21-a369-87a1c 34de5a6', 'port': '3260'}], options=None) Thread-103::DEBUG::2015-03-25 20:00:50,487::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.1992-04.com.emc:storage.storage.rbdata - I default -p 192.168.1.103:3260,1 --op=new (cwd None) Thread-103::DEBUG::2015-03-25 20:00:50,498::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0 Thread-103::DEBUG::2015-03-25 20:00:50,498::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.1992-04.com.emc:storage.storage.rbdata - I default -p 192.168.1.103:3260,1 -l (cwd None) Thread-103::DEBUG::2015-03-25 20:00:50,747::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0 Thread-103::DEBUG::2015-03-25 20:00:50,748::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.1992-04.com.emc:storage.storage.rbdata - I default -p 192.168.1.103:3260,1 -n node.startup -v manual --op=update (cwd None) Thread-103::DEBUG::2015-03-25 20:00:50,762::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0 But for some reason lvm cannot read from /dev/mapper/35000144f82378826 Thread-103::DEBUG::2015-03-25 20:00:50,763::lvm::365::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-103::DEBUG::2015-03-25 20:00:50,765::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_ suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/35000144f82378826|'\'', '\''r|.*|'\'' ] } globa l { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator ' |' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name (cwd None) Thread-103::DEBUG::2015-03-25 20:00:50,884::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' /dev/mapper/35000144f82378826: read failed after 0 of 4096 at 32212189184: Input/output error\n /dev/mapper/35000144f82378826: read failed after 0 of 4096 at 32212246528: Input/output error\n /dev/mapper/35000144f82378826: read failed after 0 of 409 6 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/35000144f82378826 was disabled\n No volume groups found\n'; <rc> = 0 When this pv is not accessible, it is expected that your new vg will not be found. We need to understand why /dev/mapper/35000144f82378826 is not accessible after loggin to the iscsi server. Lets try to login manually, and check what we see on the storage. 1. Put the host to maintenance 2. Attach the output of these commands: mkdir before lsblk > before/lsblk multipath -ll > before/multiath-ll pvdisplay > before/pvdisplay vgdisplay > before/vgdisplay 3. Login to the iscsi target iscsiadm -m node -T iqn.1992-04.com.emc:storage.storage.rbdata -I default -p 192.168.1.103:3260,1 -l 4. Attach the output of these commands: mkdir after lsblk > after/lsblk multipath -ll > after/multiath-ll pvdisplay > after/pvdisplay vgdisplay > after/vgdisplay 5. Logout from the iscsi server iscsiadm -m session -u
Looking in messages - it seems that the device is ready for use at 20:00:51 but we try to access it at 20:00:50, which explain why we cannot read from it. Mar 25 20:00:50 localhost kernel: sd 2:0:0:0: [sdb] Synchronizing SCSI cache Mar 25 20:00:50 localhost multipathd: sdb: remove path (uevent) Mar 25 20:00:50 localhost multipathd: 35000144f82378826 Last path deleted, disabling queueing Mar 25 20:00:50 localhost multipathd: 35000144f82378826: map in use Mar 25 20:00:50 localhost multipathd: 35000144f82378826: can't flush Mar 25 20:00:50 localhost multipathd: flush_on_last_del in progress Mar 25 20:00:50 localhost multipathd: 35000144f82378826: load table [0 62914560 multipath 0 0 0 0] Mar 25 20:00:50 localhost multipathd: sdb [8:16]: path removed from map 35000144f82378826 Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 24 Mar 25 20:00:50 localhost kernel: Buffer I/O error on device dm-5, logical block 3 Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 24 Mar 25 20:00:50 localhost kernel: Buffer I/O error on device dm-5, logical block 3 Mar 25 20:00:50 localhost iscsid: Connection1:0 to [target: iqn.1992-04.com.emc:storage.storage.rbdata, portal: 192.168.1.103,3260] through [iface: default] is shutdown. Mar 25 20:00:50 localhost kernel: device-mapper: table: 253:12: multipath: error getting device Mar 25 20:00:50 localhost kernel: device-mapper: ioctl: error adding target to table Mar 25 20:00:50 localhost udevd-work[16801]: setfilecon /dev/dm-12 failed: No such file or directory Mar 25 20:00:50 localhost multipathd: dm-12: remove map (uevent) Mar 25 20:00:50 localhost multipathd: dm-12: devmap not registered, can't remove Mar 25 20:00:50 localhost multipathd: dm-12: remove map (uevent) Mar 25 20:00:50 localhost multipathd: dm-12: devmap not registered, can't remove Mar 25 20:00:50 localhost kernel: scsi3 : iSCSI Initiator over TCP/IP Mar 25 20:00:50 localhost kernel: scsi 3:0:0:0: Direct-Access EMC LIFELINE-DISK 1 PQ: 0 ANSI: 4 Mar 25 20:00:50 localhost kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0 Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 62914432 Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 62914544 Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 0 Mar 25 20:00:50 localhost kernel: sd 3:0:0:0: [sdb] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB) Mar 25 20:00:51 localhost kernel: end_request: I/O error, dev dm-5, sector 62914432 Mar 25 20:00:51 localhost kernel: end_request: I/O error, dev dm-5, sector 62914544 Mar 25 20:00:51 localhost kernel: end_request: I/O error, dev dm-5, sector 0 Mar 25 20:00:51 localhost kernel: sd 3:0:0:0: [sdb] Write Protect is off Mar 25 20:00:51 localhost kernel: sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 25 20:00:51 localhost kernel: sdb: unknown partition table Mar 25 20:00:51 localhost kernel: sd 3:0:0:0: [sdb] Attached SCSI disk Mar 25 20:00:51 localhost multipathd: sdb: add path (uevent) Mar 25 20:00:51 localhost multipathd: 35000144f82378826: load table [0 62914560 multipath 0 0 1 1 round-robin 0 1 1 8:16 1] Mar 25 20:00:51 localhost multipathd: sdb [8:16]: path added to devmap 35000144f82378826 Mar 25 20:00:51 localhost iscsid: Could not set session2 priority. READ/WRITE throughout and latency could be affected. Mar 25 20:00:51 localhost iscsid: Connection2:0 to [target: iqn.1992-04.com.emc:storage.storage.rbdata, portal: 192.168.1.103,3260] through [iface: default] is operational now Ben, this is another issue with rhev-h 6.6 - can this be related to bug 1191401? How do you suggest to debug this issue?
Please attach also /etc/multipath.conf
Created attachment 1006616 [details] multipath.conf
Created attachment 1006627 [details] output commands
Hi Nir, I have re-created the environment and now the iscsi storage is called dakar and I got the same issue. My steps: =========== - Reproduced the issue as the original report - Put the host into maint. - Executed the commands. - Loggout # iscsiadm -m session -u Logging out of session [sid: 2, target: iqn.1992-04.com.emc:storage.storage.dakar, portal: 192.168.1.103,3260] Logout of [sid: 2, target: iqn.1992-04.com.emc:storage.storage.dakar, portal: 192.168.1.103,3260] successful. Attached the commands and new messages file. # rpm -qa | grep -i multipath device-mapper-multipath-0.4.9-80.el6_6.2.x86_64 device-mapper-multipath-libs-0.4.9-80.el6_6.2.x86_64
(In reply to Douglas Schilling Landgraf from comment #9) Please attach engine, vdsm, and messages logs showing the pv used in the commands output (attachment 1006627 [details]). Can you describe how to reproduce this? 1. What rhev-h version needed before the upgrade - please attach rpm -qa output before and after the upgrade. 2. What is the system state before the upgrade - did we have any setup before or we created the domain after the upgrade 3. Does it reproduce after upgrade to previous version? e.g. if this happens when upgrading version x to x+1, does it reproduce when upgrading version x-1 to x? 4. Can I have live session on the failing system - before you upgrade it?
Hi Nir, (In reply to Nir Soffer from comment #10) > (In reply to Douglas Schilling Landgraf from comment #9) > Please attach engine, vdsm, and messages logs showing the pv used in > the commands output (attachment 1006627 [details]). > > Can you describe how to reproduce this? > I have described in the "description" of bug. You can use autoinstall or even normal installation of rhev-h, approve the node and try to add an iscsi storage data. > 1. What rhev-h version needed before the upgrade - please attach > rpm -qa output before and after the upgrade. > 2. What is the system state before the upgrade - did we have any > setup before or we created the domain after the upgrade > 3. Does it reproduce after upgrade to previous version? > e.g. if this happens when upgrading version x to x+1, > does it reproduce when upgrading version x-1 to x? > 4. Can I have live session on the failing system - before you upgrade it? There is no upgrade involved in this bug, normal installation.
(In reply to Douglas Schilling Landgraf from comment #11) > There is no upgrade involved in this bug, normal installation. Ok - can you check if issue is reproducible with the previous iso, and the same storage server? I want to eliminate problems with the environment you use.
Hello Nir, (In reply to Nir Soffer from comment #13) > (In reply to Douglas Schilling Landgraf from comment #11) > > There is no upgrade involved in this bug, normal installation. > > Ok - can you check if issue is reproducible with the previous iso, > and the same storage server? I want to eliminate problems with the > environment you use. I have used the storage server with others RHEV-H isos yesterday, no issues. Hello Aharon, could you please try a reproducer too? Thanks!
(In reply to Douglas Schilling Landgraf from comment #14) Douglas, can you attach vdsm.conf on this machine?
Created attachment 1006985 [details] logs marc 26 2015
Created attachment 1006986 [details] engine.log 26 marc 2015
Douglas, can you test the attached patch? I think this should solve this issue.
Created attachment 1007039 [details] testlogspatchnr1
Hi Nir, Still no go with the patch you suggested, attached the logs.
(In reply to Douglas Schilling Landgraf from comment #20) > Still no go with the patch you suggested, attached the logs. Douglas, can you attach the output of lsblk from the last setup? We need this to understand multipath messages in /var/log/messages, because vdsm does not mention the device names in its logs (dm-*). If you run another test, please attach lsblk output before and after the test. Also, what is the latest rhev-h iso that does works? I want to perform the same flow with the broken and last good iso, and compare the logs. It will be very helpful if you perform the same flow with the last good iso and attach the same logs here. Thanks!
Douglas, I think the new patches will make a difference. Can you test them? Also, can you specify which engine version are you using? Testing the patches with engine from master, I see different flow compared with the flow seen in the logs here.
Hi Nir, I will test soon the new patches. For now I would like to share that I can see this report also in: # rpm -qa | grep -i vdsm vdsm-reg-4.16.12.1-3.el6ev.noarch ovirt-node-plugin-vdsm-0.2.0-20.el6ev.noarch vdsm-python-zombiereaper-4.16.12.1-3.el6ev.noarch vdsm-jsonrpc-4.16.12.1-3.el6ev.noarch vdsm-python-4.16.12.1-3.el6ev.noarch vdsm-cli-4.16.12.1-3.el6ev.noarch vdsm-hook-vhostmd-4.16.12.1-3.el6ev.noarch vdsm-4.16.12.1-3.el6ev.x86_64 vdsm-yajsonrpc-4.16.12.1-3.el6ev.noarch vdsm-hook-ethtool-options-4.16.12.1-3.el6ev.noarch vdsm-xmlrpc-4.16.12.1-3.el6ev.noarch # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor 6.6 (20150327.0.el6ev)
Created attachment 1007481 [details] vdsm-rhevh20150327.0.el6ev.tar.gz
Hi Nir, FYI, I have attached the logs from the last iso (rhevh20150327) with the fail, still not using the last patches you provided.
Hi Nir, Previous working version data, attached logs. # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor 6.6 (20150319.42.el6ev) # rpm -qa | grep -i vdsm vdsm-reg-4.16.12.1-3.el6ev.noarch ovirt-node-plugin-vdsm-0.2.0-20.el6ev.noarch vdsm-python-zombiereaper-4.16.12.1-3.el6ev.noarch vdsm-jsonrpc-4.16.12.1-3.el6ev.noarch vdsm-python-4.16.12.1-3.el6ev.noarch vdsm-cli-4.16.12.1-3.el6ev.noarch vdsm-hook-vhostmd-4.16.12.1-3.el6ev.noarch vdsm-4.16.12.1-3.el6ev.x86_64 vdsm-yajsonrpc-4.16.12.1-3.el6ev.noarch vdsm-hook-ethtool-options-4.16.12.1-3.el6ev.noarch vdsm-xmlrpc-4.16.12.1-3.el6ev.noarch [root@localhost log]# rpm -qa | grep -i mapper device-mapper-event-libs-1.02.90-2.el6_6.1.x86_64 device-mapper-multipath-libs-0.4.9-80.el6_6.3.x86_64 device-mapper-persistent-data-0.3.2-1.el6.x86_64 device-mapper-1.02.90-2.el6_6.1.x86_64 device-mapper-event-1.02.90-2.el6_6.1.x86_64 device-mapper-multipath-0.4.9-80.el6_6.3.x86_64 device-mapper-libs-1.02.90-2.el6_6.1.x86_64
Created attachment 1007534 [details] rhevh20150319.42 worked iscsi data.
Comparing "good" iso (20150319) with "bad" iso (20150327), we can see that vdsm flow is identical, but the underlying platform behave differently. Good iso: Mar 27 22:37:52 localhost kernel: scsi3 : iSCSI Initiator over TCP/IP Mar 27 22:37:52 localhost kernel: scsi 3:0:0:0: Direct-Access EMC LIFELINE-DISK 1 PQ: 0 ANSI: 4 Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0 Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: [sdb] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB) Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: [sdb] Write Protect is off Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 27 22:37:52 localhost kernel: sdb: unknown partition table Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: [sdb] Attached SCSI disk Mar 27 22:37:52 localhost multipathd: sdb: add path (uevent) Mar 27 22:37:53 localhost multipathd: 35000144f02325565: load table [0 62914560 multipath 0 0 1 1 round-robin 0 1 1 8:16 1] Mar 27 22:37:53 localhost multipathd: sdb [8:16]: path added to devmap 35000144f02325565 Here the device (sdb) backing the multipath device (35000144f02325565) was added so we lvm can read from the multipath device. Mar 27 22:37:53 localhost kernel: device-mapper: table: 253:12: multipath: error getting device Mar 27 22:37:53 localhost kernel: device-mapper: ioctl: error adding target to table Mar 27 22:37:53 localhost iscsid: Connection1:0 to [target: iqn.1992-04.com.emc:storage.storage.data42, portal: 192.168.1.103,3260] through [iface: default] is shutdown. The old connection is shutdown Mar 27 22:37:53 localhost iscsid: Connection2:0 to [target: iqn.1992-04.com.emc:storage.storage.data42, portal: 192.168.1.103,3260] through [iface: default] is operational now And the new connection is finished bad iso: Mar 27 21:01:10 localhost kernel: scsi3 : iSCSI Initiator over TCP/IP Mar 27 21:01:10 localhost kernel: scsi 3:0:0:0: Direct-Access EMC LIFELINE-DISK 1 PQ: 0 ANSI: 4 Mar 27 21:01:10 localhost kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0 Mar 27 21:01:11 localhost iscsid: Connection1:0 to [target: iqn.1992-04.com.emc:storage.storage.finddata, portal: 192.168.1.103,3260] through [iface: default] is shutdown. The old connection is shutdown Mar 27 21:01:11 localhost iscsid: Connection2:0 to [target: iqn.1992-04.com.emc:storage.storage.finddata, portal: 192.168.1.103,3260] through [iface: default] is operational now And the new connection is done - but the multipath device is not ready! Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 62914432 Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 62914544 Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 0 Mar 27 21:01:11 localhost kernel: sd 3:0:0:0: [sdb] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB) Mar 27 21:01:11 localhost kernel: sd 3:0:0:0: [sdb] Write Protect is off Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 62914432 Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 62914544 Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 0 Somewhere here, lvm vgck try to read from the multipath device and fails. Mar 27 21:01:11 localhost kernel: sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 27 21:01:11 localhost kernel: sdb: unknown partition table Mar 27 21:01:11 localhost kernel: sd 3:0:0:0: [sdb] Attached SCSI disk Mar 27 21:01:11 localhost multipathd: sdb: add path (uevent) Mar 27 21:01:11 localhost multipathd: 35000144f11907099: load table [0 62914560 multipath 0 0 1 1 round-robin 0 1 1 8:16 1] Mar 27 21:01:11 localhost multipathd: sdb [8:16]: path added to devmap 35000144f11907099 The device is ready only now! This may be caused by multipath, iscsid, udevd or kernel - lets start with multipath - Ben - what do you think?
Hi Nir, As we talked, I have backported your below patches to downstream vdsm-4.16.12.1-3.el6ev.x86_64 and added it to the last build of rhev-h 6.6 (20150327.0.el6ev) available and it worked. The host is up and storage data is up. My suggestion is build a new downstream vdsm with these fixes so we can build new rhev-h. multipath: Wait for udev events after rescan https://gerrit.ovirt.org/#/c/39288/ storageServer: Wait for udev events after connect https://gerrit.ovirt.org/#/c/39289/ udevadm: Add udevadm wrapper https://gerrit.ovirt.org/#/c/39287/
Created attachment 1007546 [details] logs rhevh20150327.0 patched vdsm worked
Closing this since it only affects el6 and we will only support el7 in 3.6.
allon/douglas - there won't be any official 3.6.0 vdsm builds in d/s until official switch to d/s releases in august. QE is using upstream vdsm for current testing of 3.6, can you explain why do you need a d/s vdsm at this point? also, bug is on rhev-h 6.6, while in 3.6.0 we dropped support for EL6 hypervisors, so i don't really follow, can you elaborate?
(In reply to Eyal Edri from comment #41) > allon/douglas - there won't be any official 3.6.0 vdsm builds in d/s until > official switch to d/s releases in august. > > QE is using upstream vdsm for current testing of 3.6, > can you explain why do you need a d/s vdsm at this point? > > also, bug is on rhev-h 6.6, while in 3.6.0 we dropped support for EL6 > hypervisors, so i don't really follow, can you elaborate? Hello Eyal, I was working in a different bug which involves rhev-h-hypervisor 7.1 (20150505.0.el7ev) and I faced this report. Basically, I was asking Allon if we have newer vdsm build which includes this fix to test there as well.
(In reply to Eyal Edri from comment #41) > also, bug is on rhev-h 6.6, while in 3.6.0 we dropped support for EL6 > hypervisors, so i don't really follow, can you elaborate? This issue was originally reported on RHEV-H 6.6, but there's nothing EL6-specific in the fix. The same logic there also applies to EL7.
douglas, we don't have official vdsm builds yet, you'll have to wait for august for it. if you can use upstream vdsm, just take latest nightly or wait for alpha release next week.