1205877 – StorageDomainAccessError: Domain is either partially accessible or entirely inacessible when creating an iSCSI storage domain with RHEV-H 6.6

Bug 1205877 - StorageDomainAccessError: Domain is either partially accessible or entirely inacessible when creating an iSCSI storage domain with RHEV-H 6.6

Summary: StorageDomainAccessError: Domain is either partially accessible or entirely i...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	ovirt-3.6.3
Target Release:	3.6.0
Assignee:	Nir Soffer
QA Contact:	Aharon Canan
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:	1209028
TreeView+	depends on / blocked

Reported:	2015-03-25 20:35 UTC by Douglas Schilling Landgraf
Modified:	2016-03-10 06:23 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1209028 (view as bug list)
Environment:
Last Closed:	2015-05-06 16:26:30 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
vdsm logs (100.62 KB, application/x-gzip) 2015-03-25 20:42 UTC, Douglas Schilling Landgraf	no flags	Details
multipath.conf (1.21 KB, text/plain) 2015-03-26 01:03 UTC, Douglas Schilling Landgraf	no flags	Details
output commands (21.79 KB, application/x-gzip) 2015-03-26 01:39 UTC, Douglas Schilling Landgraf	no flags	Details
logs marc 26 2015 (74.78 KB, application/x-gzip) 2015-03-26 21:23 UTC, Douglas Schilling Landgraf	no flags	Details
engine.log 26 marc 2015 (238.02 KB, application/x-gzip) 2015-03-26 21:25 UTC, Douglas Schilling Landgraf	no flags	Details
testlogspatchnr1 (72.48 KB, application/x-gzip) 2015-03-27 00:44 UTC, Douglas Schilling Landgraf	no flags	Details
vdsm-rhevh20150327.0.el6ev.tar.gz (109.04 KB, application/x-gzip) 2015-03-27 21:22 UTC, Douglas Schilling Landgraf	no flags	Details
rhevh20150319.42 worked iscsi data. (124.25 KB, application/x-gzip) 2015-03-27 22:43 UTC, Douglas Schilling Landgraf	no flags	Details
logs rhevh20150327.0 patched vdsm worked (166.71 KB, application/x-gzip) 2015-03-27 23:54 UTC, Douglas Schilling Landgraf	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	39256	None	ABANDONED	iscsi: Fix minimal timeout when performing scsi scan	Never
oVirt gerrit	39287	None	None	None	Never
oVirt gerrit	39288	None	None	None	Never
oVirt gerrit	39289	master	MERGED	storageServer: Wait for udev events after connect	Never
oVirt gerrit	39594	None	None	None	Never
oVirt gerrit	39607	ovirt-3.5	MERGED	storageServer: Wait for udev events after connect	Never

Description Douglas Schilling Landgraf 2015-03-25 20:35:24 UTC

Description of problem:

RHEV-M: 3.5.1-0.2.el6ev

# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor 6.6 (20150325.0.el6ev)

# getenforce 
Enforcing

# audit2allow -a
#

# rpm -qa | grep -i vdsm
vdsm-jsonrpc-4.16.8.1-6.el6ev.noarch
vdsm-reg-4.16.8.1-6.el6ev.noarch
vdsm-xmlrpc-4.16.8.1-6.el6ev.noarch
vdsm-yajsonrpc-4.16.8.1-6.el6ev.noarch
vdsm-python-4.16.8.1-6.el6ev.noarch
vdsm-cli-4.16.8.1-6.el6ev.noarch
vdsm-hook-vhostmd-4.16.8.1-6.el6ev.noarch
vdsm-4.16.8.1-6.el6ev.x86_64
ovirt-node-plugin-vdsm-0.2.0-18.el6ev.noarch
vdsm-hook-ethtool-options-4.16.8.1-6.el6ev.noarch
vdsm-python-zombiereaper-4.16.8.1-6.el6ev.noarch

# rpm -qa | grep -i libvirt
libvirt-client-0.10.2-46.el6_6.2.x86_64
libvirt-python-0.10.2-46.el6_6.2.x86_64
libvirt-cim-0.6.1-12.el6.x86_64
libvirt-0.10.2-46.el6_6.2.x86_64
libvirt-lock-sanlock-0.10.2-46.el6_6.2.x86_64


vdsm.log
===============
<snip>
Thread-253::DEBUG::2015-03-25 20:05:31,440::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 32212189184: Input/output error\n  /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 32212246528: Input/output error\n  /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/35000144f13886704 was disabled\n  Volume group "57e7e1dd-fc33-4f2c-88ee-4a73aba77597" not found\n  Skipping volume group 57e7e1dd-fc33-4f2c-88ee-4a73aba77597\n'; <rc> = 5
Thread-253::DEBUG::2015-03-25 20:05:31,443::lvm::495::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-253::DEBUG::2015-03-25 20:05:31,444::lvm::497::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-253::DEBUG::2015-03-25 20:05:31,444::lvm::505::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-253::DEBUG::2015-03-25 20:05:31,444::lvm::517::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-253::ERROR::2015-03-25 20:05:31,445::sp::561::Storage.StoragePool::(create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 556, in create
  File "/usr/share/vdsm/storage/blockSD.py", line 840, in validate
  File "/usr/share/vdsm/storage/lvm.py", line 976, in chkVG
StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('57e7e1dd-fc33-4f2c-88ee-4a73aba77597: [\'  /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 32212189184: Input/output error\', \'  /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 32212246528: Input/output error\', \'  /dev/mapper/35000144f13886704: read failed after 0 of 4096 at 0: Input/output error\', \'  WARNING: Error counts reached a limit of 3. Device /dev/mapper/35000144f13886704 was disabled\', \'  Volume group "57e7e1dd-fc33-4f2c-88ee-4a73aba77597" not found\', \'  Skipping volume group 57e7e1dd-fc33-4f2c-88ee-4a73aba77597\']',)
Thread-253::ERROR::2015-03-25 20:05:31,445::task::866::Storage.TaskManager.Task::(_setError) Task=`3e0f52ba-6e99-4995-805d-ac890f16a853`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
  File "/usr/share/vdsm/storage/hsm.py", line 978, in createStoragePool
  File "/usr/share/vdsm/storage/sp.py", line 562, in create
StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('57e7e1dd-fc33-4f2c-88ee-4a73aba77597',)
</snip>

How reproducible:

Steps to Reproduce:
1. Autoinstall RHEV-H 20150325.0.el6ev and approve the node
2. Try to add ISCSI data storage, it will fail
3.

Actual results:
Cannot add iscsi storage

Expected results:
Be able to add iscsi storage

Comment 2 Douglas Schilling Landgraf 2015-03-25 20:42:43 UTC

Created attachment 1006468 [details]
vdsm logs

Comment 3 Nir Soffer 2015-03-25 21:35:42 UTC

Looking at vdsm log:

The login to the iscsi server looks ok:

Thread-103::INFO::2015-03-25 20:00:50,486::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='00000002-0002-0002-0002-0000000001ed', c
onList=[{'connection': '192.168.1.103', 'iqn': 'iqn.1992-04.com.emc:storage.storage.rbdata', 'user': '', 'tpgt': '1', 'password': '******', 'id': 'e221b3e0-f6c3-4f21-a369-87a1c
34de5a6', 'port': '3260'}], options=None)
Thread-103::DEBUG::2015-03-25 20:00:50,487::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.1992-04.com.emc:storage.storage.rbdata -
I default -p 192.168.1.103:3260,1 --op=new (cwd None)
Thread-103::DEBUG::2015-03-25 20:00:50,498::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-103::DEBUG::2015-03-25 20:00:50,498::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.1992-04.com.emc:storage.storage.rbdata -
I default -p 192.168.1.103:3260,1 -l (cwd None)
Thread-103::DEBUG::2015-03-25 20:00:50,747::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-103::DEBUG::2015-03-25 20:00:50,748::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.1992-04.com.emc:storage.storage.rbdata -
I default -p 192.168.1.103:3260,1 -n node.startup -v manual --op=update (cwd None)
Thread-103::DEBUG::2015-03-25 20:00:50,762::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0

But for some reason lvm cannot read from /dev/mapper/35000144f82378826

Thread-103::DEBUG::2015-03-25 20:00:50,763::lvm::365::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-103::DEBUG::2015-03-25 20:00:50,765::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_
suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/35000144f82378826|'\'', '\''r|.*|'\'' ] }  globa
l {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '
|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name (cwd None)
Thread-103::DEBUG::2015-03-25 20:00:50,884::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  /dev/mapper/35000144f82378826: read failed after 0 of 4096 at 32212189184: 
Input/output error\n  /dev/mapper/35000144f82378826: read failed after 0 of 4096 at 32212246528: Input/output error\n  /dev/mapper/35000144f82378826: read failed after 0 of 409
6 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/35000144f82378826 was disabled\n  No volume groups found\n'; <rc> = 0

When this pv is not accessible, it is expected that your new vg will not be found.

We need to understand why /dev/mapper/35000144f82378826 is not accessible
after loggin to the iscsi server.

Lets try to login manually, and check what we see on the storage.

1. Put the host to maintenance
2. Attach the output of these commands:

mkdir before
lsblk > before/lsblk
multipath -ll > before/multiath-ll
pvdisplay > before/pvdisplay
vgdisplay > before/vgdisplay

3. Login to the iscsi target
   iscsiadm -m node -T iqn.1992-04.com.emc:storage.storage.rbdata -I default -p 192.168.1.103:3260,1 -l

4. Attach the output of these commands:

mkdir after
lsblk > after/lsblk
multipath -ll > after/multiath-ll
pvdisplay > after/pvdisplay
vgdisplay > after/vgdisplay

5. Logout from the iscsi server
   iscsiadm -m session -u

Comment 4 Nir Soffer 2015-03-25 21:52:59 UTC

Looking in messages - it seems that the device is ready for use at 20:00:51
but we try to access it at 20:00:50, which explain why we cannot read from it.

Mar 25 20:00:50 localhost kernel: sd 2:0:0:0: [sdb] Synchronizing SCSI cache
Mar 25 20:00:50 localhost multipathd: sdb: remove path (uevent)
Mar 25 20:00:50 localhost multipathd: 35000144f82378826 Last path deleted, disabling queueing
Mar 25 20:00:50 localhost multipathd: 35000144f82378826: map in use
Mar 25 20:00:50 localhost multipathd: 35000144f82378826: can't flush
Mar 25 20:00:50 localhost multipathd: flush_on_last_del in progress
Mar 25 20:00:50 localhost multipathd: 35000144f82378826: load table [0 62914560 multipath 0 0 0 0]
Mar 25 20:00:50 localhost multipathd: sdb [8:16]: path removed from map 35000144f82378826
Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 24
Mar 25 20:00:50 localhost kernel: Buffer I/O error on device dm-5, logical block 3
Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 24
Mar 25 20:00:50 localhost kernel: Buffer I/O error on device dm-5, logical block 3
Mar 25 20:00:50 localhost iscsid: Connection1:0 to [target: iqn.1992-04.com.emc:storage.storage.rbdata, portal: 192.168.1.103,3260] through [iface: default] is shutdown.
Mar 25 20:00:50 localhost kernel: device-mapper: table: 253:12: multipath: error getting device
Mar 25 20:00:50 localhost kernel: device-mapper: ioctl: error adding target to table
Mar 25 20:00:50 localhost udevd-work[16801]: setfilecon /dev/dm-12 failed: No such file or directory
Mar 25 20:00:50 localhost multipathd: dm-12: remove map (uevent)
Mar 25 20:00:50 localhost multipathd: dm-12: devmap not registered, can't remove
Mar 25 20:00:50 localhost multipathd: dm-12: remove map (uevent)
Mar 25 20:00:50 localhost multipathd: dm-12: devmap not registered, can't remove
Mar 25 20:00:50 localhost kernel: scsi3 : iSCSI Initiator over TCP/IP
Mar 25 20:00:50 localhost kernel: scsi 3:0:0:0: Direct-Access     EMC      LIFELINE-DISK    1    PQ: 0 ANSI: 4
Mar 25 20:00:50 localhost kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0
Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 62914432
Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 62914544
Mar 25 20:00:50 localhost kernel: end_request: I/O error, dev dm-5, sector 0
Mar 25 20:00:50 localhost kernel: sd 3:0:0:0: [sdb] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
Mar 25 20:00:51 localhost kernel: end_request: I/O error, dev dm-5, sector 62914432
Mar 25 20:00:51 localhost kernel: end_request: I/O error, dev dm-5, sector 62914544
Mar 25 20:00:51 localhost kernel: end_request: I/O error, dev dm-5, sector 0
Mar 25 20:00:51 localhost kernel: sd 3:0:0:0: [sdb] Write Protect is off
Mar 25 20:00:51 localhost kernel: sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 25 20:00:51 localhost kernel: sdb: unknown partition table
Mar 25 20:00:51 localhost kernel: sd 3:0:0:0: [sdb] Attached SCSI disk
Mar 25 20:00:51 localhost multipathd: sdb: add path (uevent)
Mar 25 20:00:51 localhost multipathd: 35000144f82378826: load table [0 62914560 multipath 0 0 1 1 round-robin 0 1 1 8:16 1]
Mar 25 20:00:51 localhost multipathd: sdb [8:16]: path added to devmap 35000144f82378826
Mar 25 20:00:51 localhost iscsid: Could not set session2 priority. READ/WRITE throughout and latency could be affected.
Mar 25 20:00:51 localhost iscsid: Connection2:0 to [target: iqn.1992-04.com.emc:storage.storage.rbdata, portal: 192.168.1.103,3260] through [iface: default] is operational now

Ben, this is another issue with rhev-h 6.6 - can this be related to
bug 1191401?

How do you suggest to debug this issue?

Comment 5 Nir Soffer 2015-03-25 21:58:37 UTC

Please attach also /etc/multipath.conf

Comment 7 Douglas Schilling Landgraf 2015-03-26 01:03:52 UTC

Created attachment 1006616 [details]
multipath.conf

Comment 8 Douglas Schilling Landgraf 2015-03-26 01:39:29 UTC

Created attachment 1006627 [details]
output commands

Comment 9 Douglas Schilling Landgraf 2015-03-26 01:45:11 UTC

Hi Nir,

I have re-created the environment and now the iscsi storage is called dakar and I got the same issue. 

My steps:
===========
- Reproduced the issue as the original report
- Put the host into maint. 
- Executed the commands.
- Loggout
  # iscsiadm -m session -u
  Logging out of session [sid: 2, target: 
  iqn.1992-04.com.emc:storage.storage.dakar, portal: 192.168.1.103,3260]
  Logout of [sid: 2, target: iqn.1992-04.com.emc:storage.storage.dakar, portal: 
  192.168.1.103,3260] successful.

Attached the commands and new messages file.

# rpm -qa | grep -i multipath
device-mapper-multipath-0.4.9-80.el6_6.2.x86_64
device-mapper-multipath-libs-0.4.9-80.el6_6.2.x86_64

Comment 10 Nir Soffer 2015-03-26 11:31:01 UTC

(In reply to Douglas Schilling Landgraf from comment #9)
Please attach engine, vdsm, and messages logs showing the pv used in
the commands output (attachment 1006627 [details]).

Can you describe how to reproduce this?

1. What rhev-h version needed before the upgrade - please attach
   rpm -qa output before and after the upgrade.
2. What is the system state before the upgrade - did we have any
   setup before or we created the domain after the upgrade
3. Does it reproduce after upgrade to previous version?
   e.g. if this happens when upgrading version x to x+1,
   does it reproduce when upgrading version x-1 to x?
4. Can I have live session on the failing system - before you upgrade it?

Comment 11 Douglas Schilling Landgraf 2015-03-26 14:25:43 UTC

Hi Nir,

(In reply to Nir Soffer from comment #10)
> (In reply to Douglas Schilling Landgraf from comment #9)
> Please attach engine, vdsm, and messages logs showing the pv used in
> the commands output (attachment 1006627 [details]).
> 
> Can you describe how to reproduce this?
> 
I have described in the "description" of bug. You can use autoinstall or even normal installation of rhev-h, approve the node and try to add an iscsi storage data.

> 1. What rhev-h version needed before the upgrade - please attach
>    rpm -qa output before and after the upgrade.
> 2. What is the system state before the upgrade - did we have any
>    setup before or we created the domain after the upgrade
> 3. Does it reproduce after upgrade to previous version?
>    e.g. if this happens when upgrading version x to x+1,
>    does it reproduce when upgrading version x-1 to x?
> 4. Can I have live session on the failing system - before you upgrade it?

There is no upgrade involved in this bug, normal installation.

Comment 13 Nir Soffer 2015-03-26 14:57:57 UTC

(In reply to Douglas Schilling Landgraf from comment #11)
> There is no upgrade involved in this bug, normal installation.

Ok - can you check if issue is reproducible with the previous iso,
and the same storage server? I want to eliminate problems with the
environment you use.

Comment 14 Douglas Schilling Landgraf 2015-03-26 15:02:44 UTC

Hello Nir,

(In reply to Nir Soffer from comment #13)
> (In reply to Douglas Schilling Landgraf from comment #11)
> > There is no upgrade involved in this bug, normal installation.
> 
> Ok - can you check if issue is reproducible with the previous iso,
> and the same storage server? I want to eliminate problems with the
> environment you use.

I have used the storage server with others RHEV-H isos yesterday, no issues.
Hello Aharon, could you please try a reproducer too? 

Thanks!

Comment 15 Nir Soffer 2015-03-26 20:10:41 UTC

(In reply to Douglas Schilling Landgraf from comment #14)
Douglas, can you attach vdsm.conf on this machine?

Comment 16 Douglas Schilling Landgraf 2015-03-26 21:23:36 UTC

Created attachment 1006985 [details]
logs marc 26 2015

Comment 17 Douglas Schilling Landgraf 2015-03-26 21:25:24 UTC

Created attachment 1006986 [details]
engine.log 26 marc 2015

Comment 18 Nir Soffer 2015-03-26 23:43:03 UTC

Douglas, can you test the attached patch? I think this should solve this issue.

Comment 19 Douglas Schilling Landgraf 2015-03-27 00:44:34 UTC

Created attachment 1007039 [details]
testlogspatchnr1

Comment 20 Douglas Schilling Landgraf 2015-03-27 00:46:34 UTC

Hi Nir,

Still no go with the patch you suggested, attached the logs.

Comment 21 Nir Soffer 2015-03-27 11:11:08 UTC

(In reply to Douglas Schilling Landgraf from comment #20)
> Still no go with the patch you suggested, attached the logs.

Douglas, can you attach the output of lsblk from the last setup?

We need this to understand multipath messages in /var/log/messages, because
vdsm does not mention the device names in its logs (dm-*).

If you run another test, please attach lsblk output before and after the test.

Also, what is the latest rhev-h iso that does works? I want to perform
the same flow with the broken and last good iso, and compare the logs.
It will be very helpful if you perform the same flow with the last good
iso and attach the same logs here.

Thanks!

Comment 22 Nir Soffer 2015-03-27 17:06:41 UTC

Douglas, I think the new patches will make a difference. Can you test them?

Also, can you specify which engine version are you using? Testing the patches
with engine from master, I see different flow compared with the flow seen
in the logs here.

Comment 23 Douglas Schilling Landgraf 2015-03-27 21:05:42 UTC

Hi Nir,

I will test soon the new patches. For now I would like to share that I can see this report also in:

# rpm -qa | grep -i vdsm
vdsm-reg-4.16.12.1-3.el6ev.noarch
ovirt-node-plugin-vdsm-0.2.0-20.el6ev.noarch
vdsm-python-zombiereaper-4.16.12.1-3.el6ev.noarch
vdsm-jsonrpc-4.16.12.1-3.el6ev.noarch
vdsm-python-4.16.12.1-3.el6ev.noarch
vdsm-cli-4.16.12.1-3.el6ev.noarch
vdsm-hook-vhostmd-4.16.12.1-3.el6ev.noarch
vdsm-4.16.12.1-3.el6ev.x86_64
vdsm-yajsonrpc-4.16.12.1-3.el6ev.noarch
vdsm-hook-ethtool-options-4.16.12.1-3.el6ev.noarch
vdsm-xmlrpc-4.16.12.1-3.el6ev.noarch


# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor 6.6 (20150327.0.el6ev)

Comment 24 Douglas Schilling Landgraf 2015-03-27 21:22:50 UTC

Created attachment 1007481 [details]
vdsm-rhevh20150327.0.el6ev.tar.gz

Comment 25 Douglas Schilling Landgraf 2015-03-27 21:23:54 UTC

Hi Nir,

FYI, I have attached the logs from the last iso (rhevh20150327) with the fail, still not using the last patches you provided.

Comment 26 Douglas Schilling Landgraf 2015-03-27 22:42:45 UTC

Hi Nir,

Previous working version data, attached logs.

# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor 6.6 (20150319.42.el6ev)

# rpm -qa | grep -i vdsm
vdsm-reg-4.16.12.1-3.el6ev.noarch
ovirt-node-plugin-vdsm-0.2.0-20.el6ev.noarch
vdsm-python-zombiereaper-4.16.12.1-3.el6ev.noarch
vdsm-jsonrpc-4.16.12.1-3.el6ev.noarch
vdsm-python-4.16.12.1-3.el6ev.noarch
vdsm-cli-4.16.12.1-3.el6ev.noarch
vdsm-hook-vhostmd-4.16.12.1-3.el6ev.noarch
vdsm-4.16.12.1-3.el6ev.x86_64
vdsm-yajsonrpc-4.16.12.1-3.el6ev.noarch
vdsm-hook-ethtool-options-4.16.12.1-3.el6ev.noarch
vdsm-xmlrpc-4.16.12.1-3.el6ev.noarch
[root@localhost log]# rpm -qa | grep -i mapper
device-mapper-event-libs-1.02.90-2.el6_6.1.x86_64
device-mapper-multipath-libs-0.4.9-80.el6_6.3.x86_64
device-mapper-persistent-data-0.3.2-1.el6.x86_64
device-mapper-1.02.90-2.el6_6.1.x86_64
device-mapper-event-1.02.90-2.el6_6.1.x86_64
device-mapper-multipath-0.4.9-80.el6_6.3.x86_64
device-mapper-libs-1.02.90-2.el6_6.1.x86_64

Comment 27 Douglas Schilling Landgraf 2015-03-27 22:43:45 UTC

Created attachment 1007534 [details]
rhevh20150319.42 worked iscsi data.

Comment 28 Nir Soffer 2015-03-27 23:20:15 UTC

Comparing "good" iso (20150319) with "bad" iso (20150327), we can see that vdsm
flow is identical, but the underlying platform behave differently.

Good iso:


Mar 27 22:37:52 localhost kernel: scsi3 : iSCSI Initiator over TCP/IP
Mar 27 22:37:52 localhost kernel: scsi 3:0:0:0: Direct-Access     EMC      LIFELINE-DISK    1    PQ: 0 ANSI: 4
Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0
Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: [sdb] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: [sdb] Write Protect is off
Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 27 22:37:52 localhost kernel: sdb: unknown partition table
Mar 27 22:37:52 localhost kernel: sd 3:0:0:0: [sdb] Attached SCSI disk
Mar 27 22:37:52 localhost multipathd: sdb: add path (uevent)
Mar 27 22:37:53 localhost multipathd: 35000144f02325565: load table [0 62914560 multipath 0 0 1 1 round-robin 0 1 1 8:16 1]
Mar 27 22:37:53 localhost multipathd: sdb [8:16]: path added to devmap 35000144f02325565

    Here the device (sdb) backing the multipath device (35000144f02325565) was added
    so we lvm can read from the multipath device.

Mar 27 22:37:53 localhost kernel: device-mapper: table: 253:12: multipath: error getting device
Mar 27 22:37:53 localhost kernel: device-mapper: ioctl: error adding target to table
Mar 27 22:37:53 localhost iscsid: Connection1:0 to [target: iqn.1992-04.com.emc:storage.storage.data42, portal: 192.168.1.103,3260] through [iface: default] is shutdown.

    The old connection is shutdown

Mar 27 22:37:53 localhost iscsid: Connection2:0 to [target: iqn.1992-04.com.emc:storage.storage.data42, portal: 192.168.1.103,3260] through [iface: default] is operational now

    And the new connection is finished


bad iso:


Mar 27 21:01:10 localhost kernel: scsi3 : iSCSI Initiator over TCP/IP
Mar 27 21:01:10 localhost kernel: scsi 3:0:0:0: Direct-Access     EMC      LIFELINE-DISK    1    PQ: 0 ANSI: 4
Mar 27 21:01:10 localhost kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0
Mar 27 21:01:11 localhost iscsid: Connection1:0 to [target: iqn.1992-04.com.emc:storage.storage.finddata, portal: 192.168.1.103,3260] through [iface: default] is shutdown.

    The old connection is shutdown

Mar 27 21:01:11 localhost iscsid: Connection2:0 to [target: iqn.1992-04.com.emc:storage.storage.finddata, portal: 192.168.1.103,3260] through [iface: default] is operational now

    And the new connection is done - but the multipath device is not ready!

Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 62914432
Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 62914544
Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 0
Mar 27 21:01:11 localhost kernel: sd 3:0:0:0: [sdb] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
Mar 27 21:01:11 localhost kernel: sd 3:0:0:0: [sdb] Write Protect is off
Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 62914432
Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 62914544
Mar 27 21:01:11 localhost kernel: end_request: I/O error, dev dm-5, sector 0

    Somewhere here, lvm vgck try to read from the multipath device and fails.

Mar 27 21:01:11 localhost kernel: sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 27 21:01:11 localhost kernel: sdb: unknown partition table
Mar 27 21:01:11 localhost kernel: sd 3:0:0:0: [sdb] Attached SCSI disk
Mar 27 21:01:11 localhost multipathd: sdb: add path (uevent)
Mar 27 21:01:11 localhost multipathd: 35000144f11907099: load table [0 62914560 multipath 0 0 1 1 round-robin 0 1 1 8:16 1]
Mar 27 21:01:11 localhost multipathd: sdb [8:16]: path added to devmap 35000144f11907099

    The device is ready only now!

This may be caused by multipath, iscsid, udevd or kernel - lets start with 
multipath - Ben - what do you think?

Comment 29 Douglas Schilling Landgraf 2015-03-27 23:51:39 UTC

Hi Nir,

As we talked, I have backported your below patches to downstream vdsm-4.16.12.1-3.el6ev.x86_64 and added it to the last build of rhev-h 6.6 (20150327.0.el6ev) available and it worked. The host is up and storage data is up. My suggestion is build a new downstream vdsm with these fixes so we can build new rhev-h.

multipath: Wait for udev events after rescan
https://gerrit.ovirt.org/#/c/39288/

storageServer: Wait for udev events after connect
https://gerrit.ovirt.org/#/c/39289/

udevadm: Add udevadm wrapper
https://gerrit.ovirt.org/#/c/39287/

Comment 30 Douglas Schilling Landgraf 2015-03-27 23:54:10 UTC

Created attachment 1007546 [details]
logs rhevh20150327.0 patched vdsm worked

Comment 35 Yaniv Lavi 2015-05-06 16:26:30 UTC

Closing this since it only affects el6 and we will only support el7 in 3.6.

Comment 41 Eyal Edri 2015-05-12 14:30:17 UTC

allon/douglas - there won't be any official 3.6.0 vdsm builds in d/s until official switch to d/s releases in august.

QE is using upstream vdsm for current testing of 3.6,
can you explain why do you need a d/s vdsm at this point?

also, bug is on rhev-h 6.6, while in 3.6.0 we dropped support for EL6 hypervisors, so i don't really follow, can you elaborate?

Comment 42 Douglas Schilling Landgraf 2015-05-13 02:31:27 UTC

(In reply to Eyal Edri from comment #41)
> allon/douglas - there won't be any official 3.6.0 vdsm builds in d/s until
> official switch to d/s releases in august.
> 
> QE is using upstream vdsm for current testing of 3.6,
> can you explain why do you need a d/s vdsm at this point?
> 
> also, bug is on rhev-h 6.6, while in 3.6.0 we dropped support for EL6
> hypervisors, so i don't really follow, can you elaborate?

Hello Eyal, I was working in a different bug which involves rhev-h-hypervisor 7.1 (20150505.0.el7ev) and I faced this report. Basically, I was asking Allon if we have newer vdsm build which includes this fix to test there as well.

Comment 43 Allon Mureinik 2015-05-13 09:08:38 UTC

(In reply to Eyal Edri from comment #41)
> also, bug is on rhev-h 6.6, while in 3.6.0 we dropped support for EL6
> hypervisors, so i don't really follow, can you elaborate?
This issue was originally reported on RHEV-H 6.6, but there's nothing EL6-specific in the fix. The same logic there also applies to EL7.

Comment 44 Eyal Edri 2015-05-13 10:23:18 UTC

douglas, we don't have official vdsm builds yet, you'll have to wait for august for it.
if you can use upstream vdsm, just take latest nightly or wait for alpha release next week.

Note You need to log in before you can comment on or make changes to this bug.

acanan
amureini
bazulay
bmarzins
cshao
danken
dougsland
ecohen
eedri
fdeutsch
gklein
hadong
huiwa
leiwang
lpeer
lsurette
nsoffer
rbarry
yaniwang
ycui
yeylon
ylavi