834893 – vdsm: vms with shared disk will pause due to I/O errors on double use of PCI Address

Bug 834893 - vdsm: vms with shared disk will pause due to I/O errors on double use of PCI Address

Summary: vdsm: vms with shared disk will pause due to I/O errors on double use of PCI ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.1.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	3.1.0
Assignee:	Eli Mesika
QA Contact:	Dafna Ron
Docs Contact:
URL:
Whiteboard:	storage
Depends On:	840386
Blocks:
TreeView+	depends on / blocked

Reported:	2012-06-24 16:11 UTC by Dafna Ron
Modified:	2016-02-10 16:48 UTC (History)
CC List:	12 users (show)
Fixed In Version:	SI13
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	840386 (view as bug list)
Environment:
Last Closed:
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (512.70 KB, application/x-gzip) 2012-06-24 16:11 UTC, Dafna Ron	no flags	Details
logs (5.21 MB, application/x-gzip) 2012-06-26 15:30 UTC, Dafna Ron	no flags	Details
logs (13.81 MB, application/x-gzip) 2012-07-29 14:58 UTC, Dafna Ron	no flags	Details
View All

Description Dafna Ron 2012-06-24 16:11:01 UTC

Description of problem:

running vm's with shared disk on the same host will cause vm's to pause due to I/O errors with XML error: Attempted double use of PCI Address

Version-Release number of selected component (if applicable):

vdsm-4.9.6-16.0.el6.x86_64
si6

How reproducible:

100%

Steps to Reproduce:
1. create a shared disk and attach it to several vm's
2. run all vms on the same host
3.
  
Actual results:

vm's will pause due to I/O errors with the following error: 

XML error: Attempted double use of PCI Address

Expected results:

we should be able to run the vms on the same host

Additional info: full backend and vdsm logs

hread-411::ERROR::2012-06-24 18:49:07,264::vm::604::vm.Vm::(_startUnderlyingVm) vmId=`0ffa8e45-f64d-45f4-9df1-6d165c48f8d8`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 570, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1364, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2490, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: XML error: Attempted double use of PCI Address '0:0:4.0'

Comment 1 Dafna Ron 2012-06-24 16:11:57 UTC

Created attachment 594021 [details]
logs

Comment 2 Dan Kenigsberg 2012-06-25 11:10:24 UTC

{'device': 'ich6', 'specParams': {}, 'type': 'sound'}

is specified twice in the devices list.

Comment 4 Itamar Heim 2012-06-26 08:25:13 UTC

ich6 is a sound device. not sure how related to shared disk

Comment 5 Eli Mesika 2012-06-26 09:50:46 UTC

Have tested with 5 VMs using the same shared disk running on a single host.
Problem is not reproducable.
Looked at the code , can not be connected to shared disk.

Please let me know hot to proceed.

Comment 6 Itamar Heim 2012-06-26 09:59:51 UTC

Dafna - per comment 5 - please try to reproduce and provide steps.
thanks

Comment 7 Dafna Ron 2012-06-26 15:26:21 UTC

reproduces on si7 with vdsm-4.9.6-17.0.el6.x86_64

[root@orange-vdsd ~]# vdsClient -s 0 list table
345b7456-c365-4334-87f7-0a5eb6e54ad3  27414  NEW3                 Paused                                   
d9ed9c4a-0892-4992-9005-4a538cdba77b  27321  NEW                  Paused                                   
c790c4ae-edfd-46c0-b79e-7008d748b44f  27464  NEW2                 Up          


event log:

VM NEW2 started on Host orange-vdsd

logs will be attached again -> I restarted vdsm before test so look for I am in the logs. 

reproduce: 

1. create several vms with no disks
2. create a shared disk
3. attach the shared disks to all vm's (as single not bootable disk)
4. run all vms on the same host

Comment 8 Dafna Ron 2012-06-26 15:30:28 UTC

Created attachment 594521 [details]
logs

Comment 9 Eli Mesika 2012-06-27 07:23:47 UTC

seems like a vdsm issue 

Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 570, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1290, in _run
    self.preparePaths(devices[vm.DISK_DEVICES])
  File "/usr/share/vdsm/vm.py", line 616, in preparePaths
    drive['path'] = self.cif.prepareVolumePath(drive, self.id)
  File "/usr/share/vdsm/clientIF.py", line 190, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'index': '0', 'iface': 'virtio', 'format': 'raw', 'type': 'disk', 'specParams': {}, 'readonly': 'false', 'deviceId': '070fe1ec-18c1-4941-85b8-c857735f0bb4', 'propagateErrors': 'off', 'address': {'bus': '0x00', ' slot': '0x06', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'}, 'device': 'disk', 'shared': 'false', 'GUID': '1Dafna-Direct41340269', 'optional': 'false'}


seems like /dev/mapper/1Dafna-Direct41340269 volume is not valid or not accessible 

danken , please recheck ....

Comment 10 Eli Mesika 2012-06-27 09:31:49 UTC

bug was not reproduced on latest even when using same vdsm,libvirt,qemu RPMs as Dafna

vdsm-python-4.9.6-17.0.el6.noarch
vdsm-4.9.6-17.0.el6.x86_64
vdsm-cli-4.9.6-17.0.el6.noarch
libvirt-0.9.10-21.el6.x86_64
qemu-img-rhev-0.12.1.2-2.295.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.295.el6.x86_64

Comment 11 Eli Mesika 2012-06-27 19:38:05 UTC

testing again with a git branch on git hash 1e1966cfd65cc2008fd2317ef127e3c09fc40d16
(this is the git hash reported in si7)

didn't succeeded to reproduce the bug

So, I have now an identical environment as Dafna : core, kernel, vdsm, libvirt and qemu and still bug is not reproducable.

Will need additional information to proceed.
Setting NEEDINFO on Dafna again.

Comment 12 Itamar Heim 2012-06-28 01:59:12 UTC

eli - you are mentioinng seeing a bad volume.
dafna/danken are discussing a duplicate ich (sound iirc) device.
assuming dafna reproduces the duplicate ich error, please take a look in her db at the device table to see if ich defined more than once.
if the error is the bad volume specification, i agree need to look in vdsm, but need the environemnt reproducing this.

dafna - for the repro steps in comment 7, did this reprodcue for you consistently each time you tried to start the VM?

Comment 13 Dafna Ron 2012-06-28 08:30:24 UTC

> 
> dafna - for the repro steps in comment 7, did this reprodcue for you
> consistently each time you tried to start the VM?

yes

Comment 14 Eli Mesika 2012-06-28 09:56:55 UTC

Checking again with ISCSI domain as Dafna uses (my previous checks were in NFS domain)

Same result , not reproduced on si7

Dafna is going to check it on si8 as next step

The sound card suplication seems totally not reklated

Comment 15 Yair Zaslavsky 2012-07-01 07:41:34 UTC

Dafna  , following comment #14 Can you please reproduce on si8?

Comment 16 Eli Mesika 2012-07-01 12:11:51 UTC

(In reply to comment #15)
> Dafna  , following comment #14 Can you please reproduce on si8?

Had checked on si8 (Kiril's env) 
Unable to reproduce the bug, reported scenario works perfectly.

Comment 18 Eli Mesika 2012-07-08 13:35:20 UTC

updated scenario:

1.create 3-4 vm's with nic but no disk
2.go to disk tab
3.create a new shared disk
4.go back to vm tab
5.attach the disk you created one vm at a time :)
6.run the vms on the hosts

Comment 20 Eli Mesika 2012-07-15 15:31:19 UTC

http://gerrit.ovirt.org/#/c/6282/
http://gerrit.ovirt.org/#/c/6283/

Comment 21 Eli Mesika 2012-07-16 08:40:54 UTC

correction :

patch is only :
http://gerrit.ovirt.org/#/c/6282/

Comment 22 Yair Zaslavsky 2012-07-19 14:25:03 UTC

http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=e548c72ee1d9275d363c025ae8d61f4990a8445b

Comment 23 Dafna Ron 2012-07-29 14:46:10 UTC

not verified. 
vms still paused due to I/O errors
attaching new logs

Comment 24 Eli Mesika 2012-07-29 14:54:37 UTC

(In reply to comment #23)
> not verified. 
> vms still paused due to I/O errors
> attaching new logs

As you see , this bug blocks 840386 which is a vdsm bug that is in a POST status, so it will not work until 840386 will be merged

Comment 25 Dafna Ron 2012-07-29 14:58:58 UTC

Created attachment 601042 [details]
logs

si12 - logs attached

Comment 26 Dafna Ron 2012-07-29 15:02:38 UTC

actually, this bug is marked as if its blocking 840386 and not the other way around :)

changing this bug to depend on 840386

Comment 27 Dafna Ron 2012-08-12 13:46:38 UTC

verified on si13.2 vdsm-4.9.6-27.0.el6_3.x86_64

Note You need to log in before you can comment on or make changes to this bug.