Bug 1007980 - Failed to run VM with gluster drives: internal error unexpected address type for ide disk
Summary: Failed to run VM with gluster drives: internal error unexpected address type ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: ---
Assignee: Martin Polednik
QA Contact: Haim
URL:
Whiteboard: virt
Depends On:
Blocks: 918494 1011800
TreeView+ depends on / blocked
 
Reported: 2013-09-13 16:32 UTC by Jason Brooks
Modified: 2013-11-07 08:28 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-07 08:28:57 UTC
oVirt Team: ---


Attachments (Terms of Use)
error passage from engine.log (8.52 KB, text/x-log)
2013-09-13 16:32 UTC, Jason Brooks
no flags Details
Engine and VDSM logs at point of failure (37.32 KB, text/plain)
2013-09-16 07:29 UTC, Chris Sullivan
no flags Details
engine log from create vm to second failed run (45.58 KB, application/gzip)
2013-09-18 15:10 UTC, Jason Brooks
no flags Details
engine.log in gzip format (1.81 MB, application/gzip)
2013-10-03 22:49 UTC, Gianluca Cecchi
no flags Details
vdsm for node f18ovn03 in gzip format (3.51 MB, application/gzip)
2013-10-03 22:51 UTC, Gianluca Cecchi
no flags Details
vdsm for node f18ovn01 in gzip format (8.59 MB, application/gzip)
2013-10-03 22:55 UTC, Gianluca Cecchi
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 19906 None None None Never
oVirt gerrit 19949 None None None Never

Description Jason Brooks 2013-09-13 16:32:50 UTC
Created attachment 797438 [details]
error passage from engine.log

Description of problem:

When running a VM for the second time, the VM fails to start, with the error: "Failed to run VM: internal error unexpected address type for ide disk"

Version-Release number of selected component (if applicable):

vdsm 4.12.1-2.fc19
ovirt-engine 3.3.0-3.fc19

How reproducible:


Steps to Reproduce:

1. Create a new VM w/ virtio disk
2. VM runs normally
3. Power down VM
4. Try to start VM

Actual results:

VM won't start, w/ error msg: internal error unexpected address type for ide disk

Expected results:

VM starts

Additional info:

* Changing disk to IDE, removing and re-adding, VM still won't start
* If created w/ IDE disk from the beginning, VM runs and restarts as
expected.

ML thread (a few others are experiencing this): http://lists.ovirt.org/pipermail/users/2013-September/016280.html

It's an AIO engine+host setup, with a second node on a separate machine. Both machines are running F19, both have all current F19 updates and all current ovirt-beta repo updates.

This is on a GlusterFS domain, hosted from a volume on the AIO machine.

Also, I have the neutron external network provider configured, but these
VMs aren't using one of these networks.

selinux permissive on both machines, firewall down on both as well 
(firewall rules for gluster don't appear to be set by the engine)


error passage from vdsm.log:

Thread-47970::ERROR::2013-09-13 12:23:44,558::vm::2062::vm.Vm::(_startUnderlyingVm) vmId=`48bbdaf3-ee25-4ac3-a7ec-ee9512246728`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 2022, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/vm.py", line 2906, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2805, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error unexpected address type for ide disk

Comment 1 Dan Kenigsberg 2013-09-15 08:14:12 UTC
Would you please attach the 'call vmCreate' line from vdsm.log? It is most likely to be an Engine bug, which should reset the disk address after its bus type is changed.

Comment 2 Chris Sullivan 2013-09-16 07:28:12 UTC
Hi,

I am getting the exact same issue with a non-AIO oVirt 3.3.0-3.fc19 setup. The only workaround I've found so far is to delete the offending VM, recreate, and reattach the disks. The recreated VM will work normally until it is shutdown, after which it will fail to start with the same error.

Engine and VDSM log excepts below. Versions:
- Fedora 19 (3.10.11-200)
- oVirt 3.3.0-3
- VDSM 4.12.1-2
- libvirt 1.1.2-1
- gluster 3.4.0.8

Engine and VDSM logs uploaded.

Thanks,

Chris

Comment 3 Chris Sullivan 2013-09-16 07:29:19 UTC
Created attachment 798132 [details]
Engine and VDSM logs at point of failure

Comment 4 Dan Kenigsberg 2013-09-16 09:23:30 UTC
Engine asks to create the VM with an IDE cdrom that has a PCI address.

{'index': '2', 'iface': 'ide', 'address': {'bus': '0x00', ' slot': '0x06', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'}, 'specParams': {'path': ''}, 'readonly': 'true', 'deviceId': 'ef25939b-a5ff-456e-978f-53e7600b83ce', 'path': '', 'device': 'cdrom', 'shared': 'false', 'type': 'disk'}

This may be an Engine bug, but it is more probably a Vdsm bug - could you reproduce the first successful startup of this VM? I suspect that somehow, Vdsm return a faulty PCI address for this cdrom.


=============
unrelated to main issue:
P.S., the log still has this awful bug with
Thread-143930::WARNING::2013-09-12 15:01:22,168::clientIF::337::vds::(teardownVolumePath) Drive is not a vdsm image: VOLWM_CHUNK_MB:1024 VOLWM_CHUNK_REPLICATE_MULT:2 VOLWM_FREE_PCT:50 _blockDev:False _checkIoTuneCategories:<bound method Drive._checkIoTuneCategories of <vm.Drive object at 0x7f0fb8a7d610>> _customize:<bound method ...
Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 331, in teardownVolumePath
    res = self.irs.teardownImage(drive['domainID'],
  File "/usr/share/vdsm/vm.py", line 1344, in __getitem__
    raise KeyError(key)
KeyError: 'domainID'

which I thought is history.

Comment 5 Michal Skrivanek 2013-09-16 10:28:48 UTC
for the domainID issue - sergey was looking at it in bug 980054

Comment 6 Jason Brooks 2013-09-16 17:25:18 UTC
I've been able to work around this issue just by attaching an iso image to one of my "broken" VMs -- I do one "Run Once" with a CD attached, the VM runs, and then, in subsequent runs, attaching the CD image is no longer necessary.

Seems that something (in the engine db, perhaps) is being set w/ that one with-CD-attached boot that resolves the issue...

Comment 7 Michal Skrivanek 2013-09-17 07:37:02 UTC
we need to figure out if it's on the engine side or vdsm. The second time vdsm gets cdrom it gets the address as if it is a regular disk

Comment 8 Michal Skrivanek 2013-09-18 10:49:42 UTC
still can't reproduce. Need engine&vdsm logs from creation, first and second start of VM...

Comment 9 Jason Brooks 2013-09-18 15:08:35 UTC
(In reply to Michal Skrivanek from comment #8)
> still can't reproduce. Need engine&vdsm logs from creation, first and second
> start of VM...

attaching logs from create to second run of a VM called "1007980"

Comment 10 Jason Brooks 2013-09-18 15:10:56 UTC
Created attachment 799438 [details]
engine log from create vm to second failed run

Comment 11 Martin Polednik 2013-09-18 17:58:12 UTC
It seems that the cdrom is assigned incorrect address by libvirt as indicated by line 2078 of vdsm log, this address is then reported back to engine which uses it in next vmCreate call - causing invalid XML to be generated.

Could you reproduce this bug with different version of libvirt?

Comment 12 Michal Skrivanek 2013-09-19 08:15:40 UTC
we can't reproduce it with your libvirt version yet, though. Another possibility might be some kind of race during the VM creation as I see your VM is 21s in WaitForLaunch...can you try with some smaller VM? Every VM behave like you described?

anyway, seems like the problem is indeed on vdsm side of things, not engine

Comment 13 Jason Brooks 2013-09-19 15:20:37 UTC
(In reply to Martin Polednik from comment #11)
> It seems that the cdrom is assigned incorrect address by libvirt as
> indicated by line 2078 of vdsm log, this address is then reported back to
> engine which uses it in next vmCreate call - causing invalid XML to be
> generated.
> 
> Could you reproduce this bug with different version of libvirt?

Now I'm having trouble reproducing the issue at all. I rebuilt that test setup yesterday, an AIO setup, and tried it with a downgraded libvirt -- no problem. Before commenting on this bug, I tried it again with the upgraded libvirt -- no problem again. I switched my local domain to be an NFS domain, still no problem.

I'm going to convert this into a GlusterFS domain today and give it another shot -- I wasn't the only one to hit this issue, and I'd like to know that it's really gone.

Comment 14 Michal Skrivanek 2013-09-23 14:45:36 UTC
indeed.
We haven't tried with Gluster as it should not be relevant, but having narrowed all other causes please give it a try. It would likely be some libvirt/gluster integration issue

Comment 15 Jason Brooks 2013-09-25 20:30:44 UTC
(In reply to Michal Skrivanek from comment #14)
> indeed.
> We haven't tried with Gluster as it should not be relevant, but having
> narrowed all other causes please give it a try. It would likely be some
> libvirt/gluster integration issue

I just reproduced it. I set up an AIO install following my instructions at http://community.redhat.com/up-and-running-with-ovirt-3-3/ and then converted that install to Gluster using this: http://community.redhat.com/ovirt-3-3-glusterized/.

I imported a VM I'd created in AIO mode to my new, glusterized, setup, and the VM refused to run, with the error:

VM foo is down. Exit message: XML error: Attempted double use of PCI Address '0000:00:06.0'.

I made another VM, it ran normally, then I powered it down, tried to restart it, and back to:

Failed to run VM: internal error unexpected address type for ide disk

Comment 16 Jason Brooks 2013-09-25 21:08:19 UTC
Also, I tried downgrading libvirt (following the steps I mentioned above) to libvirt-1.0.5.1-1.fc19

yum downgrade libvirt*
reboot

I tried running my previously-created VMs, same error "unexpected address type for ide disk." Then I created a new VM, started it up, powered it off, and again, same error.

Comment 17 Michal Skrivanek 2013-10-02 14:20:02 UTC
Bala, any idea?

Comment 18 Bala.FA 2013-10-03 06:12:50 UTC
I am not sure about this.  I haven't gone through or studied Gluster Storage Domain code of vdsm.

Deepak, could you help us on this?

Comment 19 Gianluca Cecchi 2013-10-03 22:27:49 UTC
Me too encountering this after today install.
I'm going to attach my vdsm.log files and engine.log
Environment is f19 engine + 2 x f19 hosts with gluster datacenter.
packages from f19 + ovirt stable as of today

Gianluca

Comment 20 Gianluca Cecchi 2013-10-03 22:47:13 UTC
Start of vm c6s (the only one vm present) with error at 00:21
Successfully tried Jason workaround with run once with cd attached at 00:28
Then shutdown -h now of VM at 00:31
Then power on normally at 00:31 and I don't get the error any more.

Comment 21 Gianluca Cecchi 2013-10-03 22:49:13 UTC
Created attachment 807313 [details]
engine.log in gzip format

Comment 22 Gianluca Cecchi 2013-10-03 22:51:26 UTC
Created attachment 807314 [details]
vdsm for node f18ovn03 in gzip format

Comment 23 Gianluca Cecchi 2013-10-03 22:55:21 UTC
Created attachment 807315 [details]
vdsm for node f18ovn01 in gzip format

Comment 24 Gianluca Cecchi 2013-10-03 22:57:21 UTC
So my engine-setup was run about mid-day 03/10 and I provided full logs for the engine and vdsm from the beginning.
See timestamps in comment #20 to match vm errors.

Comment 25 Deepak C Shetty 2013-10-04 05:26:26 UTC
I don't have a AIO setup myself.. but going thru the BZ notes.. I am not sure if this is related to the Gluster storage domain VDSM part of the code.. bcos in VDSM side we don't touch / modify the disk specific params. I am not too sure abt the Engine side of Gluster domain code as that was worked by Sharad (who isn't working for IBM anymore). 

In one of the comments above Dan indicated abt Engine sending some IDE stuff.. which looked fishy? I don't understand the Engine stuff fully, and from what I have understood the problem in reading this BZ.. i don't think its related to VDSM side of gluster domain code.

I will try to setup a AIO setup locally and see if i can get more insights.

thanx,
deepak

Comment 26 Bala.FA 2013-10-07 05:45:43 UTC
For sake of complete study of logs, could you send glusterfs logs from gluster servers and mounts?

Comment 27 Michal Skrivanek 2013-10-07 12:58:31 UTC
there should be a workaround when you attach any CDROM, anything would do, probably. can you give it a try?

Comment 28 Gianluca Cecchi 2013-10-07 22:00:56 UTC
Hello, should the change in 
http://gerrit.ovirt.org/#/c/19906/2/vdsm/vm.py
solve this?
And in case only for new machines or existing ones too?
Because in my case after applying to /usr/share/vdsm/vm.py and restarting vdsmd of both nodes, I stiil get the error if I don't select run oce and connect an iso image as a cd...

Comment 29 Martin Polednik 2013-10-07 22:14:26 UTC
The change will fix newly created VMs, the old ones might be fixed by starting/stopping through the workaround (needs confirmation if engine is able to change the address, if not the machines would need to be recreated)

Comment 30 Gianluca Cecchi 2013-10-08 00:15:39 UTC
Ok.
I confirm that I created a CentOS 6.4 32bit VM with original vm.py and I got the problem.
Then after modifying vm.py and restart vdsmd I execute the exact same steps and I have not the problem.
So this solves the problem for me

Comment 31 Sandro Bonazzola 2013-11-07 08:28:57 UTC
oVirt 3.3.0.1 has been released.


Note You need to log in before you can comment on or make changes to this bug.