Created attachment 797438 [details] error passage from engine.log Description of problem: When running a VM for the second time, the VM fails to start, with the error: "Failed to run VM: internal error unexpected address type for ide disk" Version-Release number of selected component (if applicable): vdsm 4.12.1-2.fc19 ovirt-engine 3.3.0-3.fc19 How reproducible: Steps to Reproduce: 1. Create a new VM w/ virtio disk 2. VM runs normally 3. Power down VM 4. Try to start VM Actual results: VM won't start, w/ error msg: internal error unexpected address type for ide disk Expected results: VM starts Additional info: * Changing disk to IDE, removing and re-adding, VM still won't start * If created w/ IDE disk from the beginning, VM runs and restarts as expected. ML thread (a few others are experiencing this): http://lists.ovirt.org/pipermail/users/2013-September/016280.html It's an AIO engine+host setup, with a second node on a separate machine. Both machines are running F19, both have all current F19 updates and all current ovirt-beta repo updates. This is on a GlusterFS domain, hosted from a volume on the AIO machine. Also, I have the neutron external network provider configured, but these VMs aren't using one of these networks. selinux permissive on both machines, firewall down on both as well (firewall rules for gluster don't appear to be set by the engine) error passage from vdsm.log: Thread-47970::ERROR::2013-09-13 12:23:44,558::vm::2062::vm.Vm::(_startUnderlyingVm) vmId=`48bbdaf3-ee25-4ac3-a7ec-ee9512246728`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 2022, in _startUnderlyingVm self._run() File "/usr/share/vdsm/vm.py", line 2906, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2805, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error unexpected address type for ide disk
Would you please attach the 'call vmCreate' line from vdsm.log? It is most likely to be an Engine bug, which should reset the disk address after its bus type is changed.
Hi, I am getting the exact same issue with a non-AIO oVirt 3.3.0-3.fc19 setup. The only workaround I've found so far is to delete the offending VM, recreate, and reattach the disks. The recreated VM will work normally until it is shutdown, after which it will fail to start with the same error. Engine and VDSM log excepts below. Versions: - Fedora 19 (3.10.11-200) - oVirt 3.3.0-3 - VDSM 4.12.1-2 - libvirt 1.1.2-1 - gluster 3.4.0.8 Engine and VDSM logs uploaded. Thanks, Chris
Created attachment 798132 [details] Engine and VDSM logs at point of failure
Engine asks to create the VM with an IDE cdrom that has a PCI address. {'index': '2', 'iface': 'ide', 'address': {'bus': '0x00', ' slot': '0x06', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'}, 'specParams': {'path': ''}, 'readonly': 'true', 'deviceId': 'ef25939b-a5ff-456e-978f-53e7600b83ce', 'path': '', 'device': 'cdrom', 'shared': 'false', 'type': 'disk'} This may be an Engine bug, but it is more probably a Vdsm bug - could you reproduce the first successful startup of this VM? I suspect that somehow, Vdsm return a faulty PCI address for this cdrom. ============= unrelated to main issue: P.S., the log still has this awful bug with Thread-143930::WARNING::2013-09-12 15:01:22,168::clientIF::337::vds::(teardownVolumePath) Drive is not a vdsm image: VOLWM_CHUNK_MB:1024 VOLWM_CHUNK_REPLICATE_MULT:2 VOLWM_FREE_PCT:50 _blockDev:False _checkIoTuneCategories:<bound method Drive._checkIoTuneCategories of <vm.Drive object at 0x7f0fb8a7d610>> _customize:<bound method ... Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 331, in teardownVolumePath res = self.irs.teardownImage(drive['domainID'], File "/usr/share/vdsm/vm.py", line 1344, in __getitem__ raise KeyError(key) KeyError: 'domainID' which I thought is history.
for the domainID issue - sergey was looking at it in bug 980054
I've been able to work around this issue just by attaching an iso image to one of my "broken" VMs -- I do one "Run Once" with a CD attached, the VM runs, and then, in subsequent runs, attaching the CD image is no longer necessary. Seems that something (in the engine db, perhaps) is being set w/ that one with-CD-attached boot that resolves the issue...
we need to figure out if it's on the engine side or vdsm. The second time vdsm gets cdrom it gets the address as if it is a regular disk
still can't reproduce. Need engine&vdsm logs from creation, first and second start of VM...
(In reply to Michal Skrivanek from comment #8) > still can't reproduce. Need engine&vdsm logs from creation, first and second > start of VM... attaching logs from create to second run of a VM called "1007980"
Created attachment 799438 [details] engine log from create vm to second failed run
It seems that the cdrom is assigned incorrect address by libvirt as indicated by line 2078 of vdsm log, this address is then reported back to engine which uses it in next vmCreate call - causing invalid XML to be generated. Could you reproduce this bug with different version of libvirt?
we can't reproduce it with your libvirt version yet, though. Another possibility might be some kind of race during the VM creation as I see your VM is 21s in WaitForLaunch...can you try with some smaller VM? Every VM behave like you described? anyway, seems like the problem is indeed on vdsm side of things, not engine
(In reply to Martin Polednik from comment #11) > It seems that the cdrom is assigned incorrect address by libvirt as > indicated by line 2078 of vdsm log, this address is then reported back to > engine which uses it in next vmCreate call - causing invalid XML to be > generated. > > Could you reproduce this bug with different version of libvirt? Now I'm having trouble reproducing the issue at all. I rebuilt that test setup yesterday, an AIO setup, and tried it with a downgraded libvirt -- no problem. Before commenting on this bug, I tried it again with the upgraded libvirt -- no problem again. I switched my local domain to be an NFS domain, still no problem. I'm going to convert this into a GlusterFS domain today and give it another shot -- I wasn't the only one to hit this issue, and I'd like to know that it's really gone.
indeed. We haven't tried with Gluster as it should not be relevant, but having narrowed all other causes please give it a try. It would likely be some libvirt/gluster integration issue
(In reply to Michal Skrivanek from comment #14) > indeed. > We haven't tried with Gluster as it should not be relevant, but having > narrowed all other causes please give it a try. It would likely be some > libvirt/gluster integration issue I just reproduced it. I set up an AIO install following my instructions at http://community.redhat.com/up-and-running-with-ovirt-3-3/ and then converted that install to Gluster using this: http://community.redhat.com/ovirt-3-3-glusterized/. I imported a VM I'd created in AIO mode to my new, glusterized, setup, and the VM refused to run, with the error: VM foo is down. Exit message: XML error: Attempted double use of PCI Address '0000:00:06.0'. I made another VM, it ran normally, then I powered it down, tried to restart it, and back to: Failed to run VM: internal error unexpected address type for ide disk
Also, I tried downgrading libvirt (following the steps I mentioned above) to libvirt-1.0.5.1-1.fc19 yum downgrade libvirt* reboot I tried running my previously-created VMs, same error "unexpected address type for ide disk." Then I created a new VM, started it up, powered it off, and again, same error.
Bala, any idea?
I am not sure about this. I haven't gone through or studied Gluster Storage Domain code of vdsm. Deepak, could you help us on this?
Me too encountering this after today install. I'm going to attach my vdsm.log files and engine.log Environment is f19 engine + 2 x f19 hosts with gluster datacenter. packages from f19 + ovirt stable as of today Gianluca
Start of vm c6s (the only one vm present) with error at 00:21 Successfully tried Jason workaround with run once with cd attached at 00:28 Then shutdown -h now of VM at 00:31 Then power on normally at 00:31 and I don't get the error any more.
Created attachment 807313 [details] engine.log in gzip format
Created attachment 807314 [details] vdsm for node f18ovn03 in gzip format
Created attachment 807315 [details] vdsm for node f18ovn01 in gzip format
So my engine-setup was run about mid-day 03/10 and I provided full logs for the engine and vdsm from the beginning. See timestamps in comment #20 to match vm errors.
I don't have a AIO setup myself.. but going thru the BZ notes.. I am not sure if this is related to the Gluster storage domain VDSM part of the code.. bcos in VDSM side we don't touch / modify the disk specific params. I am not too sure abt the Engine side of Gluster domain code as that was worked by Sharad (who isn't working for IBM anymore). In one of the comments above Dan indicated abt Engine sending some IDE stuff.. which looked fishy? I don't understand the Engine stuff fully, and from what I have understood the problem in reading this BZ.. i don't think its related to VDSM side of gluster domain code. I will try to setup a AIO setup locally and see if i can get more insights. thanx, deepak
For sake of complete study of logs, could you send glusterfs logs from gluster servers and mounts?
there should be a workaround when you attach any CDROM, anything would do, probably. can you give it a try?
Hello, should the change in http://gerrit.ovirt.org/#/c/19906/2/vdsm/vm.py solve this? And in case only for new machines or existing ones too? Because in my case after applying to /usr/share/vdsm/vm.py and restarting vdsmd of both nodes, I stiil get the error if I don't select run oce and connect an iso image as a cd...
The change will fix newly created VMs, the old ones might be fixed by starting/stopping through the workaround (needs confirmation if engine is able to change the address, if not the machines would need to be recreated)
Ok. I confirm that I created a CentOS 6.4 32bit VM with original vm.py and I got the problem. Then after modifying vm.py and restart vdsmd I execute the exact same steps and I have not the problem. So this solves the problem for me
oVirt 3.3.0.1 has been released.