Red Hat Bugzilla – Bug 978268
Unable to put a host into maintenance because VMs previously managed by vdsm are running on the host
Last modified: 2015-09-22 09:09 EDT
Created attachment 765472 [details] engine.log, vdsm.log Description of problem: This is caused very bad use case but anyway. I had a rhevm with couple of running VMs on a host, vdsm known about these running VMs. Then I did 'rhevm-cleanup', but the VMs were still running. Then I clean rhevm server completely, reinstalled rhevm and did rhevm-setup, re-registered the host again to rhevm. Till now all OK. Then I tried to put the host into maintenance, it is still in 'Preparing For Maintenance'... I think it is because vdsm knows about "old" VMs running on the host but engine does not know about them. I see a lot of 'vmGetStats' in vdsm.log. I think this strange case could be solved in two ways: 1. when host is registered to engine, it should get info about VMs, and if vdsm has some "old" VMs running, it should forget them, so the host could be put into maintenance 2. (better in future) engine could get a list of running VMs on running host (rhevm style running hosts) and offer a way to 're-register' such running "old" VMs into engine. this would of course need different handling of storage domains, ie. enable to attach already used data domain (like it is in VMWare vSphere world). so... there are some "unknown" rhevm-like VMs running on the host -> if it could get all attributes (properties, access to storage...) and their health would look OK, propose to 're-register' and use storage of VMs' images as data domain. Version-Release number of selected component (if applicable): is2 How reproducible: 100% Steps to Reproduce: 1. have engine, host and couple of VMs running 2. rhevm-cleanup, clean all engine server 3. install rhevm, clean rhevm-setup, add the host (with still running VMs) 4. try to put the host into maintenance Actual results: unable to put the host into maintenance because its vdsm knows about some "old" vms running Expected results: probably vdsm should drop its data about running VMs which are unknown to engine (or some re-register way?) Additional info:
Killing qemu-kvm processes was not enough, vdsmd restart helped and now the host is in maintenance.
well there might be some check for running qemu on host deploy...maybe. Or at least alert on unknown VMs on the host in GUI?
we wouldn't have fixed for this use case probably, but we are actually looking now at showing unmanaged VMs - assigning to Oved
needed for hosted engine
Latest merged patch (see external tracker), imports VMs that run on the engine hosts as external VMs in the engine. There is a limited list of operations that one can do on such VMs: 1. Connect to the console 2. Migrate VM 3. Cancel migration So, in the maintenance use-case, one can migrate this VM to another host in the cluster, and then move the host to maintenance.
is13 - migration does not work. either something is not fully described or i doubt this funcionality makes logic. 1. what about storage? we do not support importing already configured data domain, so how could i migrate it without having storage in the setup? as i could not import already configured data domain i tried to mount already mounted data domain to newly added host before doing migration... 2. if you dediced to import already running VMs into engine managed by already running vdsmd, why i cannot stop it? this is little ridiculous... Error while executing action: external-running: Cannot shutdown VM. This VM is not managed by the engine. 3. why am i forced to use another host to do something with already running VMs? the issue in the beginning was obvious and simple... there were forgotten "old" VMs running which used to be managed by an engine. # doing migration as described in comment #5 * on source host: Thread-121::ERROR::2013-09-05 13:15:11,650::vm::242::vm.Vm::(_recover) vmId=`89478d8e-48dd-4722-91a9-78530382f5be`::migration destination error: Error creating the requested VM Thread-121::ERROR::2013-09-05 13:15:11,705::vm::322::vm.Vm::(run) vmId=`89478d8e-48dd-4722-91a9-78530382f5be`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 309, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 345, in _startUnderlyingMigration response['status']['message']) RuntimeError: migration destination error: Error creating the requested VM VM Channels Listener::DEBUG::2013-09-05 13:15:12,573::vmChannels::91::vds::(_handle_timeouts) Timeout on fileno 7. * on destination host: Thread-94::ERROR::2013-09-05 13:14:36,511::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Image path does not exist or cannot be accessed/created: ('/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7-a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631',)", 'code': 254}} Thread-94::DEBUG::2013-09-05 13:14:36,511::vm::2038::vm.Vm::(_startUnderlyingVm) vmId=`89478d8e-48dd-4722-91a9-78530382f5be`::_ongoingCreations released Thread-94::ERROR::2013-09-05 13:14:36,511::vm::2064::vm.Vm::(_startUnderlyingVm) vmId=`89478d8e-48dd-4722-91a9-78530382f5be`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 2024, in _startUnderlyingVm self._run() File "/usr/share/vdsm/vm.py", line 2833, in _run self.preparePaths(devices[DISK_DEVICES]) File "/usr/share/vdsm/vm.py", line 2086, in preparePaths drive['path'] = self.cif.prepareVolumePath(drive, self.id) File "/usr/share/vdsm/clientIF.py", line 280, in prepareVolumePath raise vm.VolumeError(drive) VolumeError: Bad volume specification {'address': {'bus': '0', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'reqsize': '0', 'index': 0, 'iface': 'ide', 'apparentsize': '1073741824', 'specParams': {}, 'imageID': '888b41bd-949e-4eb0-816d-168751728631', 'readonly': 'False', 'shared': 'false', 'truesize': '24576', 'type': 'disk', 'domainID': 'de0419b0-bee2-43f7-a100-111a113547ad', 'volumeInfo': {'path': '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7-a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631/248416e6-e898-4816-b02e-b243a73c46cf', 'volType': 'path'}, 'format': 'raw', 'deviceId': '888b41bd-949e-4eb0-816d-168751728631', 'poolID': '5849b030-626e-47cb-ad90-3ce782d831b3', 'device': 'disk', 'path': '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7-a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631/248416e6-e898-4816-b02e-b243a73c46cf', 'propagateErrors': 'off', 'optional': 'false', 'name': 'hda', 'bootOrder': '2', 'volumeID': '248416e6-e898-4816-b02e-b243a73c46cf', 'alias': 'ide0-0-0', 'volumeChain': [{'path': '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7-a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631/248416e6-e898-4816-b02e-b243a73c46cf', 'domainID': 'de0419b0-bee2-43f7-a100-111a113547ad', 'imageID': '888b41bd-949e-4eb0-816d-168751728631', 'volumeID': '248416e6-e898-4816-b02e-b243a73c46cf', 'vmVolInfo': {'path': '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7-a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631/248416e6-e898-4816-b02e-b243a73c46cf', 'volType': 'path'}}]} so now i have some imported VM running in the setup, i have this VM in 'Migrating From' state and the original host having this VM is in 'Preparing For Maintenance'...
after VM migration timeouts, it is up again on original host. attaching some logs...
Created attachment 794233 [details] engine.log, host4-vdsm.log, host3-vdsm.log host4 - source host, host3 - destination host.
(In reply to Jiri Belka from comment #8) > is13 - migration does not work. > > either something is not fully described or i doubt this funcionality makes > logic. > > 1. what about storage? we do not support importing already configured data > domain, so how could i migrate it without having storage in the setup? > as i could not import already configured data domain i tried to mount > already mounted data domain to newly added host before doing migration... > Migration will obviously fail if you don't have the storage on the destination host. > 2. if you dediced to import already running VMs into engine managed by > already running vdsmd, why i cannot stop it? this is little ridiculous... > > Error while executing action: > > external-running: > > Cannot shutdown VM. This VM is not managed by the engine. > The idea here was to allow you to see these VMs, even to migrate them, but not to stop them as someone else is managing them for you. It is true that if these VMs are just "leftovers", then no one manages them, but in the real use-case there will be someone in control of these VMs (such as the hosted engine HA agents). Perhaps we can do the following logic: 1. If the VM is hosted-engine VM, allow only operations we allow today (migrate, connect to console). If, however, it is just an external VM, then also allow stopping it (and perhaps other operations as well?) Itamar - thoughts about that? > 3. why am i forced to use another host to do something with already running > VMs? > > the issue in the beginning was obvious and simple... there were forgotten > "old" VMs running which used to be managed by an engine. > > # doing migration as described in comment #5 > > * on source host: > > Thread-121::ERROR::2013-09-05 13:15:11,650::vm::242::vm.Vm::(_recover) > vmId=`89478d8e-48dd-4722-91a9-78530382f5be`::migration destination error: > Error creating the requested VM > Thread-121::ERROR::2013-09-05 13:15:11,705::vm::322::vm.Vm::(run) > vmId=`89478d8e-48dd-4722-91a9-78530382f5be`::Failed to migrate > Traceback (most recent call last): > File "/usr/share/vdsm/vm.py", line 309, in run > self._startUnderlyingMigration() > File "/usr/share/vdsm/vm.py", line 345, in _startUnderlyingMigration > response['status']['message']) > RuntimeError: migration destination error: Error creating the requested VM > VM Channels Listener::DEBUG::2013-09-05 > 13:15:12,573::vmChannels::91::vds::(_handle_timeouts) Timeout on fileno 7. > > * on destination host: > > Thread-94::ERROR::2013-09-05 > 13:14:36,511::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': > {'message': "Image path does not exist or cannot be accessed/created: > ('/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7- > a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631',)", 'code': > 254}} > Thread-94::DEBUG::2013-09-05 > 13:14:36,511::vm::2038::vm.Vm::(_startUnderlyingVm) > vmId=`89478d8e-48dd-4722-91a9-78530382f5be`::_ongoingCreations released > Thread-94::ERROR::2013-09-05 > 13:14:36,511::vm::2064::vm.Vm::(_startUnderlyingVm) > vmId=`89478d8e-48dd-4722-91a9-78530382f5be`::The vm start process failed > Traceback (most recent call last): > File "/usr/share/vdsm/vm.py", line 2024, in _startUnderlyingVm > self._run() > File "/usr/share/vdsm/vm.py", line 2833, in _run > self.preparePaths(devices[DISK_DEVICES]) > File "/usr/share/vdsm/vm.py", line 2086, in preparePaths > drive['path'] = self.cif.prepareVolumePath(drive, self.id) > File "/usr/share/vdsm/clientIF.py", line 280, in prepareVolumePath > raise vm.VolumeError(drive) > VolumeError: Bad volume specification {'address': {'bus': '0', 'controller': > '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'reqsize': '0', 'index': > 0, 'iface': 'ide', 'apparentsize': '1073741824', 'specParams': {}, > 'imageID': '888b41bd-949e-4eb0-816d-168751728631', 'readonly': 'False', > 'shared': 'false', 'truesize': '24576', 'type': 'disk', 'domainID': > 'de0419b0-bee2-43f7-a100-111a113547ad', 'volumeInfo': {'path': > '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7- > a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631/248416e6-e898- > 4816-b02e-b243a73c46cf', 'volType': 'path'}, 'format': 'raw', 'deviceId': > '888b41bd-949e-4eb0-816d-168751728631', 'poolID': > '5849b030-626e-47cb-ad90-3ce782d831b3', 'device': 'disk', 'path': > '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7- > a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631/248416e6-e898- > 4816-b02e-b243a73c46cf', 'propagateErrors': 'off', 'optional': 'false', > 'name': 'hda', 'bootOrder': '2', 'volumeID': > '248416e6-e898-4816-b02e-b243a73c46cf', 'alias': 'ide0-0-0', 'volumeChain': > [{'path': > '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7- > a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631/248416e6-e898- > 4816-b02e-b243a73c46cf', 'domainID': 'de0419b0-bee2-43f7-a100-111a113547ad', > 'imageID': '888b41bd-949e-4eb0-816d-168751728631', 'volumeID': > '248416e6-e898-4816-b02e-b243a73c46cf', 'vmVolInfo': {'path': > '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/de0419b0-bee2-43f7- > a100-111a113547ad/images/888b41bd-949e-4eb0-816d-168751728631/248416e6-e898- > 4816-b02e-b243a73c46cf', 'volType': 'path'}}]} > > so now i have some imported VM running in the setup, i have this VM in > 'Migrating From' state and the original host having this VM is in 'Preparing > For Maintenance'...
The patch posted above (http://gerrit.ovirt.org/#/c/18963/) will allow one to stop and remove external VMs (but not hosted-engine VMs). Didn't move the bug to POST yet, as I'm waiting for an approval to go with that approach.
sounds ok to me. doron/michal?
Ok. Though we want to draw a line somewhere. Since it's external we won't be able to do the same set of actions. Maybe better work on a proper import into the system including storage&stuff...
(In reply to Michal Skrivanek from comment #14) > Ok. Though we want to draw a line somewhere. Since it's external we won't be > able to do the same set of actions. Maybe better work on a proper import > into the system including storage&stuff... Scheduling wise, this is a nightmare as something is eating up the host resources. So we should be able to stop it, but I agree with Michal that for the long term import would be more beneficial.
What do you mean by 'external'? I tried to run a VM via qemu-kvm directly and this VM is not discovered. I suppose the engine gets the list from vdsm, thus in comment #1 I wrote that after killing processes and restarting vdsm it was OK.
(In reply to Jiri Belka from comment #16) > What do you mean by 'external'? I tried to run a VM via qemu-kvm directly > and this VM is not discovered. I suppose the engine gets the list from vdsm, > thus in comment #1 I wrote that after killing processes and restarting vdsm > it was OK. I mean VMs that were run not through the oVirt engine. VDSM reports every VM with "product" entity that's oVirt-related.
(In reply to Oved Ourfali from comment #12) > The patch posted above (http://gerrit.ovirt.org/#/c/18963/) will allow one > to stop and remove external VMs (but not hosted-engine VMs). > > Didn't move the bug to POST yet, as I'm waiting for an approval to go with > that approach. This patch is posted now. Moved the bug to MODIFIED. It will allow to stop and delete external VMs (but not the hosted engine VM). Putting the host into maintenance might work and might fail, depending on the migration of the VM, but in case of failure the admin can just stop and delete it.
Now that your way is to kill "old external" VMs, what about to unmount "old" data storage domain from the host (one which has mount-point in /rhev/data-center/mnt)? You cannot work from new engine with such data storage domain anyway, as you can't import already configured data domain and IIRC adding same data domain would end in error as the mount point already exists. Maybe importing storage domains as "external" in the same way as done with "external" VMs?
In is14 I don't see "external" VMs imported at all. So I cannot verify new functionality. # uptime 12:38:06 up 21:58, 2 users, load average: 0.00, 0.00, 0.00 # vdsClient -s 0 list | grep vmName vmName = w8-x64 # ps ax | grep '[q]emu-kvm.*w8-x64' 15520 ? Sl 3:55 /usr/libexec/qemu-kvm -name w8-x64 -S -M rhel6.4.0 -cpu SandyBridge -enable-kvm -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 294c853b-aa7b-4d41-9436-3f46f82d362d -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.4.0.4.el6,serial=4C4C4544-0058-3410-8058-C2C04F38354A,uuid=294c853b-aa7b-4d41-9436-3f46f82d362d -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/w8-x64.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-09-12T08:51:39,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/3ae7b3fe-5d1b-4e64-8c80-a6b69a111425/images/c5937153-de6d-4bf8-806f-73a6eb146a81/46623e7d-dd81-40a8-813f-dff1e87ecce9,if=none,id=drive-virtio-disk0,format=raw,serial=c5937153-de6d-4bf8-806f-73a6eb146a81,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2d:4d:23,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/294c853b-aa7b-4d41-9436-3f46f82d362d.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/294c853b-aa7b-4d41-9436-3f46f82d362d.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k es -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device AC97,id=sound0,bus=pci.0,addr=0x4 -incoming tcp:0.0.0.0:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8
Created attachment 796767 [details] engine.log, vdsm.log
Ha, in VM tabs there is not VMs visible but in Hosts list in Virtual Machines column there is '1'.
(In reply to Jiri Belka from comment #23) > Ha, in VM tabs there is not VMs visible but in Hosts list in Virtual > Machines column there is '1'. There was a change recently regarding supporting single QXL device. Turns out the default is to have one such device, thus adding the VM fails, as it supported only to linux VMs, and when I import the VM I don't set the OS. Fixing that now. Patch will be verified and posted soon.
Verified in rhevm-3.3.0-0.24.master.el6ev.noarch (is17). Verification steps: 1) Create new VM with bootable disk and run it. 2) Stop the ovirt-engine service. 3) Make the VM "external" by removing it from engine's database: - psql -U engine engine - DELETE FROM snapshots WHERE vm_id='<VM UUID>'; - DELETE FROM vm_static WHERE vm_name='<VM name>'; 4) Start ovirt-engine service and log into Webadmin. Results: The removed VM was added to VMs grid as external VM under name 'external-<VM name>'. This machine can be shutdown, powered off or migrated and then deleted. The related host can be then successfully put to maintenance.
This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0038.html