Hide Forgot
Created attachment 989687 [details] engine.log, vdsm.log, libvirtd.log, qemu vm log Description of problem: In BZ1190663 I mentioned sysprep payload is attached to _normal_ Windows VM regardless of its settings' enabling. Surprisingly second start (second Run click; previously powered off) fails because of payload file access problem. it is not related to selinux, i executed 'setenforce 0' (permissive) and this same issue. ... Thread-353536::ERROR::2015-02-09 13:12:27,417::vm::2324::vm.Vm::(_startUnderlyingVm) vmId=`be87998e-8a1d-4817-8543-c7f11b392d43`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 2264, in _startUnderlyingVm File "/usr/share/vdsm/virt/vm.py", line 3323, in _run File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2709, in createXML libvirtError: internal error process exited while connecting to monitor: 2015-02-09T13:12:26.920792Z qemu-kvm: -drive file=/var/run/vdsm/payload/be87998e-8a1d-4817-8543-c7f11b392d43.8704fc5b5ad0d207a173b632278828 5a.img,if=none,id=drive-fdc0-0-0,format=raw,serial=: could not open disk image /var/run/vdsm/payload/be87998e-8a1d-4817-8543-c7f11b392d43.8704fc5b5ad0d207a173b6322788285a.img: Permission denied Thread-353536::DEBUG::2015-02-09 13:12:27,422::vm::2779::vm.Vm::(setDownStatus) vmId=`be87998e-8a1d-4817-8543-c7f11b392d43`::Changed state to Down: internal error process exited while connecting to monitor: 2015- 02-09T13:12:26.920792Z qemu-kvm: -drive file=/var/run/vdsm/payload/be87998e-8a1d-4817-8543-c7f11b392d43.8704fc5b5ad0d207a173b6322788285a.img,if=none,id=drive-fdc0-0-0,format=raw,serial=: could not open disk image /var/run/vdsm/payload/be87998e-8a1d-4817-8543-c7f11b392d43.8704fc5b5ad0d207a173b6322788285a.img: Permission denied (code=1) i used following loop to see payload image permissions and selinux context... # rm -f /tmp/out ; while true; do echo **start**; date "+%Y-%m-%d %H:%M:%S:%N"; su -s /bin/sh - qemu -c "ls -laZ /var/run/vdsm/payload/" ; su -s /bin/sh - qemu -c "cat /var/run/vdsm/payload/*.img 2>/dev/null | strings" ; ps axuZ | grep '[q]emu.*omg'; echo ; sleep 0.1 ; echo **end** ; done 2>&1 >> /tmp/out it revealed the payload is created under different context but right before qemu uses it, it is changed to 'svirt_image_t:s0:c739,c875' **start** 2015-02-09 13:17:57:743047247 drwxr-xr-x. root root system_u:object_r:virt_var_run_t:s0 . drwxr-xr-x. vdsm kvm system_u:object_r:virt_var_run_t:s0 .. -rw-r--r--. vdsm qemu system_u:object_r:svirt_image_t:s0:c739,c875 be87998e-8a1d-4817-8543-c7f11b392d43.8704fc5b5ad0d207a173b6322788285a.img mkdosfs T8 FAT12 This is not a bootable disk. Please insert a bootable floppy and press any key to try again ... SYSPREP INF d<jIFIF <jIF <?xml version="1.0" encoding="utf-8"?> ...content of image... ystem_u:system_r:svirt_t:s0:c739,c875 qemu 19714 5.0 0.1 155532 9124 ? D 13:17 0:00 /usr/libexec/qemu-kvm -name omg -S -M rhel6.5.0 -cpu SandyBridge -enable-kvm -m 1024 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -uuid be87998e-8a1d-4817-8543-c7f11b392d43 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6.6-20150128.0.el6ev,serial=4C4C4544-0034-5310-8052-B3C04F4A354A,uuid=be87998e-8a1d-4817-8543-c7f11b392d43 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/omg.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2015-02-09T13:17:57,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x6 -drive file=/rhev/data-center/mnt/10.34.63.204:_home_iso_shared/0c78b4d6-ba00-4d3e-9f9f-65c7d5899d71/images/11111111-1111-1111-1111-111111111111/RHEV-toolsSetup_3.5_9.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/run/vdsm/payload/be87998e-8a1d-4817-8543-c7f11b392d43.8704fc5b5ad0d207a173b6322788285a.img,if=none,id=drive-fdc0-0-0,format=raw,serial= -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/rhev/data-center/00000002-0002-0002-0002-0000000001a9/2779099b-7952-4455-ac43-7ec876b9f3a3/images/8de92838-7f7e-4f45-bdfc-d59a9f2ebc97/11d3a0d1-5e4d-43a0-9c45-1e76eeefbb2f,if=none,id=drive-virtio-disk0,format=raw,serial=8de92838-7f7e-4f45-bdfc-d59a9f2ebc97,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:d1:5d:06,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/be87998e-8a1d-4817-8543-c7f11b392d43.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/be87998e-8a1d-4817-8543-c7f11b392d43.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -vnc 10.34.62.205:0,password -k en-us -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on **end** i don't understand why qemu-kvm can't access it (selinux is permissive!). [root@dell-r210ii-13 ~]# grep denied /var/log/audit/audit.log [root@dell-r210ii-13 ~]# getenforce Permissive 2015-02-09T13:17:57.796582Z qemu-kvm: -drive file=/var/run/vdsm/payload/be87998e-8a1d-4817-8543-c7f11b392d43.8704fc5b5ad0d207a173b6322788285a.img,if=none,id=drive-fdc0-0-0,format=raw,serial=: could not open disk image /var/run/vdsm/payload/be87998e-8a1d-4817-8543-c7f11b392d43.8704fc5b5ad0d207a173b6322788285a.img: Permission denied 2015-02-09 13:17:58.087+0000: shutting down please inspect logs why it sometimes doesn't start Version-Release number of selected component (if applicable): libvirt-0.10.2-46.el6_6.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.446.el6.x86_64 vdsm-4.16.8.1-6.el6ev.x86_64 Red Hat Enterprise Virtualization Hypervisor 6.6 (20150128.0.el6ev) rhevm-backend-3.5.0-0.31.el6ev.noarch How reproducible: ??? Steps to Reproduce: 1. new Windows VM in User Portal/Extended, add disk 2. start via 'Run' button, poweroff immediately 3. start again via 'Run' button Actual results: fails to start, qemu-kvm tries to access sysprep payload but fails Expected results: should work Additional info: no idea why there is sysprep payload at all, see BZ1190663
can you please check, after you shut down the vm, and before the second run, what is the result of the following db query: select is_initialized from vm_static where vm_name='<name>';
engine=# select is_initialized from vm_static where vm_name='bugtest'; is_initialized ---------------- t (1 row) I used exact steps as in #0 (VmCreator role on DC).
the second run by mistake drops the "readonly". It is a permission problem - image is r/o for qemu. seems engine's fault...
right, the reason is that the engine mistakenly picks the floppy (unmanaged) device as a managed device and send it back to vdsm with bad params.
reducing severity; sending payload again and again is not an interesting usecase (though it's possible)
Update on the problem symptoms at my customer (case number 1518293 linked to external trackers) and the reason I set this BZ to NeedsInfo. After extensive testing at my customer site today, it was determined that the pooled Windows VMs show the problem on the second boot exactly as described in this BZ. The VM firstboots and applies the settings from a:\\unattend.xml. Shut down the VM, then power it up again. It fails with the error as described. Sometime RHEV-M also declares the host in error for 30 minutes, which prohibits the host from accepting any new VM migrations. But I have so far been unable to replicate the problem in my own test environment. So the info I'm looking for is (1) Do we know what conditions need to exist to trigger the problem, (2) is there a more suitable workaround than engine database updates for every single virtual machine (hundreds of new VMs every month), and (3) since a fix is already apparently in QA, is it possible to apply a hotfix to 3.5.4 to get our customer past this issue? Customer details in the support case mentioned above. thanks - Greg Scott
not sure about hotfix, but patch looks simple and safe to backport for 3.5 suggesting 3.5.z i think we can have it in time for 3.5.5
Backporting to 3.5.5 is great - thanks - but it still leaves a major customer in a world of hurt. The customer is months into the 3.5.4 testing cycle and to start again with 3.5.5 means they have to throw out months of work and start over. We've already pitched doing 3.5.5 since it's coming soon and the answer was a resounding no with some adjectives added. So would it be possible to also do this as a hotfix for 3.5.4? If not possible to hotfix 3.5.4, at least document the conditions that trigger this bug since I'm not able to reproduce it. thanks - Greg
Omer, can we confirm for sure the fix can go into 3.5.5? I'm pushing the customer to go to 3.5.5 instead of 3.5.4 - can I commit to the customer this fix will be in there? thanks - Greg
unfortunately i think not. this got delayed as we could not confirm its the exact same case and the solution would be the same. could you upload relevant logs of your scenario so we could investigate further?
Created attachment 1084405 [details] Log Collector report from the customer experiencing the problem
I should have updated the title of the attachment I just uploaded. The whole log collector report was too big to upload. So I extracted some QEMU logs, libvirtd.log, vdsm.log from a host and engine.log and the database from rhevm. I know 3.5.5 is due out tomorrow - but if there's any way to get the fix into that 3.5.5 stream I will be in your debt. thanks - Greg
Im sorry but i think, 3.5.5 is missed, as far as i know there are no more builds planned for it (as it moved for final testing already). I could not see the errors in vdsm/engine logs that you attached, only the final error in qemu/libvirt logs, which doesnt help in understanding the source of the issue. assuming this is the same issue, i have created a patch rebased on top of 3.5 from what i know, you can ask for GSS to create hotfix using this patch, so you can apply if needed. I hope it helps
3.5 patch https://gerrit.ovirt.org/#/c/47562/
Verification build: rhevm-3.6.0.1-0.1.el6 vdsm-4.17.9-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7.x86_64 libvirt-client-1.2.17-5.el7.x86_64 Steps to Reproduce: 1. new Windows VM in User Portal/Extended, add disk 2. start via 'Run' button, poweroff immediately 3. start again via 'Run' button 4. Repeat steps 2-3 few times. 1. Create pool VM. 2. Run VM and power it off immediately. 3. run VM again. 4. Repeat steps 2-3 few times.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0376.html