Bug 1190696
Summary: | Second run of Windows VM fails because of access problem to sysprep payload | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Jiri Belka <jbelka> | ||||||
Component: | ovirt-engine | Assignee: | Michal Skrivanek <michal.skrivanek> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Nisim Simsolo <nsimsolo> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 3.5.0 | CC: | agkesos, dornelas, gklein, gscott, inetkach, istein, jbelka, lpeer, lsurette, michal.skrivanek, nashok, nsimsolo, obockows, pzhukov, rbalakri, Rhev-m-bugs, rhodain, sknauss, smelamud, yeylon, ykaul | ||||||
Target Milestone: | ovirt-3.6.0-rc | Keywords: | ZStream | ||||||
Target Release: | 3.6.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1274708 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-03-09 20:56:58 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1274708 | ||||||||
Attachments: |
|
Description
Jiri Belka
2015-02-09 13:32:07 UTC
can you please check, after you shut down the vm, and before the second run, what is the result of the following db query: select is_initialized from vm_static where vm_name='<name>'; engine=# select is_initialized from vm_static where vm_name='bugtest'; is_initialized ---------------- t (1 row) I used exact steps as in #0 (VmCreator role on DC). the second run by mistake drops the "readonly". It is a permission problem - image is r/o for qemu. seems engine's fault... right, the reason is that the engine mistakenly picks the floppy (unmanaged) device as a managed device and send it back to vdsm with bad params. reducing severity; sending payload again and again is not an interesting usecase (though it's possible) Update on the problem symptoms at my customer (case number 1518293 linked to external trackers) and the reason I set this BZ to NeedsInfo. After extensive testing at my customer site today, it was determined that the pooled Windows VMs show the problem on the second boot exactly as described in this BZ. The VM firstboots and applies the settings from a:\\unattend.xml. Shut down the VM, then power it up again. It fails with the error as described. Sometime RHEV-M also declares the host in error for 30 minutes, which prohibits the host from accepting any new VM migrations. But I have so far been unable to replicate the problem in my own test environment. So the info I'm looking for is (1) Do we know what conditions need to exist to trigger the problem, (2) is there a more suitable workaround than engine database updates for every single virtual machine (hundreds of new VMs every month), and (3) since a fix is already apparently in QA, is it possible to apply a hotfix to 3.5.4 to get our customer past this issue? Customer details in the support case mentioned above. thanks - Greg Scott not sure about hotfix, but patch looks simple and safe to backport for 3.5 suggesting 3.5.z i think we can have it in time for 3.5.5 Backporting to 3.5.5 is great - thanks - but it still leaves a major customer in a world of hurt. The customer is months into the 3.5.4 testing cycle and to start again with 3.5.5 means they have to throw out months of work and start over. We've already pitched doing 3.5.5 since it's coming soon and the answer was a resounding no with some adjectives added. So would it be possible to also do this as a hotfix for 3.5.4? If not possible to hotfix 3.5.4, at least document the conditions that trigger this bug since I'm not able to reproduce it. thanks - Greg Omer, can we confirm for sure the fix can go into 3.5.5? I'm pushing the customer to go to 3.5.5 instead of 3.5.4 - can I commit to the customer this fix will be in there? thanks - Greg unfortunately i think not. this got delayed as we could not confirm its the exact same case and the solution would be the same. could you upload relevant logs of your scenario so we could investigate further? Created attachment 1084405 [details]
Log Collector report from the customer experiencing the problem
I should have updated the title of the attachment I just uploaded. The whole log collector report was too big to upload. So I extracted some QEMU logs, libvirtd.log, vdsm.log from a host and engine.log and the database from rhevm. I know 3.5.5 is due out tomorrow - but if there's any way to get the fix into that 3.5.5 stream I will be in your debt. thanks - Greg Im sorry but i think, 3.5.5 is missed, as far as i know there are no more builds planned for it (as it moved for final testing already). I could not see the errors in vdsm/engine logs that you attached, only the final error in qemu/libvirt logs, which doesnt help in understanding the source of the issue. assuming this is the same issue, i have created a patch rebased on top of 3.5 from what i know, you can ask for GSS to create hotfix using this patch, so you can apply if needed. I hope it helps 3.5 patch https://gerrit.ovirt.org/#/c/47562/ Verification build: rhevm-3.6.0.1-0.1.el6 vdsm-4.17.9-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7.x86_64 libvirt-client-1.2.17-5.el7.x86_64 Steps to Reproduce: 1. new Windows VM in User Portal/Extended, add disk 2. start via 'Run' button, poweroff immediately 3. start again via 'Run' button 4. Repeat steps 2-3 few times. 1. Create pool VM. 2. Run VM and power it off immediately. 3. run VM again. 4. Repeat steps 2-3 few times. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0376.html |