Bug 1419924
Summary: | cluster level 4.1 adds Random Generator to all VMs while it may not be presented by cluster | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Evgheni Dereveanchin <ederevea> | ||||||||||
Component: | BLL.Virt | Assignee: | jniederm | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nisim Simsolo <nsimsolo> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 4.1.0.4 | CC: | bugs, ederevea, eedri, gklein, jniederm, mburman, michal.skrivanek, nsimsolo | ||||||||||
Target Milestone: | ovirt-4.1.1 | Flags: | gklein:
ovirt-4.1+
gklein: blocker+ |
||||||||||
Target Release: | 4.1.1.3 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | upgrade | ||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: |
If a VM was running during cluster upgrade 4.0 -> 4.1 and the VM had a /dev/random RNG device set and VM had no custom compatibility level set then the RNG device was not updated from `random` to `urandom` on VM shutdown (or power off or restart). This caused a VM to have incompatible RNG device which prevented it from running. Workaround: Remove and re-add rng device to the VM. Fix: Running VMs have updated RNG device stored in next-run configuration during cluster update. Thus the RNG device is properly updated on VM shutdown
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2017-04-21 09:54:08 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1337101, 1374227 | ||||||||||||
Attachments: |
|
Description
Evgheni Dereveanchin
2017-02-07 12:18:18 UTC
This is probably related to bz#1337101 which enabled this by default but on pre-existing hosts the flag is not set for some reason as otherwise the VM would start fine. if you have engine.log from the time of cluster level change please attach it to the bug Created attachment 1248430 [details]
engine log
engine log provided, VM name is "vm1" and the first failed reboot is at timestamp 2017-02-06 13:56:36,561
Cluster and DC updates happened a few minutes before that.
*** Bug 1420213 has been marked as a duplicate of this bug. *** This bug is also relevant for RHEV-H: 2017-02-08 18:58:12,824+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] (default task-30) [21f2ce18-0f42-467a-b4d8-664538fed970] Validation of action 'RunVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_RNG_SOURCE_NOT_SUPPORTED engine.log attached Created attachment 1248645 [details]
rhevh engine.log
a workaround is fairly simple - for all VMs using virtio-rng "random" source in 4.0 do the followinf after upgrade to 4.1: Edit VM, uncheck virtio-rng, and check it again Not fixed, trying to run VM after cluster upgrade from 4.0 to 4.1 failed and the next message displayed in webadmin: "Error while executing action: 1111upgrade1: Cannot run VM. Random Number Generator device is not supported in cluster." engine.log showing the next WARN: 2017-02-19 16:18:48,705+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] (default task-23) [301d16a7-d572-4d22-90a1-73385eab3a7c] Validation of action 'RunVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_RNG_SOURCE_NOT_SUPPORTED Verification builds: ovirt-engine-4.1.1.2-0.1.el7 qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64 vdsm-4.19.6-1.el7ev.x86_64 libvirt-client-2.0.0-10.el7_3.5.x86_64 sanlock-3.4.0-1.el7.x86_64 Verification scenario: 1. Create New DC and cluster with compatibility version 4.0, add host and storage to it. 2. Create new VM, install RHEL 7 on it and verify /dev/random functionality: dd count=1 bs=128 if=/dev/random of=/dev/stdout| xxd 3. Change cluster and DC compatibility version to 4.1 and verify delta icon appears on VM status tab. 4. Power off VM and try to run it. engine.log and vdsm.log attached Created attachment 1255438 [details]
reassign engine.log
Created attachment 1255439 [details]
reassign vdsm.log
Hi Nisim, this was caused by incorrect move to MODIFIED by automation. Sorry, this fix is not in 4.1.1.2. Eyal, this is still happening and confusing people and they waste their time. We need to be more careful what is being moved to MODIFIED and ON_QA automatically. A bug is moved to MODIFIED if all its attached external trackers are in MERGED status. The bot can't know if there will be future patches attached to the bug or not. In the past if this happened, we moved the status back to POST - we got a request for PM to never move bug status backwards, so this was removed. If you choose to leave moving bugs from POST to MODIFIED to a manual process then we'll end up with much more bugs on POST then on MODIFIED and will deploy fixes to QE that are already fixed but weren't moved due to human error. So unless you have a better criteria or solution on when to move to MODIFIED, I don't think we can do anything different than we're doing now that will improve things w/o the penalties explained above. (In reply to Eyal Edri from comment #17) > A bug is moved to MODIFIED if all its attached external trackers are in > MERGED status. Sorry, I wasn't clear, I've meant the MODIFIED->ON_QA is problematic in this case > The bot can't know if there will be future patches attached to the bug or > not. > In the past if this happened, we moved the status back to POST - we got a > request for PM to never move bug status backwards, so this was removed. really? hm, but that's exactly what a person will do manually now anyway > If you choose to leave moving bugs from POST to MODIFIED to a manual process > then we'll end up with much more bugs on POST then on MODIFIED and will > deploy fixes to QE that are already fixed but weren't moved due to human > error. > > So unless you have a better criteria or solution on when to move to > MODIFIED, I don't think we can do anything different than we're doing now > that will improve things w/o the penalties explained above. well, I personally advocate for manually moving the bug. But the point here is really about incorrect ON_QA - that we should either verify automatically or at least get the list of modified candidates at the time of tagging instead of the build which takes one or two days more. Everything merged in that time falls through the cracks and ends up in ON_QA without being in OK, looking at the logs, the bug passed all versification, but its still not part of the latest tag so it shouldn't have been moved. This might a bug in our code that scans the bugs's external trackers and verify the bug is in. I've opened [1] to track debugging this issue, thanks for reporting it. [1] https://ovirt-jira.atlassian.net/browse/OVIRT-1168 Verification builds: ovirt-engine-4.1.1.3-0.1.el7 qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64 vdsm-4.19.7-1.el7ev.x86_64 libvirt-client-2.0.0-10.el7_3.5.x86_64 sanlock-3.4.0-1.el7.x86_64 Verification scenario: 1. Run some VMs 2. Upgrade clusted and DC with /dev/random enabled from 3.6 to 4.1 3. Verify delta icon (reboot required) appears on running VMs. 4. Power -> run all VMs. 5. Veirfy from vdsm.log VM xml that rng source is now: <backend model="random">/dev/urandom</backend> 6. Repeat steps 1-5 but this time upgrade cluster from 4.0 to 4.1 Verified also with RHEV-H: qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64 vdsm-4.19.4-1.el7ev.x86_64 libvirt-client-2.0.0-10.el7_3.4.x86_64 sanlock-3.4.0-1.el7.x86_64 |