Red Hat Bugzilla – Bug 1269828
[windows 10] [3.6 engine/3.5 cluster] Windows 10 are restarting - SYSTEM_THREAD_EXCEPTION_NOT_HANDLES
Last modified: 2016-02-10 14:24:07 EST
Created attachment 1080958 [details]
windows 10 error screenshot
Description of problem:
I had Windows 10 running on my 3.6 engine env (including RHEL7 hosts with 3.6 vdsm). I needed to test something so I modified Windows 10 VM settings and changed cluster to another one with 3.5 cluster level (no warning, no error - thus seemed to be OK).
Then I started this (previously working OK) Windows 10 on RHEL6 hosts with 3.5 vdsm in this 3.5 cluster.
Windows are restarting all over again with error:
If Windows 10 is not supported to be run on 3.5 cluster level (which allow RHEL6 hosts, thus allowing older qemu-kvm etc...), then there is missing check while a user is modifying VM configuration and changes cluster level.
I'm speculating here but if Windows 10 is supported only on 3.6 cluster level then:
- there should be always check what OS has VM defined and check if requested cluster level would support it
- there should be a warning or it should not be allowed to change cluster level for specific OS types (to below levels)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. 3.6 engine, 3.6 cluster level, 3.6 host (RHEL7)
2. install Windows 10
3. 3.5 cluster level, 3.5 host (RHEL6)
4. while Windows 10 VM is down, modify cluster to 3.5 level
5. start Windows 10 VM
- no warning, no error about issues to downgrade cluster level for Windows 10
- Windows 10 is restarting all over again
- either should not be allowed to downgrade cluster level for OS types which
require new features from above cluster levels or there should be a warning
- running Windows 10 on 3.5 cluster level?
Do we support Windows 10 only with certain machine types?
RHEV 3.5 here means rhel_6.5.0 type, 3.6 would be the latest 7.2 machine type...
(In reply to Michal Skrivanek from comment #2)
> Do we support Windows 10 only with certain machine types?
> RHEV 3.5 here means rhel_6.5.0 type, 3.6 would be the latest 7.2 machine
Yes, I believe Win10 is only supported with the rhel-7.2.0 machine type. Eduardo?
There is also a -cpu flag workaround. Add +fsgsbase. There is a way to do this with libvirt. Jirka?
I'm not sure if it works in this case, though. I'm curious if RHEV allows using this libvirt workaround. Thanks.
Yes, the host has Xeon 5507, Gainestown serie based on Nehalem microarchitecture (thx wikipedia).
I'm going to search for Westmere host.
So windows 10 starts OK on a host (rhel 6.7 with 3.5 vdsm) in 3.5 cluster with Westmere CPU module.
(Anyway, I think there would be some check if one can specific OS type put into a specific cluster.)
And what was repaired? What behaviour should we expect?
currently you can not run a windows 10 VM on any of this CPUs:
conroe, penryn, nehalem, opteron_g1, opteron_g2, opteron_g3, opteron_g4, opteron_g5
- Reassigned. cluster level cannot be changed to 3.5. Rejected with the next webadmin message:
Error while executing action: Cannot decrease data center compatibility version
Relevant engine logs:
2015-11-05 15:37:58,288 WARN [org.ovirt.engine.core.bll.UpdateVdsGroupCommand] (ajp-/127.0.0.1:8702-5) [1d402ae3] CanDoAction of action 'UpdateVdsGroup' failed for user admin@internal. Reasons: VAR__TYPE__CLUSTER,VAR__ACTION__UPDATE,ACTION_TYPE_FAILED_CANNOT_DECREASE_COMPATIBILITY_VERSION
2015-11-05 15:37:58,571 ERROR [org.ovirt.engine.core.bll.host.provider.foreman.SystemProviderFinder] (ajp-/127.0.0.1:8702-3)  Failed to find host on any provider by host name 'dhcp163-68.scl.lab.tlv.redhat.com'
2015-11-05 15:37:59,289 ERROR [org.ovirt.engine.core.bll.host.provider.foreman.SystemProviderFinder] (ajp-/127.0.0.1:8702-4)  Failed to find host on any provider by host name 'dhcp163-68.scl.lab.tlv.redhat.com'
1. Change data center compatibility version to 3.5
2. Change cluster compatibility version to 3.5 and click ok.
Action rejected by webadmin.
Issue is relevant for both AMD and Intel CPU types (Verified on AMD Opteron G5 and Intel Nehalem).
If https://bugzilla.redhat.com/show_bug.cgi?id=1269828#c12 is the bug fix, then an appropriate error message should appear in webadmin instead of "cannot decrease data center compatibility version" without an explanation.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
BTW, I verified if Win10 VM can be move from 3.6 cluster to 3.5 cluster (with 3.5.6 host) and it functioned as expected (Win10 VM can be run on 3.6 engine with 3.5 cluster)
Host CPU type was Opteron G3.
Created attachment 1090194 [details]
Reassign engine log
Decreasing cluster version is allowed only if:
- there are no hosts in the cluster
- the new version is not smaller than the DC version
This is an old behavior which has not been touched by this patch.
If this is the case than it is correct.
Also, what exact version of engine are you testing this on?
Strange is that on Opteron it was running...
Can you please double check the cluster's CPU Type?
Because on 3.6 branch I see this:
os.windows_10.cpu.unsupported.value = conroe, penryn, nehalem, opteron_g1, opteron_g2, opteron_g3, opteron_g4, opteron_g5
I verified it again, this time i first removed all the hosts from cluster as you mentioned.
Exact verification scenario:
1. Install windows 10 on 3.6 engine, 3.6 host (DC and cluster levels are 3.6).
2. Verify VM is running.
3. Power off VMs, remove host (verify no other hosts are in this cluster).
4. Reduce DC and cluster to compatibility version 3.5
5. Add 3.5 host to cluster.
6. Run Windows 10 VM.
VM is running properly.
engine: rhevm-188.8.131.52-0.1.el6 (3.6.0-17)
- 3.6 host:
- 3.5.6 host:
Cluster CPU type: AMD Opteron G3
Host CPU type: Quad-Core AMD Opteron(tm) Processor 2350
VM OS type was set to windows 10 x64 in webadmin During new VM creation (see bug https://bugzilla.redhat.com/show_bug.cgi?id=1278442)
Please move bug to ON_QA and I'll verify it.
current expected behavior is that Win10 (32bit, 64bit) does NOT run on (among others) Opteron G1-G5.
I suggest following test procedure.
1. Make sure DC and Cluster levels are set to 3.5
2. Make sure Opteron G3 host is in this cluster
4. Make sure Cluster CPU is set to Opteron G3
5. Create VM, set guest OS to 'Other OS. (no real installation required)
6. (ASSERT) vm runs
7. Stop vm
8. Set host to maitenance
9. Set Cluster CPU to Haswell
10. Set VM guest OS to Win10 (no real installation required)
11. Set Cluster CPU to Opteron G3
12. Activate Host
13. Run vm
Expected result: VM refuses to run with error message
The guest OS doesn't support the following CPUs: opteron_g5, opteron_g3, opteron_g4, opteron_g1, opteron_g2, conroe, nehalem, penryn. Its possible to change the cluster cpu or set a different one per VM
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
working since ~ 3.5.5
Westmere+ CPUS working fine (except el6 hosts with SandyBridge during W10 installation)