Bug 1982111
| Summary: | [WSL2] WSL2 fails to start when boot win vm with 'pc' machine type on Skylake machine | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Li Xiaohui <xiaohli> |
| Component: | qemu-kvm | Assignee: | Vitaly Kuznetsov <vkuznets> |
| qemu-kvm sub component: | Machine Types | QA Contact: | Li Xiaohui <xiaohli> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | ailan, chayang, hhei, jinzhao, ribarry, virt-maint, vkuznets, yafu |
| Version: | 9.0 | Keywords: | Triaged |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-01 16:14:23 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
(In reply to Li Xiaohui from comment #0) > 3.And on this Skylake-Client-IBRS machine where I find the new issue, > Win10+WSL2 also works well with the new version cpu model > "Skylake-Client-v4" under 'pc' machine type. > 4.Failed to start win10 vm with cpu cmds "-cpu > Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed, > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime, > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' > machine type on above host, I'm not sure whether they're same produce issue. I'm a bit confused: do I understand correctly that 'Skylake-Client-v4' works well without 'hv-*' flags and fails with them? 'Skylake-Client-v4' adds 'xsaves' and 'vmx-xsaves' features which are needed by at least WS2016 (https://bugzilla.redhat.com/show_bug.cgi?id=1942914), "Skylake-Client-IBRS" CPU model doesn't have them. (In reply to Vitaly Kuznetsov from comment #1) > (In reply to Li Xiaohui from comment #0) > > > 3.And on this Skylake-Client-IBRS machine where I find the new issue, > > Win10+WSL2 also works well with the new version cpu model > > "Skylake-Client-v4" under 'pc' machine type. > > 4.Failed to start win10 vm with cpu cmds "-cpu > > Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed, > > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime, > > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' > > machine type on above host, I'm not sure whether they're same produce issue. > > I'm a bit confused: do I understand correctly that 'Skylake-Client-v4' works > well > without 'hv-*' flags and fails with them? Sorry, described the wrong problem. Correct the problem: Failed to start win10 vm with cpu cmds "-cpu Skylake-Client-IBRS,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' machine type on above host, win10 hang in 'Preparing Automatic Repair', I'm not sure whether they're same product issue. Will open a new bz if they're not same root cause. BTW, Win10+WSL2 works well with 'hv-*' flags under the cpu model 'Skylake-Client-v4'. > > 'Skylake-Client-v4' adds 'xsaves' and 'vmx-xsaves' features which are needed > by at > least WS2016 (https://bugzilla.redhat.com/show_bug.cgi?id=1942914), > "Skylake-Client-IBRS" > CPU model doesn't have them. (In reply to Li Xiaohui from comment #2) > (In reply to Vitaly Kuznetsov from comment #1) > > (In reply to Li Xiaohui from comment #0) > > > > > 3.And on this Skylake-Client-IBRS machine where I find the new issue, > > > Win10+WSL2 also works well with the new version cpu model > > > "Skylake-Client-v4" under 'pc' machine type. > > > 4.Failed to start win10 vm with cpu cmds "-cpu > > > Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed, > > > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime, > > > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' > > > machine type on above host, I'm not sure whether they're same produce issue. > > > > I'm a bit confused: do I understand correctly that 'Skylake-Client-v4' works > > well > > without 'hv-*' flags and fails with them? > > Sorry, described the wrong problem. Correct the problem: > Failed to start win10 vm with cpu cmds "-cpu > Skylake-Client-IBRS,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed, > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime, > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' > machine type on above host, win10 hang in 'Preparing Automatic Repair', I'm > not sure whether they're same product issue. Will open a new bz if they're > not same root cause. > > BTW, Win10+WSL2 works well with 'hv-*' flags under the cpu model > 'Skylake-Client-v4'. This sounds exactly like https://bugzilla.redhat.com/show_bug.cgi?id=1942914 then. Note: existing CPU models (like 'Skylake-Client-IBRS') can't be changed to support new feature or we will be breaking migrations so the only solution is to introduce new CPU models (like the above mentioned 'Skylake-Client-v4'). That's what was done in BZ#1942914. So getting back to the original report, is there anything which is not working with 'Skylake-Client-v4'? (In reply to Vitaly Kuznetsov from comment #3) > (In reply to Li Xiaohui from comment #2) > > (In reply to Vitaly Kuznetsov from comment #1) > > > (In reply to Li Xiaohui from comment #0) > > > > > > > 3.And on this Skylake-Client-IBRS machine where I find the new issue, > > > > Win10+WSL2 also works well with the new version cpu model > > > > "Skylake-Client-v4" under 'pc' machine type. > > > > 4.Failed to start win10 vm with cpu cmds "-cpu > > > > Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed, > > > > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime, > > > > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' > > > > machine type on above host, I'm not sure whether they're same produce issue. > > > > > > I'm a bit confused: do I understand correctly that 'Skylake-Client-v4' works > > > well > > > without 'hv-*' flags and fails with them? > > > > Sorry, described the wrong problem. Correct the problem: > > Failed to start win10 vm with cpu cmds "-cpu > > Skylake-Client-IBRS,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed, > > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime, > > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' > > machine type on above host, win10 hang in 'Preparing Automatic Repair', I'm > > not sure whether they're same product issue. Will open a new bz if they're > > not same root cause. > > > > BTW, Win10+WSL2 works well with 'hv-*' flags under the cpu model > > 'Skylake-Client-v4'. > > This sounds exactly like https://bugzilla.redhat.com/show_bug.cgi?id=1942914 then. I don't think so. In original problem I found in bz 1942914(following tests all are under 'q35' machine type): 1)Win10+WSL2 works well with cpu model 'Skylake-Client-IBRS' without xsaves related flags or with 'xsaves=off' , the cpu cmd: -cpu Skylake-Client-IBRS,vmx=on \ 2)WSL2 fail to start with 'xsaves=on' appending to cpu cmd: -cpu Skylake-Client-IBRS,vmx=on,xsaves=on \ But now only change 'q35' to 'pc', but will fail to start wsl2 with '-cpu Skylake-Client-IBRS,vmx=on' > Note: existing CPU models (like 'Skylake-Client-IBRS') can't be changed to > support > new feature or we will be breaking migrations so the only solution is to > introduce > new CPU models (like the above mentioned 'Skylake-Client-v4'). That's what > was done > in BZ#1942914. > > So getting back to the original report, is there anything which is not > working with > 'Skylake-Client-v4'? No currently. (In reply to Li Xiaohui from comment #4) > > In original problem I found in bz 1942914(following tests all are under > 'q35' machine type): > 1)Win10+WSL2 works well with cpu model 'Skylake-Client-IBRS' without xsaves > related flags or with 'xsaves=off' , the cpu cmd: > -cpu Skylake-Client-IBRS,vmx=on \ > 2)WSL2 fail to start with 'xsaves=on' appending to cpu cmd: > -cpu Skylake-Client-IBRS,vmx=on,xsaves=on \ > > But now only change 'q35' to 'pc', but will fail to start wsl2 with '-cpu > Skylake-Client-IBRS,vmx=on' > OK, so let me rephrase to make sure I understood: In BZ#1942914 it was discovered that WS2016+Hyper-V can't start without xsaves/vmx-xsaves with 'q35' machine type. We didn't check 'pc' back then. In this BZ it was discovered that Win10+WSL2 can't start without xsaves/ vmx-xsaves with 'pc' machine type but work well with 'q35'. (Note: the difference between Skylake-Client-IBRS and Skylake-Client-v4 is exactly 'xsaves/vmx-xsaves') Am I being correct? In case yes, the resolution for this BZ will be the same as for BZ#1942914 - use new CPU types which have xsaves/vmx-xsaves by default or manually enable xsaves/vmx-xsaves. Vitaly - assigning directly to you since you seem to be engaged already. If the issue doesn't reproduce with the latest CPU models (e.g. Skylake-Client-v4) then I'm intended to close this as a duplicate of BZ#1942914. Li Xiaohui, please confirm. Thanks. I'm trying to test under more situations, will list all results here after testing. So let's wait now. Hi Vitaly,
I have tested following scenarios(using same qemu cmds except cpu command) on same host(qemu-kvm-6.0.0-25.module+el8.5.0+11890+8e7c3f51.x86_64).
Please help confirm whether we should go on track following errors in bugzilla?
Notes:
1) error[1]: Please enable the Virtual Machine Platform Windows feature and ensure virtualization is enabled in the BIOS.
2) error[2]: Win10 guest hang in the booting stage with error 'Preparing Automatic Repair'
3) $hyper-v_flags means: hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs
| -cpu Skylake-Client-IBRS,vmx=on | -cpu Skylake-Client-v4,vmx=on |
----------------------------------------------------------------------------------------------------------------------
pc machine type | can't start wsl2 with error[1] | works well with wsl2 |
----------------------------------------------------------------------------------------------------------------------
q35 machine type | works well with wsl2 | works well with wsl2 |
| -cpu Skylake-Client-IBRS,vmx=on,$hyper-v_flags | -cpu Skylake-Client-v4,vmx=on,$hyper-v_flags |
----------------------------------------------------------------------------------------------------------------------
pc machine type | win10 hang with error[2] (work well without wsl2)| works well with wsl2 |
----------------------------------------------------------------------------------------------------------------------
q35 machine type | works well with wsl2 | works well with wsl2 |
Through above two errors under pc machine type, I'm curious why error only happen under pc machine type, but q35 machine type work well at same situations? Could you help give some comments?
| -cpu Skylake-Client-IBRS,vmx=on,xsaves=off,vmx-xsaves=off| -cpu Skylake-Client-IBRS,vmx=on,xsaves=on,vmx-xsaves=on|
----------------------------------------------------------------------------------------------------------------------------------------
pc machine type | can't start wsl2 with error[1] | can't start wsl2 with error[1] |
----------------------------------------------------------------------------------------------------------------------------------------
q35 machine type| works well with wsl2 | works well with wsl2 |
| -cpu Skylake-Client-v4,vmx=on,xsaves=off,vmx-xsaves=off | -cpu Skylake-Client-v4,vmx=on,xsaves=on,vmx-xsaves=on |
-------------------------------------------------------------------------------------------------------------------------------------
pc machine type | works well with wsl2 | works well with wsl2 |
-------------------------------------------------------------------------------------------------------------------------------------
q35 machine type | works well with wsl2 | works well with wsl2 |
See above two charts, error[1] only happen under pc machine type and Skylake-Client-IBRS model too.
Though win10+wsl2 work well with Skylake-Client-v4 model, don't we need care Skylake-Client-IBRS and fix errors under pc machine type comparing the different results between pc and q35 machine type?
(In reply to Li Xiaohui from comment #10) > Hi Vitaly, > I have tested following scenarios(using same qemu cmds except cpu command) > on same host(qemu-kvm-6.0.0-25.module+el8.5.0+11890+8e7c3f51.x86_64). > Please help confirm whether we should go on track following errors in > bugzilla? > Thank for for running these additional tests! See below > > Through above two errors under pc machine type, I'm curious why error only > happen under pc machine type, but q35 machine type work well at same > situations? Could you help give some comments? I'm not exactly sure but my guess would be that Windows uses chipset name to enable or disable certain features and some of them require missing CPU features. > > > > | -cpu Skylake-Client-IBRS,vmx=on,xsaves=off,vmx-xsaves=off| > -cpu Skylake-Client-IBRS,vmx=on,xsaves=on,vmx-xsaves=on| > ----------------------------------------------------------------------------- > ----------------------------------------------------------- > pc machine type | can't start wsl2 with error[1] | > can't start wsl2 with error[1] | I forgot one thing, "Skylake-Client-IBRS" is an alias for "Skylake-Client-v2". "Skylake-Client-v3" (alias "Skylake-Client-noTSX-IBRS") also has "hle=off,rtm=off" and "Skylake-Client-v4" also has "xsaves=on,vmx-xsaves=on". If you add "hle=off,rtm=off,xsaves=on,vmx-xsaves=on" to Skylake-Client-IBRS I'd expect it to work, it should look exactly like "Skylake-Client-v4". ... > > See above two charts, error[1] only happen under pc machine type and > Skylake-Client-IBRS model too. > > > Though win10+wsl2 work well with Skylake-Client-v4 model, don't we need care > Skylake-Client-IBRS and fix errors under pc machine type comparing the > different results between pc and q35 machine type? The thing is that the only way to fix the problem (we know of) is to add CPU features but we can't change the list of CPU features for the existing CPU models and not break migration. That's why we solve these issues by introducing new CPU models (e.g. Skylake-Client-v4). (In reply to Vitaly Kuznetsov from comment #11) > (In reply to Li Xiaohui from comment #10) > > Hi Vitaly, > > I have tested following scenarios(using same qemu cmds except cpu command) > > on same host(qemu-kvm-6.0.0-25.module+el8.5.0+11890+8e7c3f51.x86_64). > > Please help confirm whether we should go on track following errors in > > bugzilla? > > > > Thank for for running these additional tests! See below > > > > > Through above two errors under pc machine type, I'm curious why error only > > happen under pc machine type, but q35 machine type work well at same > > situations? Could you help give some comments? > > I'm not exactly sure but my guess would be that Windows uses chipset > name to enable or disable certain features and some of them require > missing CPU features. > > > > > > > > > | -cpu Skylake-Client-IBRS,vmx=on,xsaves=off,vmx-xsaves=off| > > -cpu Skylake-Client-IBRS,vmx=on,xsaves=on,vmx-xsaves=on| > > ----------------------------------------------------------------------------- > > ----------------------------------------------------------- > > pc machine type | can't start wsl2 with error[1] | > > can't start wsl2 with error[1] | > > I forgot one thing, "Skylake-Client-IBRS" is an alias for > "Skylake-Client-v2". > "Skylake-Client-v3" (alias "Skylake-Client-noTSX-IBRS") also has > "hle=off,rtm=off" > > and "Skylake-Client-v4" also has "xsaves=on,vmx-xsaves=on". > > If you add "hle=off,rtm=off,xsaves=on,vmx-xsaves=on" to Skylake-Client-IBRS > I'd > expect it to work, it should look exactly like "Skylake-Client-v4". Tried the cpu cmd '-cpu Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still fail to start wls2 with error[1]. > > ... > > > > See above two charts, error[1] only happen under pc machine type and > > Skylake-Client-IBRS model too. > > > > > > Though win10+wsl2 work well with Skylake-Client-v4 model, don't we need care > > Skylake-Client-IBRS and fix errors under pc machine type comparing the > > different results between pc and q35 machine type? > > The thing is that the only way to fix the problem (we know of) is to add CPU > features but we can't change the list of CPU features for the existing CPU > models > and not break migration. That's why we solve these issues by introducing new > CPU models (e.g. Skylake-Client-v4). I'd advise go on analyse this bz. (In reply to Li Xiaohui from comment #12) > Tried the cpu cmd '-cpu > Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still > fail to start wls2 with error[1]. Hm, this is unexpected as "Skylake-Client-v4,vmx=on" should be equal to "Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on" what if you try "Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on" ? (In reply to Vitaly Kuznetsov from comment #13) > (In reply to Li Xiaohui from comment #12) > > > Tried the cpu cmd '-cpu > > Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still > > fail to start wls2 with error[1]. > > Hm, this is unexpected as > > "Skylake-Client-v4,vmx=on" should be equal to > "Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on" > > what if you try > > "Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on" > > ? Hi, Win10 + WSL2 work well under below three cpu cmds: 1) -cpu Skylake-Client-noTSX-IBRS,vmx=on 2) -cpu Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on 3) -cpu Skylake-Client-noTSX-IBRS,vmx=on,$hyper-v_flags (In reply to Li Xiaohui from comment #14) > > Hi, Win10 + WSL2 work well under below three cpu cmds: > 1) -cpu Skylake-Client-noTSX-IBRS,vmx=on > 2) -cpu Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on > 3) -cpu Skylake-Client-noTSX-IBRS,vmx=on,$hyper-v_flags Thanks for trying! I'm surprised by the result even more as '-cpu Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on' which was previously reported to be broken is equal to the working '-cpu Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on' as far as I understand. P.S. I'll be on PTO next two weeks, I plan to take a look when I'm back. Let's keep this open for now. Could someone help remove the ITR 8.5.0 in this bz since it's nearly ITM 26 of 8.5.0 (Aug 30) and have no time to fix. We could go on track this bz in next release rhel-8.6.0 because the priority of the bz is not high, and it only happens under pc machine type on special cpu model. Agreed, moving to backlog for now. (In reply to Li Xiaohui from comment #12) > > Tried the cpu cmd '-cpu > Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still > fail to start wls2 with error[1]. > I got back to this BZ (sorry for the delay) and I'm testing the following: kernel-4.18.0-339.el8.x86_64 qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64 I've tried the following: /usr/libexec/qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -name guest=win10 -cpu Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi -smp 12 -m 8192 -drive file=/var/lib/libvirt/images/win10-64-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc :0 -rtc base=localtime,driftfix=slew --no-hpet -monitor stdio and this seems to work well, my Win10+WSL2 guest starts without issues. Do I understand correctly that the same command line still fails for you? I may need to access your hardware to debug further then. (In reply to Vitaly Kuznetsov from comment #18) > (In reply to Li Xiaohui from comment #12) > > > > Tried the cpu cmd '-cpu > > Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still > > fail to start wls2 with error[1]. > > > > I got back to this BZ (sorry for the delay) and I'm testing the following: > > kernel-4.18.0-339.el8.x86_64 > qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64 > > I've tried the following: > > /usr/libexec/qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -name > guest=win10 -cpu > Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,hv_stimer, > hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time, > hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct, > hv_ipi -smp 12 -m 8192 -drive > file=/var/lib/libvirt/images/win10-64-virtio-scsi.qcow2,format=qcow2,if=none, > id=drive-ide0-0-0 -device > ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc :0 > -rtc base=localtime,driftfix=slew --no-hpet -monitor stdio > > and this seems to work well, my Win10+WSL2 guest starts without issues. Win10+WSL2 works well since you use the 'q35' machine type: -machine q35,... > Do I understand correctly that the same command line still fails for you? I may need to > access your hardware to debug further then. Please boot vm with '-machine pc' like the qemu command line in my Comment 0, then you will see WSL2 fail to start. BTW, I use my machines to handle other issues now. I believe you could reproduce bz when you use similar cmds as mine in Comment 0. (In reply to Li Xiaohui from comment #19) > > Win10+WSL2 works well since you use the 'q35' machine type: -machine q35,... > > Please boot vm with '-machine pc' like the qemu command line in my Comment > 0, then you will see WSL2 fail to start. Sorry, my bad, I forgot 'pc' was essential. I, however, was finaly able to figure out what's wrong (I hope). In RHEL8, 'pc' machine type is an alias for pc-i440fx-rhel7.6.0 which is quite old. Unlike upstream (and unlike q35) it is not updated with every major version change. When pc-i440fx-rhel7.6.0 machine type was implemented, cpu models used to have 'mpx' feature enabled by default. Namely: GlobalProperty pc_rhel_7_6_compat[] = { { "Skylake-Client" "-" TYPE_X86_CPU, "mpx", "on" }, { "Skylake-Client-IBRS" "-" TYPE_X86_CPU, "mpx", "on" }, { "Skylake-Server" "-" TYPE_X86_CPU, "mpx", "on" }, { "Skylake-Server-IBRS" "-" TYPE_X86_CPU, "mpx", "on" }, { "Cascadelake-Server" "-" TYPE_X86_CPU, "mpx", "on" }, { "Icelake-Client" "-" TYPE_X86_CPU, "mpx", "on" }, { "Icelake-Server" "-" TYPE_X86_CPU, "mpx", "on" }, }; So if you pick any of these models, you get 'mpx' feature enabled by default. It doesn't work well with Windows so you can either: - Disable it manually ("mpx=off") - Pick a newer cpu model (Skylake-Client-v4 for example). It'll have "mpx=off" by default. I don't think there's anything we can do as the issue only applies to legacy machine type + legacy CPU models. We can't change the default without breaking migrations and creating problems for those who need these features, unfortunately. Could you please verify 'mpx=off' solves the problem for you? E.g. the following should work: -cpu Skylake-Client-IBRS,mpx=off,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,[all other hyperv flags] Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. (In reply to John Ferlan from comment #21) > Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to > the current RHEL8 release. I'm not 100% sure the bug exists in RHEL9, Li Xiaohui you you please also verify? (In reply to Vitaly Kuznetsov from comment #22) > (In reply to John Ferlan from comment #21) > > Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to > > the current RHEL8 release. > > I'm not 100% sure the bug exists in RHEL9, Li Xiaohui you you please also > verify? Thank you Vitaly, I will test with '-cpu Skylake-Client-IBRS,mpx=off,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,[all other hyperv flags]' on rhel8 and rhel9 this week. Didn't reproduce bz on the rhel8.6.0(kernel-4.18.0-340.el8.x86_64&qemu-kvm-6.1.0-1.module+el8.6.0+12721+8d053ff2.x86_64), I tried on same machine with same qemu cmds, but win10+wsl2 works well. Also test below scenarios, win10+wsl2 works well under pc machine type: 1)'-cpu Skylake-Client-IBRS,vmx=on' 2)'-cpu Skylake-Client-IBRS,mpx=off,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs'; 3)'-cpu Skylake-Client-IBRS,mpx=off,vmx-xsaves=on,xsaves=on,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi' BTW, test above three scenarios on rhel9(kernel-5.14.0-3.el9.x86_64&qemu-img-6.1.0-2.el9.x86_64), win10+wsl2 works well too under pc machine type. The cpu model of rhel9 machine is Skylake-Server-IBRS, other qemu cmds are same with above. Do we have some code changes on rhel8.6 and rhel9 since rhel8.5 didn't work well with '-cpu Skylake-Client-IBRS,vmx=on'? Do I need retest on rhel8.5.0 with '-cpu Skylake-Client-IBRS,mpx=off,vmx-xsaves=on,xsaves=on,vmx=on' and hyper-v flags? I think it's unnecessary as rhel8.6 and rhel9 works well now. Maybe we could close this bz as currentrelease? PS: sorry about the delay reply since the intel machines were broken last week (In reply to Li Xiaohui from comment #24) > Didn't reproduce bz on the > rhel8.6.0(kernel-4.18.0-340.el8.x86_64&qemu-kvm-6.1.0-1.module+el8.6. > 0+12721+8d053ff2.x86_64), I tried on same machine with same qemu cmds, but > win10+wsl2 works well. > > Also test below scenarios, win10+wsl2 works well under pc machine type: > 1)'-cpu Skylake-Client-IBRS,vmx=on' > 2)'-cpu > Skylake-Client-IBRS,mpx=off,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed, > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime, > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs'; > 3)'-cpu > Skylake-Client-IBRS,mpx=off,vmx-xsaves=on,xsaves=on,ss=on,vmx=on,pdcm=on, > hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch- > capabilities=on,ssbd=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on, > skip-l1dfl-vmentry=on,pschange-mc-no=on,hv_stimer,hv_synic,hv_vpindex, > hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies, > hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi' > > > BTW, test above three scenarios on > rhel9(kernel-5.14.0-3.el9.x86_64&qemu-img-6.1.0-2.el9.x86_64), win10+wsl2 > works well too under pc machine type. > The cpu model of rhel9 machine is Skylake-Server-IBRS, other qemu cmds are > same with above. > Thanks for testing these! I'd expect RHEL9 to behave similarly as 'pc' machine type is the same legacy 7.6.0. > > > Do we have some code changes on rhel8.6 and rhel9 since rhel8.5 didn't work > well with '-cpu Skylake-Client-IBRS,vmx=on'? Out of top of my head I'm not aware of any differences between 8.5 and 8.6 which would be causing a behavioral change here. Could you check if it's kernel or QEMU change which makes a difference? It would also be great to know the first 'fixed' version. > Do I need retest on rhel8.5.0 with '-cpu > Skylake-Client-IBRS,mpx=off,vmx-xsaves=on,xsaves=on,vmx=on' and hyper-v > flags? I think it's unnecessary as rhel8.6 and rhel9 works well now. Maybe > we could close this bz as currentrelease? Yes, in case the result is consistent. Are you testing 8.5 and 8.6 on the same host CPU? > > PS: sorry about the delay reply since the intel machines were broken last > week No problem, thanks for your persistence here! Hope we're getting closer to knowing all the peculiarities here. Hi, as bug doesn't reproduce on current release, I would find out the first kernel or qemu 'fixed' version once finish some function tests on these machines, it needs some days, thanks.
> Yes, in case the result is consistent. Are you testing 8.5 and 8.6 on the same
> host CPU?
Yes, I use the same machine and same qemu cmds.
Li Xiaohui, it seems the BZ got stuck. I'm closing this as CURRENTRELEASE as: 1) Everything seems to work with RHEL8.6/9.0 2) "-cpu Skylake-Client-IBRS,mpx=off,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,[all other hyperv flags]" should resolve the issue for legacy CPU models. Feel free to reopen if something still seems to be broken. Thanks for your testing efforts! (In reply to Vitaly Kuznetsov from comment #27) > Li Xiaohui, it seems the BZ got stuck. I'm closing this as CURRENTRELEASE as: > 1) Everything seems to work with RHEL8.6/9.0 > 2) "-cpu > Skylake-Client-IBRS,mpx=off,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on, > [all other hyperv flags]" should resolve the issue for legacy CPU models. > > Feel free to reopen if something still seems to be broken. Thanks for your > testing efforts! Ok, thanks you too. |
Description of problem: When boot win10 vm with 'pc' machine type, the cpu of host and guest is 'Skylake-Client-IBRS' or 'Skylake-Server-IBRS', fail to start wsl2: "Please enable the Virtual Machine Platform Windows feature and ensure virtualization is enabled in the BIOS." Version-Release number of selected component (if applicable): hosts info: kernel-4.18.0-312.el8.x86_64 & qemu-kvm-6.0.0-23.module+el8.5.0+11740+35571f13.x86_64 host cpu: Model name: Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz [root@ibm-x3250m6-07 home]# virsh capabilities | grep mode <model>Skylake-Client-IBRS</model> <secmodel> <model>none</model> </secmodel> <secmodel> <model>dac</model> </secmodel> How reproducible: 100% Steps to Reproduce: 1.Boot a windows 10 vm with qemu cmds[1] -machine pc \ -cpu Skylake-Client-IBRS,vmx=on \ 2.Start wsl2 in windows 10 vm Actual results: Fail to start wsl2 in win10 vm: "Please enable the Virtual Machine Platform Windows feature and ensure virtualization is enabled in the BIOS." Expected results: WSL2 works well under win10 vm Additional info: 1.Found this bz when pre-verify following bz, and doesn't hit bz under 'q35' machine type: https://bugzilla.redhat.com/show_bug.cgi?id=1940837#c28 2.found Win10+WSL2 works well on rhelav-8.5.0 Cascadelake-Server-noTSX machine with the legacy 'pc' machine type and the cpu model of guest is specified to Cascadelake-Server-noTSX; 3.And on this Skylake-Client-IBRS machine where I find the new issue, Win10+WSL2 also works well with the new version cpu model "Skylake-Client-v4" under 'pc' machine type. 4.Failed to start win10 vm with cpu cmds "-cpu Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' machine type on above host, I'm not sure whether they're same produce issue. I will open a new bz if they're not same root cause. Qemu command line: /usr/libexec/qemu-kvm \ -name "mouse-vm" \ -sandbox off \ -machine pc \ -cpu Skylake-Client-IBRS,vmx=on \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server=on,wait=off \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server=on,wait=off \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=pci.0,addr=0x5 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/test/win10-64-virtio-scsi.qcow2,node-name=drive_sys1 \ -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ -netdev tap,id=tap0,vhost=on \ -m 4096 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -vnc :10 \ -rtc base=utc,clock=host \ -boot menu=off,strict=off,order=cdn,once=c \ -enable-kvm \ -qmp tcp:0:3333,server=on,wait=off \ -qmp tcp:0:9999,server=on,wait=off \ -qmp tcp:0:9888,server=on,wait=off \ -serial tcp:0:4444,server=on,wait=off \ -monitor stdio \