Bug 1982111

Summary: [WSL2] WSL2 fails to start when boot win vm with 'pc' machine type on Skylake machine
Product: Red Hat Enterprise Linux 9 Reporter: Li Xiaohui <xiaohli>
Component: qemu-kvmAssignee: Vitaly Kuznetsov <vkuznets>
qemu-kvm sub component: Machine Types QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: medium CC: ailan, chayang, hhei, jinzhao, ribarry, virt-maint, vkuznets, yafu
Version: 9.0Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-01 16:14:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Li Xiaohui 2021-07-14 08:32:59 UTC
Description of problem:
When boot win10 vm with 'pc' machine type, the cpu of host and guest is 'Skylake-Client-IBRS' or 'Skylake-Server-IBRS', fail to start wsl2:
"Please enable the Virtual Machine Platform Windows feature and ensure virtualization is enabled in the BIOS."


Version-Release number of selected component (if applicable):
hosts info: kernel-4.18.0-312.el8.x86_64 & qemu-kvm-6.0.0-23.module+el8.5.0+11740+35571f13.x86_64
host cpu: Model name:          Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz
[root@ibm-x3250m6-07 home]# virsh capabilities | grep mode
      <model>Skylake-Client-IBRS</model>
    <secmodel>
      <model>none</model>
    </secmodel>
    <secmodel>
      <model>dac</model>
    </secmodel>


How reproducible:
100%


Steps to Reproduce:
1.Boot a windows 10 vm with qemu cmds[1]
-machine pc \
-cpu Skylake-Client-IBRS,vmx=on \
2.Start wsl2 in windows 10 vm


Actual results:
Fail to start wsl2 in win10 vm:
"Please enable the Virtual Machine Platform Windows feature and ensure virtualization is enabled in the BIOS."


Expected results:
WSL2 works well under win10 vm


Additional info:
1.Found this bz when pre-verify following bz, and doesn't hit bz under 'q35' machine type: 
https://bugzilla.redhat.com/show_bug.cgi?id=1940837#c28
2.found Win10+WSL2 works well on rhelav-8.5.0 Cascadelake-Server-noTSX machine with the legacy 'pc' machine type and the cpu model of guest is specified to Cascadelake-Server-noTSX;
3.And on this Skylake-Client-IBRS machine where I find the new issue, Win10+WSL2 also works well with the new version cpu model "Skylake-Client-v4" under 'pc' machine type.  
4.Failed to start win10 vm with cpu cmds "-cpu Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' machine type on above host, I'm not sure whether they're same produce issue. I will open a new bz if they're not same root cause.


Qemu command line:
/usr/libexec/qemu-kvm  \
-name "mouse-vm" \
-sandbox off \
-machine pc \
-cpu Skylake-Client-IBRS,vmx=on \
-nodefaults  \
-vga std \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server=on,wait=off \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server=on,wait=off \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=pci.0,addr=0x5 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/test/win10-64-virtio-scsi.qcow2,node-name=drive_sys1 \
-blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \
-netdev tap,id=tap0,vhost=on \
-m 4096 \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm  \
-qmp tcp:0:3333,server=on,wait=off \
-qmp tcp:0:9999,server=on,wait=off \
-qmp tcp:0:9888,server=on,wait=off \
-serial tcp:0:4444,server=on,wait=off \
-monitor stdio \

Comment 1 Vitaly Kuznetsov 2021-07-14 12:23:25 UTC
(In reply to Li Xiaohui from comment #0)

> 3.And on this Skylake-Client-IBRS machine where I find the new issue,
> Win10+WSL2 also works well with the new version cpu model
> "Skylake-Client-v4" under 'pc' machine type.  
> 4.Failed to start win10 vm with cpu cmds "-cpu
> Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,
> hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,
> hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc'
> machine type on above host, I'm not sure whether they're same produce issue.

I'm a bit confused: do I understand correctly that 'Skylake-Client-v4' works well
without 'hv-*' flags and fails with them? 

'Skylake-Client-v4' adds 'xsaves' and 'vmx-xsaves' features which are needed by at
least WS2016 (https://bugzilla.redhat.com/show_bug.cgi?id=1942914), "Skylake-Client-IBRS"
CPU model doesn't have them.

Comment 2 Li Xiaohui 2021-07-14 12:43:55 UTC
(In reply to Vitaly Kuznetsov from comment #1)
> (In reply to Li Xiaohui from comment #0)
> 
> > 3.And on this Skylake-Client-IBRS machine where I find the new issue,
> > Win10+WSL2 also works well with the new version cpu model
> > "Skylake-Client-v4" under 'pc' machine type.  
> > 4.Failed to start win10 vm with cpu cmds "-cpu
> > Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,
> > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,
> > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc'
> > machine type on above host, I'm not sure whether they're same produce issue.
> 
> I'm a bit confused: do I understand correctly that 'Skylake-Client-v4' works
> well
> without 'hv-*' flags and fails with them? 

Sorry, described the wrong problem. Correct the problem:
Failed to start win10 vm with cpu cmds "-cpu Skylake-Client-IBRS,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc' machine type on above host, win10 hang in 'Preparing Automatic Repair', I'm not sure whether they're same product issue. Will open a new bz if they're not same root cause.

BTW, Win10+WSL2 works well with 'hv-*' flags under the cpu model 'Skylake-Client-v4'.

> 
> 'Skylake-Client-v4' adds 'xsaves' and 'vmx-xsaves' features which are needed
> by at
> least WS2016 (https://bugzilla.redhat.com/show_bug.cgi?id=1942914),
> "Skylake-Client-IBRS"
> CPU model doesn't have them.

Comment 3 Vitaly Kuznetsov 2021-07-14 14:44:17 UTC
(In reply to Li Xiaohui from comment #2)
> (In reply to Vitaly Kuznetsov from comment #1)
> > (In reply to Li Xiaohui from comment #0)
> > 
> > > 3.And on this Skylake-Client-IBRS machine where I find the new issue,
> > > Win10+WSL2 also works well with the new version cpu model
> > > "Skylake-Client-v4" under 'pc' machine type.  
> > > 4.Failed to start win10 vm with cpu cmds "-cpu
> > > Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,
> > > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,
> > > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc'
> > > machine type on above host, I'm not sure whether they're same produce issue.
> > 
> > I'm a bit confused: do I understand correctly that 'Skylake-Client-v4' works
> > well
> > without 'hv-*' flags and fails with them? 
> 
> Sorry, described the wrong problem. Correct the problem:
> Failed to start win10 vm with cpu cmds "-cpu
> Skylake-Client-IBRS,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,
> hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,
> hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc'
> machine type on above host, win10 hang in 'Preparing Automatic Repair', I'm
> not sure whether they're same product issue. Will open a new bz if they're
> not same root cause.
> 
> BTW, Win10+WSL2 works well with 'hv-*' flags under the cpu model
> 'Skylake-Client-v4'.

This sounds exactly like https://bugzilla.redhat.com/show_bug.cgi?id=1942914 then.
Note: existing CPU models (like 'Skylake-Client-IBRS') can't be changed to support
new feature or we will be breaking migrations so the only solution is to introduce
new CPU models (like the above mentioned 'Skylake-Client-v4'). That's what was done
in BZ#1942914.

So getting back to the original report, is there anything which is not working with
'Skylake-Client-v4'?

Comment 4 Li Xiaohui 2021-07-15 02:48:58 UTC
(In reply to Vitaly Kuznetsov from comment #3)
> (In reply to Li Xiaohui from comment #2)
> > (In reply to Vitaly Kuznetsov from comment #1)
> > > (In reply to Li Xiaohui from comment #0)
> > > 
> > > > 3.And on this Skylake-Client-IBRS machine where I find the new issue,
> > > > Win10+WSL2 also works well with the new version cpu model
> > > > "Skylake-Client-v4" under 'pc' machine type.  
> > > > 4.Failed to start win10 vm with cpu cmds "-cpu
> > > > Skylake-Client-v4,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,
> > > > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,
> > > > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc'
> > > > machine type on above host, I'm not sure whether they're same produce issue.
> > > 
> > > I'm a bit confused: do I understand correctly that 'Skylake-Client-v4' works
> > > well
> > > without 'hv-*' flags and fails with them? 
> > 
> > Sorry, described the wrong problem. Correct the problem:
> > Failed to start win10 vm with cpu cmds "-cpu
> > Skylake-Client-IBRS,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,
> > hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,
> > hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs" under 'pc'
> > machine type on above host, win10 hang in 'Preparing Automatic Repair', I'm
> > not sure whether they're same product issue. Will open a new bz if they're
> > not same root cause.
> > 
> > BTW, Win10+WSL2 works well with 'hv-*' flags under the cpu model
> > 'Skylake-Client-v4'.
> 
> This sounds exactly like https://bugzilla.redhat.com/show_bug.cgi?id=1942914 then.

I don't think so.

In original problem I found in bz 1942914(following tests all are under 'q35' machine type):
1)Win10+WSL2 works well with cpu model 'Skylake-Client-IBRS' without xsaves related flags or with 'xsaves=off' , the cpu cmd:
-cpu Skylake-Client-IBRS,vmx=on \ 
2)WSL2 fail to start with 'xsaves=on' appending to cpu cmd:
-cpu Skylake-Client-IBRS,vmx=on,xsaves=on \

But now only change 'q35' to 'pc', but will fail to start wsl2 with '-cpu Skylake-Client-IBRS,vmx=on'


> Note: existing CPU models (like 'Skylake-Client-IBRS') can't be changed to
> support
> new feature or we will be breaking migrations so the only solution is to
> introduce
> new CPU models (like the above mentioned 'Skylake-Client-v4'). That's what
> was done
> in BZ#1942914.
> 
> So getting back to the original report, is there anything which is not
> working with
> 'Skylake-Client-v4'?

No currently.

Comment 5 Vitaly Kuznetsov 2021-07-15 08:18:53 UTC
(In reply to Li Xiaohui from comment #4)
> 
> In original problem I found in bz 1942914(following tests all are under
> 'q35' machine type):
> 1)Win10+WSL2 works well with cpu model 'Skylake-Client-IBRS' without xsaves
> related flags or with 'xsaves=off' , the cpu cmd:
> -cpu Skylake-Client-IBRS,vmx=on \ 
> 2)WSL2 fail to start with 'xsaves=on' appending to cpu cmd:
> -cpu Skylake-Client-IBRS,vmx=on,xsaves=on \
> 
> But now only change 'q35' to 'pc', but will fail to start wsl2 with '-cpu
> Skylake-Client-IBRS,vmx=on'
> 

OK, so let me rephrase to make sure I understood:

In BZ#1942914 it was discovered that WS2016+Hyper-V can't start without
xsaves/vmx-xsaves with 'q35' machine type. We didn't check 'pc' back then.

In this BZ it was discovered that Win10+WSL2 can't start without xsaves/
vmx-xsaves with 'pc' machine type but work well with 'q35'.

(Note: the difference between Skylake-Client-IBRS and Skylake-Client-v4 is
exactly 'xsaves/vmx-xsaves')

Am I being correct? In case yes, the resolution for this BZ will be the same
as for BZ#1942914 - use new CPU types which have xsaves/vmx-xsaves by default
or manually enable xsaves/vmx-xsaves.

Comment 6 John Ferlan 2021-07-22 18:43:41 UTC
Vitaly - assigning directly to you since you seem to be engaged already.

Comment 8 Vitaly Kuznetsov 2021-07-23 11:01:16 UTC
If the issue doesn't reproduce with the latest CPU models (e.g. Skylake-Client-v4) then I'm intended to close this as a duplicate of BZ#1942914.
Li Xiaohui, please confirm. Thanks.

Comment 9 Li Xiaohui 2021-07-27 07:31:37 UTC
I'm trying to test under more situations, will list all results here after testing. So let's wait now.

Comment 10 Li Xiaohui 2021-07-28 14:12:12 UTC
Hi Vitaly, 
I have tested following scenarios(using same qemu cmds except cpu command) on same host(qemu-kvm-6.0.0-25.module+el8.5.0+11890+8e7c3f51.x86_64).
Please help confirm whether we should go on track following errors in bugzilla?

Notes:
1) error[1]: Please enable the Virtual Machine Platform Windows feature and ensure virtualization is enabled in the BIOS.
2) error[2]: Win10 guest hang in the booting stage with error 'Preparing Automatic Repair'
3) $hyper-v_flags means: hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs


                 |  -cpu Skylake-Client-IBRS,vmx=on                  | -cpu Skylake-Client-v4,vmx=on                 |
----------------------------------------------------------------------------------------------------------------------
pc machine type  |  can't start wsl2 with error[1]                   |  works well with wsl2                         |
----------------------------------------------------------------------------------------------------------------------
q35 machine type |  works well with wsl2                             |  works well with wsl2                         | 


                 |  -cpu Skylake-Client-IBRS,vmx=on,$hyper-v_flags   | -cpu Skylake-Client-v4,vmx=on,$hyper-v_flags  |
----------------------------------------------------------------------------------------------------------------------
pc machine type  |  win10 hang with error[2] (work well without wsl2)|  works well with wsl2                         |
----------------------------------------------------------------------------------------------------------------------
q35 machine type |  works well with wsl2                             |  works well with wsl2                         |     

Through above two errors under pc machine type, I'm curious why error only happen under pc machine type, but q35 machine type work well at same situations? Could you help give some comments?



                | -cpu Skylake-Client-IBRS,vmx=on,xsaves=off,vmx-xsaves=off| -cpu Skylake-Client-IBRS,vmx=on,xsaves=on,vmx-xsaves=on|
----------------------------------------------------------------------------------------------------------------------------------------
pc machine type | can't start wsl2 with error[1]                           |  can't start wsl2 with error[1]                        |
----------------------------------------------------------------------------------------------------------------------------------------
q35 machine type| works well with wsl2                                     |  works well with wsl2                                  | 


                 |  -cpu Skylake-Client-v4,vmx=on,xsaves=off,vmx-xsaves=off | -cpu Skylake-Client-v4,vmx=on,xsaves=on,vmx-xsaves=on |
-------------------------------------------------------------------------------------------------------------------------------------
pc machine type  |  works well with wsl2                                    |  works well with wsl2                                 |
-------------------------------------------------------------------------------------------------------------------------------------
q35 machine type |  works well with wsl2                                    |  works well with wsl2                                 |

See above two charts, error[1] only happen under pc machine type and Skylake-Client-IBRS model too.



Though win10+wsl2 work well with Skylake-Client-v4 model, don't we need care Skylake-Client-IBRS and fix errors under pc machine type comparing the different results between pc and q35 machine type?

Comment 11 Vitaly Kuznetsov 2021-07-29 15:51:45 UTC
(In reply to Li Xiaohui from comment #10)
> Hi Vitaly, 
> I have tested following scenarios(using same qemu cmds except cpu command)
> on same host(qemu-kvm-6.0.0-25.module+el8.5.0+11890+8e7c3f51.x86_64).
> Please help confirm whether we should go on track following errors in
> bugzilla?
> 

Thank for for running these additional tests! See below

> 
> Through above two errors under pc machine type, I'm curious why error only
> happen under pc machine type, but q35 machine type work well at same
> situations? Could you help give some comments?

I'm not exactly sure but my guess would be that Windows uses chipset
name to enable or disable certain features and some of them require
missing CPU features.

> 
> 
> 
>                 | -cpu Skylake-Client-IBRS,vmx=on,xsaves=off,vmx-xsaves=off|
> -cpu Skylake-Client-IBRS,vmx=on,xsaves=on,vmx-xsaves=on|
> -----------------------------------------------------------------------------
> -----------------------------------------------------------
> pc machine type | can't start wsl2 with error[1]                           |
> can't start wsl2 with error[1]                        |

I forgot one thing, "Skylake-Client-IBRS" is an alias for "Skylake-Client-v2".
"Skylake-Client-v3" (alias "Skylake-Client-noTSX-IBRS") also has "hle=off,rtm=off"

and "Skylake-Client-v4" also has "xsaves=on,vmx-xsaves=on".

If you add "hle=off,rtm=off,xsaves=on,vmx-xsaves=on" to Skylake-Client-IBRS I'd
expect it to work, it should look exactly like "Skylake-Client-v4".

...
> 
> See above two charts, error[1] only happen under pc machine type and
> Skylake-Client-IBRS model too.
> 
> 
> Though win10+wsl2 work well with Skylake-Client-v4 model, don't we need care
> Skylake-Client-IBRS and fix errors under pc machine type comparing the
> different results between pc and q35 machine type?

The thing is that the only way to fix the problem (we know of) is to add CPU
features but we can't change the list of CPU features for the existing CPU models
and not break migration. That's why we solve these issues by introducing new
CPU models (e.g. Skylake-Client-v4).

Comment 12 Li Xiaohui 2021-07-30 13:22:32 UTC
(In reply to Vitaly Kuznetsov from comment #11)
> (In reply to Li Xiaohui from comment #10)
> > Hi Vitaly, 
> > I have tested following scenarios(using same qemu cmds except cpu command)
> > on same host(qemu-kvm-6.0.0-25.module+el8.5.0+11890+8e7c3f51.x86_64).
> > Please help confirm whether we should go on track following errors in
> > bugzilla?
> > 
> 
> Thank for for running these additional tests! See below
> 
> > 
> > Through above two errors under pc machine type, I'm curious why error only
> > happen under pc machine type, but q35 machine type work well at same
> > situations? Could you help give some comments?
> 
> I'm not exactly sure but my guess would be that Windows uses chipset
> name to enable or disable certain features and some of them require
> missing CPU features.
> 
> > 
> > 
> > 
> >                 | -cpu Skylake-Client-IBRS,vmx=on,xsaves=off,vmx-xsaves=off|
> > -cpu Skylake-Client-IBRS,vmx=on,xsaves=on,vmx-xsaves=on|
> > -----------------------------------------------------------------------------
> > -----------------------------------------------------------
> > pc machine type | can't start wsl2 with error[1]                           |
> > can't start wsl2 with error[1]                        |
> 
> I forgot one thing, "Skylake-Client-IBRS" is an alias for
> "Skylake-Client-v2".
> "Skylake-Client-v3" (alias "Skylake-Client-noTSX-IBRS") also has
> "hle=off,rtm=off"
> 
> and "Skylake-Client-v4" also has "xsaves=on,vmx-xsaves=on".
> 
> If you add "hle=off,rtm=off,xsaves=on,vmx-xsaves=on" to Skylake-Client-IBRS
> I'd
> expect it to work, it should look exactly like "Skylake-Client-v4".

Tried the cpu cmd '-cpu Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still fail to start wls2 with error[1].

> 
> ...
> > 
> > See above two charts, error[1] only happen under pc machine type and
> > Skylake-Client-IBRS model too.
> > 
> > 
> > Though win10+wsl2 work well with Skylake-Client-v4 model, don't we need care
> > Skylake-Client-IBRS and fix errors under pc machine type comparing the
> > different results between pc and q35 machine type?
> 
> The thing is that the only way to fix the problem (we know of) is to add CPU
> features but we can't change the list of CPU features for the existing CPU
> models
> and not break migration. That's why we solve these issues by introducing new
> CPU models (e.g. Skylake-Client-v4).

I'd advise go on analyse this bz.

Comment 13 Vitaly Kuznetsov 2021-07-30 14:19:04 UTC
(In reply to Li Xiaohui from comment #12)

> Tried the cpu cmd '-cpu
> Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still
> fail to start wls2 with error[1].

Hm, this is unexpected as 

"Skylake-Client-v4,vmx=on" should be equal to "Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on"

what if you try

"Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on"

?

Comment 14 Li Xiaohui 2021-07-30 14:37:01 UTC
(In reply to Vitaly Kuznetsov from comment #13)
> (In reply to Li Xiaohui from comment #12)
> 
> > Tried the cpu cmd '-cpu
> > Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still
> > fail to start wls2 with error[1].
> 
> Hm, this is unexpected as 
> 
> "Skylake-Client-v4,vmx=on" should be equal to
> "Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on"
> 
> what if you try
> 
> "Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on"
> 
> ?

Hi, Win10 + WSL2 work well under below three cpu cmds:
1) -cpu Skylake-Client-noTSX-IBRS,vmx=on
2) -cpu Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on
3) -cpu Skylake-Client-noTSX-IBRS,vmx=on,$hyper-v_flags

Comment 15 Vitaly Kuznetsov 2021-07-30 14:44:23 UTC
(In reply to Li Xiaohui from comment #14)
> 
> Hi, Win10 + WSL2 work well under below three cpu cmds:
> 1) -cpu Skylake-Client-noTSX-IBRS,vmx=on
> 2) -cpu Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on
> 3) -cpu Skylake-Client-noTSX-IBRS,vmx=on,$hyper-v_flags

Thanks for trying! I'm surprised by the result even more as

'-cpu Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on'

which was previously reported to be broken is equal to the working

'-cpu Skylake-Client-noTSX-IBRS,vmx=on,xsaves=on,vmx-xsaves=on'

as far as I understand.

P.S. I'll be on PTO next two weeks, I plan to take a look when I'm back.
Let's keep this open for now.

Comment 16 Li Xiaohui 2021-08-17 07:04:44 UTC
Could someone help remove the ITR 8.5.0 in this bz since it's nearly ITM 26 of 8.5.0 (Aug 30) and have no time to fix.
 
We could go on track this bz in next release rhel-8.6.0 because the priority of the bz is not high, and it only happens under pc machine type on special cpu model.

Comment 17 Vitaly Kuznetsov 2021-08-17 07:26:39 UTC
Agreed, moving to backlog for now.

Comment 18 Vitaly Kuznetsov 2021-09-03 14:01:56 UTC
(In reply to Li Xiaohui from comment #12)
> 
> Tried the cpu cmd '-cpu
> Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still
> fail to start wls2 with error[1].
> 

I got back to this BZ (sorry for the delay) and I'm testing the following:

kernel-4.18.0-339.el8.x86_64
qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64

I've tried the following:

/usr/libexec/qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -name guest=win10 -cpu Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi -smp 12 -m 8192 -drive file=/var/lib/libvirt/images/win10-64-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc :0 -rtc base=localtime,driftfix=slew --no-hpet -monitor stdio

and this seems to work well, my Win10+WSL2 guest starts without issues. Do I understand
correctly that the same command line still fails for you? I may need to access your
hardware to debug further then.

Comment 19 Li Xiaohui 2021-09-08 11:30:51 UTC
(In reply to Vitaly Kuznetsov from comment #18)
> (In reply to Li Xiaohui from comment #12)
> > 
> > Tried the cpu cmd '-cpu
> > Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on', still
> > fail to start wls2 with error[1].
> > 
> 
> I got back to this BZ (sorry for the delay) and I'm testing the following:
> 
> kernel-4.18.0-339.el8.x86_64
> qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64
> 
> I've tried the following:
> 
> /usr/libexec/qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -name
> guest=win10 -cpu
> Skylake-Client-IBRS,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,hv_stimer,
> hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,
> hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,
> hv_ipi -smp 12 -m 8192 -drive
> file=/var/lib/libvirt/images/win10-64-virtio-scsi.qcow2,format=qcow2,if=none,
> id=drive-ide0-0-0 -device
> ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc :0
> -rtc base=localtime,driftfix=slew --no-hpet -monitor stdio
> 
> and this seems to work well, my Win10+WSL2 guest starts without issues.

Win10+WSL2 works well since you use the 'q35' machine type: -machine q35,... 


> Do I understand correctly that the same command line still fails for you? I may need to
> access your hardware to debug further then.

Please boot vm with '-machine pc' like the qemu command line in my Comment 0, then you will see WSL2 fail to start.

BTW, I use my machines to handle other issues now. I believe you could reproduce bz when you use similar cmds as mine in Comment 0.

Comment 20 Vitaly Kuznetsov 2021-09-08 12:41:16 UTC
(In reply to Li Xiaohui from comment #19)
> 
> Win10+WSL2 works well since you use the 'q35' machine type: -machine q35,... 
> 
> Please boot vm with '-machine pc' like the qemu command line in my Comment
> 0, then you will see WSL2 fail to start.

Sorry, my bad, I forgot 'pc' was essential. I, however, was finaly able to
figure out what's wrong (I hope).

In RHEL8, 'pc' machine type is an alias for pc-i440fx-rhel7.6.0 which is quite
old. Unlike upstream (and unlike q35) it is not updated with every major version
change.

When pc-i440fx-rhel7.6.0 machine type was implemented, cpu models used to have
'mpx' feature enabled by default. Namely:

GlobalProperty pc_rhel_7_6_compat[] = {
    { "Skylake-Client" "-" TYPE_X86_CPU, "mpx", "on" },
    { "Skylake-Client-IBRS" "-" TYPE_X86_CPU, "mpx", "on" },
    { "Skylake-Server" "-" TYPE_X86_CPU, "mpx", "on" },
    { "Skylake-Server-IBRS" "-" TYPE_X86_CPU, "mpx", "on" },
    { "Cascadelake-Server" "-" TYPE_X86_CPU, "mpx", "on" },
    { "Icelake-Client" "-" TYPE_X86_CPU, "mpx", "on" },
    { "Icelake-Server" "-" TYPE_X86_CPU, "mpx", "on" },
};

So if you pick any of these models, you get 'mpx' feature enabled by default. It
doesn't work well with Windows so you can either:
- Disable it manually ("mpx=off")
- Pick a newer cpu model (Skylake-Client-v4 for example). It'll have "mpx=off" by
default.

I don't think there's anything we can do as the issue only applies to legacy machine
type + legacy CPU models. We can't change the default without breaking migrations
and creating problems for those who need these features, unfortunately.

Could you please verify 'mpx=off' solves the problem for you? E.g. the following should
work:

-cpu Skylake-Client-IBRS,mpx=off,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,[all other hyperv flags]

Comment 21 John Ferlan 2021-09-08 21:28:21 UTC
Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 22 Vitaly Kuznetsov 2021-09-09 05:23:22 UTC
(In reply to John Ferlan from comment #21)
> Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to
> the current RHEL8 release.

I'm not 100% sure the bug exists in RHEL9, Li Xiaohui you you please also verify?

Comment 23 Li Xiaohui 2021-09-13 04:46:10 UTC
(In reply to Vitaly Kuznetsov from comment #22)
> (In reply to John Ferlan from comment #21)
> > Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to
> > the current RHEL8 release.
> 
> I'm not 100% sure the bug exists in RHEL9, Li Xiaohui you you please also
> verify?

Thank you Vitaly, I will test with '-cpu Skylake-Client-IBRS,mpx=off,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,[all other hyperv flags]' on rhel8 and rhel9 this week.

Comment 24 Li Xiaohui 2021-09-26 12:27:48 UTC
Didn't reproduce bz on the rhel8.6.0(kernel-4.18.0-340.el8.x86_64&qemu-kvm-6.1.0-1.module+el8.6.0+12721+8d053ff2.x86_64), I tried on same machine with same qemu cmds, but win10+wsl2 works well. 

Also test below scenarios, win10+wsl2 works well under pc machine type:
1)'-cpu Skylake-Client-IBRS,vmx=on'
2)'-cpu Skylake-Client-IBRS,mpx=off,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs';
3)'-cpu Skylake-Client-IBRS,mpx=off,vmx-xsaves=on,xsaves=on,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi'


BTW, test above three scenarios on rhel9(kernel-5.14.0-3.el9.x86_64&qemu-img-6.1.0-2.el9.x86_64), win10+wsl2 works well too under pc machine type.
The cpu model of rhel9 machine is Skylake-Server-IBRS, other qemu cmds are same with above.



Do we have some code changes on rhel8.6 and rhel9 since rhel8.5 didn't work well with '-cpu Skylake-Client-IBRS,vmx=on'?
Do I need retest on rhel8.5.0 with '-cpu Skylake-Client-IBRS,mpx=off,vmx-xsaves=on,xsaves=on,vmx=on' and hyper-v flags? I think it's unnecessary as rhel8.6 and rhel9 works well now. Maybe we could close this bz as currentrelease?

PS: sorry about the delay reply since the intel machines were broken last week

Comment 25 Vitaly Kuznetsov 2021-09-29 08:41:19 UTC
(In reply to Li Xiaohui from comment #24)
> Didn't reproduce bz on the
> rhel8.6.0(kernel-4.18.0-340.el8.x86_64&qemu-kvm-6.1.0-1.module+el8.6.
> 0+12721+8d053ff2.x86_64), I tried on same machine with same qemu cmds, but
> win10+wsl2 works well. 
> 
> Also test below scenarios, win10+wsl2 works well under pc machine type:
> 1)'-cpu Skylake-Client-IBRS,vmx=on'
> 2)'-cpu
> Skylake-Client-IBRS,mpx=off,vmx=on,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,
> hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,hv_runtime,
> hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs';
> 3)'-cpu
> Skylake-Client-IBRS,mpx=off,vmx-xsaves=on,xsaves=on,ss=on,vmx=on,pdcm=on,
> hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-
> capabilities=on,ssbd=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,
> skip-l1dfl-vmentry=on,pschange-mc-no=on,hv_stimer,hv_synic,hv_vpindex,
> hv_relaxed,hv_spinlocks=0xfff,hv_crash,hv_vapic,hv_time,hv_frequencies,
> hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi'
> 
> 
> BTW, test above three scenarios on
> rhel9(kernel-5.14.0-3.el9.x86_64&qemu-img-6.1.0-2.el9.x86_64), win10+wsl2
> works well too under pc machine type.
> The cpu model of rhel9 machine is Skylake-Server-IBRS, other qemu cmds are
> same with above.
> 

Thanks for testing these! I'd expect RHEL9 to behave similarly as 'pc' machine
type is the same legacy 7.6.0.

> 
> 
> Do we have some code changes on rhel8.6 and rhel9 since rhel8.5 didn't work
> well with '-cpu Skylake-Client-IBRS,vmx=on'?

Out of top of my head I'm not aware of any differences between 8.5 and 8.6 which
would be causing a behavioral change here. Could you check if it's kernel or
QEMU change which makes a difference? It would also be great to know the first
'fixed' version.

> Do I need retest on rhel8.5.0 with '-cpu
> Skylake-Client-IBRS,mpx=off,vmx-xsaves=on,xsaves=on,vmx=on' and hyper-v
> flags? I think it's unnecessary as rhel8.6 and rhel9 works well now. Maybe
> we could close this bz as currentrelease?

Yes, in case the result is consistent. Are you testing 8.5 and 8.6 on the same
host CPU?

> 
> PS: sorry about the delay reply since the intel machines were broken last
> week

No problem, thanks for your persistence here! Hope we're getting closer to knowing
all the peculiarities here.

Comment 26 Li Xiaohui 2021-10-13 07:20:42 UTC
Hi, as bug doesn't reproduce on current release, I would find out the first  kernel or qemu 'fixed' version once finish some function tests on these machines, it needs some days, thanks.

> Yes, in case the result is consistent. Are you testing 8.5 and 8.6 on the same
> host CPU?

Yes, I use the same machine and same qemu cmds.

Comment 27 Vitaly Kuznetsov 2021-12-01 16:14:23 UTC
Li Xiaohui, it seems the BZ got stuck. I'm closing this as CURRENTRELEASE as:
1) Everything seems to work with RHEL8.6/9.0
2) "-cpu Skylake-Client-IBRS,mpx=off,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,[all other hyperv flags]" should resolve the issue for legacy CPU models.

Feel free to reopen if something still seems to be broken. Thanks for your testing efforts!

Comment 28 Li Xiaohui 2021-12-02 01:31:51 UTC
(In reply to Vitaly Kuznetsov from comment #27)
> Li Xiaohui, it seems the BZ got stuck. I'm closing this as CURRENTRELEASE as:
> 1) Everything seems to work with RHEL8.6/9.0
> 2) "-cpu
> Skylake-Client-IBRS,mpx=off,vmx=on,hle=off,rtm=off,xsaves=on,vmx-xsaves=on,
> [all other hyperv flags]" should resolve the issue for legacy CPU models.
> 
> Feel free to reopen if something still seems to be broken. Thanks for your
> testing efforts!

Ok, thanks you too.