Bug 1235510 - [virtio-win][whql]WIn2012 guest could not boot up while running multiple processor job(reboot case)
Summary: [virtio-win][whql]WIn2012 guest could not boot up while running multiple proc...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: Peixiu Hou
URL:
Whiteboard:
Depends On: 1682882
Blocks: 1401400 1558351 1473046
TreeView+ depends on / blocked
 
Reported: 2015-06-25 03:11 UTC by Yu Wang
Modified: 2020-02-06 07:21 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-02 03:36:11 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1513833 None VERIFIED whql job "Multiple processor group device test" fail when boot with cpu flag "hv_time/hv_relaxed/hv_vapic/hv_spinlocks=0... 2019-11-06 01:15:40 UTC

Internal Links: 1513833

Description Yu Wang 2015-06-25 03:11:08 UTC
Description of problem:
WIn2012 guest could not boot up while running multiple processor job(reboot case)

Version-Release number of selected component (if applicable):
virtio-win-prewhql-105
qemu-kvm-rhev-2.3.0-2.el7.x86_64
kernel-3.10.0-267.el7.x86_64
seabios-1.7.5-9.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot guest with:
/usr/libexec/qemu-kvm -name 105SCS2012645FH -enable-kvm -m 6G -smp 8 -uuid 18052249-f65e-4592-afb6-639b6c8c3730 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/105SCS2012645FH,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=105SCS2012645FH,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=en_windows_server_2012_x64_dvd_915478.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=105SCS2012645FH.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:52:05:1f:cd:e0,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:2 -vga cirrus -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7,num_queues=8 -drive file=105SCS2012645FH_test.raw,if=none,id=drive-scsi-disk0,format=raw,serial=mike_cao,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0 -monitor stdio

2.running multiple processor job

Actual results:

Hang while reboot case

Expected results:
Reboot normally

Additional info:
-smp 8 change to -smp 6, reboot normally

Comment 2 Yu Wang 2015-06-25 03:36:36 UTC
The NMI dump file is located at http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/virtio-win/bug1235510/


*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 80, {4f4454, 0, 0, 0}

Probably caused by : ntkrnlmp.exe ( nt!WheaReportHwError+249 )

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction.  The hardware supplier should
be called.
Arguments:
Arg1: 00000000004f4454
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------


DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  0x80

PROCESS_NAME:  System

CURRENT_IRQL:  f

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

LOCK_ADDRESS:  fffff8011d55b740 -- (!locks fffff8011d55b740)

Resource @ nt!PiEngineLock (0xfffff8011d55b740)    Exclusively owned
    Contention Count = 2
     Threads: fffffa80056ab040-01<*> 
1 total locks, 1 locks currently held

PNP_TRIAGE: 
	Lock address  : 0xfffff8011d55b740
	Thread Count  : 1
	Thread address: 0xfffffa80056ab040
	Thread wait   : 0x2ab

LAST_CONTROL_TRANSFER:  from fffff8011d2468de to fffff8011d300040

STACK_TEXT:  
fffff801`1c702c08 fffff801`1d2468de : 00000000`00000080 00000000`004f4454 00000000`00000000 00000000`00000000 : nt!KeBugCheckEx
fffff801`1c702c10 fffff801`1d3dec09 : 00000000`00000001 fffff801`1d25a030 00000000`00000000 fffffa80`14a27968 : hal!HalBugCheckSystem+0x9a
fffff801`1c702c50 fffff801`1d247204 : 00000000`000006c0 fffff801`1c702e20 fffff801`1d57b100 fffff801`1d25a030 : nt!WheaReportHwError+0x249
fffff801`1c702cb0 fffff801`1d4597a7 : fffff801`1c702e70 00000000`00000010 00000000`80000005 fffff801`1d29a27d : hal!HalHandleNMI+0x150
fffff801`1c702ce0 fffff801`1d2fcd02 : 00000000`b411dd56 fffff801`1c702ef0 00000000`00000000 fffff801`1d57b180 : nt! ?? ::FNODOBFM::`string'+0x13d6d
fffff801`1c702d30 fffff801`1d2fcb73 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxNmiInterrupt+0x82
fffff801`1c702e70 fffff801`1d3330d1 : fffff880`00c6b080 00000000`8ebe4626 00000000`00000076 00000000`00000000 : nt!KiNmiInterrupt+0x173
fffff880`0373f5d0 fffff801`1d36de0c : fffff801`1d5d4a80 ffffffff`ffffffff fffff880`0373f8d9 00000000`00000000 : nt!KeFlushMultipleRangeTb+0x3c6
fffff880`0373f7d0 fffff801`1d2b3349 : fffffa80`056ab040 fffffa80`00000000 00000000`00000000 fffff880`00000000 : nt!MiFlushPteList+0x2c
fffff880`0373f800 fffff801`1d2b2e1e : fffff880`00fc0000 00000014`00000000 fffffa80`14acdd50 fffff801`1d5d4a80 : nt!MiRemoveMappedPtes+0x151
fffff880`0373f940 fffff801`1d64c147 : 00000000`0007ffff fffffa80`056ab040 00000000`00000001 00000000`00000000 : nt!MiRemoveFromSystemSpace+0x1ba
fffff880`0373f9c0 fffff801`1d64cc6a : fffffa80`14c13130 fffff880`00fb0000 00000000`00000000 00000000`00000001 : nt!MiUnmapImageInSystemSpace+0x4b
fffff880`0373f9f0 fffff801`1d6ee2df : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000002 : nt!MiValidateSectionCreate+0x2fa
fffff880`0373fb70 fffff801`1d6ea900 : fffffa80`14c13130 fffff880`01000002 00000000`00000222 00000000`00000220 : nt!MiCreateNewSection+0x20f
fffff880`0373fc70 fffff801`1d6fcf30 : fffff880`037401e0 00000000`00000020 fffff880`03740180 fffff801`1d3363de : nt!MiCreateSection+0x88c
fffff880`0373fe90 fffff801`1d2ff053 : fffffa80`056ab040 fffff880`03740148 fffff880`0373ff38 fffff880`03740280 : nt!NtCreateSection+0x1af
fffff880`0373ff20 fffff801`1d304230 : fffff801`1d70b3d8 fffff980`0141efd0 00000000`00000030 00000000`20206f49 : nt!KiSystemServiceCopyEnd+0x13
fffff880`03740128 fffff801`1d70b3d8 : fffff980`0141efd0 00000000`00000030 00000000`20206f49 fffff880`037403d8 : nt!KiServiceLinkage
fffff880`03740130 fffff801`1d7170ca : ffffffff`800000e0 fffff801`1d5fbc22 fffff801`1d54fa20 00000000`00000012 : nt!MiCreateSectionForDriver+0xe0
fffff880`037401e0 fffff801`1d716a58 : 00000000`00000000 fffff880`037402e9 00000000`00000000 00000000`20206f49 : nt!MiObtainSectionForDriver+0x8e
fffff880`03740230 fffff801`1d71743e : fffff880`03740390 00000000`00000000 00000000`00000001 00000000`00000000 : nt!MmLoadSystemImage+0x120
fffff880`03740340 fffff801`1d71179c : 00000000`00000000 00000000`00000000 fffff880`03740860 00000000`00000003 : nt!IopLoadDriver+0x2ca
fffff880`03740610 fffff801`1d70ffb6 : fffff8a0`003e7010 fffff880`0097b340 00000000`00000000 00000000`00000000 : nt!PipCallDriverAddDeviceQueryRoutine+0x22c
fffff880`03740730 fffff801`1d70dbc0 : 00000000`00000000 fffff880`00000002 00000000`00000000 00000000`000007ff : nt!PnpCallDriverQueryServiceHelper+0x13e
fffff880`037407b0 fffff801`1d70e11e : fffffa80`056c5c30 fffff880`03740a40 fffffa80`056f0d30 fffffa80`056f54a0 : nt!PipCallDriverAddDevice+0x400
fffff880`03740940 fffff801`1d781c07 : fffff801`1d309700 00000000`00000001 00000000`00000000 fffff801`1d608a52 : nt!PipProcessDevNodeTree+0x1ca
fffff880`03740bc0 fffff801`1d38b81f : 00000001`00000003 00000000`00000000 fffff801`1d5586e0 fffff8a0`000f6258 : nt!PiProcessStartSystemDevices+0x87
fffff880`03740c10 fffff801`1d338391 : fffffa80`056ab040 fffff801`1d38b4bc fffff801`1d5586e0 fffff801`1d309700 : nt!PnpDeviceActionWorker+0x363
fffff880`03740cc0 fffff801`1d2a7521 : 00000000`00000000 00000000`00000080 fffff801`1d338250 fffffa80`056ab040 : nt!ExpWorkerThread+0x142
fffff880`03740d50 fffff801`1d2e5dd6 : fffff801`1d57b180 fffffa80`056ab040 fffffa80`056ca040 fffffa80`056a4040 : nt!PspSystemThreadStartup+0x59
fffff880`03740da0 00000000`00000000 : fffff880`03741000 fffff880`0373b000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


STACK_COMMAND:  kb

FOLLOWUP_IP: 
nt!WheaReportHwError+249
fffff801`1d3dec09 eb7c            jmp     nt!WheaReportHwError+0x2c7 (fffff801`1d3dec87)

SYMBOL_STACK_INDEX:  2

SYMBOL_NAME:  nt!WheaReportHwError+249

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  5010ac4b

IMAGE_VERSION:  6.2.9200.16384

BUCKET_ID_FUNC_OFFSET:  249

FAILURE_BUCKET_ID:  0x80_VRF_nt!WheaReportHwError

BUCKET_ID:  0x80_VRF_nt!WheaReportHwError

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:0x80_vrf_nt!wheareporthwerror

FAILURE_ID_HASH:  {cc66e451-8fe3-4b00-cfe6-2474d51c5874}

Followup: MachineOwner
---------

Comment 4 lijin 2016-07-28 07:57:17 UTC
Is this bug dup of https://bugzilla.redhat.com/show_bug.cgi?id=1039469?

Comment 5 Vadim Rozenfeld 2016-07-28 12:03:11 UTC
(In reply to lijin from comment #4)
> Is this bug dup of https://bugzilla.redhat.com/show_bug.cgi?id=1039469?

They seem to be very close but probably not the same.

Comment 7 Peixiu Hou 2016-08-12 01:25:47 UTC
Hi Amnon,

For this issue, I done follows tests:

1. Tried with vioscsi build 102(rhel7.2 released version) + qemu-kvm-rhev-2.6.0-17.el7.x86_64, cannot reproduced this bug.  --rhel7.3
2. Tried with vioscsi build 124 + qemu-kvm-rhev-2.6.0-17.el7.x86_64, can reproduced this bug. --rhel7.3  
3. According to the RHEL7 vioscsi whql report: https://mojo.redhat.com/docs/DOC-941688, test with vioscsi build 102 + qemu-kvm-rhev-2.3.0-31.el7.x86_64, didn't hit this bug.  --rhel7.2

According to above test results, this bug isn't a regression in qemu-kvm-rhev. 


Best Regards~
Peixiu Hou

Comment 8 Vadim Rozenfeld 2016-08-12 02:50:21 UTC
(In reply to Peixiu Hou from comment #7)
> Hi Amnon,
> 
> For this issue, I done follows tests:
> 
> 1. Tried with vioscsi build 102(rhel7.2 released version) +
> qemu-kvm-rhev-2.6.0-17.el7.x86_64, cannot reproduced this bug.  --rhel7.3
> 2. Tried with vioscsi build 124 + qemu-kvm-rhev-2.6.0-17.el7.x86_64, can
> reproduced this bug. --rhel7.3  
> 3. According to the RHEL7 vioscsi whql report:
> https://mojo.redhat.com/docs/DOC-941688, test with vioscsi build 102 +
> qemu-kvm-rhev-2.3.0-31.el7.x86_64, didn't hit this bug.  --rhel7.2
> 
> According to above test results, this bug isn't a regression in
> qemu-kvm-rhev. 
> 
> 
> Best Regards~
> Peixiu Hou

I'm afraid we cannot make any assumption comparing these two build (102 and 124). They are totally different in the way how they utilize NUMA facilities.

Comment 9 Vadim Rozenfeld 2016-12-27 06:57:32 UTC
Can we give a try to virtio-win build 129?
Thanks,
Vadim.

Comment 10 Yu Wang 2016-12-27 09:42:15 UTC
(In reply to Vadim Rozenfeld from comment #9)
> Can we give a try to virtio-win build 129?
> Thanks,
> Vadim.

Hi Vadim,

Try w/ build 129 (w/ vioscsi or vioinput device), still hit this issue.

virtio-win-prewhql-129
qemu-kvm-rhev-2.6.0-29.el7.x86_64
kernel-3.10.0-537.el7.x86_64
seabios-1.9.0-5.el7.x86_64

Thanks
Yu Wang

Comment 16 lijin 2018-07-04 06:37:57 UTC
Set priority to high as QE hit this issue on win2012 100% for all drivers

Comment 19 Yu Wang 2019-03-27 07:45:08 UTC
reproduce steps:

1 boot guest with win2012-64 guest (6G memory 8 vcpus)

2 In guest:
bcdedit.exe /set groupsize 1 
bcdedit.exe /set maxgroup on 
bcdedit.exe /set groupaware on 

3 reboot guest

It will hang at the beginning.

It can still reproduce without any virtio devices, so change bug to qemu-kvm

Comment 20 Ademar Reis 2019-04-16 17:20:36 UTC
(In reply to Yu Wang from comment #19)
> reproduce steps:
> 
> 1 boot guest with win2012-64 guest (6G memory 8 vcpus)
> 
> 2 In guest:
> bcdedit.exe /set groupsize 1 
> bcdedit.exe /set maxgroup on 
> bcdedit.exe /set groupaware on 
> 
> 3 reboot guest
> 
> It will hang at the beginning.
> 
> It can still reproduce without any virtio devices, so change bug to qemu-kvm

Vadim, do you have any insights on what could be causing this? Perhaps you can take this BZ back (assignment)? I don't have any good candidate to take care of this BZ right now.

Comment 21 Ademar Reis 2019-07-26 21:54:14 UTC
(In reply to Ademar Reis from comment #20)
> (In reply to Yu Wang from comment #19)
> > reproduce steps:
> > 
> > 1 boot guest with win2012-64 guest (6G memory 8 vcpus)
> > 
> > 2 In guest:
> > bcdedit.exe /set groupsize 1 
> > bcdedit.exe /set maxgroup on 
> > bcdedit.exe /set groupaware on 
> > 
> > 3 reboot guest
> > 
> > It will hang at the beginning.
> > 
> > It can still reproduce without any virtio devices, so change bug to qemu-kvm
> 
> Vadim, do you have any insights on what could be causing this? Perhaps you
> can take this BZ back (assignment)? I don't have any good candidate to take
> care of this BZ right now.

There was an ongoing investigation until this point. We need guidance from the windows team.

Comment 22 Vadim Rozenfeld 2019-07-29 03:39:01 UTC
(In reply to Yu Wang from comment #19)
> reproduce steps:
> 
> 1 boot guest with win2012-64 guest (6G memory 8 vcpus)
> 
> 2 In guest:
> bcdedit.exe /set groupsize 1 
> bcdedit.exe /set maxgroup on 
> bcdedit.exe /set groupaware on 
> 
> 3 reboot guest
> 
> It will hang at the beginning.
> 
> It can still reproduce without any virtio devices, so change bug to qemu-kvm

Does it hang just after specifying the group parameters mentioned above,
without any test running? Can you please provide the full qemu command line
and qemu version? I was not able to make my test system hang after changing
group parameters, and "coreinfo" utility shows correct number of groups. But 
it is probably because I am doing something wrong.

Thanks,
Vadim.

Comment 23 Yu Wang 2019-07-29 06:30:45 UTC
(In reply to Vadim Rozenfeld from comment #22)
> (In reply to Yu Wang from comment #19)
> > reproduce steps:
> > 
> > 1 boot guest with win2012-64 guest (6G memory 8 vcpus)
> > 
> > 2 In guest:
> > bcdedit.exe /set groupsize 1 
> > bcdedit.exe /set maxgroup on 
> > bcdedit.exe /set groupaware on 
> > 
> > 3 reboot guest
> > 
> > It will hang at the beginning.
> > 
> > It can still reproduce without any virtio devices, so change bug to qemu-kvm
> 
> Does it hang just after specifying the group parameters mentioned above,
> without any test running? Can you please provide the full qemu command line
> and qemu version? I was not able to make my test system hang after changing
> group parameters, and "coreinfo" utility shows correct number of groups. But 
> it is probably because I am doing something wrong.
> 
> Thanks,
> Vadim.

Need not to run any test, only run as comment#19

Full commandline is :
/usr/libexec/qemu-kvm -name 172BLK126435D3I -enable-kvm -m 6G -smp 8 -uuid 4f1a308e-33c0-4314-b646-293ad9f227ed -nodefaults -cpu Skylake-Server,hv_stimer,hv_synic,hv_time,hv_relaxed,hv_vpindex,hv_spinlocks=0xfff,hv_vapic,hv_reset,hv-tlbflush -chardev socket,id=charmonitor,path=/tmp/172BLK126435D3I,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb -drive file=172BLK126435D3I,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bus=ide.0,unit=0 -drive file=/home/kvm_autotest_root/iso/ISO/Win2012/en_windows_server_2012_x64_dvd_915478.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,id=ide0-1-0,bus=ide.1,unit=0 -drive file=172BLK126435D3I.iso,if=none,media=cdrom,id=drive-ide0-1-1,readonly=on,format=raw -device ide-cd,drive=drive-ide0-1-1,id=ide0-1-1 -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:52:2d:25:bb:52 -device usb-tablet,id=input0 -vnc 0.0.0.0:1 -vga std -M q35 -device pcie-root-port,bus=pcie.0,id=root1.0,slot=1 -object iothread,id=thread0 -drive file=172BLK126435D3I_test.raw,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,iothread=thread0,scsi=off,bus=root1.0,drive=drive-virtio-disk0,id=virtio-disk0,serial=whql_test

Thanks
Yu Wang

Comment 32 Peixiu Hou 2020-02-03 05:39:02 UTC
Tested with "-numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 -numa node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 -numa node,nodeid=4,cpus=4 -numa node,nodeid=5,cpus=5 -numa node,nodeid=6,cpus=6 -numa node,nodeid=7,cpus=7 -smp 8" on win2012-64 vm, cannot reproduce this bug, the job "Multiple processor group device test" can be passed smoothly.

Tested guest os:
Win2012-64

Used versions:
kernel-4.18.0-175.el8.x86_64
qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64
seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64
virtio-win-prewhql-178

Best Regards~
Peixiu

Comment 33 Vadim Rozenfeld 2020-02-03 08:12:21 UTC
(In reply to Peixiu Hou from comment #32)
> Tested with "-numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 -numa
> node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 -numa node,nodeid=4,cpus=4
> -numa node,nodeid=5,cpus=5 -numa node,nodeid=6,cpus=6 -numa
> node,nodeid=7,cpus=7 -smp 8" on win2012-64 vm, cannot reproduce this bug,
> the job "Multiple processor group device test" can be passed smoothly.
> 
> Tested guest os:
> Win2012-64
> 
> Used versions:
> kernel-4.18.0-175.el8.x86_64
> qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64
> seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64
> virtio-win-prewhql-178
> 
> Best Regards~
> Peixiu

Good.
Many thanks for testing.
By any chance, could you check if the above configuration helps to solve 
https://bugzilla.redhat.com/show_bug.cgi?id=1039469 ?

Best,
Vadim.

Comment 34 Peixiu Hou 2020-02-04 03:58:56 UTC
(In reply to Vadim Rozenfeld from comment #33)
> (In reply to Peixiu Hou from comment #32)
> > Tested with "-numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 -numa
> > node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 -numa node,nodeid=4,cpus=4
> > -numa node,nodeid=5,cpus=5 -numa node,nodeid=6,cpus=6 -numa
> > node,nodeid=7,cpus=7 -smp 8" on win2012-64 vm, cannot reproduce this bug,
> > the job "Multiple processor group device test" can be passed smoothly.
> > 
> > Tested guest os:
> > Win2012-64
> > 
> > Used versions:
> > kernel-4.18.0-175.el8.x86_64
> > qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64
> > seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64
> > virtio-win-prewhql-178
> > 
> > Best Regards~
> > Peixiu
> 
> Good.
> Many thanks for testing.
> By any chance, could you check if the above configuration helps to solve 
> https://bugzilla.redhat.com/show_bug.cgi?id=1039469 ?
> 

Yes, this configuration also can solve your mentioned bug https://bugzilla.redhat.com/show_bug.cgi?id=1039469.

Due to "Default splitting of RAM between nodes is deprecated, Use '-numa node,memdev' to explictly define RAM allocation per node" on latest qemu version.
Tested with comment#32 cli, qemu-kvm will report warning:
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) qemu-kvm: warning: Default splitting of RAM between nodes is deprecated, Use '-numa node,memdev' to explictly define RAM allocation per node

Solved this problem with -numa node,memdev cli as follows:
=======================================================
-m 6G \
-object memory-backend-ram,id=mem0,size=768M \
-object memory-backend-ram,id=mem1,size=768M \
-object memory-backend-ram,id=mem2,size=768M \
-object memory-backend-ram,id=mem3,size=768M \
-object memory-backend-ram,id=mem4,size=768M \
-object memory-backend-ram,id=mem5,size=768M \
-object memory-backend-ram,id=mem6,size=768M \
-object memory-backend-ram,id=mem7,size=768M \
-numa node,memdev=mem0,nodeid=0,cpus=0 \
-numa node,memdev=mem1,nodeid=1,cpus=1 \
-numa node,memdev=mem2,nodeid=2,cpus=2 \
-numa node,memdev=mem3,nodeid=3,cpus=3 \
-numa node,memdev=mem4,nodeid=4,cpus=4 \
-numa node,memdev=mem5,nodeid=5,cpus=5 \
-numa node,memdev=mem6,nodeid=6,cpus=6 \
-numa node,memdev=mem7,nodeid=7,cpus=7 -smp 8 \
======================================================
Tested with these cli on both vioscsi and vioser driver, the job "Multiple processor group device test" all can be passed smoothly.

Best Regards~
Peixiu

> Best,
> Vadim.

Comment 35 Vadim Rozenfeld 2020-02-04 07:06:34 UTC
(In reply to Peixiu Hou from comment #34)
> (In reply to Vadim Rozenfeld from comment #33)
> > (In reply to Peixiu Hou from comment #32)
> > > Tested with "-numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 -numa
> > > node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 -numa node,nodeid=4,cpus=4
> > > -numa node,nodeid=5,cpus=5 -numa node,nodeid=6,cpus=6 -numa
> > > node,nodeid=7,cpus=7 -smp 8" on win2012-64 vm, cannot reproduce this bug,
> > > the job "Multiple processor group device test" can be passed smoothly.
> > > 
> > > Tested guest os:
> > > Win2012-64
> > > 
> > > Used versions:
> > > kernel-4.18.0-175.el8.x86_64
> > > qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64
> > > seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64
> > > virtio-win-prewhql-178
> > > 
> > > Best Regards~
> > > Peixiu
> > 
> > Good.
> > Many thanks for testing.
> > By any chance, could you check if the above configuration helps to solve 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1039469 ?
> > 
> 
> Yes, this configuration also can solve your mentioned bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1039469.
> 
> Due to "Default splitting of RAM between nodes is deprecated, Use '-numa
> node,memdev' to explictly define RAM allocation per node" on latest qemu
> version.
> Tested with comment#32 cli, qemu-kvm will report warning:
> QEMU 4.2.0 monitor - type 'help' for more information
> (qemu) qemu-kvm: warning: Default splitting of RAM between nodes is
> deprecated, Use '-numa node,memdev' to explictly define RAM allocation per
> node
> 
> Solved this problem with -numa node,memdev cli as follows:
> =======================================================
> -m 6G \
> -object memory-backend-ram,id=mem0,size=768M \
> -object memory-backend-ram,id=mem1,size=768M \
> -object memory-backend-ram,id=mem2,size=768M \
> -object memory-backend-ram,id=mem3,size=768M \
> -object memory-backend-ram,id=mem4,size=768M \
> -object memory-backend-ram,id=mem5,size=768M \
> -object memory-backend-ram,id=mem6,size=768M \
> -object memory-backend-ram,id=mem7,size=768M \
> -numa node,memdev=mem0,nodeid=0,cpus=0 \
> -numa node,memdev=mem1,nodeid=1,cpus=1 \
> -numa node,memdev=mem2,nodeid=2,cpus=2 \
> -numa node,memdev=mem3,nodeid=3,cpus=3 \
> -numa node,memdev=mem4,nodeid=4,cpus=4 \
> -numa node,memdev=mem5,nodeid=5,cpus=5 \
> -numa node,memdev=mem6,nodeid=6,cpus=6 \
> -numa node,memdev=mem7,nodeid=7,cpus=7 -smp 8 \
> ======================================================
> Tested with these cli on both vioscsi and vioser driver, the job "Multiple
> processor group device test" all can be passed smoothly.
> 
> Best Regards~
> Peixiu
> 
> > Best,
> > Vadim.

That's good. While it is still not clear if it is a KVM bug or a Windows limitation, 
we can use the above configuration as a workaround. Could you add some notes to QE 
known base to make sure that we are not hitting such problem in the future?

Best,
Vadim.

Comment 36 Peixiu Hou 2020-02-06 07:13:40 UTC
(In reply to Vadim Rozenfeld from comment #35)
> (In reply to Peixiu Hou from comment #34)
> > (In reply to Vadim Rozenfeld from comment #33)
> > > (In reply to Peixiu Hou from comment #32)
> > > > Tested with "-numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 -numa
> > > > node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 -numa node,nodeid=4,cpus=4
> > > > -numa node,nodeid=5,cpus=5 -numa node,nodeid=6,cpus=6 -numa
> > > > node,nodeid=7,cpus=7 -smp 8" on win2012-64 vm, cannot reproduce this bug,
> > > > the job "Multiple processor group device test" can be passed smoothly.
> > > > 
> > > > Tested guest os:
> > > > Win2012-64
> > > > 
> > > > Used versions:
> > > > kernel-4.18.0-175.el8.x86_64
> > > > qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64
> > > > seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64
> > > > virtio-win-prewhql-178
> > > > 
> > > > Best Regards~
> > > > Peixiu
> > > 
> > > Good.
> > > Many thanks for testing.
> > > By any chance, could you check if the above configuration helps to solve 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1039469 ?
> > > 
> > 
> > Yes, this configuration also can solve your mentioned bug
> > https://bugzilla.redhat.com/show_bug.cgi?id=1039469.
> > 
> > Due to "Default splitting of RAM between nodes is deprecated, Use '-numa
> > node,memdev' to explictly define RAM allocation per node" on latest qemu
> > version.
> > Tested with comment#32 cli, qemu-kvm will report warning:
> > QEMU 4.2.0 monitor - type 'help' for more information
> > (qemu) qemu-kvm: warning: Default splitting of RAM between nodes is
> > deprecated, Use '-numa node,memdev' to explictly define RAM allocation per
> > node
> > 
> > Solved this problem with -numa node,memdev cli as follows:
> > =======================================================
> > -m 6G \
> > -object memory-backend-ram,id=mem0,size=768M \
> > -object memory-backend-ram,id=mem1,size=768M \
> > -object memory-backend-ram,id=mem2,size=768M \
> > -object memory-backend-ram,id=mem3,size=768M \
> > -object memory-backend-ram,id=mem4,size=768M \
> > -object memory-backend-ram,id=mem5,size=768M \
> > -object memory-backend-ram,id=mem6,size=768M \
> > -object memory-backend-ram,id=mem7,size=768M \
> > -numa node,memdev=mem0,nodeid=0,cpus=0 \
> > -numa node,memdev=mem1,nodeid=1,cpus=1 \
> > -numa node,memdev=mem2,nodeid=2,cpus=2 \
> > -numa node,memdev=mem3,nodeid=3,cpus=3 \
> > -numa node,memdev=mem4,nodeid=4,cpus=4 \
> > -numa node,memdev=mem5,nodeid=5,cpus=5 \
> > -numa node,memdev=mem6,nodeid=6,cpus=6 \
> > -numa node,memdev=mem7,nodeid=7,cpus=7 -smp 8 \
> > ======================================================
> > Tested with these cli on both vioscsi and vioser driver, the job "Multiple
> > processor group device test" all can be passed smoothly.
> > 
> > Best Regards~
> > Peixiu
> > 
> > > Best,
> > > Vadim.
> 
> That's good. While it is still not clear if it is a KVM bug or a Windows
> limitation, 
> we can use the above configuration as a workaround. Could you add some notes
> to QE 
> known base to make sure that we are not hitting such problem in the future?
> 

Ok, got it, thanks~

> Best,
> Vadim.


Note You need to log in before you can comment on or make changes to this bug.