Bug 1432567
| Summary: | [virtio-win][vioscsi] Crash dump not generated with num_queues=4 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Ladi Prosek <lprosek> |
| Component: | virtio-win | Assignee: | Ladi Prosek <lprosek> |
| virtio-win sub component: | virtio-win-prewhql | QA Contact: | Virtualization Bugs <virt-bugs> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | ailan, coli, lijin, michen, peliu, phou, vrozenfe, wyu, xiagao, xuhan, xuwei, yhong |
| Version: | 7.3 | Keywords: | Regression |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: two many queues are allocated in in dump mode,
Consequence:this results in attempts to allocate excessive
amount of memory and VioScsiFindAdapter failing.
Fix: Allocate only one virtqueue in dump mode
Result: memory dump can be generated
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-01 12:58:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ladi Prosek
2017-03-15 17:03:16 UTC
Looks like regression introduced by
commit 4bf912c46cf3519534ed83a7f22d8c6a4adb211f
Author: Julius Rus <iuliur>
Date: Wed Feb 1 14:22:40 2017 -0800
Fix bluescreen when num_queues > num_cpus.
Can you please confirm that build 131 has no such problem?
Thanks,
Vadim.
(In reply to Vadim Rozenfeld from comment #2) > Looks like regression introduced by > commit 4bf912c46cf3519534ed83a7f22d8c6a4adb211f > Author: Julius Rus <iuliur> > Date: Wed Feb 1 14:22:40 2017 -0800 > Fix bluescreen when num_queues > num_cpus. > > Can you please confirm that build 131 has no such problem? 131 has the same problem. This was introduced in commit 7cfc971d62a361796d497adbb5c7c3bb7cac635a Author: Vadim Rozenfeld <vrozenfe> Date: Tue Jun 28 16:31:55 2016 +1000 [vioscsi] fix regression in Hot-Add Device HCK test So it is a 7.2 -> 7.3 regression. Fix has been committed: https://github.com/virtio-win/kvm-guest-drivers-windows/commit/7184e7110170f87d499e6976878eb3a3614034cd Hi Ladi, Could you please help check my reproduced steps? I cannot reproduce this issue with virtio-win-prewhql-134 & 131. The Crash dump is generated. Steps: 1. Launch a Win7 64-bit VM with qemu cli: /usr/libexec/qemu-kvm -name win7-64 -enable-kvm -m 3G -smp 4,sockets=1,cores=4,threads=1 -cpu SandyBridge -uuid ea78071a-f6e4-4347-8077-9cb9f7953a84 -nodefconfig --nodefaults -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -device virtio-scsi-pci,id=scsi0,num_queues=4 -drive file=133QSRWIN764BWA,if=none,id=drive-scsi-disk0,format=raw,serial=22,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0 -drive file=en_windows_server_2008_datacenter_enterprise_standard_sp2_x64_dvd_342336.iso,media=cdrom,id=cdrom,if=none -device ide-drive,drive=cdrom,bootindex=1 -netdev tap,id=hostnet0,vhost=on,vhostforce=off -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:83:66:77:88:66,bus=pci.0,addr=0x3,status=on -vnc 0.0.0.0:1 -vga std -monitor stdio -qmp tcp:0:4446,server,nowait 2. Cause a BSOD with the NotMyFault tool. Used version: kernel-3.10.0-606.el7.x86_64 qemu-kvm-rhev-2.8.0-6.el7.x86_64 seabios-1.10.2-1.el7.x86_64 virtio-win-prewhql-134 & 131 Best Regards~ Peixiu Hi Peixiu Hou, (In reply to Peixiu Hou from comment #6) > Hi Ladi, > > Could you please help check my reproduced steps? I cannot reproduce this > issue with virtio-win-prewhql-134 & 131. The Crash dump is generated. Apologies for incomplete/incorrect repro steps. Turns out my test VM was running with disable-modern=true and modern virtio needs much less physically contiguous memory. Can you please try it with: -device virtio-scsi-pci,id=scsi0,num_queues=12 (and 12 vcpus) or -device virtio-scsi-pci,id=scsi0,num_queues=4,disable-modern=true Thanks! Ladi Hi Ladi, Tried with "-device virtio-scsi-pci,id=scsi0,num_queues=12 (and 12 vcpus)", reproduced this issue. Tried with "-device virtio-scsi-pci,id=scsi0,num_queues=4,disable-modern=true", reproduced this issue. Thanks a lot~ Peixiu Hou Reproduced this issue on virtio-win-prewhql-131&134 version Verified this issue on virtio-win-prewhql-135 version Steps same as comment#7 Actual Results: on virtio-win-prewhql-134&131 (un-fixed version), Crash dump is not generated. on virtio-win-prewhql-135 (fix version), Crash dump is generated (expected results). So this issue has been fixed,thanks. Version-Release number of selected component kernel-3.10.0-634.el7.x86_6 qemu-kvm-rhev-2.8.0-6.el7.x86_64 seabios-1.10.2-1.el7.x86_64 change status to verified according to comment#9 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2341 Hi Vadim,
We're reviewing the test case, I have some question for this issue.
We have a test case for this issue, case steps as follows:
1. Start VM with virtio-scsi-pci (system disk)
-object iothread,id=iothread0 \
-device virtio-scsi-pci,id=scsi0,iothread=iothread0 \
-drive file=OS.raw,if=none,id=drive-scsi-disk0,format=raw,serial=22,cache=none \
-device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0,share-rw=on \
2. Check whether vioscsi.sys verifier enabled in guest:
#verifier /querysettings (run as administrator)
If No ,enabled vioscsi.sys verifier:
#verifier.exe /standard /driver vioscsi.sys (run as administrator)
then reboot the guest and recheck
#verifier /querysettings
3. Cause a BSOD with the NotMyFault tool or NMI.
1). For NotMyFault: open the tool application, click any crash type to trigger BSOD.
2). For NMI: create a NMICrashDump DWORD in registry folder
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashContro
Set NMICrashDump = 1, then reboot the vm
Open QMP monitor
# telnet $HostIP $Port
{"execute":"qmp_capabilities"}
{"execute":"inject-nmi"}
4. Quit the VM and reboot it up, check Memory.dump in C:\Windows directory
5. Shutdown the VM, restart the VM with virtio-scsi-pci (system disk) and num_queues=12
CLI:
-smp 12,sockets=1,cores=12,threads=1 \
-device virtio-scsi-pci,id=scsi0,num_queues=12 -drive file=OS.raw,if=none,id=drive-scsi-disk0,format=raw,serial=22,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0
6. Repeat step3-4.
The num_queues=12 on step5 is come from comment#7, here I confused why choose num_queues=12 to reproduce this issue? what's special meaning have? and if the case step need to be adjusted? if yes, could you help to give some advise for it?
Thanks a lot~
Peixiu
(In reply to Peixiu Hou from comment #14) > Hi Vadim, > > We're reviewing the test case, I have some question for this issue. > We have a test case for this issue, case steps as follows: > > 1. Start VM with virtio-scsi-pci (system disk) > -object iothread,id=iothread0 \ > -device virtio-scsi-pci,id=scsi0,iothread=iothread0 \ > -drive > file=OS.raw,if=none,id=drive-scsi-disk0,format=raw,serial=22,cache=none \ > -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0,share-rw=on > \ > > 2. Check whether vioscsi.sys verifier enabled in guest: > #verifier /querysettings (run as administrator) > If No ,enabled vioscsi.sys verifier: > #verifier.exe /standard /driver vioscsi.sys (run as administrator) > then reboot the guest and recheck > #verifier /querysettings > > 3. Cause a BSOD with the NotMyFault tool or NMI. > 1). For NotMyFault: open the tool application, click any crash type to > trigger BSOD. > 2). For NMI: create a NMICrashDump DWORD in registry folder > > HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashContro > Set NMICrashDump = 1, then reboot the vm > > Open QMP monitor > # telnet $HostIP $Port > {"execute":"qmp_capabilities"} > {"execute":"inject-nmi"} > > 4. Quit the VM and reboot it up, check Memory.dump in C:\Windows directory > 5. Shutdown the VM, restart the VM with virtio-scsi-pci (system disk) and > num_queues=12 > CLI: > -smp 12,sockets=1,cores=12,threads=1 \ > -device virtio-scsi-pci,id=scsi0,num_queues=12 -drive > file=OS.raw,if=none,id=drive-scsi-disk0,format=raw,serial=22,cache=none > -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0 > 6. Repeat step3-4. > > The num_queues=12 on step5 is come from comment#7, here I confused why > choose num_queues=12 to reproduce this issue? what's special meaning have? > and if the case step need to be adjusted? if yes, could you help to give > some advise for it? > > Thanks a lot~ > Peixiu Windows needs to pre-allocate some amount of physically continuous memory to build virtio queues on it. The problem is that Windows needs to do it not only during boot-up time but also when crash happens. The number of pages that Windows can allocate as continuous region is quite limited resource. in build 134 vioscsi driver was trying to allocate memory and create as many virtio queues as specified in num_queues even when running in dump stack. 12 queues need quite significant physically continuous memory region which is not always available. When virtio-scsi drivers fail to allocate enough memory it crashes by itself, which makes impossible creating a valid dump file. IIRC the problem described in this bug was a temporary regression, and should be fixed now. Best regards, Vadim. (In reply to Vadim Rozenfeld from comment #15) > (In reply to Peixiu Hou from comment #14) > > Hi Vadim, > > > > We're reviewing the test case, I have some question for this issue. > > We have a test case for this issue, case steps as follows: > > > > 1. Start VM with virtio-scsi-pci (system disk) > > -object iothread,id=iothread0 \ > > -device virtio-scsi-pci,id=scsi0,iothread=iothread0 \ > > -drive > > file=OS.raw,if=none,id=drive-scsi-disk0,format=raw,serial=22,cache=none \ > > -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0,share-rw=on > > \ > > > > 2. Check whether vioscsi.sys verifier enabled in guest: > > #verifier /querysettings (run as administrator) > > If No ,enabled vioscsi.sys verifier: > > #verifier.exe /standard /driver vioscsi.sys (run as administrator) > > then reboot the guest and recheck > > #verifier /querysettings > > > > 3. Cause a BSOD with the NotMyFault tool or NMI. > > 1). For NotMyFault: open the tool application, click any crash type to > > trigger BSOD. > > 2). For NMI: create a NMICrashDump DWORD in registry folder > > > > HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashContro > > Set NMICrashDump = 1, then reboot the vm > > > > Open QMP monitor > > # telnet $HostIP $Port > > {"execute":"qmp_capabilities"} > > {"execute":"inject-nmi"} > > > > 4. Quit the VM and reboot it up, check Memory.dump in C:\Windows directory > > 5. Shutdown the VM, restart the VM with virtio-scsi-pci (system disk) and > > num_queues=12 > > CLI: > > -smp 12,sockets=1,cores=12,threads=1 \ > > -device virtio-scsi-pci,id=scsi0,num_queues=12 -drive > > file=OS.raw,if=none,id=drive-scsi-disk0,format=raw,serial=22,cache=none > > -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0 > > 6. Repeat step3-4. > > > > The num_queues=12 on step5 is come from comment#7, here I confused why > > choose num_queues=12 to reproduce this issue? what's special meaning have? > > and if the case step need to be adjusted? if yes, could you help to give > > some advise for it? > > > > Thanks a lot~ > > Peixiu > > Windows needs to pre-allocate some amount of physically continuous memory to > build > virtio queues on it. The problem is that Windows needs to do it not only > during boot-up > time but also when crash happens. The number of pages that Windows can > allocate as > continuous region is quite limited resource. in build 134 vioscsi driver was > trying to > allocate memory and create as many virtio queues as specified in num_queues > even when > running in dump stack. 12 queues need quite significant physically > continuous memory region > which is not always available. When virtio-scsi drivers fail to allocate > enough memory it crashes > by itself, which makes impossible creating a valid dump file. IIRC the > problem described in > this bug was a temporary regression, and should be fixed now. > > Best regards, > Vadim. Got it, thanks a lot~ |