Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1409298

Summary: [virtio-win][balloon] Guest win2008-32 occurs BSoD when running job 'Device Path Exerciser'
Product: Red Hat Enterprise Linux 7 Reporter: Peixiu Hou <phou>
Component: virtio-winAssignee: Ladi Prosek <lprosek>
virtio-win sub component: virtio-win-prewhql QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ailan, lijin
Version: 7.3Keywords: Regression, TestBlocker
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 12:55:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peixiu Hou 2016-12-31 11:42:30 UTC
Description of problem:
Guest win2008-32 occurs BSoD when running job 'Device Path Exerciser'. Occurred 2 times BSoD(8e), 1 time BSoD(0a).

Version-Release number of selected component (if applicable):
kernel-3.10.0-537.el7.x86_64
qemu-kvm-rhev-2.6.0-29.el7.x86_64
seabios-1.9.1-5.el7.x86_64
virtio-win-prewhql-129

How reproducible:
3/3

Steps to Reproduce:
1.Boot cli:
/usr/libexec/qemu-kvm -name 129BLN200832IIU -enable-kvm -m 4G -smp 4 -uuid 8870cc4f-4ef0-4fc4-b59c-8d3486fb51ad -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/129BLN200832IIU,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb -drive file=129BLN200832IIU,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=en_windows_server_2008_datacenter_enterprise_standard_sp2_x86_dvd_342333.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=129BLN200832IIU.vfd,if=floppy,id=drive-fdc0-0-0,format=raw,cache=none -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:52:04:45:10:cd -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -vga cirrus -M q35 -device ioh3420,bus=pcie.0,id=root1.0,slot=1 -device virtio-balloon-pci,id=balloon0,bus=root1.0
2. Run the job 'Device Path Exerciser'
3. Check the guest status

Actual results:
BSOD

Expected results:
Normally Pass

Additional info:
1. Reproduced this issue with '-M q35' and virtio-win-prewhql-129.
2. Reproduced this issue with '-M pc' and virtio-win-prewhql-129.
2. Cannot reproduced this issue with '-M q35' and virtio-win-prewhql-125.
3. Cannot reproduced this issue with '-M pc' and virtio-win-prewhql-125.

Comment 4 Ladi Prosek 2017-01-05 11:41:15 UTC
This is a tight race between request completion and cancellation in the balloon driver. Very likely not a regression but requires specific timing to hit. Fix coming.

Comment 5 Peixiu Hou 2017-01-16 08:39:33 UTC
Tried this job with virtio-win-prewhql-130 on win2008-32. 

Also reproduced this issue with '-M q35', tried 3 times totally, hit bsod 2 times(bsod(d1)+bsod(8e)), and it can be passed on the last time.

Cannot reproduce this issue with '-M pc'.

Best Regards~
Peixiu

Comment 6 Ladi Prosek 2017-01-16 08:46:52 UTC
(In reply to Peixiu Hou from comment #5)
> Tried this job with virtio-win-prewhql-130 on win2008-32. 
> 
> Also reproduced this issue with '-M q35', tried 3 times totally, hit bsod 2
> times(bsod(d1)+bsod(8e)), and it can be passed on the last time.
> 
> Cannot reproduce this issue with '-M pc'.

Can you please double-check that it reproduces with -M pc and virtio-win 129 and does not reproduce with -M pc and virtio-win 130? Also, would it be possible to share the new dumps?

Thanks!
Ladi

Comment 8 Peixiu Hou 2017-01-16 11:13:31 UTC
(In reply to Ladi Prosek from comment #6)
> (In reply to Peixiu Hou from comment #5)
> > Tried this job with virtio-win-prewhql-130 on win2008-32. 
> > 
> > Also reproduced this issue with '-M q35', tried 3 times totally, hit bsod 2
> > times(bsod(d1)+bsod(8e)), and it can be passed on the last time.
> > 
> > Cannot reproduce this issue with '-M pc'.
> 
> Can you please double-check that it reproduces with -M pc and virtio-win 129
> and does not reproduce with -M pc and virtio-win 130? Also, would it be
> possible to share the new dumps?
> 
> Thanks!
> Ladi

Hi Ladi,

This issue can be reproduced on virtio-win 129 with -M pc easily, used same image, reinstall the virtio-win 129 balloon driver, run the job, the bsod(8e) occurred.
And then, reinstall the virtio-win 130 balloon driver, the job can be easily passed.

With -M q35, first try occurred bsod(d1), second try occurred bsod(8e) and then reboot the guest, occurred bsod(d1), reboot the guest again, occurred bsod(7e). I saved the dump file bsod(d1) on first test and bsod(7e) on second test(the bsod(8e) dump file is covered).


Best Regards~
Peixiu

Comment 10 Ladi Prosek 2017-01-16 15:10:26 UTC
Hi Peixiu Hou,

(In reply to Peixiu Hou from comment #8)
> Hi Ladi,
> 
> This issue can be reproduced on virtio-win 129 with -M pc easily, used same
> image, reinstall the virtio-win 129 balloon driver, run the job, the
> bsod(8e) occurred.
> And then, reinstall the virtio-win 130 balloon driver, the job can be easily
> passed.
> 
> With -M q35, first try occurred bsod(d1), second try occurred bsod(8e) and
> then reboot the guest, occurred bsod(d1), reboot the guest again, occurred
> bsod(7e). I saved the dump file bsod(d1) on first test and bsod(7e) on
> second test(the bsod(8e) dump file is covered).

Thank you. The crashes you're getting on Q35 are a different issue, not specific to balloon. It's the same thing we hit in bug 1408771. Two possible workarounds are: 1) use less than 4GB of memory, or 2) do not use cdrom. Thanks!

Comment 11 Peixiu Hou 2017-01-19 09:10:05 UTC
(In reply to Ladi Prosek from comment #10)
> Hi Peixiu Hou,
> 
> (In reply to Peixiu Hou from comment #8)
> > Hi Ladi,
> > 
> > This issue can be reproduced on virtio-win 129 with -M pc easily, used same
> > image, reinstall the virtio-win 129 balloon driver, run the job, the
> > bsod(8e) occurred.
> > And then, reinstall the virtio-win 130 balloon driver, the job can be easily
> > passed.
> > 
> > With -M q35, first try occurred bsod(d1), second try occurred bsod(8e) and
> > then reboot the guest, occurred bsod(d1), reboot the guest again, occurred
> > bsod(7e). I saved the dump file bsod(d1) on first test and bsod(7e) on
> > second test(the bsod(8e) dump file is covered).
> 
> Thank you. The crashes you're getting on Q35 are a different issue, not
> specific to balloon. It's the same thing we hit in bug 1408771. Two possible
> workarounds are: 1) use less than 4GB of memory, or 2) do not use cdrom.
> Thanks!

Hi Ladi,

Ok, thanks a lot, I tried it without cdrom on a new installed image, the job can be passed, did not reproduce this bug, then added cdrom, bsod occurred when the system booting, and then run this job again, it can be passed.
And also I tried it with less than 4GB mem on a new installed image, the job can be passed, did not reproduce this bug, then restore mem to 4G to rerun this job, the job failed 1 time, and passed 1 time.


Best Regards~
Peixiu

Comment 12 lijin 2017-01-22 02:27:49 UTC
this bug tracks only balloon tight race issue,bug1408771 will track q35 issue.

change status to verified according to comment8 and comment#10

Comment 13 lijin 2017-05-11 05:46:41 UTC
Hi Amnon,

Could you help to ack?

Thanks

Comment 16 errata-xmlrpc 2017-08-01 12:55:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2341