591763 – Guest quits abnormally during write 'zero' to port 49220

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 591763 - Guest quits abnormally during write 'zero' to port 49220

Summary: Guest quits abnormally during write 'zero' to port 49220

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Amit Shah
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	580953
TreeView+	depends on / blocked

Reported:	2010-05-13 03:33 UTC by Amos Kong
Modified:	2015-05-25 00:05 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-06-08 07:17:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Amos Kong 2010-05-13 03:33:11 UTC

Description of problem:
Enumerate all IO port ranges through /proc/ioports, and try to read/write random port, Reset guest when test detects the hang. 
I found guest always quits abnormally during write 'zero' to port 49220. qemu process outputs "virtio-net header not in first element".
This bug is only reproduced when using virtio nic.

Version-Release number of selected component (if applicable):
guest kernel: 2.6.18-196.el5
host kernel: 2.6.32-24.el6.x86_64
# rpm -qa |grep qemu
qemu-img-0.12.1.2-2.51.el6.x86_64
gpxe-roms-qemu-0.9.7-6.3.el6.noarch
qemu-kvm-debuginfo-0.12.1.2-2.51.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.51.el6.x86_64
qemu-kvm-0.12.1.2-2.51.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot up a guest with virtio_nic
2.try to write 'zero' to port 49220
# echo -e '\0' | dd of=/dev/port seek=49220 bs=1 count=1

Actual results:
guest quits abnormally

Expected results:
guest works well or hangs

Additional info:
1. command line:
# qemu-kvm -name 'vm1' -monitor tcp:0:6001,server,nowait -drive file=/root/autotest/client/tests/kvm/images/RHEL-Server-5.5-64-virtio.qcow2,if=virtio,cache=none,boot=on -net nic,vlan=0,model=virtio,macaddr=00:A9:7C:6C:47:11 -net tap,vlan=0,ifname=virtio_0_6001,script=/root/autotest/client/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 512 -smp 1 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -redir tcp:5000::22 -vnc :0 -serial unix:/tmp/serial-20100513-104022-p4ix,server,nowait
(qemu)virtio-net header not in first element

2. ioports info of guest
guest)# cat /proc/ioports 
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
0064-0064 : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
0376-0376 : ide1
0378-037a : parport0
03c0-03df : vga+
03f2-03f5 : floppy
03f7-03f7 : floppy DIR
03f8-03ff : serial
0cf8-0cff : PCI conf1
afe0-afe3 : ACPI GPE0_BLK
b000-b03f : 0000:00:01.3
  b000-b003 : ACPI PM1a_EVT_BLK
  b004-b005 : ACPI PM1a_CNT_BLK
  b008-b00b : ACPI PM_TMR
  b010-b015 : ACPI CPU throttle
b100-b10f : 0000:00:01.3
  b100-b107 : piix4_smbus
c000-c00f : 0000:00:01.1
  c000-c007 : ide0
  c008-c00f : ide1
c020-c03f : 0000:00:01.2
  c020-c03f : uhci_hcd
c040-c05f : 0000:00:03.0
  c040-c05f : virtio-pci
c400-c7ff : 0000:00:04.0
  c400-c7ff : Intel 82801AA-ICH
c800-c8ff : 0000:00:04.0
  c800-c8ff : Intel 82801AA-ICH
c900-c93f : 0000:00:05.0
  c900-c93f : virtio-pci

Comment 2 RHEL Program Management 2010-05-13 05:28:44 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Alex Williamson 2010-05-20 14:22:45 UTC

Is this really a valid test?  You're writing 0 to ioport 0xc044, which is clearly assigned to virtio-pci.  There are control structures there that get out of sync if modified outside the driver.  Surely there could be instances of real hardware behaving the same way or worse if a privileged user decides to start poking io space.

Comment 4 Amos Kong 2010-05-20 16:12:10 UTC

I found this 'bug' by execute the iofuzz testcase of autotest, and verified manually.

(http://patchwork.test.kernel.org/patch/2155/)
    The design of iofuzz is simple: it just generate random I/O port
    activity inside the virtual machine. The correctness of the device
    emulation may be verified through this test.
    
    As the instructions are randomly generated, guest may enter the wrong
    state. The test solve this issue by detect the hang and restart the
    virtual machine.
    
    The test duration could also be adjusted through the "fuzz_count". And
    the parameter "skip_devices" is used to specified the devices which
    should not be used to do the fuzzing.
    
    For current version, every activity were logged and the command was
    sent through a session between host and guest. Through this method may
    slow down the whole test but it works well. The enumeration was done
    through /proc/ioports and the scenario of activity is not aggressive.

Comment 5 Alex Williamson 2010-05-20 16:38:11 UTC

Thanks Amos, so if I understand the test, the failing condition is that qemu exists rather than simply restarting the vm, which is considered acceptable.

Comment 6 Dor Laor 2010-05-20 20:24:06 UTC

IMO it is rather low priority, it's not a huge difference between guest crash and reboot. It's not a security issue either to the guest nor the host. I rather close it as won't fix. Amos, please respond if you think otherwise

Comment 7 Amos Kong 2010-05-21 03:19:38 UTC

I suggest to add some fault tolerance for virtio-net rather than a exit().
Recover the virtio device from error state or could inject a interrupt to let the guest know what happens.

just my opinion.

Comment 8 Amit Shah 2010-06-08 07:17:22 UTC

Recovering a guest from such external writes is not possible. It's impossible to maintain all the state that would be necessary to recover from such illegal writes.

I think the point of the test is to write to random locations in the IO space and find out the response of the guest or the hypervisor.

I also think it's perfectly valid for qemu to exit. The testsuite can re-start the VM, as mentioned in the link (the testsuite seems to currently only expect guest hangs, not guest shutdowns, and in that case, the testsuite should be fixed).

I'm not really sure this is a bug, closing as NOTABUG. Please re-open with a different summary line and description if any other behaviour is desired.

Note You need to log in before you can comment on or make changes to this bug.