Bug 591763

Summary: Guest quits abnormally during write 'zero' to port 49220
Product: Red Hat Enterprise Linux 6 Reporter: Amos Kong <akong>
Component: qemu-kvmAssignee: Amit Shah <amit.shah>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: ailan, alex.williamson, ndai, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-08 07:17:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580953    

Description Amos Kong 2010-05-13 03:33:11 UTC
Description of problem:
Enumerate all IO port ranges through /proc/ioports, and try to read/write random port, Reset guest when test detects the hang. 
I found guest always quits abnormally during write 'zero' to port 49220. qemu process outputs "virtio-net header not in first element".
This bug is only reproduced when using virtio nic.

Version-Release number of selected component (if applicable):
guest kernel: 2.6.18-196.el5
host kernel: 2.6.32-24.el6.x86_64
# rpm -qa |grep qemu
qemu-img-0.12.1.2-2.51.el6.x86_64
gpxe-roms-qemu-0.9.7-6.3.el6.noarch
qemu-kvm-debuginfo-0.12.1.2-2.51.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.51.el6.x86_64
qemu-kvm-0.12.1.2-2.51.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot up a guest with virtio_nic
2.try to write 'zero' to port 49220
# echo -e '\0' | dd of=/dev/port seek=49220 bs=1 count=1

Actual results:
guest quits abnormally

Expected results:
guest works well or hangs

Additional info:
1. command line:
# qemu-kvm -name 'vm1' -monitor tcp:0:6001,server,nowait -drive file=/root/autotest/client/tests/kvm/images/RHEL-Server-5.5-64-virtio.qcow2,if=virtio,cache=none,boot=on -net nic,vlan=0,model=virtio,macaddr=00:A9:7C:6C:47:11 -net tap,vlan=0,ifname=virtio_0_6001,script=/root/autotest/client/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 512 -smp 1 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -redir tcp:5000::22 -vnc :0 -serial unix:/tmp/serial-20100513-104022-p4ix,server,nowait
(qemu)virtio-net header not in first element

2. ioports info of guest
guest)# cat /proc/ioports 
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
0064-0064 : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
0376-0376 : ide1
0378-037a : parport0
03c0-03df : vga+
03f2-03f5 : floppy
03f7-03f7 : floppy DIR
03f8-03ff : serial
0cf8-0cff : PCI conf1
afe0-afe3 : ACPI GPE0_BLK
b000-b03f : 0000:00:01.3
  b000-b003 : ACPI PM1a_EVT_BLK
  b004-b005 : ACPI PM1a_CNT_BLK
  b008-b00b : ACPI PM_TMR
  b010-b015 : ACPI CPU throttle
b100-b10f : 0000:00:01.3
  b100-b107 : piix4_smbus
c000-c00f : 0000:00:01.1
  c000-c007 : ide0
  c008-c00f : ide1
c020-c03f : 0000:00:01.2
  c020-c03f : uhci_hcd
c040-c05f : 0000:00:03.0
  c040-c05f : virtio-pci
c400-c7ff : 0000:00:04.0
  c400-c7ff : Intel 82801AA-ICH
c800-c8ff : 0000:00:04.0
  c800-c8ff : Intel 82801AA-ICH
c900-c93f : 0000:00:05.0
  c900-c93f : virtio-pci

Comment 2 RHEL Program Management 2010-05-13 05:28:44 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Alex Williamson 2010-05-20 14:22:45 UTC
Is this really a valid test?  You're writing 0 to ioport 0xc044, which is clearly assigned to virtio-pci.  There are control structures there that get out of sync if modified outside the driver.  Surely there could be instances of real hardware behaving the same way or worse if a privileged user decides to start poking io space.

Comment 4 Amos Kong 2010-05-20 16:12:10 UTC
I found this 'bug' by execute the iofuzz testcase of autotest, and verified manually.

(http://patchwork.test.kernel.org/patch/2155/)
    The design of iofuzz is simple: it just generate random I/O port
    activity inside the virtual machine. The correctness of the device
    emulation may be verified through this test.
    
    As the instructions are randomly generated, guest may enter the wrong
    state. The test solve this issue by detect the hang and restart the
    virtual machine.
    
    The test duration could also be adjusted through the "fuzz_count". And
    the parameter "skip_devices" is used to specified the devices which
    should not be used to do the fuzzing.
    
    For current version, every activity were logged and the command was
    sent through a session between host and guest. Through this method may
    slow down the whole test but it works well. The enumeration was done
    through /proc/ioports and the scenario of activity is not aggressive.

Comment 5 Alex Williamson 2010-05-20 16:38:11 UTC
Thanks Amos, so if I understand the test, the failing condition is that qemu exists rather than simply restarting the vm, which is considered acceptable.

Comment 6 Dor Laor 2010-05-20 20:24:06 UTC
IMO it is rather low priority, it's not a huge difference between guest crash and reboot. It's not a security issue either to the guest nor the host. I rather close it as won't fix. Amos, please respond if you think otherwise

Comment 7 Amos Kong 2010-05-21 03:19:38 UTC
I suggest to add some fault tolerance for virtio-net rather than a exit().
Recover the virtio device from error state or could inject a interrupt to let the guest know what happens.

just my opinion.

Comment 8 Amit Shah 2010-06-08 07:17:22 UTC
Recovering a guest from such external writes is not possible. It's impossible to maintain all the state that would be necessary to recover from such illegal writes.

I think the point of the test is to write to random locations in the IO space and find out the response of the guest or the hypervisor.

I also think it's perfectly valid for qemu to exit. The testsuite can re-start the VM, as mentioned in the link (the testsuite seems to currently only expect guest hangs, not guest shutdowns, and in that case, the testsuite should be fixed).

I'm not really sure this is a bug, closing as NOTABUG. Please re-open with a different summary line and description if any other behaviour is desired.