Bug 531958

Summary: kvm guests' network rx (tap+virtio-net) can break during migration
Product: Red Hat Enterprise Linux 5 Reporter: Charles Duffy <charles_duffy>
Component: kvmAssignee: Michael S. Tsirkin <mst>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 5.6CC: cpelland, jfeeney, koj, llim, tburke, virt-maint
Target Milestone: rcFlags: mst: needinfo+
mst: needinfo+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-09-12 13:45:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580948    

Description Charles Duffy 2009-10-30 00:35:06 UTC
Description of problem:

  While migrating paused guests to and from file (as via "virsh save" / "virsh restore"), kvm's networking support can get stuck in a mode in which packets can be sent by the guest but not received. This has been observed with virtio_net; it is not presently known whether the issue can be reproduced with other network adapters (ie. e1000).

  "ip link down" and "ip link up" within the guest does not clear the issue. Removing and reinstalling the virtio_net module, a burst of packets is briefly received, then the prior state resumes. virtio-blk devices configured within the same guest continue to work without issue.

  On the host, the tx_overrun counter for the tap device used by the guest in question occasionally increments if the host attempts to send a sufficiently large number of packets.

  strace'ing qemu-kvm, no select() calls appear to be occurring with the file descriptor for the tap device in their argument list.


Version-Release number of selected component (if applicable):

  kvm-83-105.el5_4.9
  kmod-kvm-83-105.el5_4.9
  kernel-2.6.18-164.2.1.el5

  (kernel version is for both guest and host)


How reproducible:

  Happens 100% of the time when restoring a saved VM state on which the issue has triggered, but occurs on a fairly low percentage of save/restore cycles.

  Reproduction is part of an automated QA environment; all OS provisioning and software installation steps leading up to the triggering of this bug are automated (including the points in the install process at which migrate-to-file operations occur), but even so the bug does not trigger with any reliability.

Steps to Reproduce:

1. Load a VM state / disk image combo provided by the Dell MessageOne systems engineering team as exhibiting this problem
    -or-
1. Run save/restore cycles on virtual machines with virtio network adapters using tap-based networking until one restores in such a state as to be unable to communicate with the outside world.
2. Run tcpdump or equivalent tool on both sides
3. Observe that ARP requests from the guest are seen and responded to by the host, but that the receive counter for the guest's ethernet device does not increment.

While the reproducer we have right now has a libvirt header on the ramsave file, we are happy to strip that and provide a reproducer which can be run against raw kvm without libvirt if such is preferred.

  
Additional info:

  Marking "Dell Confidential" as reproduction materials may include software confidential to Dell MessageOne.

  This issue was not seen when using upstream qemu-kvm-0.11.0 prior to migration to RHEL5.4's virtualization infrastructure.


Command line:

/usr/libexec/qemu-kvm -S -M pc -m 384 -smp 1 -name fvte-140dd98a761361aea78a6b105ee018413e270738 -uuid 140dd98a-7613-61ae-a78a-6b105ee01841 -monitor unix:/var/lib/libvirt/qemu/fvte-140dd98a761361aea78a6b105ee018413e270738.monitor,server,nowait -no-reboot -boot c -drive file=/local/fvte-q/.fvte/states/vms/140dd98a761361aea78a6b105ee018413e270738/disks/da.qcow2,if=virtio,index=0,boot=on,format=qcow2,cache=none -drive file=,if=floppy,index=0 -drive file=,if=ide,media=cdrom,index=2 -net nic,macaddr=00:16:3e:c6:39:f3,vlan=0,model=virtio -net tap,fd=18,vlan=0 -serial file:/local/fvte-q/.fvte/states/vms/140dd98a761361aea78a6b105ee018413e270738/log/console.log -serial pty -parallel none -usb -vnc 127.0.0.1:0 -vga cirrus -incoming "exec:cat && { echo 'MIGRATION' 'DONE'; } >&2"

Various data from qemu monitor console:

(qemu) info network
VLAN 0 devices:
  tap.0: fd=18
  virtio.0: model=virtio,macaddr=00:16:3e:c6:39:f3
(qemu) info pci
  Bus  0, device   0, function 0:
    Host bridge: PCI device 8086:1237
  Bus  0, device   1, function 0:
    ISA bridge: PCI device 8086:7000
  Bus  0, device   1, function 1:
    IDE controller: PCI device 8086:7010
      BAR4: I/O at 0xc000 [0xc00f].
  Bus  0, device   1, function 2:
    USB controller: PCI device 8086:7020
      IRQ 11.
      BAR4: I/O at 0xc020 [0xc03f].
  Bus  0, device   1, function 3:
    Bridge: PCI device 8086:7113
      IRQ 9.
  Bus  0, device   2, function 0:
    VGA controller: PCI device 1013:00b8
      BAR0: 32 bit memory at 0xc2000000 [0xc3ffffff].
      BAR1: 32 bit memory at 0xc4000000 [0xc4000fff].
  Bus  0, device   3, function 0:
    Ethernet controller: PCI device 1af4:1000
      IRQ 11.
      BAR0: I/O at 0xc040 [0xc05f].
  Bus  0, device   4, function 0:
    SCSI controller: PCI device 1af4:1001
      IRQ 11.
      BAR0: I/O at 0xc080 [0xc0bf].
  Bus  0, device   5, function 0:
    RAM controller: PCI device 1af4:1002
      IRQ 10.
      BAR0: I/O at 0xc0c0 [0xc0df].

Comment 1 Charles Duffy 2009-10-30 16:51:41 UTC
Removing from "Dell Confidential". Will transmit reproducer out-of-band if necessary.

Comment 2 Charles Duffy 2009-11-02 07:13:35 UTC
Built a debug version of kvm (changed -O2 -g in CFLAGS to -O1 -ggdb) to inspect.

Comparing the VirtIONet struct from a single pair of known-good and known-bad savevm images, the following differences jump out at me.

From the good image:

   n->vdev.isr=1
   n->vdev.pci_dev.irq_state = {1,0,0,0}
   n->mergeable_rx_bufs=0

From the bad image:

   n->vdev.isr=0
   n->vdev.pci_dev.irq_state = {0,0,0,0}
   n->mergeable_rx_bufs=1

Comment 3 Charles Duffy 2009-11-02 16:30:16 UTC
I have a packaged reproducer, with the libvirt dependency removed, containing one known-good and one known-bad sample VM. Its run script has support for, among other things, invoking qemu-kvm through gdb.

As these VMs were taken from our automated testing system, rather than built as minimal reproducers from the ground up, they're a bit larger than what an ideal minimal testcase might comprise -- the archive containing them weighs in at 741MB; with its content decompressed, 4.3GB of working space is needed.

Is this likely to be of use to 'yall? If so, is there somewhere I should upload it?

Comment 4 Michael S. Tsirkin 2009-12-01 14:21:33 UTC
Yes, it will be very helpful to get good and bad images
so that we can look at them. You can upload the files to
dropbox:
http://kbase.redhat.com/faq/docs/DOC-2113

After you do, please provide exact filenames, MD5 or SHA1 message digest of the uploaded files.

Comment 5 Charles Duffy 2009-12-08 19:21:28 UTC
(In reply to comment #4)
> Yes, it will be very helpful to get good and bad images
> so that we can look at them. You can upload the files to
> dropbox:
> http://kbase.redhat.com/faq/docs/DOC-2113
> 
> After you do, please provide exact filenames, MD5 or SHA1 message digest of the
> uploaded files.  

rhbz531958-reproducer.pax
MD5:  bc5b1fc8beb1431fcc27d19d1ed7fc50
SHA1: d07029c457d4a3be36d8c85676b09208261d42fc

Comment 6 Michael S. Tsirkin 2009-12-22 09:40:31 UTC
I downloaded the files and the hash matches.
I unpacked the pax archive,  but I have trouble decompressing
both disk and ram images. the error I get is: Unexpected end of input

$ sha1sum rhbz531958-reproducer.pax
d07029c457d4a3be36d8c85676b09208261d42fc  rhbz531958-reproducer.pax
$ md5sum rhbz531958-reproducer.pax
bc5b1fc8beb1431fcc27d19d1ed7fc50  rhbz531958-reproducer.pax
$ pax -rvf rhbz531958-reproducer.pax    
rhbz531958-reproducer                                 
rhbz531958-reproducer/run                             
rhbz531958-reproducer/net.setup                       
rhbz531958-reproducer/data.known_good                 
rhbz531958-reproducer/data.known_good/ramsave.xml     
rhbz531958-reproducer/data.known_good/env             
rhbz531958-reproducer/data.known_good/ramsave.raw.xz  
rhbz531958-reproducer/data.known_good/ramsave.argv    
rhbz531958-reproducer/data.known_good/da.qcow2.xz     
rhbz531958-reproducer/data.known_bad                  
rhbz531958-reproducer/data.known_bad/ramsave.xml      
rhbz531958-reproducer/data.known_bad/env              
rhbz531958-reproducer/data.known_bad/ramsave.raw.xz   
rhbz531958-reproducer/data.known_bad/da.qcow2.xz      
rhbz531958-reproducer/README                          
rhbz531958-reproducer/net.up                          
pax: ustar vol 1, 16 files, 775176192 bytes read, 0 bytes written.
$ cd rhbz531958-reproducer/
$ xz -k -d data.known_bad/da.qcow2.xz
xz: data.known_bad/da.qcow2.xz: Unexpected end of input

Am I doing the right thing?
Is this a problem with the uploaded files?

Thanks!

Comment 7 Charles Duffy 2009-12-22 21:27:12 UTC
The issue is on my end -- the .xz files packaged in that pax archive were indeed corrupt.

A corrected version is uploading presently.

rhbz531958-reproducer.pax
Len:  2528860160 (2.4G)
MD5:  700723a25aec72a66ba725dd0eeace52
SHA1: f085d7ae237df765ce8a6157dba538c4b5be6d12

Comment 9 Michael S. Tsirkin 2010-02-10 15:07:05 UTC
I think this is fixed in latest kvm: kvm-83-105.el5_4.22
Specifically, after running this command:
env DATA_DIR=data.known_bad ./run -nographic
I was able to ssh into guest at address 192.168.0.2

In other words, after updating kvm, it can load the
image and networking works.

Charles, could you confirm this please?

Comment 11 Michael S. Tsirkin 2010-03-02 16:40:50 UTC
No info yet, so - postpone for 5.6

Comment 12 Ludek Smid 2010-03-09 09:00:42 UTC
Since it is too late to address this issue in RHEL 5.5, it has been proposed for RHEL 5.6. Contact your support representative if you need to escalate this issue.