Bug 516022

Summary: virtio-net fails to transmit any packets, gives "Network is unreachable" errors
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: qemuAssignee: Glauber Costa <gcosta>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: berrange, dwmw2, gcosta, itamar, jaswinder, markmc, mbooth, mst, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-0.10.91-0.5.rc1.fc12 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-08-07 09:25:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 498969    

Description Richard W.M. Jones 2009-08-06 12:30:07 UTC
Description of problem:

This is on Rawhide, booting an appliance like this:

/usr/bin/qemu-kvm \
  -drive file=/tmp/test.img,cache=off,if=virtio \
  -m 500 -no-reboot \
  -kernel /tmp/libguestfsJMrUXK/kernel \
  -initrd /tmp/libguestfsJMrUXK/initrd \
  -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off cgroup_disable=memory selinux=0 guestfs=10.0.2.4:6666 guestfs_verbose=1' \
  -nographic -serial stdio \
  -net channel,6666:unix:/tmp/libguestfsJMrUXK/sock,server,nowait \
  -net user,vlan=0,net=10.0.2.0/8 \
  -net nic,model=virtio,vlan=0 \     <===== NB
  -no-hpet -rtc-td-hack

When the appliance boots, it does this inside the guest:

  /sbin/ifconfig lo 127.0.0.1
  /sbin/ifconfig eth0 10.0.2.10
  /sbin/route add default gw 10.0.2.2
  ping -n -v -c 5 10.0.2.2

Ping fails with:

  PING 10.0.2.2 (10.0.2.2) 56(84) bytes of data.
  From 10.0.2.10 icmp_seq=2 Destination Host Unreachable
  From 10.0.2.10 icmp_seq=3 Destination Host Unreachable
  From 10.0.2.10 icmp_seq=4 Destination Host Unreachable

Also connections from inside the guest fail with:

  connect: Network is unreachable

Previously this was worked fine to set up the network.
However in Rawhide, this fails a lot of the time.

The virtio_net module is loaded, and there are no kernel
errors or other messages to indicate any problem.

If I change the line marked "NB" above to:

  -net nic,model=ne2k_pci,vlan=0

(and obviously load the ne2k-pci driver instead of virtio_net)
then the test *works*.

Note: The only change in the test is replacing virtio with ne2k-pci,
so it does appear to be a problem with virtio somewhere.

Version-Release number of selected component (if applicable):

qemu-0.10.91-0.4.rc1.fc12.x86_64
kernel (in guest) 2.6.31-0.24.rc0.git18.fc11.x86_64

How reproducible:

Not 100% of the time, but quite close to it.

Steps to Reproduce:

With libguestfs:
guestfish -v alloc /tmp/test.img 200M : run

Comment 1 Richard W.M. Jones 2009-08-06 12:36:28 UTC
Also tried the newest kernel, same problem:

2.6.31-0.125.rc5.git2.fc12.x86_64

Comment 2 Mark McLoughlin 2009-08-06 21:58:56 UTC
Oh, man

First, I haven't been able to reproduce with a 'normal' guest - not using slirp, vmchannel, the same nic config commands libguestfs uses etc. etc.

But I can reproduce with libguestfs, so I bisected it

The first commit that causes this regression is:

  http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=566e2d3e88

That's MSI support. Okay, well using e.g. '-nic vectors=0' works around that

Well, not so easy as that, there's actually another regression. vectors=0 works right up until this commit:

  http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=9f8bd0421d

That's some slirp re-factoring

i.e. we've two regressions to investigate here, which we've only managed to reproduce with libguestfs so far

Fun!

Comment 3 Richard W.M. Jones 2009-08-06 22:24:07 UTC
Heroic!

Comment 4 Mark McLoughlin 2009-08-07 08:56:33 UTC
Okay, this fixes the slirp regression:

* Fri Aug  7 2009 Mark McLoughlin <markmc> - 2:0.10.91-0.5.rc1
- Fix virtio_net with -net user (#516022)

Now it works with -net vectors=0, but broken otherwise - i.e. there's still something wrong with MSI

Comment 5 Mark McLoughlin 2009-08-07 09:25:51 UTC
Okay, I can't reproduce the MSI issue now, perhaps I was confused and it has since been fixed

Closing for now

Comment 6 Richard W.M. Jones 2009-08-07 09:41:11 UTC
Thanks Mark.

Comment 7 Mark McLoughlin 2009-08-07 09:57:28 UTC
f12-alpha tag request:

  https://fedorahosted.org/rel-eng/ticket/2060