Bug 658919

Summary: [6.0] netdump client always hangs up on RHEL3.9 kvm guest when e1000 emulation device is selected.
Product: Red Hat Enterprise Linux 6 Reporter: asilva <asilva>
Component: kernelAssignee: Jiri Olsa <jolsa>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: agospoda, chrisw, cye, jmunilla, jolsa, nhorman, tgraf
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-19 01:13:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 662543    
Attachments:
Description Flags
Error Screenshot
none
Screenshot
none
tcpdump of the netdump packets
none
RHEL3: disable udp checksum check for netpoll none

Description asilva 2010-12-01 15:58:41 UTC
Created attachment 464036 [details]
Error Screenshot

> Description of problem:
Netdump client always hangs up on RHEL3.9 kvm guest when e1000 emulation device is selected.

Here is the log
------
< netdump activated - performing handshake with the server. >
------
After the above-mentioned message is displayed on the console screen,
nothing is displayed.
See attached Screenshot.png for all messages on the console. 

> Version:
Red Hat Enterprise Linux Version Number: 6.0
Release Number: Partner GA
Architecture: x86_64
Kernel Version: 2.6.32-71.el6.x86_64 

How reproducible:
Always. 

> Steps to Reproduce:
1. Setup netdump client and server on RHEL3.9 kvm guests. See sysreports for detail.

2. Execute the following on the client:
# echo c > /proc/sysrq-trigger

  
> Actual results:
The netdump client hangs up.

> Expected results:
After completing netdump, the vmcore is normally collected on the server.

Comment 1 Chao Ye 2010-12-03 10:15:03 UTC
Created attachment 464539 [details]
Screenshot

Hi,

I installed two guest on a RHEL6 Host:
====================================================
[root@intel-s3e36-01 ~]# virsh list
 Id Name                 State
----------------------------------
  7 rhel3.9_x86_64_hvm   running
  8 rhel3.9_i386_hvm     running
[root@intel-s3e36-01 ~]# rpm -q kernel
kernel-2.6.32-71.el6.x86_64

rhel3.9_x86_64_hvm was set as netdump server, rhel3.9_i386_hvm as client.
When I trigger a crash, it start to dump. But seems got hang.
Here is my guest xml:
====================================================
[root@intel-s3e36-01 ~]# cat /etc/libvirt/qemu/rhel3.9_i386_hvm.xml 
<domain type='kvm'>
  <name>rhel3.9_i386_hvm</name>
  <uuid>da097a4c-9798-7145-1b6e-1b87652c9429</uuid>
  <memory>2097152</memory>
  <currentMemory>2097152</currentMemory>
  <vcpu>2</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/rhel3.9_i386_hvm.img'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:79:a4:a7'/>
      <source bridge='br0'/>
      <model type='e1000'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'/>
    <sound model='ac97'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Both rhel3.9_x86_64_hvm and rhel3.9_i386_hvm used bridge and e1000.

Comment 2 Neil Horman 2010-12-03 12:06:11 UTC
can you use xendump to retrieve a core of the guest after it hangs?

Comment 3 Neil Horman 2010-12-03 12:07:07 UTC
scratch that, given its kvm, rather can you use the qemu gdb service to attach to the hung guest to get a dump or backtrace of it in its hung state?

Comment 4 Neil Horman 2010-12-06 14:50:31 UTC
Triage assignment.  If you feel this bug doesn't belong to you, or that it cannot be handled in a timely fashion, please contact me for re-assignment

Comment 5 Jiri Olsa 2011-01-03 10:07:56 UTC
Created attachment 471457 [details]
tcpdump of the netdump packets

192.168.122.55  is the netdump server
192.168.122.135 is the netdump client

Comment 6 Jiri Olsa 2011-01-03 10:10:26 UTC
I was able to reproduce the issue, and made tcpdump capture (attached in c5)
I can see some malformed packets being sent from client.

On the same token I found, not all devices were always supported for netdump.
I found some old doc, that does not include e1000 in such list:

http://www.redhat.com/support/wpapers/redhat/netdump/setup.html
 (search for "support)

but I haven't found anything for RHEL3 explicitly... any idea?
any input appreciated, I continue to work on it..

thanks,
jirka

Comment 7 Jiri Olsa 2011-01-03 18:36:42 UTC
Created attachment 471534 [details]
RHEL3: disable udp checksum check for netpoll

workaround

Comment 8 Jiri Olsa 2011-01-03 18:41:23 UTC
it looks like the e1000 netpoll function fails to checksum properly
received packets.. given it's qemu e1000 emulation, it might be bug
in the emulation itself... hw checksums..?

if I disable the udp checksum validation completely for netdump,
it works and I get the full vmcore to the server

need to find some e1000 master probably.. :)

jirka

Comment 9 RHEL Program Management 2011-01-07 04:50:24 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 10 Suzanne Logcher 2011-01-07 16:19:07 UTC
This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 11 Jiri Olsa 2011-01-17 09:58:52 UTC
being worked on, update from Chris Wright:

On Fri, Jan 14, 2011 at 09:58:30AM -0800, Chris Wright wrote:
> The upstream sf driver is the same.  But I think I finally have an idea
> of what's going wrong.  I spent way too much time chasing down a red
> herring on the tx path only to realize it's just fine.
> 
> I'm building some test qemu-kvm binaries w/ patches to the e1000
> emulation to test today.  I'll update you when I've got results from
> that.

Comment 12 Chris Wright 2011-01-19 01:13:43 UTC
This is an issue with the length value we put in the rx descriptor.  The guest asked for SECRC, but we act as if we are sending the full frame with final ethernet CRC.  This should be fixed in anything >= qemu-kvm-0.12.1.2-2.119.el6, and is a duplicate of bugzilla 603413.  Please re-open if testing shows that it's not working with newer qemu-kvm.  I tested and found I could recreate and fix the problem with fixes similar to the patch associated with bz 603413.

*** This bug has been marked as a duplicate of bug 603413 ***

Comment 13 Jiri Olsa 2011-01-19 10:39:45 UTC
I tried with qemu-kvm-0.12.1.2-2.129 and the netdump works properly

thanks a lot,
jirka