658919 – [6.0] netdump client always hangs up on RHEL3.9 kvm guest when e1000 emulation device is selected.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 658919 - [6.0] netdump client always hangs up on RHEL3.9 kvm guest when e1000 emulation device is selected.

Summary: [6.0] netdump client always hangs up on RHEL3.9 kvm guest when e1000 emulatio...

Keywords:
Status:	CLOSED DUPLICATE of bug 603413
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jiri Olsa
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	662543
TreeView+	depends on / blocked

Reported:	2010-12-01 15:58 UTC by asilva
Modified:	2018-11-14 16:20 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-01-19 01:13:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Error Screenshot (12.87 KB, image/png) 2010-12-01 15:58 UTC, asilva	no flags	Details
Screenshot (27.89 KB, image/png) 2010-12-03 10:15 UTC, Chao Ye	no flags	Details
tcpdump of the netdump packets (31.76 KB, application/octet-stream) 2011-01-03 10:07 UTC, Jiri Olsa	no flags	Details
RHEL3: disable udp checksum check for netpoll (438 bytes, patch) 2011-01-03 18:36 UTC, Jiri Olsa	no flags	Details \| Diff
View All

Description asilva 2010-12-01 15:58:41 UTC

Created attachment 464036 [details]
Error Screenshot

> Description of problem:
Netdump client always hangs up on RHEL3.9 kvm guest when e1000 emulation device is selected.

Here is the log
------
< netdump activated - performing handshake with the server. >
------
After the above-mentioned message is displayed on the console screen,
nothing is displayed.
See attached Screenshot.png for all messages on the console. 

> Version:
Red Hat Enterprise Linux Version Number: 6.0
Release Number: Partner GA
Architecture: x86_64
Kernel Version: 2.6.32-71.el6.x86_64 

How reproducible:
Always. 

> Steps to Reproduce:
1. Setup netdump client and server on RHEL3.9 kvm guests. See sysreports for detail.

2. Execute the following on the client:
# echo c > /proc/sysrq-trigger

  
> Actual results:
The netdump client hangs up.

> Expected results:
After completing netdump, the vmcore is normally collected on the server.

Comment 1 Chao Ye 2010-12-03 10:15:03 UTC

Created attachment 464539 [details]
Screenshot

Hi,

I installed two guest on a RHEL6 Host:
====================================================
[root@intel-s3e36-01 ~]# virsh list
 Id Name                 State
----------------------------------
  7 rhel3.9_x86_64_hvm   running
  8 rhel3.9_i386_hvm     running
[root@intel-s3e36-01 ~]# rpm -q kernel
kernel-2.6.32-71.el6.x86_64

rhel3.9_x86_64_hvm was set as netdump server, rhel3.9_i386_hvm as client.
When I trigger a crash, it start to dump. But seems got hang.
Here is my guest xml:
====================================================
[root@intel-s3e36-01 ~]# cat /etc/libvirt/qemu/rhel3.9_i386_hvm.xml 
<domain type='kvm'>
  <name>rhel3.9_i386_hvm</name>
  <uuid>da097a4c-9798-7145-1b6e-1b87652c9429</uuid>
  <memory>2097152</memory>
  <currentMemory>2097152</currentMemory>
  <vcpu>2</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/rhel3.9_i386_hvm.img'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:79:a4:a7'/>
      <source bridge='br0'/>
      <model type='e1000'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'/>
    <sound model='ac97'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Both rhel3.9_x86_64_hvm and rhel3.9_i386_hvm used bridge and e1000.

Comment 2 Neil Horman 2010-12-03 12:06:11 UTC

can you use xendump to retrieve a core of the guest after it hangs?

Comment 3 Neil Horman 2010-12-03 12:07:07 UTC

scratch that, given its kvm, rather can you use the qemu gdb service to attach to the hung guest to get a dump or backtrace of it in its hung state?

Comment 4 Neil Horman 2010-12-06 14:50:31 UTC

Triage assignment.  If you feel this bug doesn't belong to you, or that it cannot be handled in a timely fashion, please contact me for re-assignment

Comment 5 Jiri Olsa 2011-01-03 10:07:56 UTC

Created attachment 471457 [details]
tcpdump of the netdump packets

192.168.122.55  is the netdump server
192.168.122.135 is the netdump client

Comment 6 Jiri Olsa 2011-01-03 10:10:26 UTC

I was able to reproduce the issue, and made tcpdump capture (attached in c5)
I can see some malformed packets being sent from client.

On the same token I found, not all devices were always supported for netdump.
I found some old doc, that does not include e1000 in such list:

http://www.redhat.com/support/wpapers/redhat/netdump/setup.html
 (search for "support)

but I haven't found anything for RHEL3 explicitly... any idea?
any input appreciated, I continue to work on it..

thanks,
jirka

Comment 7 Jiri Olsa 2011-01-03 18:36:42 UTC

Created attachment 471534 [details]
RHEL3: disable udp checksum check for netpoll

workaround

Comment 8 Jiri Olsa 2011-01-03 18:41:23 UTC

it looks like the e1000 netpoll function fails to checksum properly
received packets.. given it's qemu e1000 emulation, it might be bug
in the emulation itself... hw checksums..?

if I disable the udp checksum validation completely for netdump,
it works and I get the full vmcore to the server

need to find some e1000 master probably.. :)

jirka

Comment 9 RHEL Program Management 2011-01-07 04:50:24 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 10 Suzanne Logcher 2011-01-07 16:19:07 UTC

This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 11 Jiri Olsa 2011-01-17 09:58:52 UTC

being worked on, update from Chris Wright:

On Fri, Jan 14, 2011 at 09:58:30AM -0800, Chris Wright wrote:
> The upstream sf driver is the same.  But I think I finally have an idea
> of what's going wrong.  I spent way too much time chasing down a red
> herring on the tx path only to realize it's just fine.
> 
> I'm building some test qemu-kvm binaries w/ patches to the e1000
> emulation to test today.  I'll update you when I've got results from
> that.

Comment 12 Chris Wright 2011-01-19 01:13:43 UTC

This is an issue with the length value we put in the rx descriptor.  The guest asked for SECRC, but we act as if we are sending the full frame with final ethernet CRC.  This should be fixed in anything >= qemu-kvm-0.12.1.2-2.119.el6, and is a duplicate of bugzilla 603413.  Please re-open if testing shows that it's not working with newer qemu-kvm.  I tested and found I could recreate and fix the problem with fixes similar to the patch associated with bz 603413.

*** This bug has been marked as a duplicate of bug 603413 ***

Comment 13 Jiri Olsa 2011-01-19 10:39:45 UTC

I tried with qemu-kvm-0.12.1.2-2.129 and the netdump works properly

thanks a lot,
jirka

Note You need to log in before you can comment on or make changes to this bug.