Bug 658919
Summary: | [6.0] netdump client always hangs up on RHEL3.9 kvm guest when e1000 emulation device is selected. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | asilva <asilva> | ||||||||||
Component: | kernel | Assignee: | Jiri Olsa <jolsa> | ||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 6.0 | CC: | agospoda, chrisw, cye, jmunilla, jolsa, nhorman, tgraf | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-01-19 01:13:43 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 662543 | ||||||||||||
Attachments: |
|
Description
asilva
2010-12-01 15:58:41 UTC
Created attachment 464539 [details]
Screenshot
Hi,
I installed two guest on a RHEL6 Host:
====================================================
[root@intel-s3e36-01 ~]# virsh list
Id Name State
----------------------------------
7 rhel3.9_x86_64_hvm running
8 rhel3.9_i386_hvm running
[root@intel-s3e36-01 ~]# rpm -q kernel
kernel-2.6.32-71.el6.x86_64
rhel3.9_x86_64_hvm was set as netdump server, rhel3.9_i386_hvm as client.
When I trigger a crash, it start to dump. But seems got hang.
Here is my guest xml:
====================================================
[root@intel-s3e36-01 ~]# cat /etc/libvirt/qemu/rhel3.9_i386_hvm.xml
<domain type='kvm'>
<name>rhel3.9_i386_hvm</name>
<uuid>da097a4c-9798-7145-1b6e-1b87652c9429</uuid>
<memory>2097152</memory>
<currentMemory>2097152</currentMemory>
<vcpu>2</vcpu>
<os>
<type arch='x86_64' machine='rhel6.0.0'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/var/lib/libvirt/images/rhel3.9_i386_hvm.img'/>
<target dev='hda' bus='ide'/>
<address type='drive' controller='0' bus='0' unit='0'/>
</disk>
<controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<interface type='bridge'>
<mac address='52:54:00:79:a4:a7'/>
<source bridge='br0'/>
<model type='e1000'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<serial type='pty'>
<target port='0'/>
</serial>
<console type='pty'>
<target port='0'/>
</console>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes'/>
<sound model='ac97'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</sound>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memballoon>
</devices>
</domain>
Both rhel3.9_x86_64_hvm and rhel3.9_i386_hvm used bridge and e1000.
can you use xendump to retrieve a core of the guest after it hangs? scratch that, given its kvm, rather can you use the qemu gdb service to attach to the hung guest to get a dump or backtrace of it in its hung state? Triage assignment. If you feel this bug doesn't belong to you, or that it cannot be handled in a timely fashion, please contact me for re-assignment Created attachment 471457 [details]
tcpdump of the netdump packets
192.168.122.55 is the netdump server
192.168.122.135 is the netdump client
I was able to reproduce the issue, and made tcpdump capture (attached in c5) I can see some malformed packets being sent from client. On the same token I found, not all devices were always supported for netdump. I found some old doc, that does not include e1000 in such list: http://www.redhat.com/support/wpapers/redhat/netdump/setup.html (search for "support) but I haven't found anything for RHEL3 explicitly... any idea? any input appreciated, I continue to work on it.. thanks, jirka Created attachment 471534 [details]
RHEL3: disable udp checksum check for netpoll
workaround
it looks like the e1000 netpoll function fails to checksum properly received packets.. given it's qemu e1000 emulation, it might be bug in the emulation itself... hw checksums..? if I disable the udp checksum validation completely for netdump, it works and I get the full vmcore to the server need to find some e1000 master probably.. :) jirka This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. being worked on, update from Chris Wright:
On Fri, Jan 14, 2011 at 09:58:30AM -0800, Chris Wright wrote:
> The upstream sf driver is the same. But I think I finally have an idea
> of what's going wrong. I spent way too much time chasing down a red
> herring on the tx path only to realize it's just fine.
>
> I'm building some test qemu-kvm binaries w/ patches to the e1000
> emulation to test today. I'll update you when I've got results from
> that.
This is an issue with the length value we put in the rx descriptor. The guest asked for SECRC, but we act as if we are sending the full frame with final ethernet CRC. This should be fixed in anything >= qemu-kvm-0.12.1.2-2.119.el6, and is a duplicate of bugzilla 603413. Please re-open if testing shows that it's not working with newer qemu-kvm. I tested and found I could recreate and fix the problem with fixes similar to the patch associated with bz 603413. *** This bug has been marked as a duplicate of bug 603413 *** I tried with qemu-kvm-0.12.1.2-2.129 and the netdump works properly thanks a lot, jirka |