Bug 477012
| Summary: | network hangs with xen_vnif in FV RHEL5 guest | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Layton <jlayton> | ||||
| Component: | kernel | Assignee: | Herbert Xu <herbert.xu> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 5.3 | CC: | ddutile, dzickus, steved, xen-maint | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2009-09-02 08:57:36 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jeff Layton
2008-12-18 16:57:25 UTC
What does netstat -nto show on the server? Also any chance you can let me log into the server (via serial console presumably) when this is happening? Thanks! I switched the guest back to using xen_vnif, and the first ssh session into the box hung within a few seconds: # netstat -nto Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State Timer tcp 0 1968 ::ffff:10.11.231.179:22 ::ffff:10.11.12.60:46360 ESTABLISHED on (27.84/7/0) ...guest kernel is: 2.6.18-131.el5.jtltest.61debug ...and host kernel is 2.6.18-128.el5virttest5xen. That one is a -128.el5 kernel with the patches for bug 470035. I'll leave this box in this state for the time being. Find me on IRC if you want access to it. btw: the jtltest guest kernel is basically a -131.el5 kernel with some NFS and CIFS patches. Nothing that should affect the lower networking layers. 11:08 <herbert> ok i think it's a bug in netfront that's causing an incorrectly laid out packet to be sent to netback 11:08 <herbert> which then discards it because it fails one of the sanity checks, e.g., by crossing a page boundary 11:08 <herbert> as the same packet is then retransmitted over and over again it'll never make it across, thus stalling the connection 11:09 <herbert> so what we need to find out now is exactly how the packet is broken 11:09 <herbert> could you please rebuild the netback module on dom0 after adding #define DEBUG to the top of the file? 11:09 <herbert> that way we can get the backend to print out what exactly is wrong with the packet 11:09 <herbert> thanks! Ok, added: #define DEBUG to the top of drivers/xen/netback/netback.c, rebuilt that driver and installed netback.ko and netloop.ko in the dom0's kernel dir. I logged into the rhel5 guest and poked around a bit, but didn't see any interesting messages. If I didn't do the correct thing, please send a patch and I'll give it another go. Created attachment 331815 [details]
net: Handle non-linear packets in skb_checksum_setup
This patch fixes the problem on Jeff's machine.
I've been testing this and can confirm that it seems to work well. No connection hangs since this patch has been in place. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |