Bug 666453
Summary: | RHEL5.3 - xen host - random memory crash | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Douglas Schilling Landgraf <dougsland> |
Component: | kernel-xen | Assignee: | Andrew Jones <drjones> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.3.z | CC: | drjones, pbonzini, xen-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-05-09 12:07:45 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 514489 |
Description
Douglas Schilling Landgraf
2010-12-30 21:53:38 UTC
Does the NIC use scatter/gather? From dom0 check ethtool -i peth0 If so, we could try turning it off. Err, use 'ethtool -k peth0' to check. Also, just to note, I think the suggestion to run with ballooning turned off is a good idea, as it should alleviate some risk of running into problems with a flipping netfront. Or just switch the vifs to copying. Andrew, note that we do have a similar PTE=0 bug with copying: https://bugzilla.redhat.com/show_bug.cgi?id=629213 Changing scatter/gather to off (ethtool -K peth0 sg off) is definitely a good idea since Douglas's analysis points at problems in the fragments beyond the skb. As another semi-wild shot, I wonder if any of these net{front,back} corruptions we're seeing might be caused by BZ630129, which has been backported to 5.4.z and 5.5.z, but not 5.3.z. Therefore, I wonder what kernel versions are running in the _guests_ instead, for both the crashing and the non-crashing hosts. (Maybe the corruption in the guest breaks some invariant in the host as well---seems a bit far fetched, but the information should be easy to get). Ping? There's really nothing I can do with this bug without being able to experiment with the particular box, or to get a system set up locally that reproduces the issue. Turning off scatter/gather also has a good chance of being a workaround and it should certainly be attempted as soon as possible. I'll leave this bug open for now for 5.7, but without further information I'll eventually have to close it as insufficient data. |