Bug 154445
Summary: | oops in dlm_sendd after removing nodes | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | dlm | Assignee: | David Teigland <teigland> |
Status: | CLOSED WORKSFORME | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | ccaulfie, cluster-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-05-04 16:50:22 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2005-04-11 19:08:52 UTC
We've seen the -105 (ENOBUFS) error before as a symptom of bz 139738. I'm not sure if that's what's happening here or not; we usually see some other CMAN message indicating that's what's happened. It doesn't appear that this version of cman (built Apr 5) includes the latest fixes for bz 139738. Unless there are some interesting messages missing from that console log, that looks like it might be a genuine OOM condition. There's nothing in there that suggests that tank-06 has been kicked out of the cluster by anyone else. (slighty later) It looks like might be the case that sendpage() can't send highmem pages (there's some code in NFS to trap for this) which would possibly explain the oops. lowcomms buffers have two parts - a kmalloced bit and a page_alloced bit. So it could be that when low memory ran out, the page was allocated from highmem and the sendpage oopsed. The next kmalloc call then failed because low memory has run out. All hypothesis of course, The sendpage oops I can get round if highmem is the cause. The rest is harder... This has either been fixed or the machine just ran out of memory. |