Bug 570263
Summary: | GFS2: journal recovery stuck after multiple node failures | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Nate Straz <nstraz> | ||||
Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.5 | CC: | adas, bmarzins, rpeterso, teigland | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-01-20 09:47:18 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 553803, 672600 | ||||||
Attachments: |
|
Description
Nate Straz
2010-03-03 17:26:21 UTC
Created attachment 397619 [details]
Complete logs from morph cluster
GFS2 uses the transaction lock for one reason only, and that is suspending the filesystem. So assuming that no suspends were being done at the time, there should only be PR requests going to the DLM, so that no conflicting locks should have been around. It looks like the lock as been requested with all the right flags, so I think that we are probably waiting on dlm recovery in this case. Dave, can you spot anything that looks odd in the above logs? We really need more data to figure this out; nothing seems very interesting in /var/log/messages. It looks like userspace recovery may have not re-enabled dlm/gfs in the kernel, data from group_tool would tell us. Or, if it's dlm kernel recovery that's not completing, then <dlm log_debug="1"/>, or at least a ps -o pid,stat,cmd,wchan. Nate, can you reproduce this in order to get the info which Dave is requesting? This seems to have been stuck for a long time now. I'm going to close this is no more information is available. |