Bug 164324
Summary: | gfs oops in gfs_wipe_buffers | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Curtis Zinzilieta <curtisz> |
Component: | gfs | Assignee: | Ben Marzinski <bmarzins> |
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | djansa, rkenna |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2005-740 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-10-07 16:57:08 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 165449 |
Description
Curtis Zinzilieta
2005-07-26 22:02:35 UTC
Please let me know if this is reproduceable. *** Bug 166293 has been marked as a duplicate of this bug. *** o.k. I guess it is reproduceable O.k. I found a bug in the depend_sync_old code, that could definitely cause this error. Only problem is, I'm not totally sure that it *IS* causing this error, and I'm even fuzzier on how it would cause 166293. My best guess is that the stack trace for 166293 is incompelete, and that it is exactly the same bug. Here's the delema. in depend_sync_old, if it takes longer than "depend_secs" (which is a tuneable parameter set to 60 seconds by default) to sync all the old depenent inodes to disk, bad things happen, and you end up overwriting the resource group descriptor structure. If you manage to trash this structure without crashing, on the next loop, this bug is exactly what you would definitely see. This explains why we saw it with gnbd. Using gnbd, it would take longer to sync the inodes to disk. I knocked down depend_secs to 0, and I can hit this bug within minutes, every time. The problem is, I always crash while mucking with the structure. However, I don't think that you must always crash. (i.e. when you access what you think should be a pointer, it is actually a pointer in the rgd structure. There's no place where the memory that you access will never have a valid value). I think the reason that I always crash early has something to do with knocking the depend_secs down, so that other parts of the rgd don't have time to be set to valid values. If we could reproduce this bug reliably, we could verify a fix. But I can't see another way for this error to happen, and this bug could definitely cause it. Unless someone can recreate this problem which my change in, I'm calling this bug fixed An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-740.html |