Bug 161808
Summary: | nodes don't wait for first mounter to finish recovery | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | David Teigland <teigland> |
Component: | gfs | Assignee: | David Teigland <teigland> |
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2005-740 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-10-07 16:56:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Teigland
2005-06-27 15:15:58 UTC
Email from Ken: there is a chance of corruption. The scenario I see is: 1) 3 machines are mounted. 2) They all fail at the same time. 3) Machine A comes back up and starts replay on the three journals serially. 4) Machine B comes back up, replays it's own journal really quickly while Machine A is still working on the first journal. 5) Machine B starts a workload and comes across blocks that are inconsistent because the third journal hasn't been replayed yet. Because all the machines died, there are not expired locks to protect the data. In order to hit the failure case, you always need at least three nodes to have been mounted at one time or another. But not all three nodes need to be running at the power failure time. (They key is that there must be a dirty journal beyond the first two to be mounted.) Fixed on RHEL4 and STABLE branches. The likelihood of this bug causing a problem or corruption is even smaller than originally thought. Even if the lock module doesn't prevent other mounts until first recovery is done, there's a gfs lock the other mounters block on that has nearly the same effect. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-740.html |