Description of problem: This ability would allow QA to continue running revolver with the debug mount option (in order to acquire better debugging in case of real failures) while at the same time cause I/O errors on the filesystem as one of the failure injection methods. Version-Release number of selected component (if applicable): gfs-utils-0.1.18-1.el5 kmod-gfs-0.1.31-3.el5
What "better debugging" are you looking for? Today, the "-o debug" mount option for GFS doesn't provide additional GFS messages. It only causes GFS to call BUG() and dump the call stack when it does a withdraw. Are you saying that you don't want the subsequent kernel panic()?
We definitely need the call stack for when legitimate corruption is detected. So if that's only way to grab that now, then it would be nice if it didn't panic afterwards. I wonder if anyone besides QA mounts with the debug flag and what their thoughts would be about just removing the panic altogether.
I need to dig into this a little deeper. I know that RHEL4 handles withdrawing on errors differently from RHEL5. RHEL4 did a BUG_ON but RHEL5 is a little more graceful. Adding Dave T and Steve W to the cc list to get a historical perspective on the topic. I wasn't part of the design process so I don't know the impact of changing this. This seems more like a design issue to me, although I can see its value in debugging. Is this something that we should do for RHEL6 rather than RHEL5.x?
The error handling in gfs is pretty poor really. I don't really expect that it will make a lot of difference what is done. As Bob says, the debug option doesn't change any of the messages which are printed, and if the fs withdraws then it usually means something pretty serious has gone wrong and there is not anything which can be done to recover from it. We hope to improve on this in gfs2 by adding an errors= mount option along the lines of ext2/3/4 and being able to handle more of the possible fs errors by returning I/O errors to the user rather than panicing. I'm not at all sure that there is anything that we can reasonably change in gfs though at this stage.
There are two issues here: 1. mount option to enable extra messages (unfortunately there are very few) 2. mount option to enable panic instead of withdraw (on i/o errors) Both make sense, and we need both (I've seen customer requests for each). The "problem" is quite trivial: they don't have distinct mount options. We should let "debug" mean one of them (probably 1), and add a new option for the other.
*** Bug 509233 has been marked as a duplicate of this bug. ***
There are really two general classes of GFS errors: (1) withdraw problems due to file system inconsistency, and (2) run-time errors (for example, memory corruption) that cause an assertion error. So really there should be two mount options: ar_debug, which means to BUG on withdraw, and -o panic_on_assert (versus BUG() on assert).
Created attachment 355866 [details] FIrst crack at a patch This is a first-stab and is in no way complete. At the very least I need to change the man pages. Just thought I'd post what I have so far. This implements "errors=panic|continue|remount-ro" similar to ext3. If none of the three are specified, the default behavior remains unchanged from how it is today. That can be restored with errors=default.
Created attachment 356067 [details] Second prototype patch This patch adds the missing man page elements and fixes a bug I spotted.
*** Bug 461065 has been marked as a duplicate of this bug. ***
As to comment #2, I used the debug mount option by default in all Dell cluster deployments. The reason is that if GFS only withdraws, then the system didn't get fenced, so any running services simply stopped working, instead of failing over. The panic resolved this by causing it to stop responding to the cluster, and get fenced.
Patch tested and works fine, thanks!
I think that we need to be a bit careful here.... There are some issues which we need to resolve before we can go ahead with this in order to avoid shooting ourselves in the foot with gfs2 compatibility. The most important is that there is a failure to distinguish between two important classes of error. One class is those errors caused by reading something on disk which is known to be wrong. Another class of error relates to the internal state being incorrect for some reason. In the case of the first class, then it is ok to do things like "continue", or "remount-ro" for example. We know that the information that GFS2 is referring to is in the main correct and can be used to make sensible decisions as to how to recover from the error. In the second case, we must not rely on the apparent state of the filesystem since it may well be wrong. In specific cases it might be possible to recover from errors in certain ways, but we cannot really place these under any generic error handling scheme and we must at least withdraw/BUG() to inform the user. Failure to stop the execution path in these cases can lead to losing all the data on an otherwise correct filesystem. So the main task here is not in adding the options to the kernel command line, but in auditing each and every caller in order to put them in the correct class. Also, I'd prefer to call the "default" option "withdraw" since that describes what it does. I'm also rather concerned at doing this development in gfs1 rather than gfs2. It really ought to be done in gfs2 first and back-ported if required.
panic on errors instead of withdraw is an essential feature, and a lot of people depend on it. It's the only way to get reliable recovery when error originate in the fs. And it's the way gfs worked for ages before the ill-advised withdraw "feature" came along. Hiding this old behavior under option named "debug" was an unfortunate choice; it should have had its own option with better name. (I didn't realize the panic behavior had been rem altogether at some point, that's a major regression.) I'd urge not adding the other forms of error handling to gfs1, they will tu into big cans of worms when people try to use them. Just add back the simp panic behavior under -o debug or an option with a better name. People should also be encouraged to use this panic behavior since it results in much more reliable failure handling. (This is all independent of the fence_scsi issues.)
I have to agree (comment #16) that withdraw is not an easy thing to do correctly and the current implementation doesn't really work for many (any?) cases. Also I'd be quite happy to be rid of it at some future stage if there is no pressing need to retain that feature. If we do keep withdraw, then I'd prefer to change the way it operates so that the fs would internally ensure that it would no longer send any I/O before sending the uevent so that the current system of using dm to turn off I/O through the device would not be needed. Ideally, if the cause of the withdraw was not related to the journal, the fs would write a final record into the journal describing what went wrong, and flushing any I/O which could still be processed correctly. I suspect that (comment #19) errors=remount-ro will not work correctly in gfs1 because gfs_controld assumes that it can know the ro/rw state of the filesystem by catching the state changes via mount.gfs. Obviously if this change happens internally to the fs, it will no longer know whether its ok to ask the node to recover another node's journal. Recently in gfs2 I have added an ONLINE uevent which could, potentially, be used by gfs_controld to monitor that state change. Currently none of the userland tools make use of it though. As I mentioned in the earlier comment (comment #13) we do need to review all of the error handling carefully and be sure to distinguish the different types of error from each other. Some can be recovered from, and others cannot and we will have to look at them on a case by case basis. Applying a blanket policy is unlikely to have the desired effect. Returning to the issue which sparked this off, it appears from the parallel email exchange that there may be other issues to consider too (wrt fence_scsi) and that the originally proposed patch will not fix the whole problem anyway.
I've never configured a machine to automatically reboot after a panic myself; it looks like you have to set the kernel.panic sysctl to get that behavior. A withdraw will definately not reboot -- the whole idea of withdraw is to leave the machine running and in the cluster, but disable the specific fs/storage causing errors.
One thought - withdraws generate a uevent. It would be trivial to write a userland program to watch for those and call reboot if one occurs. Would that fix the issue? It also has the advantage of not needing a hot fix to the kernel.
We need a gfs mount option that will result in panic on i/o errors instead of a withdraw. Like -o debug did; it's a simple regression on its own. There is nothing more needed AFAIK.
Agree with Comment #25. Although, as per Comment #24, if the uevent could trigger a fence of the node, that would also suffice. The main issue I saw on the withdraw is that any service using that storage would not relocate to another node on I/O errors. Fencing/rebooting/panic would provide that failover.
Adding Nate Straz to the cc list as per this morning's gfs meeting.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: A new mount option (-o errors=continue) will be added to the GFS file system for Red Hat Enterprise Linux version 5.5. It will not be available in prior releases. The option controls how GFS behaves in the unlikely event that a file system error occurs. The normal behaviour is to withdraw from the file system and make it inaccessible until the next reboot. If -o errors=continue is specified, the file system will report the error as a kernel error but the error will be otherwise ignored. This mount option is intended for file system developers and quality testers only and is not intended for general use.
The new mount options introduced in the name of this bugzilla record won't be available until 5.5. Therefore, we may need a release note for 5.5. Since the options are not available in 5.4 I don't think it warrants a release note for 5.4.
I don't think we can get this done by the 5.6 cutoff. We have a prototype patch but there are too many potential pitfalls, so it requires a lot of testing. Punting it to 5.7.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release.
Although we've got a working patch for this, there really isn't a demand for this change that warrants the amount of work. I spoke with Corey and Nate about it and they agreed we could close it, at least for GFS1. We'll keep the options open for GFS2. Closing as WONTFIX until there's a customer need.