Bug 142369 - ext3 filesystem panic on error fails to work
Summary: ext3 filesystem panic on error fails to work
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-12-09 10:14 UTC by Morten Sylvest Olsen
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-19 19:11:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Morten Sylvest Olsen 2004-12-09 10:14:25 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
I've tuned the filesystem to panic on error, instead of remounting
read-only. After pulling out FC cables so that all IO to the device is
suspended, it tries to panic in line 196 of super.c.

But the kernel doesn't panic. Isn't it supposed to use ext3_panic
which make sure that panic doesn't try to sync the filesystem?


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
0. echo "10" >/proc/sys/kernel/panic
1. Mount ext3 filesystem
2. Disable io to the device (pull out Fibre Channel cables)
3. Create io to the filesystem
4. ext3 says it will panic but the panic does not work    

Actual Results:  Everything continues normally execept the filesystem
on the borked device

Expected Results:  The kernel should have panic'ed

Additional info:

Comment 1 Ernie Petrides 2004-12-09 20:54:25 UTC
The functionality of /proc/sys/kernel/panic is described in the proc(5)
manual page as follows:

  "The file panic gives read/write access to the kernel variable
   panic_timeout.  If this is zero, the kernel will loop on a
   panic; if nonzero it indicates that the kernel should autoreboot
   after this number of seconds."

Perhaps you want to use the "errors=panic" mount option.


Comment 2 Morten Sylvest Olsen 2004-12-13 08:50:22 UTC
I think I have understood the meaning of this variable. I set it to 10 to have a
10 second timeout befoore autoreboot. I did not mount with errors=panic, I used
tune2fs to set the flag in the super-block. 

As you can see from the following excerpt from the log the EXT-3 filesystem
wanted to panic, but nothing happens, everything continues to run, because the
kernel is hung in sys_sync trying to sync the filesystem which just invoked
panic in the first place. This possibly happens because super.c calls panic
directly, instead of its own ext3_panic which disables sync for self.


Dec  9 10:12:14 aasdcm04a kernel: SCSI disk error : host 2 channel 0 id 1 lun 0
return code = 10000
Dec  9 10:12:14 aasdcm04a kernel:  I/O error: dev 08:31, sector 32
Dec  9 10:12:14 aasdcm04a kernel: raid1: sdd1: rescheduling block 32
Dec  9 10:12:14 aasdcm04a kernel: raid1: sdd1: unrecoverable I/O read error for
block 32
Dec  9 10:12:14 aasdcm04a kernel: EXT3-fs error (device md(9,1)):
ext3_get_inode_loc: unable to read inode block - inode=2, block=4
Dec  9 10:12:14 aasdcm04a kernel: Aborting journal on device md(9,1).
Dec  9 10:12:14 aasdcm04a kernel: SCSI disk error : host 2 channel 0 id 1 lun 0
return code = 10000
Dec  9 10:12:14 aasdcm04a kernel:  I/O error: dev 08:31, sector 3936
Dec  9 10:12:14 aasdcm04a kernel: Kernel panic: EXT3-fs (device md(9,1)): panic
forced after error
Dec  9 10:12:14 aasdcm04a kernel:
Dec  9 10:12:16 aasdcm04a heartbeat[5226]: info: Resetting node (null) with
[external STONITH device]
Dec  9 10:12:16 aasdcm04a heartbeat[5226]: info: Host (null) external-reset
initiating
Dec  9 10:12:16 aasdcm04a heartbeat[5226]: ERROR: command
'/usr/local/lib/xseries.sh aasdcm09a /etc/ha.d/stonith.nodes' failed
Dec  9 10:12:16 aasdcm04a heartbeat[5226]: ERROR: Host (null) not reset!
Dec  9 10:12:16 aasdcm04a heartbeat[3310]: ERROR: Exiting STONITH (null) process
5226 killed by signal 11.
Dec  9 10:12:16 aasdcm04a heartbeat[3310]: ERROR: STONITH of (null) failed. 
Retrying...
Dec  9 10:12:20 aasdcm04a watchdog[3718]: cannot stat
/shared/AASTEST-1/lost+found (errno = 2 = 'No such file or directory')
Dec  9 10:12:21 aasdcm04a heartbeat[5229]: info: Resetting node (null) with
[external STONITH device]
Dec  9 10:12:21 aasdcm04a heartbeat[5229]: info: Host (null) external-reset
initiating
Dec  9 10:12:21 aasdcm04a heartbeat[5229]: ERROR: command
'/usr/local/lib/xseries.sh aasdcm09a /etc/ha.d/stonith.nodes' failed
Dec  9 10:12:21 aasdcm04a heartbeat[5229]: ERROR: Host (null) not reset!
Dec  9 10:12:26 aasdcm04a heartbeat[5231]: info: Resetting node (null) with
[external STONITH device]
Dec  9 10:12:26 aasdcm04a heartbeat[5231]: info: Host (null) external-reset
initiating
Dec  9 10:12:26 aasdcm04a heartbeat[5231]: ERROR: command
'/usr/local/lib/xseries.sh aasdcm09a /etc/ha.d/stonith.nodes' failed
Dec  9 10:12:26 aasdcm04a heartbeat[5231]: ERROR: Host (null) not reset!
Dec  9 10:12:26 aasdcm04a heartbeat[3310]: ERROR: Exiting STONITH (null) process
5231 killed by signal 11.
Dec  9 10:12:26 aasdcm04a heartbeat[3310]: ERROR: STONITH of (null) failed. 
Retrying...
Dec  9 10:12:31 aasdcm04a heartbeat[5233]: info: Resetting node (null) with
[external STONITH device]
Dec  9 10:12:31 aasdcm04a heartbeat[5233]: info: Host (null) external-reset
initiating
Dec  9 10:12:31 aasdcm04a heartbeat[5233]: ERROR: command
'/usr/local/lib/xseries.sh aasdcm09a /etc/ha.d/stonith.nodes' failed
Dec  9 10:12:31 aasdcm04a heartbeat[5233]: ERROR: Host (null) not reset!
Dec  9 10:12:31 aasdcm04a heartbeat[3310]: ERROR: Exiting STONITH (null) process
5233 killed by signal 11.
Dec  9 10:12:31 aasdcm04a heartbeat[3310]: ERROR: STONITH of (null) failed. 
Retrying...
Dec  9 10:12:35 aasdcm04a watchdog[3718]: cannot stat
/shared/AASTEST-1/lost+found (errno = 2 = 'No such file or directory')
Dec  9 10:12:36 aasdcm04a heartbeat[5236]: info: Resetting node (null) with
[external STONITH device]
Dec  9 10:12:36 aasdcm04a heartbeat[5236]: info: Host (null) external-reset
initiating
Dec  9 10:12:36 aasdcm04a heartbeat[5236]: ERROR: command
'/usr/local/lib/xseries.sh aasdcm09a /etc/ha.d/stonith.nodes' failed
Dec  9 10:12:36 aasdcm04a heartbeat[5236]: ERROR: Host (null) not reset!
Dec  9 10:12:36 aasdcm04a heartbeat[3310]: ERROR: Exiting STONITH (null) process
5236 killed by signal 11.
Dec  9 10:12:36 aasdcm04a heartbeat[3310]: ERROR: STONITH of (null) failed. 
Retrying...
Dec  9 10:12:41 aasdcm04a heartbeat[5238]: info: Resetting node (null) with
[external STONITH device]
Dec  9 10:12:41 aasdcm04a heartbeat[5238]: info: Host (null) external-reset
initiating
Dec  9 10:12:41 aasdcm04a heartbeat[5238]: ERROR: command
'/usr/local/lib/xseries.sh aasdcm09a /etc/ha.d/stonith.nodes' failed
Dec  9 10:12:41 aasdcm04a heartbeat[5238]: ERROR: Host (null) not reset!
Dec  9 10:12:41 aasdcm04a heartbeat[3310]: ERROR: Exiting STONITH (null) process
5238 killed by signal 11.
Dec  9 10:12:41 aasdcm04a heartbeat[3310]: ERROR: STONITH of (null) failed. 
Retrying...
Dec  9 10:12:46 aasdcm04a login(pam_unix)[3583]: session opened for user root by
LOGIN(uid=0)
Dec  9 10:12:46 aasdcm04a  -- root[3583]: ROOT LOGIN ON tty2

Comment 3 Ernie Petrides 2004-12-13 21:00:37 UTC
Okay, Morton.  Thanks for the clarification.  Bug assigned to Stephen.

Comment 4 RHEL Program Management 2007-10-19 19:11:48 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.


Note You need to log in before you can comment on or make changes to this bug.