Bug 494927
Summary: | Read-only filesystem after 'ext3_free_blocks_sb: bit already cleared for block' errors | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Gordon Messmer <gordon.messmer> | ||||||||||
Component: | kernel | Assignee: | Eric Sandeen <esandeen> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Eryu Guan <eguan> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | 5.3 | CC: | alberto.munoz, albert.solana, cww, daniel.brnak, devin.nate, dharmong, dhoward, dwysocha, eguan, esandeen, gatis.visnevskis, gbailey, gdr, herrold, izdotson, jaeshin, james.brown, jbacik, jbrier, jduncan, jpallich, justin.gargano, lfarkas, lmcilroy, l.wandrebeck, madhusudan-h.kulkarni, mark, mgoodwin, mounesh.b, pcaldes, qcai, richlv, rwheeler, scott.mohnkern, sleepingprocess, sputhenp, tao, tengfei.lin | ||||||||||
Target Milestone: | rc | Keywords: | Reopened, ZStream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: |
Previously, a race condition between the journal_write_metadata_buffer() and jbd_unlock_bh_state() functions could occur. Consequently, another thread could call the get_write_access() function on the buffer head and cause the wrong data to be written into the journal. If the system terminated unexpectedly or was shut down incorrectly, subsequent file system journal replay could result in file system corruption. This update fixes the race condition and the file system corruption no longer occurs in the described scenario.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-07-21 09:46:27 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | 696843 | ||||||||||||
Bug Blocks: | 833764 | ||||||||||||
Attachments: |
|
Description
Gordon Messmer
2009-04-08 18:20:39 UTC
Fortunately, on the second host, the /var partition wasn't the one that went read-only, so a little more information is available from /var/log/messages: Apr 7 01:39:54 bucket kernel: Linux version 2.6.18-128.1.6.el5 (mockbuild.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Apr 1 09:10:25 EDT 2009 ... Apr 8 03:41:54 bucket kernel: EXT3-fs error (device sdc1): ext3_free_blocks_sb: bit already cleared for block 170872073 Apr 8 03:41:54 bucket kernel: Aborting journal on device sdc1. Apr 8 03:41:54 bucket kernel: ext3_abort called. Apr 8 03:41:54 bucket kernel: EXT3-fs error (device sdc1): ext3_journal_start_sb: Detected aborted journal Apr 8 03:41:54 bucket kernel: Remounting filesystem read-only Apr 8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted Apr 8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in ext3_truncate: Journal has aborted Apr 8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted Apr 8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in ext3_orphan_del: Journal has aborted Apr 8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted Apr 8 03:41:54 bucket kernel: __journal_remove_journal_head: freeing b_committed_data Apr 8 03:41:54 bucket last message repeated 18 times I set up a test environment under KVM on Fedora 10. The test system was running CentOS 5.3 with the latest kernel: # uname -a Linux backup 2.6.18-128.1.6.el5 #1 SMP Wed Apr 1 09:19:18 EDT 2009 i686 athlon i386 GNU/Linux This is an identical setup to the VM that I first saw fail. Overnight, I ran two instances of bonnie++ in a loop in different directories under /var/tmp. Additionally, I ran a loop copying /usr to a directory under /var/tmp and then removing it. After around 12 hours, the rsync loop quit when it could not remove one of the files. The kernel recorded: Apr 14 09:10:44 backup kernel: attempt to access beyond end of device Apr 14 09:10:44 backup kernel: hda3: rw=0, want=1719008864, limit=3357585 I booted the system into single user mode to run fsck on the filesystems and found a considerable amount of corruption on the /var partition, presumably caused by the two "bonnie++" loops and the rsync loop. Running "fsck /dev/hda3" reported that the filesystem was clean, despite the fact that the kernel had noticed an attempt to access beyond the end of the device. I ran "fsck -n -f /dev/hda3" to get a list of the errors on the filesystem. Its output will be attached. The damaged filesystem is still available if any further information about it would be helpful. While the systems exhibiting the problem are not RHEL systems, I believe it is likely that they are affected. Please help me confirm this bug. Created attachment 339534 [details]
Output from "fsck -n -f /dev/hda3"
the problem in comment #2 looks like a different problem, FWIW - at least a different end result. Both could be caused by any sort of corruption, including kernel in-memory corruption, or hardware problems. Or bugs ... I can try the same test mentioned in #2 to see if I can reproduce on RHEL5.3. this problem also exists in kernel version 2.6.18-128.1.10.el5xen Similar problems show up on RHEL4.7 (kernel 2.6.9-78.0.17smp). Also verified with "fsstress" utility. (thanks to LKML) Any ext3 filesystem shows errors if run with arguments like this: mkfs.ext3 /dev/ram0 mount -t ext3 /dev/ram0 /mnt/test_file_system mkdir -p /mnt/test_file_system/work fsstress -d /mnt/test_file_system/work -p 3 -l 0 -n 100000000 Will try to test other kernel versions in short time. Verified on latest RHEL5 update: 2.6.18-128.1.14.el5 #1 SMP Mon Jun 1 15:52:36 EDT 2009 i686 i686 i386 GNU/Linux problem exists. Verified on old unpatched RHEL3: 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 i686 i386 GNU/Linux kernel panic trying to umount ext3 fs. Verified on very old RedHat 7.2 :) 2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown no problems. ok, unfair, there is only ext2. (In reply to comment #7) > Similar problems show up on RHEL4.7 (kernel 2.6.9-78.0.17smp). Similar, or same? bug #476533 reports this should be fixed in 78.0.14.EL so perhaps there is yet another root cause if it's the exact same problem. (In reply to comment #7) > Verified on latest RHEL5 update: > > 2.6.18-128.1.14.el5 #1 SMP Mon Jun 1 15:52:36 EDT 2009 i686 i686 i386 GNU/Linux > > problem exists. This kernel also has the same fix as bug #476533, fixed in bug #225172 You did the same test as in comment #6? Thanks, -Eric OK, short summary: We have pre-production system RHEL4.7 (kernel -78.0.17). Got error "ext3_free_blocks_sb: bit already cleared for block" on two different servers. Database server produces lot of oracle archivelog files. 100gb filesystem sometimes get full, then older files are deleted. Error appears after 2 months of testing. Application server writes to some 20-30 logfiles simultaneously. Error appears after performance testing, within couple days. This error "ext3_free_blocks_sb: bit already cleared for block" should be fixed in 78.0.14, right? Then i started to search for test tools. Found this: http://lkml.org/lkml/2008/11/14/121 fixed in vanilla kernel 2.6.29 Changelog here: http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 Exaple: # uname -a Linux gasha-rhel5 2.6.18-128.1.14.el5 fsck errors: /dev/ram2: Inode 251, i_size is 205449, should be 995328. FIXED. /dev/ram2: Inode 3416, i_size is 654163, should be 1212416. FIXED. /dev/ram2: Inode 4032, i_size is 868310, should be 1138688. FIXED. /dev/ram2: Inode 3530, i_size is 1458533, should be 1900544. FIXED. Compiled fresh vanilla 2.6.30 kernel with "make oldconfig". gasha-rhel5 kernel: Linux version 2.6.30 (root@gasha-rhel5) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Sun Jun 28 00:39:51 EEST 2009 No fsck errors. I cannot say, that these two error messages have the same roots. Then i need to run these stress tests longer. Ok, this bug is ranging all over now... It was opened for a very distinct error message: Read-only filesystem after 'ext3_free_blocks_sb: bit already cleared for block' errors But the testcase in comment #9, while it does seem to be an issue in RHEL5 that I'll investigate, looks like it exposes a different problem. It also appears to provoke a problem yet upstream, though manifesting differently. However, I'm still trying to reproduce anything on RHEL5 that looks like the original bug reported here. -Eric Unfortunately, I think this bug started out ranging all over, and I take responsibility for that. I believe that I initially saw this bug on a system early in my deploying 5.3. I was working on a system whose RAID array had failed, and didn't save information on the failure because I suspected the hardware. Subsequent hardware tests did not reveal any problems. This system wasn't mentioned in my initial report, because I had no data. Later, I saw a similar failure on another system with a 3ware controller, which I did mention. I've recently determined that this system had hardware problems. This data should be ignored. That was "bucket", mentioned in comment #1. Unfortunately, since I opened the ticket with information from this host, the summary and half of the information I provided is incorrect. I'm still fairly confident that the KVM guest mentioned in comments #2 and #3 was a legitimate bug in Linux. Since the nature of the bug isn't clear to me, I'm not sure what kinds of corruption may result, so it may not be possible to determine whether or not all of the commenters are seeing the same corruption bug. We should probably focus on the test case in comment #9. Please let me know what additional information I can provide, or what testing I can do. Another data point ------------------ We've seen a similar problem (I think) on one RHEL4 machine (we've got 7 identical ones in terms of hardware, and its just this one. What we see is that the /var partition will just suddenly go read only. Of course the problem is, the minute it goes read only it can't write to the log files. A mount -o remount,rw /var does not clear the problem, only a reboot will do it. We're currently running the 2.6.9-89 SMP kernel. Scott, if there's any chance for a serial console, or a look at a physical console, maybe you can get more information about what's gone wrong? Or is "bit already cleared for block" the only info... Thanks, -Eric We're actually not even seeing that, it just suddenly flips to read only. Next time it happens I'll look at the physical console and see what I see. Unfortunately, it appears to be non-repeatedly, and inconsistent as to when it happens. Created attachment 354320 [details]
ext3 error after sync
we found ext3 errors 4 identical Linux machines (installed via jumpstart, kernel 2.6.18-128.el5, x86_64)
always in /var, which was remounted read only
attached console messages, on serial console, there is nothing more produced.
I'm still trying to reproduce error using "fsstress" utility on dedicated stress system. No luck. I think it has wrong pattern. Real thing should be 2-cpu server, and and simultaneous writes to MANY open files, at the same time. Some kind of race condition. For what it's worth, I've been poking around this all day today (i.e. testcase in comment #9 with fsstress). This is what I've found so far: Testing with: fsstress -d /mnt/test_file_system/work -p 3 -l 0 -n 100000000 -X. Some tests re-run with -p 1 to remove the possibility of parallel tasks impacting the test. RHEL 5.4 beta kernel version 2.6.18-157.el5 (otherwise a stock 5.3 install). No other processes are accesting the test_file_system. RHEL is running as a vmware esxi 4.0 guest, 2vcpu, 2GB ram. No errors in dmesg. 1. fails on ext2, blocksizes 1024, 2048, 4096, when mounted on a loop device (loop0) losetup against a file in /dev/shm (tmpfs). 2. fails on ext2, blocksizes 1024, 2048, 4096, when mounted on a ramdisk (ram0) using the new brd driver (part of rhel 5.4). 3. fails on ext3, blocksizes 1024, 2048, 4096, when mounted on a loop device (loop0) losetup against a file in /dev/shm (tmpfs), ext3 mount options data=writeback and data=ordered 4. fails on ext3, blocksizes 1024, 2048, 4096, when mounted on a ramdisk (ram0) using the new brd driver (part of rhel 5.4), ext3 mount options data=writeback and data=ordered 5. fails on ext3, blocksize default, when mounted on a loop device (loop0) losetup against a file in / (exiting ext3 filesystem). 6. succeeds on ext3 for the above, with data=journal mount option. 7. Basically, as best I can tell, ext3 with data=journal always works, and nothing else does. 8. Doesn't appear to be limited to ramdisk (or at least, loop devices on the actual hard drive fail also). 9. I tried ext4.. it also failed. I don't recall which variation I used above. It was mounted with default mount options. note: "Fail" is evidenced by some combination of errors similar to: + e2fsck -fvp /dev/ram0 /dev/ram0: HTREE directory inode 61596 has an invalid root node. HTREE INDEX CLEARED. /dev/ram0: HTREE directory inode 61604 has an invalid root node. HTREE INDEX CLEARED. /dev/ram0: HTREE directory inode 61632 has an invalid root node. HTREE INDEX CLEARED. /dev/ram0: HTREE directory inode 61644 has an invalid root node. HTREE INDEX CLEARED. /dev/ram0: HTREE directory inode 61660 has an invalid root node. HTREE INDEX CLEARED. /dev/ram0: Inode 69395, i_size is 902294, should be 1906688. FIXED. /dev/ram0: Inode 69723, i_size is 1277952, should be 1280000. FIXED. /dev/ram0: Inode 69781, i_size is 0, should be 939008. FIXED. /dev/ram0: Inode 70098, i_size is 662593, should be 1267712. FIXED. /dev/ram0: Inode 70227, i_size is 0, should be 330752. FIXED. /dev/ram0: Inode 70418, i_size is 892928, should be 896000. FIXED. /dev/ram0: Inode 70541, i_size is 380928, should be 382976. FIXED. /dev/ram0: Inode 71272, i_size is 503808, should be 506880. FIXED. /dev/ram0: Inode 71663, i_size is 2002944, should be 2004992. FIXED. /dev/ram0: Inode 72421, i_size is 348160, should be 350208. FIXED. /dev/ram0: Inode 73392, i_size is 958464, should be 960512. FIXED. /dev/ram0: Inode 73980, i_size is 434176, should be 437248. FIXED. /dev/ram0: Inode 74468, i_size is 466944, should be 470016. FIXED. /dev/ram0: Inode 76077, i_size is 200704, should be 202752. FIXED. /dev/ram0: Inode 71114, i_size is 0, should be 103424. FIXED. /dev/ram0: Inode 70462, i_size is 0, should be 72704. FIXED. /dev/ram0: Entry 'c638' in /work/p0/d1/d141/d3d7/d4ed (61644) has an incorrect filetype (was 3, should be 1). /dev/ram0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) "Success" was evidenced by no errors after repeated iterations (in my case, I let it run a couple hundred runs). Furthermore, I put a bunch of links into a comment I put against the RHEL 5.4 brd driver info discussing various aspects I found today. The info is viewable from rhel regarding the use of the new brd ramdisk driver in rhel 5.4: https://bugzilla.redhat.com/show_bug.cgi?id=480663 Thanks, Devin This bug has attracted several seemingly unrelated issues - please open distinct bugzilla tickets when you change the storage/virt/etc stack since all of these can produce unique issues. I'll open 30 tickets if necessary ;) I was kind of hoping that since Red Hat can see evidence of corruption in ext2, ext3, and ext4, that your team might be more interested in the internal processes than me. Please confirm you want new tickets for each combo? FYI, I'm now testing against a physical box (IBM server) as well. Same results. Thanks, Devin Nate Devin, your testcase & error seem to be on-disk-corruption and/or fsck related, with no mention of "bit already cleared for block" in the kernel messages, which was the original error report for this bug. That's why we'd like a new bug to track this seemingly distinct problem. If your failure case is always more or less the same basic corruption found, I'd be happy with 1 new bug from you, in which you clearly describe the testcase (attaching or pasting the test script you used would be great), and test environment, and also the problematic results you encountered after the tests. Thanks, -Eric Hi Eric, thank you very much. What/where/how would you like it opened? I appreciate the response, just want to make sure it gets appropriate attention (and really have no idea how to open a bug to accomplish that - in fact I've never opened a new rhel bug before). Any guidance would be helpful, and I'll upload all info. Currently compiling a stock kernel 2.6.30.4 to see there, based on the rhel 5.4 .config for the kernel. Thanks, Devin Devin, you can go here: https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%205 and file a bug agains the kernel component. Or if you're a RHEL customer, going through your support contacts is often encouraged as well. Thanks Eric. We are a RHEL customer.. have no idea who our support contact is. I've opened a new bug: https://bugzilla.redhat.com/show_bug.cgi?id=515529 If I could trouble you to take a glance at it to make sure I've opened it properly for redhat. Thanks, Devin I've just deployed a system with RedHat 5.3 and we are seeing the same issue: ------------------------------------------------------------------------ EXT3-fs error (device sdb1): ext3_free_blocks_sb: bit already cleared for block 731165448 Aborting journal on device sdb1. EXT3-fs error (device sdb1): ext3_free_blocks_sb: bit already cleared for block 731165449 EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sdb1) in ext3_truncate: Journal has aborted EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sdb1) in ext3_orphan_del: Journal has aborted EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sdb1) in ext3_delete_inode: Journal has aborted __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data ext3_abort called. EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only ------------------------------------------------------------------------ The issue occurs under heavy disk i/o while performing a Oracle RMAN restore. Our configuration is on a Dell with a Perc 6 (LSI/megasas) raid controller with an external eSATA storage device, MD1000(JBOD). Please let me know if there is anything I can do to help provide some insight into the problem. As a general plea for folks who are RHEL customers, please also ping your support channel about this problem, it'll help prioritize the bug. Thanks, -Eric Oh, and for anyone hitting the i_size problems on fsck post-fsstress, that issue has been moved over to bug #515529, if that's your issue you might want to cc: over there. Justin, I'll try to make some headway on this original bug soon, too. And for folks w/ RHEL4 problems, you need to open a different bug, or get your support folks to do it. Sorry but we have to keep things focused here ;) And now, back to "bit already cleared for block" on RHEL5.... Thanks, -Eric For those hitting the bit already cleared for block error: Let's try a jprobe to see what path we're hitting the error on, it may at least discern between types of blocks being freed and give a clue to what's happening. http://people.redhat.com/esandeen/bz494927/ has a pre-built jprobe module for x86_64, and source & a Makefile that could be rebuilt on any arch by putting the Makefile and ext3_error_jprobe.c in a directory, cd into that dir, and: # make -C /lib/modules/`uname -r`/build M=`pwd` to build the module. Then you can: # insmod ./ext3_error_jprobe.ko It will dump a stack backtrace to dmesg whenever ext3_error() is hit, while the module is loaded. Thanks, -Eric Oh, for the above "make" to work you need kernel-devel for the kernel matching: # uname -r installed as well. If anyone gets a nice backtrace when the bit already cleared for block error pops up, please add it to this bug. Thanks, -Eric (In reply to comment #24) We tracked this problem down to a faulty PERC 6 Raid controller. Once it was replaced, the error went away. Event posted on 11-05-2009 11:27am EST by mgoodwin Hi all, Comparing the previous event (Aug 28 2009) with this new event (Oct 31 2009), a smoking gun has emerged. Both events occurred at exactly the same time (23:53:01) on exactly the same filesystem (/var mounted on vg00-var, which is dm-7), and both reported free list corruption for exactly the same block (149827). Further more, in /etc/cron.d/sysstat, we have : # generate a daily summary of process accounting at 23:53 53 23 * * * root /usr/lib64/sa/sa2 -A This site is running sysstat-7.0.2-3.el5-x86_64. Here are the two log snippets : This event, Oct 31 : Oct 31 23:51:04 srpamp35 dhclient: DHCPREQUEST on eth2 to 148.126.2.5 port 67 Oct 31 23:51:43 srpamp35 last message repeated 3 times Oct 31 23:52:47 srpamp35 last message repeated 5 times Oct 31 23:52:52 srpadm62 snmpd[3108]: Connection from UDP: [148.126.25.56]:1725 Oct 31 23:53:01 srpamp74 kernel: jprobe: ext3_error Oct 31 23:53:01 srpamp74 syslogd: /var/log/messages: Read-only file system Oct 31 23:53:01 srpamp74 kernel: Oct 31 23:53:01 srpamp74 kernel: Call Trace: Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8850b017>] :ext3_error_jprobe:jext3_error+0x17/0x20 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8804a93c>] :ext3:ext3_free_blocks_sb+0x31b/0x48b Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8804aada>] :ext3:ext3_free_blocks+0x2e/0x79 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8804f07c>] :ext3:ext3_free_data+0x84/0xfe Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8804f839>] :ext3:ext3_truncate+0x567/0x8e1 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8003fa10>] pagevec_lookup+0x17/0x1e Oct 31 23:53:01 srpamp74 kernel: [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff88050256>] :ext3:ext3_delete_inode+0x7d/0xd4 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff880501d9>] :ext3:ext3_delete_inode+0x0/0xd4 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8002f49e>] generic_delete_inode+0xc6/0x143 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8003c75c>] do_unlinkat+0xd5/0x141 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84 Oct 31 23:53:01 srpamp74 kernel: [<ffffffff8005d116>] system_call+0x7e/0x83 Oct 31 23:53:01 srpamp74 kernel: Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7): ext3_free_blocks_sb: bit already cleared for block 149827 Oct 31 23:53:01 srpamp74 kernel: Aborting journal on device dm-7. Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_free_blocks_sb: Journal has aborted Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_free_blocks_sb: Journal has aborted Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_reserve_inode_write: Journal has aborted Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_truncate: Journal has aborted Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_reserve_inode_write: Journal has aborted Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_orphan_del: Journal has aborted Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_reserve_inode_write: Journal has aborted Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_delete_inode: Journal has aborted Oct 31 23:53:01 srpamp74 kernel: __journal_remove_journal_head: freeing b_committed_data Oct 31 23:53:01 srpamp74 kernel: journal commit I/O error Oct 31 23:53:01 srpamp74 kernel: ext3_abort called. Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7): ext3_journal_start_sb: Detected aborted journal Oct 31 23:53:01 srpamp74 kernel: Remounting filesystem read-only Previous event, on August 28 : Aug 28 23:51:26 srpamp74 kernel: scsi(0): Asynchronous PORT UPDATE ignored 0000/0007/0b00. Aug 28 23:53:01 srpamp74 syslogd: /var/log/messages: Read-only file system Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7): ext3_free_blocks_sb: bit already cleared for block 149827 Aug 28 23:53:01 srpamp74 kernel: Aborting journal on device dm-7. Aug 28 23:53:01 srpamp74 kernel: ext3_abort called. Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7): ext3_journal_start_sb: Detected aborted journal Aug 28 23:53:01 srpamp74 kernel: Remounting filesystem read-only Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_free_blocks_sb: Journal has aborted Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_reserve_inode_write: Journal has aborted Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_truncate: Journal has aborted Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_reserve_inode_write: Journal has aborted Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_orphan_del: Journal has aborted Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in ext3_reserve_inode_write: Journal has aborted Aug 28 23:53:01 srpamp74 kernel: __journal_remove_journal_head: freeing b_committed_data Aug 28 23:53:01 srpamp74 syslogd: /var/log/maillog: Read-only file system Aug 28 23:53:01 srpamp74 syslogd: /var/log/cron: Read-only file system Aug 28 23:53:01 srpamp74 auditd[4261]: Record was not written to disk (Read-only file system) Aug 28 23:53:01 srpamp74 auditd[4261]: write: Audit daemon detected an error writing an event to disk (Read-only file system) Aug 28 23:53:01 srpamp74 auditd[13430]: Audit daemon failed to exec (null) Aug 28 23:53:01 srpamp74 auditd[13430]: The audit daemon is exiting. John, Known bugs in the /usr/lib64/sa/sa2 script could be related here, maybe BZ 101574? (that BZ has an easy reproducer recipe, maybe give it a try). Also, since /dev/vg00/var is less than 1G, could we please get a compressed dd image? Might be worth investigating if the /var/log/sa directory inode is stored near block 149827. Cheers -- Mark Goodwin Internal Status set to 'Waiting on Support' This event sent from IssueTracker by mgoodwin issue 347967 Same problem here in a production virtual server running on vSphere 4.0.0 build 208111 servers. Virtual server info: Red Hat Enterprise Linux Server release 5.3 (Tikanga) kernel 2.6.18-128.2.1.el5 Issues only appears in filesystems mounted on SAN. Today I have three issues in two different filesystems (each one a LUN on EMC SAN). - message log: Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1): ext3_free_blocks_sb: bit already cleared for block 70722697 Jul 26 12:29:02 nobel kernel: Aborting journal on device sde1. Jul 26 12:29:02 nobel kernel: ext3_abort called. Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1): ext3_journal_start_sb: Detected aborted journal Jul 26 12:29:02 nobel kernel: Remounting filesystem read-only Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in ext3_reserve_inode_write: Journal has aborted Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in ext3_truncate: Journal has aborted Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in ext3_reserve_inode_write: Journal has aborted Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in ext3_orphan_del: Journal has aborted Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in ext3_reserve_inode_write: Journal has aborted Jul 26 12:29:02 nobel kernel: __journal_remove_journal_head: freeing b_committed_data Jul 26 12:35:24 nobel kernel: __journal_remove_journal_head: freeing b_committed_data Jul 26 12:35:24 nobel kernel: ext3_abort called. Jul 26 12:35:24 nobel kernel: EXT3-fs error (device sde1): ext3_put_super: Couldn't clean up the journal ==> Command to check filesystem: fsck.ext3 -f -C 0 -y /dev/sde1 Jul 26 14:43:05 nobel kernel: ext3_abort called. Jul 26 14:43:05 nobel kernel: EXT3-fs error (device sdg1): ext3_journal_start_sb: Detected aborted journal Jul 26 14:43:05 nobel kernel: Remounting filesystem read-only Jul 26 14:52:50 nobel kernel: ext3_abort called. Jul 26 14:52:50 nobel kernel: EXT3-fs error (device sdg1): ext3_put_super: Couldn't clean up the journal ==> Command to check filesystem: fsck.ext3 -C 0 -y /dev/sdg1 Jul 26 15:48:29 nobel kernel: EXT3-fs error (device sdg1): ext3_lookup: unlinked inode 22093862 in dir #18956368 Jul 26 15:48:29 nobel kernel: Aborting journal on device sdg1. Jul 26 15:48:29 nobel kernel: __journal_remove_journal_head: freeing b_committed_data Jul 26 15:48:29 nobel kernel: ext3_abort called. Jul 26 15:48:29 nobel kernel: EXT3-fs error (device sdg1): ext3_journal_start_sb: Detected aborted journal Jul 26 15:48:29 nobel kernel: Remounting filesystem read-only Jul 26 16:05:06 nobel kernel: ext3_abort called. Jul 26 16:05:06 nobel kernel: EXT3-fs error (device sdg1): ext3_put_super: Couldn't clean up the journal Jul 26 16:05:39 nobel kernel: kjournald starting. Commit interval 5 seconds Jul 26 16:05:39 nobel kernel: EXT3-fs warning (device sdg1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure Jul 26 16:05:39 nobel kernel: EXT3-fs warning (device sdg1): ext3_clear_journal_err: Marking fs in need of filesystem check. Jul 26 16:05:39 nobel kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended Jul 26 16:05:39 nobel kernel: EXT3 FS on sdg1, internal journal Jul 26 16:05:39 nobel kernel: EXT3-fs: recovery complete. Jul 26 16:05:39 nobel kernel: EXT3-fs: mounted filesystem with ordered data mode. ==> Command to check filesystem: fsck.ext3 -f -C 0 -y /dev/sdg1 The original poster to this bugzilla said that fsck did in fact fix this problem. For the person who commented in c #51, please open a support ticket so they can debug the problem further. Benefits of being at a storage conference, I was just reminded of this fix upstream: commit f25f624263445785b94f39739a6339ba9ed3275d Author: Jan Kara <jack> Date: Mon Jul 12 21:04:31 2010 +0200 ext3: Avoid filesystem corruption after a crash under heavy delete load It can happen that ext3_free_branches calls ext3_forget() for an indirect block in an earlier transaction than a transaction in which we clear pointer to this indirect block. Thus if we crash before a transaction clearing the block pointer is committed, we will see indirect block pointing to already freed blocks and complain during orphan list cleanup. The fix is simple: Make sure ext3_forget() is called in the transaction doing block pointer clearing. This is a backport of an ext4 fix by Amir G. <amir73il.net> Signed-off-by: Jan Kara <jack> which may well be the problem here. It would be interesting to investigate the image with debugfs and see if it is indeed an indirect block which points to a freed block. -Eric ext4 version upstream: commit 40389687382bf0ae71458e7c0f828137a438a956 Author: Amir G <amir73il.net> Date: Tue Jul 27 11:56:05 2010 -0400 ext4: Fix block bitmap inconsistencies after a crash when deleting files We have experienced bitmap inconsistencies after crash during file delete under heavy load. The crash is not file system related and I the following patch in ext4_free_branches() fixes the recovery problem. If the transaction is restarted and there is a crash before the new transaction is committed, then after recovery, the blocks that this indirect block points to have been freed, but the indirect block itself has not been freed and may still point to some of the free blocks (because of the ext4_forget()). So ext4_forget() should be called inside ext4_free_blocks() to avoid this problem. Signed-off-by: Amir Goldstein <amir73il.net> Signed-off-by: "Theodore Ts'o" <tytso> This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Eric, I'm setting up a bridged VM for you running RHEL 5.6, with the unpacked image available. Be ready in ~ 1 hour. There are plenty of ways to hit this warning; I keep finding new ones. For example, from yet another image: Multiply-claimed block(s) in inode 32802: 69654 69654 If that inode is unlinked, it will try to block 69654 twice, and the 2nd one would issue this error. There is more than one root cause which could get us to this error message. In my testing, commit f1015c447781729060c415f5133164c638561f25 has reduced a 1%-2% occurence of journal replay corruption to 0 (out of about 3500 replays). In most cases it's mild corruption but over hours of testing I have seen bit already cleared errors and some other more severe things. I will be pushing that commit for RHEL5 soon and I expect that it will resolve most of these issues, at least for customers who have the problem at any time after experiencing a crash. (Note that the corruption could "like dormant" until it was stumbled on again at runtime. Think for example of a rotated log that only gets deleted after 6 months, and -that- is when the problem shows up, 6 months after the crash...) -Eric This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Created attachment 493679 [details]
script to generate dirty snapshots
Invocation:
./dirty_snapshot -s $SOURCE_LVNAME /dev/$COW_DEVICE $SNAPSHOTNAME i.e.
./dirty_snapshot -s testvg-testlv /dev/sdb3 testlv_snap
to create a new dirty snapshot called "testlv_snap" from /dev/mapper/testvg-testlv, with /dev/sdb3 as the cow device.
./dirty_snapshot -d testvg-testlv /dev/sdb3 testlv_snap
removes it again.
Created attachment 493681 [details]
script to drive dirty_snapshot script
This script runs fsstress (which I grabbed from xfstests) constantly, while creating snapshots that require recovery.
It checks the recovered snapshot and if it's corrupted, makes a copy of it, and tries mounting it and removing all files to hit runtime errors.
You'll need to edit the invocation of the dirty snapshot script to give it the right parameters. It also expects to have fsstress in the same dir, and mountpoints mnt/ and mnt2/ as well. Sorry, it's a bit rough.
The main lv I tested was 128M in size, made with 4k blocks.
I saw some sort of corruption in about 1%-2% of the replayed filesystems.
Eric, you mentioned the following commits - do you still think we may need them? 40389687382bf0ae71458e7c0f828137a438a956 86963918965eb8fe0c8ae009e7c1b4c630f533d5 (In reply to comment #87) > Eric, you mentioned the following commits - do you still think we may need > them? > > 40389687382bf0ae71458e7c0f828137a438a956 > 86963918965eb8fe0c8ae009e7c1b4c630f533d5 Ultimately they probably should go in, but not as a last-minute 5.7 update, I think. 40389687382bf0ae71458e7c0f828137a438a956 requires a crash right in the middle of restarting a transaction while unlinking a large file, if I read it right. 86963918965eb8fe0c8ae009e7c1b4c630f533d5 is also explicitly related to crashes while deleting. In short, much narrower windows, and the commit I sent is most likely to be hit. Mark tells me you have your own testcase, though, so if you can still hit any problems with the one commit, we can talk again. :) Patch(es) available in kernel-2.6.18-261.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Hello, i have the same problem but with RHEL 5.5 x86. Ocurried 3 times in a server in less than two weeks in production. Here is the lines in /var/log/message May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): ext3_lookup: unlinked inode 3918926 in dir #5570561 May 27 02:31:16 hostname kernel: Aborting journal on device sdh. May 27 02:31:16 hostname kernel: __journal_remove_journal_head: freeing b_committed_data May 27 02:31:16 hostname kernel: ext3_abort called. May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): ext3_journal_start_sb: Detected aborted journal May 27 02:31:16 hostname kernel: Remounting filesystem read-only To solve this only works rebooting the machine. Is this the same problem? (In reply to comment #109) > Hello, > > i have the same problem but with RHEL 5.5 x86. Ocurried 3 times in a server in > less than two weeks in production. Here is the lines in /var/log/message > > > May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): ext3_lookup: > unlinked inode 3918926 in dir #5570561 > May 27 02:31:16 hostname kernel: Aborting journal on device sdh. > May 27 02:31:16 hostname kernel: __journal_remove_journal_head: freeing > b_committed_data > May 27 02:31:16 hostname kernel: ext3_abort called. > May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): > ext3_journal_start_sb: Detected aborted journal > May 27 02:31:16 hostname kernel: Remounting filesystem read-only > > To solve this only works rebooting the machine. > > Is this the same problem? Probably not. You might try to run e2sfck -f on the fs to see if there is corruption. If the error is ongoing, please open a case with your RHEL support contacts. Thanks, -Eric (In reply to comment #110) > (In reply to comment #109) > > Hello, > > > > i have the same problem but with RHEL 5.5 x86. Ocurried 3 times in a server in > > less than two weeks in production. Here is the lines in /var/log/message > > > > > > May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): ext3_lookup: > > unlinked inode 3918926 in dir #5570561 > > May 27 02:31:16 hostname kernel: Aborting journal on device sdh. > > May 27 02:31:16 hostname kernel: __journal_remove_journal_head: freeing > > b_committed_data > > May 27 02:31:16 hostname kernel: ext3_abort called. > > May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): > > ext3_journal_start_sb: Detected aborted journal > > May 27 02:31:16 hostname kernel: Remounting filesystem read-only > > > > To solve this only works rebooting the machine. > > > > Is this the same problem? > > Probably not. You might try to run e2sfck -f on the fs to see if there is > corruption. If the error is ongoing, please open a case with your RHEL support > contacts. > > Thanks, > -Eric Really this error only saw one time, the other two times the system crashed and it freezed all, ping works, ssh request login and password but when type password donĀ“t work. I think the other times the system was in read only mode in "/var" and "/" volume and i could not see the same error because it could not write into log. This system was in tests during three weeks and connected via fibre channel to DAS, always OK without any error. Two weeks in production and three crashes. I have kernel 2.6.18-194.el5PAE . Thanks An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html Same problem again in a production virtual server running on vSphere 4.0.0 build 208167 servers. Virtual server info: Red Hat Enterprise Linux Server release 5.3 (Tikanga) Issues only appears in filesystems mounted on SAN. We installed patched kernel 2.6.18-265.el5PAE on 23th of May. An issue appeared on 30th of September and server had installed the patched kernel 2.6.18-265.el5PAE (almost 3) That day I installed patched kernel 2.6.18-285.el5PAE. But another issue appeared yesterday (with patched kernel 2.6.18-285.el5PAE installed). Then I installed the last patched kernel: 2.6.18-286.el5PAE - message log: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 43055512 Aborting journal on device sdh1. EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sdh1) in ext3_truncate: Journal has aborted EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sdh1) in ext3_orphan_del: Journal has aborted EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sdh1) in ext3_delete_inode: Journal has aborted __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data ext3_abort called. EXT3-fs error (device sdh1): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only ==> Procedure to solve the problem: 1. Unmounting the file system (/dev/sdc14) 2. Nuke the ext3 journal tune2fs -O ^has_journal /dev/sdc14 3. check disk nohup e2fsck -fy -C 0 /dev/sdc14 & 4. Rebuild the journal tune2fs -j /dev/sdc14 5. Mounting the file system (/dev/sdc14) It's not clear to me why you would need to nuke the journal...? In any case, this bug is closed; based on our investigations we have shipped a fix for the root cause encountered by the original reporter. As Josef already mentioned in comment #52, if you have a new issue, please contact your Red Hat support channel and get a new issue logged. Thanks, -Eric Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, a race condition between the journal_write_metadata_buffer() and jbd_unlock_bh_state() functions could occur. Consequently, another thread could call the get_write_access() function on the buffer head and cause the wrong data to be written into the journal. If the system terminated unexpectedly or was shut down incorrectly, subsequent file system journal replay could result in file system corruption. This update fixes the race condition and the file system corruption no longer occurs in the described scenario. |