Description of problem: Kernel oops in assertion checking during the removal of a huge file on an EXT3 fs. Version-Release number of selected component (if applicable): Latest 7.3 kernel : 2.4.18-19.7.x How reproducible: always reproducible. Steps to Reproduce: I have a RAID5 IDE-SCSI system, providing 1.2To free space. I partition it with a single partition, format it as EXT3 fs with standard options (mke2fs -j /dev/sda1), I mount it, and create a file that uses all the available space : mount /dev/sda1 /mnt/raid -text3 cd /mnt/raid dd if=/dev/zero of=foo bs=4096 --> the commands stops when no space left is available. Then I remove the file. root@dhcp-14 /mnt/raid > ls -l total 1211403296 -rw-r--r-- 1 root root 1239265550336 Jan 18 21:58 foo drwx------ 2 root root 16384 Jan 18 15:21 lost+found root@dhcp-14 /mnt/raid > df -k Filesystem 1k-blocks Used Available Use% Mounted on /dev/hda1 6190664 1943744 3932452 34% / /dev/hda3 51882048 70480 49176108 1% /home none 451460 0 451460 0% /dev/shm /dev/sda1 1211436108 1211436108 0 100% /mnt/raid root@dhcp-14 /mnt/raid > /bin/rm -f /mnt/raid/foo Segmentation fault root@dhcp-14 /mnt/raid > Message from syslogd@dhcp-14 at Mon Jan 20 10:53:10 2003 ... dhcp-14 kernel: Assertion failure in do_get_write_access() at transaction.c:715: "handle->h_buffer_credits > 0" The oops is : kernel BUG at transaction.c:715! invalid operand: 0000 agpgart nfsd lockd sunrpc autofs 8139too mii ide-cd cdrom ext3 jbd aic7xxx sd_mo d scsi_mod CPU: 0 EIP: 0010:[<f884daee>] Not tainted EFLAGS: 00010286 EIP is at do_get_write_access [jbd] 0x336 (2.4.18-19.7.x) eax: 00000063 ebx: 00000000 ecx: 00000000 edx: f6e68000 esi: e0803b10 edi: f4311e00 ebp: c334e9f0 esp: ee78fcc4 ds: 0018 es: 0018 ss: 0018 Process rm (pid: 2020, stackpage=ee78f000) Stack: f8853ce0 f8853622 f8853570 000002cb f8853636 00000000 00000000 ef99dee0 d467d000 f6d28800 cdc9c8a0 f884de39 ef99dee0 cdc9c8a0 00000001 f4311e00 00000000 d467d000 f6d28800 ef99dee0 f885a468 ef99dee0 e0803b10 00000001 Call Trace: [<f8853ce0>] .rodata.str1.32 [jbd] 0x20 (0xee78fcc4)) [<f8853622>] .rodata.str1.1 [jbd] 0xc2 (0xee78fcc8)) [<f8853570>] .rodata.str1.1 [jbd] 0x10 (0xee78fccc)) [<f8853636>] .rodata.str1.1 [jbd] 0xd6 (0xee78fcd4)) [<f884de39>] journal_get_undo_access_Rc7bbb223 [jbd] 0x3d (0xee78fcf0)) [<f885a468>] ext3_free_blocks [ext3] 0x174 (0xee78fd14)) [<f8851f6f>] log_start_commit_R335a5fa9 [jbd] 0x3b (0xee78fd50)) [<f884d6a5>] journal_restart_R3772af02 [jbd] 0xc1 (0xee78fd58)) [<f884d272>] start_this_handle [jbd] 0x122 (0xee78fd5c)) [<c0114bfd>] schedule [kernel] 0x20d (0xee78fd78)) [<f885e7ef>] ext3_clear_blocks [ext3] 0x137 (0xee78fd90)) [<c011eb7f>] update_process_times [kernel] 0x23 (0xee78fda8)) [<c011ea56>] update_wall_time [kernel] 0xe (0xee78fdbc)) [<c011ebc7>] timer_bh [kernel] 0x27 (0xee78fdc4)) [<c011ef41>] do_timer [kernel] 0x45 (0xee78fdd0)) [<f885e903>] ext3_free_data [ext3] 0x10b (0xee78fdf4)) [<c0139f54>] try_to_free_buffers [kernel] 0x8c (0xee78fe40)) [<f885eccc>] ext3_truncate [ext3] 0x128 (0xee78fe6c)) [<f884d272>] start_this_handle [jbd] 0x122 (0xee78fea8)) [<f884d379>] journal_start_R2d67426b [jbd] 0x7d (0xee78fec0)) [<f885cc23>] start_transaction [ext3] 0x57 (0xee78fed8)) [<f885cd64>] ext3_delete_inode [ext3] 0xa0 (0xee78fefc)) [<f8868de0>] ext3_sops [ext3] 0x0 (0xee78ff10)) [<f885ccc4>] ext3_delete_inode [ext3] 0x0 (0xee78ff18)) [<f8868de0>] ext3_sops [ext3] 0x0 (0xee78ff1c)) [<c0148ccf>] iput [kernel] 0xf7 (0xee78ff20)) [<f88618ad>] ext3_unlink [ext3] 0x1a5 (0xee78ff30)) [<c014735a>] d_delete [kernel] 0x4e (0xee78ff44)) [<c0140950>] vfs_unlink [kernel] 0x154 (0xee78ff64)) [<c0140a09>] sys_unlink [kernel] 0x85 (0xee78ff88)) [<c01138dc>] do_page_fault [kernel] 0x0 (0xee78ffb0)) [<c0108623>] system_call [kernel] 0x33 (0xee78ffc0)) Code: 0f 0b cb 02 70 35 85 f8 83 c4 14 8b 4c 24 1c 8b 41 04 48 8b Running the oops through ksymoops adds this information : >>EIP; f884daee <[jbd]do_get_write_access+336/464> <===== Trace; f8853ce0 <[jbd].rodata.end+781/4e2d> Trace; f8853622 <[jbd].rodata.end+c3/4e2d> Trace; f8853570 <[jbd].rodata.end+11/4e2d> Trace; f8853636 <[jbd].rodata.end+d7/4e2d> Trace; f884de39 <[jbd]journal_get_undo_access+3d/fc> Trace; f885a468 <[ext3].text.start+408/a397> Trace; f8851f6f <[jbd]log_start_commit+3b/40> Trace; f884d6a5 <[jbd]journal_restart+c1/d8> Trace; f884d272 <[jbd]__kstrtab_journal_init_inode+18/26> Trace; c0114bfd <schedule+20d/230> Trace; f885e7ef <[ext3].text.start+478f/a397> Trace; c0138000 <getblk+18/3c> Trace; f885d85f <[ext3].text.start+37ff/a397> Trace; f885e903 <[ext3].text.start+48a3/a397> Trace; f884dbfc <[jbd]do_get_write_access+444/464> Trace; f885eccc <[ext3].text.start+4c6c/a397> Trace; f884d272 <[jbd]__kstrtab_journal_init_inode+18/26> Trace; f884d379 <[jbd]__kstrtab_journal_destroy+a/1a> Trace; f885cc23 <[ext3].text.start+2bc3/a397> Trace; f885cd64 <[ext3].text.start+2d04/a397> Trace; f884e54d <[jbd]journal_stop+1b1/1c0> Trace; f885ccc4 <[ext3].text.start+2c64/a397> Trace; f8868de0 <[ext3].data.start+200/2bf> Trace; c0148ccf <iput+f7/1d8> Trace; f88618ad <[ext3].text.start+784d/a397> Trace; c014735a <d_delete+4e/70> Trace; c0140950 <vfs_unlink+154/188> Trace; c0140a09 <sys_unlink+85/f0> Trace; c01138dc <do_page_fault+0/428> Trace; c0108623 <system_call+33/38> Code; f884daee <[jbd]do_get_write_access+336/464> 00000000 <_EIP>: Code; f884daee <[jbd]do_get_write_access+336/464> <===== 0: 0f 0b ud2a <===== Code; f884daf0 <[jbd]do_get_write_access+338/464> 2: cb lret Code; f884daf1 <[jbd]do_get_write_access+339/464> 3: 02 70 35 add 0x35(%eax),%dh Code; f884daf4 <[jbd]do_get_write_access+33c/464> 6: 85 f8 test %edi,%eax Code; f884daf6 <[jbd]do_get_write_access+33e/464> 8: 83 c4 14 add $0x14,%esp Code; f884daf9 <[jbd]do_get_write_access+341/464> b: 8b 4c 24 1c mov 0x1c(%esp,1),%ecx Code; f884dafd <[jbd]do_get_write_access+345/464> f: 8b 41 04 mov 0x4(%ecx),%eax Code; f884db00 <[jbd]do_get_write_access+348/464> 12: 48 dec %eax Code; f884db01 <[jbd]do_get_write_access+349/464> 13: 8b 00 mov (%eax),%eax 2 warnings and 5 errors issued. Results may not be reliable.
I obtain the same behaviour on phoebe too, kernel 2.4.20-2.21.
Thanks very much for this report --- it has allowed me to find a problem that I've been chasing for some time. The bug occurs when the "i_blocks" count in the file's inode overflows past 2^31. That works fine most of the time, because i_blocks is an unsigned long, and should go up to 2^32; but there's a place in truncate where ext3 calculates the size of the next transaction chunk for the delete, and that mistakenly uses a signed long instead. Because the huge i_blocks gets cast to a negative value, ext3 does not reserve enough credits for the transaction and the above error results. The attached patch fixes things for me in testing, and it will be in a subsequent errata.
Created attachment 89813 [details] FIx signed use of unsigned i_blocks
*** Bug 52378 has been marked as a duplicate of this bug. ***
The patch you submitted at https://listman.redhat.com/pipermail/ext3-users/2003-February/004726.html, applied on the kernel from phoebe, solved this problem for me. Thank you Stephen.
Thanks for testing it!
Which RH 7.3 kernel incorporates this change? Is it in 8.0+? We've just seen it on an IBM x345 w/ ServeRAID 5i controller, and were sticking with the stock 7.3 kernel for IBM's supported SR driver. I need to know what RH kernel includes this fix, so I can find out if IBM supports it yet. Thanks, Chris Pepper
It's definitely in 9, and is probably in recent errata for older kernels as a result, but I don't have information handy about exactly which errata it might have first been included in.