Bug 58833

Summary: ext3 BUG: transaction.c:708
Product: [Retired] Red Hat Linux Reporter: Michael K. Johnson <johnsonm>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: shishz
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-01-05 20:28:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The ksyms file corresponding to the original BUG() message
none
Oops from after reboot, trying to access the same filesystem (/home)
none
ksyms corresponding to the oops after reboot none

Description Michael K. Johnson 2002-01-25 15:20:52 UTC
With 2.4.9-13, athlon kernel, all filesystems ext3; during the night I
got a BUG() and then lots of processes went into disk wait.  The
following message was in the logs:

assertion failure in do_get_write_access() at transaction.c:708:
"handle->h_buffer_credits > 0"
------------[ cut here ]------------
kernel bug at transaction.c:708!
invalid operand: 0000
cpu:    0
eip:   
0010:[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1078144/96]
   not tainted
eip:    0010:[<d0805c80>]    not tainted
eflags: 00010282
eax: 00000021   ebx: 00000000   ecx: 00000001   edx: 0000e13b
esi: cf8f6200   edi: c3a3fe20   ebp: c4f98210   esp: ce0e7da0
ds: 0018   es: 0018   ss: 0018
process nautilus (pid: 4045, stackpage=ce0e7000)
stack: d080d9d0 000002c4 00000000 00000000 cf8f6200 cf7498f0 cf8f6294 cf8f6200 
       c3a3fe20 c4f98210 d0805db5 c3a3fe20 c4f98210 00000000 00000000 c1b2c000 
       c3a3fe20 c2f3c0b0 d0816ecf c3a3fe20 cf94ea80 00000000 d0805276 cf8f6200 
call trace:
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1046064/96]
__insmod_jbd_s.rodata_l96 [jbd] 0x2370 
call trace: [<d080d9d0>] __insmod_jbd_s.rodata_l96 [jbd] 0x2370 
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1077835/96]
journal_get_write_access_r2e4c654a [jbd] 0x35 
[<d0805db5>] journal_get_write_access_r2e4c654a [jbd] 0x35 
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1007921/96]
__insmod_ext3_s.text_l40820 [ext3] 0x6e6f 
[<d0816ecf>] __insmod_ext3_s.text_l40820 [ext3] 0x6e6f 
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1080714/96]
__insmod_jbd_s.text_l25316 [jbd] 0x216 
[<d0805276>] __insmod_jbd_s.text_l25316 [jbd] 0x216 
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1080469/96]
journal_start_r15fe2587 [jbd] 0xb7 
[<d080536b>] journal_start_r15fe2587 [jbd] 0xb7 
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1017067/96]
__insmod_ext3_s.text_l40820 [ext3] 0x4ab5 
[<d0814b15>] __insmod_ext3_s.text_l40820 [ext3] 0x4ab5 
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1017044/96]
__insmod_ext3_s.text_l40820 [ext3] 0x4acc 
[<d0814b2c>] __insmod_ext3_s.text_l40820 [ext3] 0x4acc 
[truncate_list_pages+337/352] truncate_list_pages [kernel] 0x151 
[<c0124691>] truncate_list_pages [kernel] 0x151 
[truncate_inode_pages+119/136] truncate_inode_pages [kernel] 0x77 
[<c0124717>] truncate_inode_pages [kernel] 0x77 
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1017264/96]
__insmod_ext3_s.text_l40820 [ext3] 0x49f0 
[<d0814a50>] __insmod_ext3_s.text_l40820 [ext3] 0x49f0 
[vmtruncate+357/380] vmtruncate [kernel] 0x165 
[<c012282d>] vmtruncate [kernel] 0x165 
[generic_file_write+1501/1520] generic_file_write [kernel] 0x5dd 
[<c01274bd>] generic_file_write [kernel] 0x5dd 
[tulip:__insmod_tulip_o/lib/modules/2.4.9-13/kernel/drivers/net/tu+-1029941/96]
__insmod_ext3_s.text_l40820 [ext3] 0x186b 
[<d08118cb>] __insmod_ext3_s.text_l40820 [ext3] 0x186b 
[sys_write+149/204] sys_write [kernel] 0x95 
[<c0131bd9>] sys_write [kernel] 0x95 
[system_call+51/56] system_call [kernel] 0x33 
[<c0106e03>] system_call [kernel] 0x33 


Code: 0f 0b 59 5e 8b 4c 24 24 8b 41 04 48 8b 54 24 24 89 42 04 8b 


I'm not convinced that klogd did a very good job of figuring out the
addresses.

The machine (remote to me) appears to have hung on reboot; it may be at
a "your filesystem is very hosed, please fsck by hand prompt" but I do
not currently have anyone who can look to tell me; I only know for sure
that I have lost remote access.

Comment 1 Michael K. Johnson 2002-01-25 15:49:13 UTC
Created attachment 43494 [details]
The ksyms file corresponding to the original BUG() message

Comment 2 Michael K. Johnson 2002-01-25 15:53:08 UTC
Created attachment 43495 [details]
Oops from after reboot, trying to access the same filesystem (/home)

Comment 3 Michael K. Johnson 2002-01-25 16:00:42 UTC
Created attachment 43496 [details]
ksyms corresponding to the oops after reboot

Comment 4 Michael K. Johnson 2002-01-25 16:06:55 UTC
Finally, after oops on reboot, I did
touch /forcefsck
and rebooted again, and got one error message regarding /home:

fsck: /boot: 46/13832 files (4.3% non-contiguous), 9777/55296 blocks
fsck: /home: recovering journal
fsck: Inode 32775, i_blocks is 2147778496, should be 83381808.  FIXED.
fsck: /home: 17658/5881856 files (0.9% non-contiguous), 11165842/11753520 blocks

32775, at least now, is the .xsession-errors file from the user who was
logged in at the time of the original BUG().  I believe that has not
changed.  The file has now been truncated by restarting X, so I do not
have the broken contents.


For all I know, this is an instance of hardware failure, but I wanted to
record it in case it provides useful data later relative to some other
bug report.  I am therefore putting it in NEEDINFO state.

Comment 5 dirk Ketels 2002-06-17 11:26:25 UTC
I've got the same soft of problem.

I was testing a system with lucifer writing in the root filesystem.

At the moment of the problem it is quite possible that the root-filesystem ( on 
which lucifer was running)  was getting full.

I had to fsck to get out of the problems ( forgot to save the output ).

We will rerun lucifer on another filisystem ( /var ) that is nog getting full

I was just adding this to give more info.


Jun 16 17:25:34 dwalin3 kernel: Assertion failure in do_get_write_access() at 
transaction.c:590: "handle->h_buffer_credits > 0"
Jun 16 17:25:34 dwalin3 kernel: ------------[ cut here ]------------
Jun 16 17:25:34 dwalin3 kernel: kernel BUG at transaction.c:590!
Jun 16 17:25:34 dwalin3 kernel: invalid operand: 0000
Jun 16 17:25:34 dwalin3 kernel: CPU:    0
Jun 16 17:25:34 dwalin3 kernel: EIP:    0010:
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-
1402422/96]    Not tainted
Jun 16 17:25:34 dwalin3 kernel: EIP:    0010:[<c88009ca>]    Not tainted
Jun 16 17:25:34 dwalin3 kernel: EFLAGS: 00010296
Jun 16 17:25:34 dwalin3 kernel: eax: 00000021   ebx: 00000000   ecx: 00000001   
edx: 00001d61
Jun 16 17:25:34 dwalin3 kernel: esi: c6e261e0   edi: c1548ce0   ebp: c06a12e0   
esp: c7917d9c
Jun 16 17:25:34 dwalin3 kernel: ds: 0018   es: 0018   ss: 0018
Jun 16 17:25:34 dwalin3 kernel: Process lucifer (pid: 8784, stackpage=c7917000)
Jun 16 17:25:34 dwalin3 kernel: Stack: c88091b0 0000024e 00000000 00000000 
0006d953 c1483000 c6e26ae0 c1483094 
Jun 16 17:25:34 dwalin3 kernel:        c1483000 c36e9320 c06a12e0 c8800eb6 
c36e9320 c06a12e0 00000000 c1548ce0 
Jun 16 17:25:34 dwalin3 kernel:        c147e400 c36e9320 c5c79c80 c8813468 
c36e9320 c1548ce0 00000000 c8800284 
Jun 16 17:25:34 dwalin3 kernel: Call Trace: 
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1367632/96] 
__insmod_jbd_S.rodata_L96 [jbd] 0x25c0 
Jun 16 17:25:34 dwalin3 kernel: Call Trace: [<c88091b0>] 
__insmod_jbd_S.rodata_L96 [jbd] 0x25c0 
Jun 16 17:25:34 dwalin3 kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-
13/kernel/drivers/net/3c+-1401162/96] journal_get_write_access_R74245737 [jbd] 
0x36 
Jun 16 17:25:34 dwalin3 kernel: [<c8800eb6>] journal_get_write_access_R74245737 
[jbd] 0x36 
Jun 16 17:25:34 dwalin3 kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-
13/kernel/drivers/net/3c+-1325976/96] __insmod_ext3_S.text_L42880 [ext3] 0x7408 
Jun 16 17:25:34 dwalin3 kernel: [<c8813468>] __insmod_ext3_S.text_L42880 [ext3] 
0x7408 
Jun 16 17:25:34 dwalin3 kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-
13/kernel/drivers/net/3c+-1404284/96] __insmod_jbd_S.text_L26736 [jbd] 0x224 
Jun 16 17:25:34 dwalin3 kernel: [<c8800284>] __insmod_jbd_S.text_L26736 [jbd] 
0x224 
Jun 16 17:25:34 dwalin3 kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-
13/kernel/drivers/net/3c+-1404033/96] journal_start_R168f288d [jbd] 0xbf 
Jun 16 17:25:34 dwalin3 kernel: [<c880037f>] journal_start_R168f288d [jbd]0xbf 
Jun 16 17:25:35 dwalin3 kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-
13/kernel/drivers/net/3c+-1335338/96] __insmod_ext3_S.text_L42880 [ext3] 0x4f76 
Jun 16 17:25:35 dwalin3 kernel: [<c8810fd6>] __insmod_ext3_S.text_L42880 [ext3] 
0x4f76 
Jun 16 17:25:35 dwalin3 kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-
13/kernel/drivers/net/3c+-1335314/96] __insmod_ext3_S.text_L42880 [ext3] 0x4f8e 
Jun 16 17:25:35 dwalin3 kernel: [<c8810fee>] __insmod_ext3_S.text_L42880 [ext3] 
0x4f8e 
Jun 16 17:25:35 dwalin3 kernel: [truncate_list_pages+477/496] 
truncate_list_pages [kernel] 0x1dd 
Jun 16 17:25:35 dwalin3 kernel: [<c012797d>] truncate_list_pages [kernel] 0x1dd 
Jun 16 17:25:35 dwalin3 kernel: [truncate_inode_pages+143/160] 
truncate_inode_pages [kernel] 0x8f 
Jun 16 17:25:35 dwalin3 kernel: [<c0127a1f>] truncate_inode_pages [kernel] 0x8f 
Jun 16 17:25:35 dwalin3 kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-
13/kernel/drivers/net/3c+-1335536/96] __insmod_ext3_S.text_L42880 [ext3] 0x4eb0 
Jun 16 17:25:35 dwalin3 kernel: [<c8810f10>] __insmod_ext3_S.text_L42880 [ext3] 
0x4eb0 
Jun 16 17:25:35 dwalin3 kernel: [vmtruncate+361/384] vmtruncate [kernel] 0x169 
Jun 16 17:25:35 dwalin3 kernel: [<c01258a9>] vmtruncate [kernel] 0x169 
Jun 16 17:25:35 dwalin3 kernel: [generic_file_write+1522/1552] 
generic_file_write [kernel] 0x5f2 
Jun 16 17:25:35 dwalin3 kernel: [<c012ac12>] generic_file_write [kernel] 0x5f2 
Jun 16 17:25:35 dwalin3 kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-
13/kernel/drivers/net/3c+-1349118/96] __insmod_ext3_S.text_L42880 [ext3] 0x19a2 
Jun 16 17:25:35 dwalin3 kernel: [<c880da02>] __insmod_ext3_S.text_L42880 [ext3] 
0x19a2 
Jun 16 17:25:35 dwalin3 kernel: [sys_write+150/208] sys_write [kernel] 0x96 
Jun 16 17:25:35 dwalin3 kernel: [<c01365c6>] sys_write [kernel] 0x96 
Jun 16 17:25:35 dwalin3 kernel: [system_call+51/56] system_call [kernel] 0x33 
Jun 16 17:25:35 dwalin3 kernel: [<c0106f3b>] system_call [kernel] 0x33 
Jun 16 17:25:35 dwalin3 kernel: 
Jun 16 17:25:35 dwalin3 kernel: 
Jun 16 17:25:35 dwalin3 kernel: Code: 0f 0b 59 5b 8b 54 24 28 8b 42 04 8b 4c 24 
28 48 89 41 04 6a

Comment 6 Stephen Tweedie 2002-06-17 13:02:49 UTC
I think that what happened here is related to the disk full situation.  Older
versions of ext3 could get their block accounting confused if the disk became
full.  The ext3 truncate code relies on that block accounting to know how much
journal space it needs to reserve to successfully truncate a file.

Having said that, ext3 should really recover from a case where i_blocks is
wrong, but the current kernels should not have that problem when running out of
disk space.