86035 – Assertion Failure in do_get_write_access()

Bug 86035 - Assertion Failure in do_get_write_access()

Summary: Assertion Failure in do_get_write_access()

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Stephen Tweedie
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	99517 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-03-12 19:16 UTC by Kris Reilly
Modified:	2007-04-18 16:51 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-10 12:16:46 UTC
Embargoed:

Attachments	(Terms of Use)

Description Kris Reilly 2003-03-12 19:16:15 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
Under heavy load I have been experiencing a 50% failure rate.  The
problem has appeared in machines configured with both SCSI and IDE
drives.  The test configuration in question is the IDE setup.  

We pound the machines with web requests, we generate large logs then we
crunch them.  Crunching is very disk intensive and the drives stop
responding.  Errors that appear in the logs are attached below.

The machines are P4 Xeon 1.2Ghz x 4 with 6GB RAM.  The drives that are
crashing are 120G IDE.  They fail as secondary on IDE0 and also as
primary on IDE1.  They experience the same failure using both ext2 and
ext3.  The machines are running RedHat 7.3, kernel version
2.4.18-18.7x.bigmem.  I have just updated one of the boxes to
2.4.18-24.7x, custom compiling the kernel and leaving out any
unnecessary cruft and am waiting to see when it crashes again. 

My next approach is to use hdparm and/or muck with the proc fs though
the logs seem to suggest that this problem is directly related to
hardware and not operating system limitations.

Does anyone have any suggestions?  

Thanks!
Kris Reilly      

**Disks that are crashing:

http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=703&p_created=1037222838

**Disks crash with this error in the logs:

Message from syslogd@105 at Fri Mar  7 19:10:18 2003 ...
105 kernel: Assertion failure in do_get_write_access() at
transaction.c:737:
"((
(jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0)"

Message from syslogd@103 at Sat Mar  8 06:19:41 2003 ...
103 kernel: Assertion failure in do_get_write_access() at
transaction.c:737:
"((
(jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0)"

**Just before the crash this is what dmesg has:

))
[<c0146534>] bread [kernel] 0x24 (0xd58f3d2c))
[<f881e5a5>] ext3_get_branch [ext3] 0x55 (0xd58f3d50))
[<f880dd6f>] journal_get_write_access_Rsmp_78dc75e5 [jbd] 0x3f
(0xd58f3d68))
[<f881ed55>] ext3_get_block_handle [ext3] 0x205 (0xd58f3d7c))
[<f880e241>] journal_dirty_metadata_Rsmp_fb9ecae4 [jbd] 0x61
(0xd58f3de4))
[<c0146772>] create_buffers [kernel] 0x62 (0xd58f3de8))
[<f881ee7c>] ext3_get_block [ext3] 0x5c (0xd58f3e0c))
[<c0146d19>] __block_prepare_write [kernel] 0xe9 (0xd58f3e2c))
[<f8821555>] ext3_mark_iloc_dirty [ext3] 0x25 (0xd58f3e5c))
[<f8816310>] .rodata.str1.1 [jbd] 0x30 (0xd58f3e6c))
[<c0147675>] block_prepare_write [kernel] 0x25 (0xd58f3e80))
[<f881ee20>] ext3_get_block [ext3] 0x0 (0xd58f3e94))
[<f880d39d>] journal_start_Rsmp_171b1921 [jbd] 0x7d (0xd58f3ea0))
[<f881f3a5>] ext3_prepare_write [ext3] 0xd5 (0xd58f3eb0))
[<f881ee20>] ext3_get_block [ext3] 0x0 (0xd58f3ec0))
[<c01343ed>] generic_file_write [kernel] 0x4ed (0xd58f3ee8))
[<c0156be4>] fcntl_setlk [kernel] 0x1a4 (0xd58f3f3c))
[<f881cc32>] ext3_file_write [ext3] 0x22 (0xd58f3f5c))
[<c01440f6>] sys_write [kernel] 0x96 (0xd58f3f7c))
[<c0152e9d>] sys_fcntl64 [kernel] 0x8d (0xd58f3fac))
[<c0108c73>] system_call [kernel] 0x33 (0xd58f3fc0))


Code: 0f 0b e1 02 f0 62 81 f8 83 c4 14 8b 44 24 34 8b 08 b8 00 e0 
 end_request: I/O error, dev 03:41 (hdb), sector 67895384
EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read
inode block - inode=4243506, block=8486923
end_request: I/O error, dev 03:41 (hdb), sector 181670032
end_request: I/O error, dev 03:41 (hdb), sector 181670040
end_request: I/O error, dev 03:41 (hdb), sector 181670096
end_request: I/O error, dev 03:41 (hdb), sector 181670128
end_request: I/O error, dev 03:41 (hdb), sector 0
EXT3-fs error (device ide0(3,65)) in ext3_reserve_inode_write: IO
failure
end_request: I/O error, dev 03:41 (hdb), sector 18752
end_request: I/O error, dev 03:41 (hdb), sector 37224528
EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read
inode block - inode=2326537, block=4653066
EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read
inode block - inode=2326532, block=4653066
end_request: I/O error, dev 03:41 (hdb), sector 63941840
end_request: I/O error, dev 03:41 (hdb), sector 176160848
EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read
inode block - inode=11010060, block=22020106
end_request: I/O error, dev 03:41 (hdb), sector 181669984
end_request: I/O error, dev 03:41 (hdb), sector 181670016
end_request: I/O error, dev 03:41 (hdb), sector 181670032
end_request: I/O error, dev 03:41 (hdb), sector 181670040
end_request: I/O error, dev 03:41 (hdb), sector 181670096
end_request: I/O error, dev 03:41 (hdb), sector 0
EXT3-fs error (device ide0(3,65)) in ext3_reserve_inode_write: IO
failure
end_request: I/O error, dev 03:41 (hdb), sector 181670016
end_request: I/O error, dev 03:41 (hdb), sector 181670032
end_request: I/O error, dev 03:41 (hdb), sector 181670040

... many more of these end_request errors ...

Version-Release number of selected component (if applicable):
kernel-2.4.18-18.7.xbigmem

How reproducible:
Sometimes

Steps to Reproduce:
1.Use a program called OpenSTA to generate load to the machines
2.Under heavy load, the drives crash in 3 out of 5 boxes.

    

Additional info:

Comment 1 Joe Acosta 2004-01-07 20:14:02 UTC

Not sure if this is related, but I am using redhat advance server and
getting LOTS of disk crashes.  

Below is taken from my logs and I have messages.1 and messages.2 are
fuull of these messages, and messages.3 is where they start.

I've got a dual PIII 600 box that is one of our dev boxes running

Linux swallow.mccue.com 2.4.9-e.35enterprise #1 SMP Tue Dec 23
00:06:16 EST 2003 i686 unknown

######################################
from /var/log/messages.3
######################################
Dec 18 11:42:01 swallow kernel: attempt to access beyond end of device
Dec 18 11:42:01 swallow kernel: 03:09: rw=0, want=970312140,
limit=81923436
Dec 18 11:42:01 swallow kernel: EXT3-fs error (device ide0(3,9)):
ext3_get_inode_loc: unable to read inode block - inode=32769, bloc
k=779448946
Dec 18 11:42:01 swallow kernel: attempt to access beyond end of device
Dec 18 11:42:01 swallow kernel: 03:09: rw=0, want=970312140,
limit=81923436
Dec 18 11:42:01 swallow kernel: EXT3-fs error (device ide0(3,9)):
ext3_get_inode_loc: unable to read inode block - inode=32769, bloc
k=779448946
Dec 18 11:42:01 swallow kernel: EXT3-fs error (device ide0(3,9)) in
ext3_reserve_inode_write: IO failure
Dec 18 11:58:38 swallow kernel: attempt to access beyond end of device
Dec 18 11:58:38 swallow kernel: 03:09: rw=0, want=970312140,
limit=81923436
Dec 18 11:58:38 swallow kernel: EXT3-fs error (device ide0(3,9)):
ext3_get_inode_loc: unable to read inode block - inode=32769, bloc
k=779448946

Comment 2 Stephen Tweedie 2004-09-10 12:16:46 UTC

The "transaction.c:737" assert failure should be fixed in all current
kernels.

The root problem was that ext3's internal debugging was making
assumptions about the "uptodate" flag which are broken in the presence
of IO errors.  These assert failures have been downgraded to warnings
in later kernels.

Comment 3 Stephen Tweedie 2004-09-10 12:23:27 UTC

*** Bug 99517 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.