Bug 137237

Summary: kernel crashes under heavy load when using ext3.
Product: [Fedora] Fedora Reporter: Shervin <shervin01>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: barryn, tburke, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-19 15:04:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 135876    
Attachments:
Description Flags
dmesg log after restart.
none
Fix journaling for ext3 xattr updates none

Description Shervin 2004-10-26 21:04:19 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1

Description of problem:
I am simulating a heavy load on this server (Supermicro SYS-6013P-8+
Dual Xeon 3.0 (HT On) with adaptec 2015s SCSI adapter and 4GB RAM)
after a while (4-8 minutes) it crashes. I have resintalled FC3 Test3
6-7 times (Reformatted and repartitioned) but same thing happens.

The problem only happens when using ext3. If I use ext2 then crash
does not happen. If I run the test with less connections to the server
(Lower server load) this does not happen. Tried runnong it for 4 hours
with about 100-150 connections per second and did not crash, but when
i run it at about 350-400 connections per second it crashes.

When the crash happens nothing works anymore (I can still Shift-PGUP).
and after restart (Hardware) there are no messages in
/var/log/messages. So, I am going to try my best and type the error
message. There might be some spelling errors and some lines might be
missing, sorry about those.

Here we go:


Unable to handle kernel NULL pointer dereference at virtual address
00000014 Printing eip: f885c1d7 *pde = 00004001
Oops: 0000 [#1]
SMP
Modules linked in: i2o_config md5 ipv6 autofs4 sunrpc iptable_nat
iptable_mangle ipt_REJECT ipt_state ip_conntrack iptable_filter
ip_tables button battery ac uh ci_hcd hw_random e1000 floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod i2o_block i2o_core
sd_mod scsi_mod
CPU:	2
EIP	0060:[<f885c1d7>]	Not tainted VLI
EFLAGS: 00010246  (2.6.9-1.640smp)
EIP is at journal_dirty_metadata+0x3d/0x155 [jbd]
eax: 00000000  ebx: f7febf00  ecx: 0556ba00  edx: f5a955c4
esi: f5a955c4  edi: 00000000  ebp: 05571200  esp: 6acebde4
ds: 007b  es: 007b  ss: 0068
Process httpd (pid: 3862, threadinfo=6aceb000 task=6a3838f0)
Stack: 
69d12710 00000000 f5a955c4 00000000 69b64bc0 f89b3977 f5a955c4
0556ba00 00000000 69d12710 ffffff86 00000000 00000007 f5a6aba0
f58d1000 f58d0020 f89b36ed f58d0000 00000000 69b64b7c 69b64af0
00000007 00000001 00001000 

Call Trace: 
[<f89b3977>] ext3_xattr_set_handle2+0x23d/0x417 [ext3] 
[<f89b36ed>] ext3_xattr_set_handle+0x6bd/0x728 [ext3] 
[<f89b3b9b>] ext3_xattr_set+0x4a/0x83 [ext3] 
[<f89b50ca>] ext3_xattr_security_set+0x3c/0x65 [ext3] 
[<0216e3af>] generic_setxattr+0x48/0x50 
[<0219bdde>] post_create+0x1b7/0x203 
[<021601fe>] vfs_create+0xe7/0xef 
[<02160597>] open_namei +0x177/0x5b8 
[<0215320f>] file_open+0x23/0x3c 
[<02153521>] sys_open+0x31/0x7d
Code: <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43 
In_atomic():0 [expected: 0], irqs_disabled():1
[<0211dbf3>] __might_sleep+0x7d/0x8a
[<0214edd9>] rw_vm+0xe5/0x28c
[<f885c1ac>] journal_dirty_metadata+0x12/0x155 [jbd] 
[<f885c1ac>] journal_dirty_metadata+0x12/0x155 [jbd] 
[<0214f23d>] get_user_size+0x30/0x57 
[<f885c1ac>] journal_dirty_metadata+0x12/0x155 [jbd] 
[<021061c4>] show_registers+0x115/0x16c 
[<021061c4>] die+0xdb/0x16b 
[<0212028c>] vprintk+0x136/0x14a 
[<02119997>] do_page_fault+0x421/0x5e7 
[<f885c1d7>] journal_dirty_metadata+0x3d/0x155 [jbd] 
[<f885bcdb>] do_get_write_access+0x516/0x532 [jbd] 
[<022b7b3b>] __crond_resched+0x14/0x39 
[<02155f1a>] __getblk+0x2b/0x49 
[<f89aa36f>] ext3_get_inod_loc+0x4f/0x226 [ext3] 
[<f89aab94>] ext3_do_update_inode+0x2f7/0x31e [ext3] 
[<f885bd1c>] journal_get_write_access+0x25/0x2c [jbd] 
[<02119576>] do_page_fault+oxo/0x5e7 
[<f885c1d7>] journal_dirty_metadata+0x3d/0x155 [jbd] 
[<f89b3977>] ext3_xattr_set_handle2+0x23d/0x417 [ext3] 
[<f89b36ed>] ext3_xattr_set_handle+0x6bd/0x728 [ext3] 
[<f89b3b9b>] ext3_xattr_set+0x4a/0x83 [ext3] 
[<f89b50ca>] ext3_xattr_security_set+0x3c/0x65 [ext3] 
[<0216e3af>] generic_setxattr+0x48/0x50 
[<0219bdde>] post_create+0x1b7/0x203 
[<021601fe>] vfs_create+0xe7/0xef 
[<0215320f>] file_open+0x23/0x3c 
[<02153521>] sys_open+0x31/0x7d 
Bad EIP value.


Version-Release number of selected component (if applicable):
kernel-2.6.9-1.640smp, kernel-2.6.8-1.541smp

How reproducible:
Always

Steps to Reproduce:
1. Install FC3 test3 with ext3
2. Start up apache. and simulate a high load on the server.
    

Actual Results:  It crashes after 4-8 minutes.

Additional info:

Comment 1 Stephen Tweedie 2004-10-27 10:23:30 UTC
If you have another box nearby, I'd recommend serial console as a way
of getting the oops recorded.  Is the crash always the same, with the
same backtrace?  Have you managed to record any other instances?  Thanks.


Comment 2 Shervin 2004-10-27 20:44:00 UTC
Created attachment 105867 [details]
dmesg log after restart.

Yes the crash us always the same. I have got this message atleast 20 times now
(Everytime that I did this test).

I cannot run any programs so couldn't get any messages other than the one that
you see. I have attached the dmesg log after restart (Hardware reset after a
crash).

Comment 3 Stephen Tweedie 2004-10-27 20:46:41 UTC
OK, I'm looking into this.

On the chance that it's an on-disk corruption that is causing the
repeatable problems, have you forced a fsck of your filesystems since
you started seeing this?


Comment 4 Shervin 2004-10-27 20:59:18 UTC
Yes. I have done e2fsck and also reinstalled FC3 test3 (Respartitioned
with/without Logical Volumes and reformatted) atleast 6-7 times and
everytime the result is the same.

I have also been in contact with Mr. Markus Lidel (Current
maintainer/developer) of the I2O driver for scsi controller but so far
we haven't been able to find if I2O is the actual cause or not.

I am going to install redhat 9 today to see if the problem is caused
by I2O or not.

Comment 5 Shervin 2004-10-27 22:13:20 UTC
Ok. My hunch was correct. When I disabled SELinux the problem stopped.
I was running it in permissive mode. Ran the box for an hour with high
load and it did not crash (It usually crashes within 10 minutes).

I had this hunch because in dmesg i get:

SELinux: initialized (dev i2o/hda1, type ext3), uses xattr

and when the crash happens we get:
[<f89b3977>] ext3_xattr_set_handle2+0x23d/0x417 [ext3] 
[<f89b36ed>] ext3_xattr_set_handle+0x6bd/0x728 [ext3] 
[<f89b3b9b>] ext3_xattr_set+0x4a/0x83 [ext3] 
[<f89b50ca>] ext3_xattr_security_set+0x3c/0x65 [ext3] 
[<0216e3af>] generic_setxattr+0x48/0x50 
[<0219bdde>] post_create+0x1b7/0x203 
[<021601fe>] vfs_create+0xe7/0xef 
[<02160597>] open_namei +0x177/0x5b8 
[<0215320f>] file_open+0x23/0x3c 
[<02153521>] sys_open+0x31/0x7d
Code: <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43

They are both related to xattr on ext3. So that combination is what's
causing the crash.

I think you should move this under SELinux category.

Comment 6 Stephen Tweedie 2004-10-28 09:19:56 UTC
Well, the filesystem is the one actually oopsing; I need to eliminate
that before it's safe to call this an SELinux problem.  SELinux is
likely just triggering an existing fs problem due to its heavy use of
extended attributes (all SELinux file context labels are stored as
xattrs on disk). 

Comment 7 Stephen Tweedie 2004-10-29 16:00:58 UTC
It looks like a genuine ext3 xattr bug --- I'm currently testing a
patch, although I haven't been able to reproduce the initial error.


Comment 8 Stephen Tweedie 2004-10-29 16:19:52 UTC
Created attachment 105948 [details]
Fix journaling for ext3 xattr updates

Could you please see if this patch fixes your problem?	Let me know if you need
me to build you a custom kernel rpm with this patch in place.

Comment 9 Shervin 2004-10-31 01:54:24 UTC
I would be great if you could build an RPM for me to test.

Comment 11 Stephen Tweedie 2004-11-03 18:57:03 UTC
SRPM and i686 smp kernel is online at

http://people.redhat.com/sct/.private/test-kernels/kernel-2.6.9-1.650.2.2ext3test/

Please give this a test, and let me know if you need any other
architecture variants.

Thanks!

Comment 12 Shervin 2004-11-03 21:17:49 UTC
Thank you for the RPM. I installed the RPM and enabled SELinux.

Under high load it still crashed with the same message. So problem
still remains. 

Comment 13 Stephen Tweedie 2004-11-03 23:05:05 UTC
Hmm.  Well, if what I _think_ is going on is going on, it's a missing
get_write_access, and I can only see that one path near the failure
point that is missing that call.  

Could you please copy down an example of the oops with the modified
kernel so that I can try to work out exactly what path led to this
failure?  (Everything between the initial "unable to handle..." line
until the "Code:" line is useful; the remainder is unlikely to help much.)

Thanks.

Comment 14 Shervin 2004-11-04 00:12:59 UTC
Unable to handle kernel NULL pointer dereference at virtual address
00000014 
Printing eip: f885c1d7 
*pde = 00004001
Oops: 0000 [#1]
SMP
Modules linked in: md5 ipv6 autofs4 sunrpc iptable_nat
iptable_mangle ipt_REJECT ipt_state ip_conntrack iptable_filter
ip_tables dm_mod button battery ac uhci_hcd hw_random e1000 floppy
ext3 jbd i2o_block i2o_core sd_mod scsi_mod
CPU:	0
EIP	0060:[<f88701d7>]	Not tainted VLI
EFLAGS: 00010246  (2.6.9-1.650.2.2ext3testsmp)
EIP is at journal_dirty_metadata+0x3d/0x155 [jbd]
eax: 00000000  ebx: 5caa4980  ecx: 05550a00  edx: ac13f9a0
esi: ac13f9a0  edi: 00000000  ebp: 05550800  esp: ab862de4
ds: 007b  es: 007b  ss: 0068
Process httpd (pid: 3767, threadinfo=ab862000 task=5ec6ec10)
Stack: 
aa58325c 00000000 00001000 00000000 ac13f9a0 f89b39a2 05550a00
00000000
64a95320 aa58325c ffffff86 00000000 00000007 f7c06420 f6065000 f6064020
f89b36ed f6064000 00000000 64a952dc 64a95250 00000007 00000001 00001000 

Call Trace: 
[<f89b3977>] ext3_xattr_set_handle2+0x268/0x452 [ext3] 
[<f89b36ed>] ext3_xattr_set_handle+0x6bd/0x728 [ext3] 
[<f89b3b9b>] ext3_xattr_set+0x4a/0x83 [ext3] 
[<f89b50ca>] ext3_xattr_security_set+0x3c/0x65 [ext3] 
[<0216e3af>] generic_setxattr+0x48/0x50 
[<0219bdde>] post_create+0x1b7/0x203 
[<021601fe>] vfs_create+0xe7/0xef 
[<02160597>] open_namei +0x177/0x5b8 
[<0215320f>] file_open+0x23/0x3c 
[<02153521>] sys_open+0x31/0x7d
Code: <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43

Comment 15 Stephen Tweedie 2004-11-04 20:40:15 UTC
I'm still looking at the code but can't see any obvious problems.  Two things:

Have you tried a full fsck of the partition concerned, in case there's an
on-disk xattr corruption that is confusing things?

Secondly, how exactly are you stressing the server, so that I can try
reproducing it here? 

Thanks.

Comment 16 Shervin 2004-11-04 21:05:40 UTC
Yes I have reformatted/repartitioned and ran fsck many times. I have
also resintalled FC3 test3 many times. But still I am getting hte same
result.

Here is how I am doing this:
I have two computers (1 winodws box and 1 Linux server). From the
windows box I start a stress test on a web page (Login page of
Squirlmail) using Microsoft web application stress tool. I start it
with 40 processes each with 2 threads GETting that page as fast as
they can. After about 4-8 minutes it crashes.

Here is a run result with SELinux off:
Overview
===================================================================
Report name:                  10/27/2004 3:34:45 PM
Run on:                       10/27/2004 3:34:45 PM
Run length:                   02:00:26

Web Application Stress Tool Version:1.1.293.1

Number of test clients:       1

Number of hits:               2906386
Requests per Second:          402.30

Socket Statistics
-------------------------------------------------------------------
Socket Connects:              2906458
Total Bytes Sent (in KB):     669847.74
Bytes Sent Rate (in KB/s):    92.72
Total Bytes Recv (in KB):     7268803.27
Bytes Recv Rate (in KB/s):    1006.14

Socket Errors
-------------------------------------------------------------------
Connect:                      142
Send:                         0
Recv:                         0
Timeouts:                     0

RDS Results
-------------------------------------------------------------------
Successful Queries:           0



And here is one with SELinux on:

Overview
===================================================================
Report name:                  11/3/2004 3:57:13 PM
Run on:                       11/3/2004 3:57:13 PM
Run length:                   00:06:35

Web Application Stress Tool Version:1.1.293.1

Number of test clients:       1

Number of hits:               145087
Requests per Second:          366.33

Socket Statistics
-------------------------------------------------------------------
Socket Connects:              145207
Total Bytes Sent (in KB):     33465.68
Bytes Sent Rate (in KB/s):    84.50
Total Bytes Recv (in KB):     362856.69
Bytes Recv Rate (in KB/s):    916.18

Socket Errors
-------------------------------------------------------------------
Connect:                      115
Send:                         0
Recv:                         0
Timeouts:                     0

RDS Results
-------------------------------------------------------------------
Successful Queries:           0

Note that I stoppped the test manually after the crash happened.

Comment 17 Stephen Tweedie 2004-11-05 11:20:11 UTC
OK, there's still clearly a problem here.  And it doesn't look like it's nearly
as simple or straightforward as the bug addressed by the first patch.

I'm wondering if the extended attribute cache itself is causing the problems,
and returning us an invalid (or incorrectly-locked) buffer.  I'll code up a
debug kernel to check for that and to output diagnostic information if we do
indeed hit that sort of problem.

Comment 18 Shervin 2004-11-09 21:03:48 UTC
We will be moving this server to production after friday (2004-11-12),
after that I will not be able to run any tests on it. I am sure there
are many people out there with the same problem. If you think you can
have something by friday I might be able to keep it for a couple of
more days to test.

Thank you for your efforts.

Comment 19 Stephen Tweedie 2004-11-09 22:39:32 UTC
Unfortunately, I've had no other reports of this at all.

I'm currently looking at the locking in the extended attribute cache.
 The transaction logic for the attribute block being modified in your
oops trace has gone wrong, but I can't see anything wrong with the
journaling itself.  So the hypothesis is that the cache is actually
returning a buffer from a different transaction --- and the only way
that could happen is if it's from a different disk entirely.

How heavily does your workload stress different disks?

Anyway, if you can't test things beyond Friday then we may not be able
to get a proper fix tested for you, but I'd at least like to get a
debug kernel tested to see if we can narrow down the problem.  I'm
building a test kernel that should identify if the attribute cache is
indeed returning bogus buffers when the oops occurs.

Comment 20 Stephen Tweedie 2004-11-10 15:59:59 UTC
I'm still looking at the xattr code, but in the mean time there's a test kernel
up at

2.6.9-1.650.2.3ext3test/i686/kernel-2.6.9-1.650.2.3ext3test.i686.rpm

which contains a very simple extra test to check for what looks to be the most
likely remaining cause for the failure you're seeing.

Comment 21 Stephen Tweedie 2004-11-10 16:05:14 UTC
Sorry, pasted an incomplete URL:

http://people.redhat.com/sct/.private/test-kernels/2.6.9-1.650.2.3ext3test/

has both the UP and SMP i686 test kernel.


Comment 22 Shervin 2004-11-10 21:39:26 UTC
I installed kernel-smp-2.6.9-1.650.2.3ext3test.i686.rpm on Fedora Core
3 (Not test3 but the finall version) with the same settings as before.
The crash happened again. I have written down the log. Do you need
anything else? 

here is the log:

Unable to handle kernel NULL pointer dereference at virtual address
00000014 
Printing eip: f88701d7 
*pde = 00004001
Oops: 0000 [#1]
SMP
Modules linked in: md5 ipv6 autofs4 i2c_dev i2c_core sunrpc
iptable_nat iptable_mangle ipt_REJECT ipt_multiport ipt_state
ip_conntrack iptable_filter ip_tables dm_mod button battery ac
uhci_hcd hw_random e1000 floppy ext3 jbd i2o_block i2o_core sd_mod
scsi_mod
CPU:	1
EIP	0060:[<f88701d7>]	Not tainted VLI
EFLAGS: 00010246  (2.6.9-1.650.2.3ext3testsmp)
EIP is at journal_dirty_metadata+0x3d/0x155 [jbd]
eax: 00000000  ebx: 056a1f00 ecx: f67fb938 edx: f67fb938
esi: f67fb938 edi: 00000000  ebp: 0554d600 esp: f5e30de4
ds: 007b  es: 007b  ss: 0068
Process httpd (pid: 3247, threadinfo=f5e30000 task=f5f618f0)
Stack: 
f67a741c 00000000 00001000 00000000 f60840f8 f89b3a28 f67fb938
0554d800
00000000 f67a741c ffffff86 00000000 00000007 f79b34a0 f60af000 f60ae020
f89b36ed f60ae000 00000000 f60840b4 f6084028 00000007 00000001 00001000 

Call Trace: 
[<f89b3a28>] ext3_xattr_set_handle2+0x2ee/0x4c8 [ext3] 
[<f89b36ed>] ext3_xattr_set_handle+0x6db/0x728 [ext3] 
[<f89b3c4c>] ext3_xattr_set+0x4a/0x83 [ext3] 
[<f89b517a>] ext3_xattr_security_set+0x3c/0x65 [ext3] 
[<0216e3ab>] generic_setxattr+0x48/0x50 
[<0219bdda>] post_create+0x1b7/0x203 
[<021601fa>] vfs_create+0xe7/0xef 
[<02160593>] open_namei +0x177/0x5b8 
[<0215320b>] file_open+0x23/0x3c 
[<0215351d>] sys_open+0x31/0x7d
Code: <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43

Comment 25 Stephen Tweedie 2004-11-16 17:46:26 UTC
I'm really confused on this one.  It's the same code as we've been using on both
2.4 and 2.6 for a while for extended attributes, and I've *never* seen this
reported before.  And all of the obvious paths to this error are dealt with by
the debug tests I added in that 2.3ext3test kernel.

Can you characterise the workload a bit better?  Are you using multiple disks on
the server?  What part of the workload is creating files --- we're failing in a
create path, but you said your stress test consists mostly of reading web pages?

Finally, is there any chance you might be able to reproduce this with the
netdump system enabled, so that we can capture a kernel core dump for it?

Thanks.


Comment 26 Shervin 2004-11-17 07:52:52 UTC
Sorry we have already moved this server to production and I won't be
able to do any more testing on it.

As for the tests, It was a very simple, requests were being made from
a windows machine using Microsoft's application stress tool to a php
page on the server (login page of squirellmail to be precise). server
had 3 harddrives (two 36 GB SCSI raid-0 and one 73 for backup) we were
only using the raid-0 (I2O driver). There shouldn't have been any load
on the drive other than the apache log (the php page seems pretty static).

Nothing else was being done on the server at the time of these tests.
Again, I had done the same test many times with SELinux disabled and
it never crashed, but when SELinux was turned on it would crash 100%
of the time.

Let me know if you need more info (I have kept most of the reports
from the stress tests).

Thank you

Comment 28 Stephen Tweedie 2004-11-17 23:08:01 UTC
Thanks, but I've just today received another report of the same
problem, so I'm following up there.  I'll update this report if I get
any further with it.


Comment 29 Tim Burke 2004-11-19 15:04:24 UTC

*** This bug has been marked as a duplicate of 138951 ***

Comment 30 Stephen Tweedie 2004-11-19 23:22:12 UTC
The bug this is a duplicate of is a private bug, but I'll update this public bug
too once it is resolved.

We now have an internal reproducer for this problem and I expect that to make it
much easier to start digging further into the cause of the bug.