Bug 137237
Summary: | kernel crashes under heavy load when using ext3. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Shervin <shervin01> | ||||||
Component: | kernel | Assignee: | Stephen Tweedie <sct> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3 | CC: | barryn, tburke, wtogami | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-11-19 15:04:24 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 135876 | ||||||||
Attachments: |
|
Description
Shervin
2004-10-26 21:04:19 UTC
If you have another box nearby, I'd recommend serial console as a way of getting the oops recorded. Is the crash always the same, with the same backtrace? Have you managed to record any other instances? Thanks. Created attachment 105867 [details]
dmesg log after restart.
Yes the crash us always the same. I have got this message atleast 20 times now
(Everytime that I did this test).
I cannot run any programs so couldn't get any messages other than the one that
you see. I have attached the dmesg log after restart (Hardware reset after a
crash).
OK, I'm looking into this. On the chance that it's an on-disk corruption that is causing the repeatable problems, have you forced a fsck of your filesystems since you started seeing this? Yes. I have done e2fsck and also reinstalled FC3 test3 (Respartitioned with/without Logical Volumes and reformatted) atleast 6-7 times and everytime the result is the same. I have also been in contact with Mr. Markus Lidel (Current maintainer/developer) of the I2O driver for scsi controller but so far we haven't been able to find if I2O is the actual cause or not. I am going to install redhat 9 today to see if the problem is caused by I2O or not. Ok. My hunch was correct. When I disabled SELinux the problem stopped. I was running it in permissive mode. Ran the box for an hour with high load and it did not crash (It usually crashes within 10 minutes). I had this hunch because in dmesg i get: SELinux: initialized (dev i2o/hda1, type ext3), uses xattr and when the crash happens we get: [<f89b3977>] ext3_xattr_set_handle2+0x23d/0x417 [ext3] [<f89b36ed>] ext3_xattr_set_handle+0x6bd/0x728 [ext3] [<f89b3b9b>] ext3_xattr_set+0x4a/0x83 [ext3] [<f89b50ca>] ext3_xattr_security_set+0x3c/0x65 [ext3] [<0216e3af>] generic_setxattr+0x48/0x50 [<0219bdde>] post_create+0x1b7/0x203 [<021601fe>] vfs_create+0xe7/0xef [<02160597>] open_namei +0x177/0x5b8 [<0215320f>] file_open+0x23/0x3c [<02153521>] sys_open+0x31/0x7d Code: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 They are both related to xattr on ext3. So that combination is what's causing the crash. I think you should move this under SELinux category. Well, the filesystem is the one actually oopsing; I need to eliminate that before it's safe to call this an SELinux problem. SELinux is likely just triggering an existing fs problem due to its heavy use of extended attributes (all SELinux file context labels are stored as xattrs on disk). It looks like a genuine ext3 xattr bug --- I'm currently testing a patch, although I haven't been able to reproduce the initial error. Created attachment 105948 [details]
Fix journaling for ext3 xattr updates
Could you please see if this patch fixes your problem? Let me know if you need
me to build you a custom kernel rpm with this patch in place.
I would be great if you could build an RPM for me to test. SRPM and i686 smp kernel is online at http://people.redhat.com/sct/.private/test-kernels/kernel-2.6.9-1.650.2.2ext3test/ Please give this a test, and let me know if you need any other architecture variants. Thanks! Thank you for the RPM. I installed the RPM and enabled SELinux. Under high load it still crashed with the same message. So problem still remains. Hmm. Well, if what I _think_ is going on is going on, it's a missing get_write_access, and I can only see that one path near the failure point that is missing that call. Could you please copy down an example of the oops with the modified kernel so that I can try to work out exactly what path led to this failure? (Everything between the initial "unable to handle..." line until the "Code:" line is useful; the remainder is unlikely to help much.) Thanks. Unable to handle kernel NULL pointer dereference at virtual address 00000014 Printing eip: f885c1d7 *pde = 00004001 Oops: 0000 [#1] SMP Modules linked in: md5 ipv6 autofs4 sunrpc iptable_nat iptable_mangle ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod button battery ac uhci_hcd hw_random e1000 floppy ext3 jbd i2o_block i2o_core sd_mod scsi_mod CPU: 0 EIP 0060:[<f88701d7>] Not tainted VLI EFLAGS: 00010246 (2.6.9-1.650.2.2ext3testsmp) EIP is at journal_dirty_metadata+0x3d/0x155 [jbd] eax: 00000000 ebx: 5caa4980 ecx: 05550a00 edx: ac13f9a0 esi: ac13f9a0 edi: 00000000 ebp: 05550800 esp: ab862de4 ds: 007b es: 007b ss: 0068 Process httpd (pid: 3767, threadinfo=ab862000 task=5ec6ec10) Stack: aa58325c 00000000 00001000 00000000 ac13f9a0 f89b39a2 05550a00 00000000 64a95320 aa58325c ffffff86 00000000 00000007 f7c06420 f6065000 f6064020 f89b36ed f6064000 00000000 64a952dc 64a95250 00000007 00000001 00001000 Call Trace: [<f89b3977>] ext3_xattr_set_handle2+0x268/0x452 [ext3] [<f89b36ed>] ext3_xattr_set_handle+0x6bd/0x728 [ext3] [<f89b3b9b>] ext3_xattr_set+0x4a/0x83 [ext3] [<f89b50ca>] ext3_xattr_security_set+0x3c/0x65 [ext3] [<0216e3af>] generic_setxattr+0x48/0x50 [<0219bdde>] post_create+0x1b7/0x203 [<021601fe>] vfs_create+0xe7/0xef [<02160597>] open_namei +0x177/0x5b8 [<0215320f>] file_open+0x23/0x3c [<02153521>] sys_open+0x31/0x7d Code: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 I'm still looking at the code but can't see any obvious problems. Two things: Have you tried a full fsck of the partition concerned, in case there's an on-disk xattr corruption that is confusing things? Secondly, how exactly are you stressing the server, so that I can try reproducing it here? Thanks. Yes I have reformatted/repartitioned and ran fsck many times. I have also resintalled FC3 test3 many times. But still I am getting hte same result. Here is how I am doing this: I have two computers (1 winodws box and 1 Linux server). From the windows box I start a stress test on a web page (Login page of Squirlmail) using Microsoft web application stress tool. I start it with 40 processes each with 2 threads GETting that page as fast as they can. After about 4-8 minutes it crashes. Here is a run result with SELinux off: Overview =================================================================== Report name: 10/27/2004 3:34:45 PM Run on: 10/27/2004 3:34:45 PM Run length: 02:00:26 Web Application Stress Tool Version:1.1.293.1 Number of test clients: 1 Number of hits: 2906386 Requests per Second: 402.30 Socket Statistics ------------------------------------------------------------------- Socket Connects: 2906458 Total Bytes Sent (in KB): 669847.74 Bytes Sent Rate (in KB/s): 92.72 Total Bytes Recv (in KB): 7268803.27 Bytes Recv Rate (in KB/s): 1006.14 Socket Errors ------------------------------------------------------------------- Connect: 142 Send: 0 Recv: 0 Timeouts: 0 RDS Results ------------------------------------------------------------------- Successful Queries: 0 And here is one with SELinux on: Overview =================================================================== Report name: 11/3/2004 3:57:13 PM Run on: 11/3/2004 3:57:13 PM Run length: 00:06:35 Web Application Stress Tool Version:1.1.293.1 Number of test clients: 1 Number of hits: 145087 Requests per Second: 366.33 Socket Statistics ------------------------------------------------------------------- Socket Connects: 145207 Total Bytes Sent (in KB): 33465.68 Bytes Sent Rate (in KB/s): 84.50 Total Bytes Recv (in KB): 362856.69 Bytes Recv Rate (in KB/s): 916.18 Socket Errors ------------------------------------------------------------------- Connect: 115 Send: 0 Recv: 0 Timeouts: 0 RDS Results ------------------------------------------------------------------- Successful Queries: 0 Note that I stoppped the test manually after the crash happened. OK, there's still clearly a problem here. And it doesn't look like it's nearly as simple or straightforward as the bug addressed by the first patch. I'm wondering if the extended attribute cache itself is causing the problems, and returning us an invalid (or incorrectly-locked) buffer. I'll code up a debug kernel to check for that and to output diagnostic information if we do indeed hit that sort of problem. We will be moving this server to production after friday (2004-11-12), after that I will not be able to run any tests on it. I am sure there are many people out there with the same problem. If you think you can have something by friday I might be able to keep it for a couple of more days to test. Thank you for your efforts. Unfortunately, I've had no other reports of this at all. I'm currently looking at the locking in the extended attribute cache. The transaction logic for the attribute block being modified in your oops trace has gone wrong, but I can't see anything wrong with the journaling itself. So the hypothesis is that the cache is actually returning a buffer from a different transaction --- and the only way that could happen is if it's from a different disk entirely. How heavily does your workload stress different disks? Anyway, if you can't test things beyond Friday then we may not be able to get a proper fix tested for you, but I'd at least like to get a debug kernel tested to see if we can narrow down the problem. I'm building a test kernel that should identify if the attribute cache is indeed returning bogus buffers when the oops occurs. I'm still looking at the xattr code, but in the mean time there's a test kernel up at 2.6.9-1.650.2.3ext3test/i686/kernel-2.6.9-1.650.2.3ext3test.i686.rpm which contains a very simple extra test to check for what looks to be the most likely remaining cause for the failure you're seeing. Sorry, pasted an incomplete URL: http://people.redhat.com/sct/.private/test-kernels/2.6.9-1.650.2.3ext3test/ has both the UP and SMP i686 test kernel. I installed kernel-smp-2.6.9-1.650.2.3ext3test.i686.rpm on Fedora Core 3 (Not test3 but the finall version) with the same settings as before. The crash happened again. I have written down the log. Do you need anything else? here is the log: Unable to handle kernel NULL pointer dereference at virtual address 00000014 Printing eip: f88701d7 *pde = 00004001 Oops: 0000 [#1] SMP Modules linked in: md5 ipv6 autofs4 i2c_dev i2c_core sunrpc iptable_nat iptable_mangle ipt_REJECT ipt_multiport ipt_state ip_conntrack iptable_filter ip_tables dm_mod button battery ac uhci_hcd hw_random e1000 floppy ext3 jbd i2o_block i2o_core sd_mod scsi_mod CPU: 1 EIP 0060:[<f88701d7>] Not tainted VLI EFLAGS: 00010246 (2.6.9-1.650.2.3ext3testsmp) EIP is at journal_dirty_metadata+0x3d/0x155 [jbd] eax: 00000000 ebx: 056a1f00 ecx: f67fb938 edx: f67fb938 esi: f67fb938 edi: 00000000 ebp: 0554d600 esp: f5e30de4 ds: 007b es: 007b ss: 0068 Process httpd (pid: 3247, threadinfo=f5e30000 task=f5f618f0) Stack: f67a741c 00000000 00001000 00000000 f60840f8 f89b3a28 f67fb938 0554d800 00000000 f67a741c ffffff86 00000000 00000007 f79b34a0 f60af000 f60ae020 f89b36ed f60ae000 00000000 f60840b4 f6084028 00000007 00000001 00001000 Call Trace: [<f89b3a28>] ext3_xattr_set_handle2+0x2ee/0x4c8 [ext3] [<f89b36ed>] ext3_xattr_set_handle+0x6db/0x728 [ext3] [<f89b3c4c>] ext3_xattr_set+0x4a/0x83 [ext3] [<f89b517a>] ext3_xattr_security_set+0x3c/0x65 [ext3] [<0216e3ab>] generic_setxattr+0x48/0x50 [<0219bdda>] post_create+0x1b7/0x203 [<021601fa>] vfs_create+0xe7/0xef [<02160593>] open_namei +0x177/0x5b8 [<0215320b>] file_open+0x23/0x3c [<0215351d>] sys_open+0x31/0x7d Code: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 I'm really confused on this one. It's the same code as we've been using on both 2.4 and 2.6 for a while for extended attributes, and I've *never* seen this reported before. And all of the obvious paths to this error are dealt with by the debug tests I added in that 2.3ext3test kernel. Can you characterise the workload a bit better? Are you using multiple disks on the server? What part of the workload is creating files --- we're failing in a create path, but you said your stress test consists mostly of reading web pages? Finally, is there any chance you might be able to reproduce this with the netdump system enabled, so that we can capture a kernel core dump for it? Thanks. Sorry we have already moved this server to production and I won't be able to do any more testing on it. As for the tests, It was a very simple, requests were being made from a windows machine using Microsoft's application stress tool to a php page on the server (login page of squirellmail to be precise). server had 3 harddrives (two 36 GB SCSI raid-0 and one 73 for backup) we were only using the raid-0 (I2O driver). There shouldn't have been any load on the drive other than the apache log (the php page seems pretty static). Nothing else was being done on the server at the time of these tests. Again, I had done the same test many times with SELinux disabled and it never crashed, but when SELinux was turned on it would crash 100% of the time. Let me know if you need more info (I have kept most of the reports from the stress tests). Thank you Thanks, but I've just today received another report of the same problem, so I'm following up there. I'll update this report if I get any further with it. *** This bug has been marked as a duplicate of 138951 *** The bug this is a duplicate of is a private bug, but I'll update this public bug too once it is resolved. We now have an internal reproducer for this problem and I expect that to make it much easier to start digging further into the cause of the bug. |