Bug 138951 - [RHEL4 beta2] System occasionally hangs while under heavy load on IBM x360
Summary: [RHEL4 beta2] System occasionally hangs while under heavy load on IBM x360
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
: 137237 143020 (view as bug list)
Depends On:
Blocks: 135876
TreeView+ depends on / blocked
 
Reported: 2004-11-12 02:12 UTC by john stultz
Modified: 2007-11-30 22:07 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-01-15 12:22:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
conosle panic image (36.83 KB, image/jpeg)
2004-11-12 22:17 UTC, john stultz
no flags Details
netconsole logs from x360 that hung (3.91 KB, text/plain)
2004-11-15 18:30 UTC, john stultz
no flags Details
netconsole logs from x440 that hung (207.04 KB, text/plain)
2004-11-15 18:34 UTC, john stultz
no flags Details
Fix for mbcache/xattr races (9.07 KB, patch)
2004-12-10 22:24 UTC, Stephen Tweedie
no flags Details | Diff
Fix possible transaction overflow in journal_release_buffer() (1.91 KB, patch)
2004-12-10 22:30 UTC, Stephen Tweedie
no flags Details | Diff

Description john stultz 2004-11-12 02:12:56 UTC
From Bugzilla Helper: 
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.3; Linux) (KHTML, 
like Gecko) 
 
Description of problem: 
Using IBM's "pounder" test (sorry its not distributable), I've been 
seeing hangs w/ RHEL4 Beta2. So far it looks like the hangs have 
occurred only on an IBM x360 and possibly an IBM x440 (although a 
different x440 passed without a problem). 
 
Looking over vmstat/slabinfo/meminfo logs from the test run, I don't 
seen any lowmem exhaustion (lowmem sticks around 2megs). 
 
The hangs are somewhat odd. The system responds to pings but is not 
ssh'able. In atleast one case the keyboard numlock worked and the 
mouse still would move in X, however the GDM application would not 
respond to keypresses. 
 
Version-Release number of selected component (if applicable): 
kernel-smp-2.6.9-1.648_EL 
 
How reproducible: 
Sometimes 
 
Steps to Reproduce: 
1. Install RHEL4 beta2 
2. Run pounder test case 
3. Wait about 2-3 hours 
     
 
Actual Results:  System hangs, but is pingable. X even responds to 
mouse movement, but applications don't seem to respond (GDM doesn't 
get keypresses and displays time of the hang, ssh doesn't work). 
 
Expected Results:  System continues running w/o issue. 
 
Additional info: 
 
I'm working to narrow down the issue to a single test I can 
distribute.

Comment 1 john stultz 2004-11-12 22:17:57 UTC
Created attachment 106613 [details]
conosle panic image

Yes, I am the king of annoying bug reports! Here is a camera capture of the
panic seen on the system. I'm going to try to get a netconsole dump to lessen
my lameness.

Comment 2 john stultz 2004-11-12 22:36:37 UTC
More usable text based output captured from the service processor: 
 
esi: c97fb6fc   edi: 00000000   ebp: f7e8a600   esp: c349ade4                    
ds: 007b   es: 007b   ss: 0068                                                   
Process cp (pid: 27211, threadinfo=c349a000 task=f5115770)                       
Stack: cc115828 00000000 c97fb6fc 00000000 ed3390f8 f88ec9df c97fb6fc 
f7e92200 
       00000000 cc115828 ffffff86 00000000 00000007 f4e0f8f4 ce664000 
ce663020 
       f88ec755 ce663000 00000000 ed3390b4 ed339028 00000007 00000001 
00001000 
Call Trace:                                                                      
 [<f88ec9df>] ext3_xattr_set_handle2+0x23d/0x417 [ext3]                          
 [<f88ec755>] ext3_xattr_set_handle+0x6db/0x728 [ext3]                           
 [<f88ecc03>] ext3_xattr_set+0x4a/0x83 [ext3]                                    
 [<f88ee132>] ext3_xattr_security_set+0x3c/0x83 [ext3]                           
 [<c016f716>] generic_setxattr+0x48/0x50                                         
 [<c019e0ae>] post_create+0x1b7/0x203                                            
 [<c0161116>] vfs_create+0xe7/0xef                                               
 [<c01614af>] open_namei+0x177/0x5b8                                             
 [<c0153e8d>] filp_open+0x23/0x3c                                                
 [<c02bdaa4>] __cond_resched+0x14/0x39                                           
 [<c01b5a5a>] direct_strncpy_from_user+0x3e/0x5d                                 
 [<c015419f>] sys_open+0x31/0x7d                                                 
 [<c02bf487>] syscall_call+0x7/0xb                                               
 [<c02b007b>] cookie_v4_check+0xd9/0x3ca                                         
Code: 04 8b 2b 0f 85 32 01 00 00 f6 45 00 02 0f 85 28 01 00 00 eb 0b 
f3 90 8b 06 
 a9 00 00 08 00 75 f5 f0 0f ba 2e 13 19 c0 85 c0 75 ec <39> 5f 14 75 
37 83 7f 08 
 02 75 31 3b 5d 38 0f 84 e6 00 00 00 68 

Comment 3 john stultz 2004-11-15 18:30:20 UTC
Created attachment 106732 [details]
netconsole logs from x360 that hung

Here are the netconsole logs. Looks like there's two oopses somewhat tangled
together.

Comment 4 john stultz 2004-11-15 18:34:25 UTC
Created attachment 106733 [details]
netconsole logs from x440 that hung

This is from a different box that has seen the hangs as well. This log is
somewhat different, however. No panic, but lots of ext3 errors.

Comment 5 Stephen Tweedie 2004-11-17 20:11:01 UTC
There _is_ a panic in that last report --- but it seems to be from
netconsole.  Might be worth opening that in a separate bugzilla, it's
clearly distinct from the ext3 problems.

I've seen this xattr bug reported only once before, against FC3, but
adding debug code to the kernel there did not help me get any further
with it:

https://bugzilla.redhat.com/beta2/show_bug.cgi?id=137237

This is exactly the same footprint.

Would it be possible for you to capture a netdump for this, please?


Comment 6 Stephen Tweedie 2004-11-17 20:30:12 UTC
Also, could you please characterise the workload that you're using to
recreate this?  Thanks.


Comment 7 john stultz 2004-11-17 23:09:20 UTC
Does the attachment for comment #3 not have what you're asking for? 
 
As far as the workload goes, we're basically doing lots of disk->disk 
copies, NFS->disk copies, and running a large number of dd processes.  

Comment 8 Stephen Tweedie 2004-11-18 15:24:28 UTC
No, the attachment is just an oops log --- I'd like a complete vmcore if
possible, please.


Comment 9 john stultz 2004-11-18 18:59:52 UTC
Ok, I'll need to read up on how to capture netdumps (sorry for the 
confusion). The systems need to be reloaded because they've moved on 
to testing other distro releases, so I'll probably not have this for 
you till next week. 

Comment 11 Tim Burke 2004-11-19 15:04:38 UTC
*** Bug 137237 has been marked as a duplicate of this bug. ***

Comment 14 Stephen Tweedie 2004-11-19 22:57:46 UTC
Andrew Tridgell has reported this under Samba stress loads, and has
added an xattr test option to his dbench stress tool.

cvs -d :pserver:cvs.org:/cvsroot co dbench

and run dbench with the "-x" option.

With this, I was able to reproduce the problem within a dozen or so
dbench cycles.  This greatly reduces the pressure for external help
--- with a local reproducer I should be able to get further into the
problem.


Comment 16 Stephen Tweedie 2004-12-03 17:27:28 UTC
We've got a candidate fix for this.  I'm reviewing that now, and will start
testing on it shortly.

Fortunately, the original problem case is fairly easy to reproduce, so targeted
testing should minimise the risk from this fix.

Comment 17 Stephen Tweedie 2004-12-10 01:19:48 UTC
The initial fix had an easily fixed flaw.

The second version exposed a bug elsewhere in the jbd journaling
buffer-release mechanism (because it allowed the existing xattr code
to make much higher use of the buffer-release code than it was doing
before.)

The buffer-release code has a relatively simple fix too, and that fix
is also needed for a couple of other cases including handling races
when allocating on a heavily fragmented filesystem.  But we've never
seen a report of that case in practice, so in reality it's probably
just going to be needed for this xattr case.  The buffer-release fix
looks obviously correct, but could conceivably trigger other problems,
and could also have some performance impact.  I believe the risk is
low, but it deserves testing.

Anyway, the combined fix survived two 50-process xattr-enabled dbench
runs on two separate disks in parallel for over 13 hours last night,
so the fix is definitely an improvement over the existing code. 
Reassuringly, dbench throughput was not affected in the slightest by
the buffer-release fix.


Comment 19 Stephen Tweedie 2004-12-10 22:24:20 UTC
Created attachment 108362 [details]
Fix for mbcache/xattr races

This is the fix currently being tested.  It is identical to the version tested
last night for 14 hours except for one critical fix found during code review,
plus one other fix for an error path that cannot be hit except when
encountering on-disk corruption.

Comment 20 Stephen Tweedie 2004-12-10 22:30:30 UTC
Created attachment 108363 [details]
Fix possible transaction overflow in journal_release_buffer()

Using the previous patch, Andrew Tridgell was able to trigger a latent problem
in ext3's jbd layer.  

journal_release_buffer() is used by the xattr code to deal with a race
condition --- if a process is looking to share an attribute block but by the
time it has acquired journaling rights the attr block has been deleted, it
would release the buffer again.  

But journal_release_buffer() was not safe in all cases.  If you take write
access to a buffer, then other processes attempting to take write access to the
same buffer were allowed to piggy-back on the original process's credits, and
would not take their own journal buffer credit.  So if the intial process did a
journal_release_buffer(), we'd end up with *no* credits outstanding against a
buffer being journaled, and that mis-accounting can lead to overflowing the
journal.

The fix is to always take a buffer credit in do_get_write_access(), unless the
buffer is already part of the running transaction *AND* is dirty.  The latter
part of that condition was not tested for previously.

The risk is that if you have many processes modifying the same buffer and not
doing journal_release_buffer(), we end up accounting for the same buffer
multiple times; so the pessimistic acocunting may cause us to create shorter
transactions, impacting performance.

Comment 21 Red Hat CVS System 2004-12-11 03:35:11 UTC
Created attachment 108382 [details]
Patch committed to cvs for this bug

Comment 23 Stephen Tweedie 2004-12-16 13:43:13 UTC
*** Bug 143020 has been marked as a duplicate of this bug. ***

Comment 24 Jay Turner 2005-01-06 15:35:43 UTC
IBM, can this issue be closed out?

Comment 25 john stultz 2005-01-15 00:35:54 UTC
I'd say yes, it can be closed out. I've not seen this issue for awhile.

Comment 26 Stephen Tweedie 2005-01-15 12:22:25 UTC
Fixed in final RC.  

There has been a bit more work done on the patch upsteam: the patch
here allowed for the xattr code to avoid the use of ext3's
journal_release code entirely, which AG has done; and it has been
combined with extra patches to allow for in-inode xattr storage.  The
combined patch set has been merged into the -mm tree for later
inclusion in 2.6 mainline.


Note You need to log in before you can comment on or make changes to this bug.