608770 – Load average increases up to system crash when using sander.mpi(http://ambermd.org/) in an sun grid engine open mpi environment

Bug 608770 - Load average increases up to system crash when using sander.mpi(http://ambermd.org/) in an sun grid engine open mpi environment

Summary: Load average increases up to system crash when using sander.mpi(http://amberm...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	12
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Eric Sandeen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-06-28 16:08 UTC by Florian Eggenhofer
Modified:	2010-08-23 22:02 UTC (History)
CC List:	9 users (show)
Fixed In Version:	kernel-2.6.32.19-163.fc12
Clone Of:
Environment:
Last Closed:	2010-08-23 22:02:13 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
full dmesg output shortly before machine chrashed (122.16 KB, text/plain) 2010-06-28 16:08 UTC, Florian Eggenhofer	no flags	Details
Crash with kernel-2.6.32.16-151.fc12.x86_64 (6.74 KB, text/plain) 2010-08-03 06:55 UTC, Norman Gaywood	no flags	Details
Another more verbose crash with kernel-2.6.32.16-151.fc12.x86_64 (66.79 KB, text/plain) 2010-08-03 06:58 UTC, Norman Gaywood	no flags	Details
kernel-2.6.32.16-150.fc12 crash (119.53 KB, text/plain) 2010-08-03 21:10 UTC, Norman Gaywood	no flags	Details
fstab (1.60 KB, text/plain) 2010-08-03 21:44 UTC, Norman Gaywood	no flags	Details
View All

Description Florian Eggenhofer 2010-06-28 16:08:07 UTC

Created attachment 427451 [details]
full dmesg output shortly before machine chrashed

Description of problem:
Load average increases up to system crash when using sander.mpi(http://ambermd.org/) in an sun grid engine open mpi environment.

Version-Release number of selected component (if applicable):
2.6.32.12-115.fc12.x86_64

How reproducible:
hard

Steps to Reproduce:
1. Install Sun Grid Engine 6.2u3-3.fc12 with processing environment
   Open MPI 1.4.1-4.fc12 and Amber10.
2. Submit the job mpirun sander.mpi 
3. Wait a week
  
Actual results:
kernel bug, see attachment

Expected results:
Load average remains in normal range (< 20 on an 8 core machine).

Comment 1 Florian Eggenhofer 2010-06-29 11:18:20 UTC

Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
kernel BUG at fs/ext4/inode.c:1852!

Comment 2 Chuck Ebbert 2010-06-30 20:41:47 UTC

------------[ cut here ]------------
kernel BUG at fs/ext4/inode.c:1852!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/topology/physical_package_id
CPU 0 
Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 coretemp adm1021 ipmi_si ipmi_msghandler sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6table_filter ip6_tables ipv6 dm_multipath igb i2c_i801 i2c_core ses iTCO_wdt iTCO_vendor_support ioatdma dca joydev enclosure aacraid [last unloaded: microcode]
Pid: 7139, comm: sander.MPI Not tainted 2.6.32.14-127.fc12.x86_64 #1 S5520UR
RIP: 0010:[<ffffffff8119861d>]  [<ffffffff8119861d>] ext4_da_get_block_prep+0xeb/0x244
RSP: 0000:ffff880c4efcdb38  EFLAGS: 00010297
RAX: 0000000000000003 RBX: ffff88065045db60 RCX: 0000000000000154
RDX: 0000000000000004 RSI: 0000000000000003 RDI: 0000000000000153
RBP: ffff880c4efcdb98 R08: ffff88065045db60 R09: 0000000000000000
R10: ffff880c4ef9dd80 R11: 0000000000004000 R12: ffff880c4b4800b0
R13: 0000000000000000 R14: ffff880c4b480000 R15: ffff880c4b480380
FS:  00002af84b28fa40(0000) GS:ffff880017000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002af8502c9000 CR3: 0000000651dc9000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sander.MPI (pid: 7139, threadinfo ffff880c4efcc000, task ffff880c4ef9dd80)
Stack:
 ffff880c52c2b800 ffffea002aea9260 0000000000000000 ffffffffffff0000
<0> ffff880c4b4800b0 0000000000001000 ffff880c4efcdb98 ffffea002aea9260
<0> ffff880c4efcdbe8 0000000000001000 ffff880c4b4800b0 0000000000000000
Call Trace:
 [<ffffffff81141df5>] __block_prepare_write+0x133/0x289
 [<ffffffff81198532>] ? ext4_da_get_block_prep+0x0/0x244
 [<ffffffff810d632f>] ? lock_page+0x29/0x41
 [<ffffffff811420cd>] block_write_begin+0x80/0xd2
 [<ffffffff8119824f>] ext4_da_write_begin+0x18e/0x21d
 [<ffffffff81198532>] ? ext4_da_get_block_prep+0x0/0x244
 [<ffffffff81191f7c>] ext4_page_mkwrite+0x111/0x162
 [<ffffffff810ee3e1>] __do_fault+0x172/0x3f1
 [<ffffffff810f054f>] handle_mm_fault+0x35a/0x7bd
 [<ffffffff810748eb>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff81459287>] do_page_fault+0x288/0x2a0
 [<ffffffff81457165>] page_fault+0x25/0x30
Code: 48 89 45 a0 4c 89 ff e8 06 e7 2b 00 41 8b b6 70 03 00 00 4c 89 e7 ff c6 e8 a2 bc ff ff 41 8b 96 74 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 08 01 
RIP  [<ffffffff8119861d>] ext4_da_get_block_prep+0xeb/0x244
 RSP <ffff880c4efcdb38>
---[ end trace a69df0c7dea73af5 ]---

Comment 3 Chuck Ebbert 2010-06-30 21:00:35 UTC

line 1852:
        BUG_ON(mdblocks < EXT4_I(inode)->i_reserved_meta_blocks);

Comment 4 Eric Sandeen 2010-06-30 21:13:56 UTC

        BUG_ON(mdblocks < EXT4_I(inode)->i_reserved_meta_blocks);

hmm blast from the past.

Are you using quota on this filesystem?

Comment 5 Florian Eggenhofer 2010-07-01 16:20:28 UTC

(In reply to comment #4)
>         BUG_ON(mdblocks < EXT4_I(inode)->i_reserved_meta_blocks);
> 
> hmm blast from the past.
> 
> Are you using quota on this filesystem?    

Yes, the output was written to a subfolder of /home which has a quota setting of about 20gb(19532M) per user.

Comment 6 Eric Sandeen 2010-07-01 17:10:37 UTC

Ok, I think this is a bug fixed upstream then.  I thought the fixes had made it to 2.6.32.y, but I see that they haven't.

As a workaround, disabling quotas, if you are able, should avoid this BUG().

I'll see about getting the fixes into F12 one way or another.

Thanks,
-Eric

Comment 7 Eric Sandeen 2010-07-01 21:34:17 UTC

I could build a scratch kernel for you to test, are you interested?

I have a collection of patches assembled that should fix the problem.

-Eric

Comment 8 Florian Eggenhofer 2010-07-02 11:52:16 UTC

Wow, thanks for solving this so fast. I would be very interested in testing.

Comment 9 Eric Sandeen 2010-07-02 17:58:55 UTC

Ok, let me whip up a scratch kernel, I'll post a link when done.

Comment 10 Eric Sandeen 2010-07-02 21:07:15 UTC

Ok, give http://kojipkgs.fedoraproject.org/scratch/sandeen/task_2290752/ a try.  It's missing kernel-firmware, sorry, but a --nodeps is probably ok there.

I booted it and did some sanity tests, but you get to keep both pieces if it breaks, as usual!

Comment 11 Florian Eggenhofer 2010-07-07 08:05:39 UTC

Ok, we started testing of the patched kernel on monday. I will write when we have finished, until now everything looks good.

Comment 12 Florian Eggenhofer 2010-07-26 08:09:10 UTC

Ok, the patch works great! Thanks for fixing this. 
Do you know when the fixes will be included into F12?

Comment 13 Eric Sandeen 2010-07-26 17:36:36 UTC

I've committed, tagged, & built for F12 now.  Please also test the version that hits -testing just for sanity, plus it's a more official build :)

thanks,
-Eric

Comment 14 Norman Gaywood 2010-08-03 06:55:45 UTC

Created attachment 436193 [details]
Crash with kernel-2.6.32.16-151.fc12.x86_64

I installed kernel-2.6.32.16-151.fc12.x86_64 and a few days later the attached Oops started happening.

I don't know if it is related to a problem I'm having with Xen (bug #550724) where writes to disk start to hang. But I don't recall seeing these sort of crashes before and I've been having the Xen crashes all year.

The problem got worse so I've turned off quotas and gone back to kernel-2.6.32.16-141.fc12.x86_64 for now.

Comment 15 Norman Gaywood 2010-08-03 06:58:15 UTC

Created attachment 436195 [details]
Another more verbose crash with kernel-2.6.32.16-151.fc12.x86_64

Here is another crash shortly after the previous one that has more stacks traces.

Comment 16 Eric Sandeen 2010-08-03 16:44:10 UTC

Argh.  Norman, can you verify that kernel-2.6.32.16-150.fc12 doesn't have this behavior, and -151 does, if you're willing to soak up a couple more oopses?

Comment 17 Eric Sandeen 2010-08-03 19:38:38 UTC

static inline void dquot_resv_space(struct dquot *dquot, qsize_t number)
{
        dquot->dq_dqb.dqb_rsvspace += number;
ffffffff81166d5f:       48 01 81 c0 00 00 00    add    %rax,0xc0(%rcx)  <-------- oopsed here

so we got a null dquot down this path:

 [<ffffffff81199d02>] vfs_dq_init+0x3f/0x47
 [<ffffffff8119d461>] ext4_unlink+0x25/0x1e0
 [<ffffffff81045afa>] ? __might_sleep+0x28/0xef
 [<ffffffff81126a17>] vfs_unlink+0x7a/0xb7
 [<ffffffff811263ba>] ? lookup_hash+0x3b/0x3f
 [<ffffffff8112850b>] do_unlinkat+0xcd/0x15b
 [<ffffffff8145a3d5>] ? do_page_fault+0x2c2/0x2f2
 [<ffffffff810a930d>] ? audit_syscall_entry+0x11e/0x14a
 [<ffffffff811285af>] sys_unlink+0x16/0x18
 [<ffffffff81011d32>] system_call_fastpath+0x16/0x1b

hrm, still looking.

Comment 18 Norman Gaywood 2010-08-03 21:10:59 UTC

Created attachment 436384 [details]
kernel-2.6.32.16-150.fc12  crash

THis also did happen to 2.6.32.16-141.fc12.x86_64 after I rebooted to it after the previous crashes with kernel-2.6.32.16-151.fc12.x86_64. The full boot session is attached. During this I edited /etc/fstab to disable quotas and rebooted.

I figured, probably wrongly, that my quota files were corrupted by kernel-2.6.32.16-151.fc12 and I would eventually have to rebuild them with quotacheck.

Note that this system crashes a lot due to the bug I've referenced above.

I recall quota warnings previously but nothing as severe as this. I'll see if I can find them in my logs.

There are limits to what I can do here. This system is in production and at its busiest time.

Comment 19 Eric Sandeen 2010-08-03 21:21:47 UTC

Hm looks like the .32 stable series has pulled in the same patches I backported.

For sanity's sake would you both mind testing the kernel at:

http://kojipkgs.fedoraproject.org/packages/kernel/2.6.32.17/156.fc12/

?

Thanks,
-Eric

Comment 20 Eric Sandeen 2010-08-03 21:23:34 UTC

Norman, we appear to have hit a null dquot down this path in dquot_initialize()

                if (!inode->i_dquot[cnt]) {
                        inode->i_dquot[cnt] = got[cnt];
                        got[cnt] = NULL;
                        /*
                         * Make quota reservation system happy if someone
                         * did a write before quota was turned on
                         */
                        rsv = inode_get_rsv_space(inode);
                        if (unlikely(rsv))
                                dquot_resv_space(inode->i_dquot[cnt], rsv);
                }

and dquot_resv_space got a null dquot.

Note the comment though; this usually happens on the root fs since writes happen at bootup time.  Do you have root fs quotas?

Thanks,
-Eric

Comment 21 Norman Gaywood 2010-08-03 21:44:08 UTC

Created attachment 436390 [details]
fstab

No I've never had quotas enabled on the root filesystem.

I just noticed however that there are two root filesystem lines in /etc/fstab:

/dev/mapper/SYSTEM-root /       ext4    defaults,relatime    1 1
/dev/SYSTEM/root	/	ext4	defaults,relatime    1 1

Don't know if that can cause a problem.

Comment 22 Norman Gaywood 2010-08-03 21:55:40 UTC

I just recalled something however. Just after 2.6.32.16-141.fc12.x86_64 booted above (I think in this boot) I typed quotaoff -a and the command hung. I then edited /etc/fstab in another terminal and rebooted.

Perhaps that explains the write with quota off?

Comment 23 Eric Sandeen 2010-08-03 22:08:07 UTC

Maybe ... still should not have oopsed :)

Comment 24 Chuck Ebbert 2010-08-10 15:19:19 UTC

Maybe the quota files have been corrupted and need to be rebuilt.

Comment 25 Norman Gaywood 2010-08-11 01:32:28 UTC

(In reply to comment #24)
> Maybe the quota files have been corrupted and need to be rebuilt.    

OK. The quota files were corrupt, probably due to crashes (bug #550724) and problems with the quota files (bug #578674)

I've rebuilt some of my quota files and reenabled quotas on those filesystems and rebooted to 2.6.32.16-141.fc12.x86_64. So far so good.

My report probably should have gone to bug #550724 and not here.

Comment 26 Fedora Update System 2010-08-18 10:50:25 UTC

kernel-2.6.32.19-162.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/kernel-2.6.32.19-162.fc12

Comment 27 Fedora Update System 2010-08-18 20:45:57 UTC

kernel-2.6.32.19-163.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/kernel-2.6.32.19-163.fc12

Comment 28 Fedora Update System 2010-08-20 02:00:07 UTC

kernel-2.6.32.19-163.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.32.19-163.fc12

Comment 29 Fedora Update System 2010-08-23 22:01:56 UTC

kernel-2.6.32.19-163.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.