Created attachment 427451 [details] full dmesg output shortly before machine chrashed Description of problem: Load average increases up to system crash when using sander.mpi(http://ambermd.org/) in an sun grid engine open mpi environment. Version-Release number of selected component (if applicable): 2.6.32.12-115.fc12.x86_64 How reproducible: hard Steps to Reproduce: 1. Install Sun Grid Engine 6.2u3-3.fc12 with processing environment Open MPI 1.4.1-4.fc12 and Amber10. 2. Submit the job mpirun sander.mpi 3. Wait a week Actual results: kernel bug, see attachment Expected results: Load average remains in normal range (< 20 on an 8 core machine).
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: kernel BUG at fs/ext4/inode.c:1852!
------------[ cut here ]------------ kernel BUG at fs/ext4/inode.c:1852! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu15/topology/physical_package_id CPU 0 Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 coretemp adm1021 ipmi_si ipmi_msghandler sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6table_filter ip6_tables ipv6 dm_multipath igb i2c_i801 i2c_core ses iTCO_wdt iTCO_vendor_support ioatdma dca joydev enclosure aacraid [last unloaded: microcode] Pid: 7139, comm: sander.MPI Not tainted 2.6.32.14-127.fc12.x86_64 #1 S5520UR RIP: 0010:[<ffffffff8119861d>] [<ffffffff8119861d>] ext4_da_get_block_prep+0xeb/0x244 RSP: 0000:ffff880c4efcdb38 EFLAGS: 00010297 RAX: 0000000000000003 RBX: ffff88065045db60 RCX: 0000000000000154 RDX: 0000000000000004 RSI: 0000000000000003 RDI: 0000000000000153 RBP: ffff880c4efcdb98 R08: ffff88065045db60 R09: 0000000000000000 R10: ffff880c4ef9dd80 R11: 0000000000004000 R12: ffff880c4b4800b0 R13: 0000000000000000 R14: ffff880c4b480000 R15: ffff880c4b480380 FS: 00002af84b28fa40(0000) GS:ffff880017000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002af8502c9000 CR3: 0000000651dc9000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process sander.MPI (pid: 7139, threadinfo ffff880c4efcc000, task ffff880c4ef9dd80) Stack: ffff880c52c2b800 ffffea002aea9260 0000000000000000 ffffffffffff0000 <0> ffff880c4b4800b0 0000000000001000 ffff880c4efcdb98 ffffea002aea9260 <0> ffff880c4efcdbe8 0000000000001000 ffff880c4b4800b0 0000000000000000 Call Trace: [<ffffffff81141df5>] __block_prepare_write+0x133/0x289 [<ffffffff81198532>] ? ext4_da_get_block_prep+0x0/0x244 [<ffffffff810d632f>] ? lock_page+0x29/0x41 [<ffffffff811420cd>] block_write_begin+0x80/0xd2 [<ffffffff8119824f>] ext4_da_write_begin+0x18e/0x21d [<ffffffff81198532>] ? ext4_da_get_block_prep+0x0/0x244 [<ffffffff81191f7c>] ext4_page_mkwrite+0x111/0x162 [<ffffffff810ee3e1>] __do_fault+0x172/0x3f1 [<ffffffff810f054f>] handle_mm_fault+0x35a/0x7bd [<ffffffff810748eb>] ? autoremove_wake_function+0x0/0x39 [<ffffffff81459287>] do_page_fault+0x288/0x2a0 [<ffffffff81457165>] page_fault+0x25/0x30 Code: 48 89 45 a0 4c 89 ff e8 06 e7 2b 00 41 8b b6 70 03 00 00 4c 89 e7 ff c6 e8 a2 bc ff ff 41 8b 96 74 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 08 01 RIP [<ffffffff8119861d>] ext4_da_get_block_prep+0xeb/0x244 RSP <ffff880c4efcdb38> ---[ end trace a69df0c7dea73af5 ]---
line 1852: BUG_ON(mdblocks < EXT4_I(inode)->i_reserved_meta_blocks);
BUG_ON(mdblocks < EXT4_I(inode)->i_reserved_meta_blocks); hmm blast from the past. Are you using quota on this filesystem?
(In reply to comment #4) > BUG_ON(mdblocks < EXT4_I(inode)->i_reserved_meta_blocks); > > hmm blast from the past. > > Are you using quota on this filesystem? Yes, the output was written to a subfolder of /home which has a quota setting of about 20gb(19532M) per user.
Ok, I think this is a bug fixed upstream then. I thought the fixes had made it to 2.6.32.y, but I see that they haven't. As a workaround, disabling quotas, if you are able, should avoid this BUG(). I'll see about getting the fixes into F12 one way or another. Thanks, -Eric
I could build a scratch kernel for you to test, are you interested? I have a collection of patches assembled that should fix the problem. -Eric
Wow, thanks for solving this so fast. I would be very interested in testing.
Ok, let me whip up a scratch kernel, I'll post a link when done.
Ok, give http://kojipkgs.fedoraproject.org/scratch/sandeen/task_2290752/ a try. It's missing kernel-firmware, sorry, but a --nodeps is probably ok there. I booted it and did some sanity tests, but you get to keep both pieces if it breaks, as usual!
Ok, we started testing of the patched kernel on monday. I will write when we have finished, until now everything looks good.
Ok, the patch works great! Thanks for fixing this. Do you know when the fixes will be included into F12?
I've committed, tagged, & built for F12 now. Please also test the version that hits -testing just for sanity, plus it's a more official build :) thanks, -Eric
Created attachment 436193 [details] Crash with kernel-2.6.32.16-151.fc12.x86_64 I installed kernel-2.6.32.16-151.fc12.x86_64 and a few days later the attached Oops started happening. I don't know if it is related to a problem I'm having with Xen (bug #550724) where writes to disk start to hang. But I don't recall seeing these sort of crashes before and I've been having the Xen crashes all year. The problem got worse so I've turned off quotas and gone back to kernel-2.6.32.16-141.fc12.x86_64 for now.
Created attachment 436195 [details] Another more verbose crash with kernel-2.6.32.16-151.fc12.x86_64 Here is another crash shortly after the previous one that has more stacks traces.
Argh. Norman, can you verify that kernel-2.6.32.16-150.fc12 doesn't have this behavior, and -151 does, if you're willing to soak up a couple more oopses?
static inline void dquot_resv_space(struct dquot *dquot, qsize_t number) { dquot->dq_dqb.dqb_rsvspace += number; ffffffff81166d5f: 48 01 81 c0 00 00 00 add %rax,0xc0(%rcx) <-------- oopsed here so we got a null dquot down this path: [<ffffffff81199d02>] vfs_dq_init+0x3f/0x47 [<ffffffff8119d461>] ext4_unlink+0x25/0x1e0 [<ffffffff81045afa>] ? __might_sleep+0x28/0xef [<ffffffff81126a17>] vfs_unlink+0x7a/0xb7 [<ffffffff811263ba>] ? lookup_hash+0x3b/0x3f [<ffffffff8112850b>] do_unlinkat+0xcd/0x15b [<ffffffff8145a3d5>] ? do_page_fault+0x2c2/0x2f2 [<ffffffff810a930d>] ? audit_syscall_entry+0x11e/0x14a [<ffffffff811285af>] sys_unlink+0x16/0x18 [<ffffffff81011d32>] system_call_fastpath+0x16/0x1b hrm, still looking.
Created attachment 436384 [details] kernel-2.6.32.16-150.fc12 crash THis also did happen to 2.6.32.16-141.fc12.x86_64 after I rebooted to it after the previous crashes with kernel-2.6.32.16-151.fc12.x86_64. The full boot session is attached. During this I edited /etc/fstab to disable quotas and rebooted. I figured, probably wrongly, that my quota files were corrupted by kernel-2.6.32.16-151.fc12 and I would eventually have to rebuild them with quotacheck. Note that this system crashes a lot due to the bug I've referenced above. I recall quota warnings previously but nothing as severe as this. I'll see if I can find them in my logs. There are limits to what I can do here. This system is in production and at its busiest time.
Hm looks like the .32 stable series has pulled in the same patches I backported. For sanity's sake would you both mind testing the kernel at: http://kojipkgs.fedoraproject.org/packages/kernel/2.6.32.17/156.fc12/ ? Thanks, -Eric
Norman, we appear to have hit a null dquot down this path in dquot_initialize() if (!inode->i_dquot[cnt]) { inode->i_dquot[cnt] = got[cnt]; got[cnt] = NULL; /* * Make quota reservation system happy if someone * did a write before quota was turned on */ rsv = inode_get_rsv_space(inode); if (unlikely(rsv)) dquot_resv_space(inode->i_dquot[cnt], rsv); } and dquot_resv_space got a null dquot. Note the comment though; this usually happens on the root fs since writes happen at bootup time. Do you have root fs quotas? Thanks, -Eric
Created attachment 436390 [details] fstab No I've never had quotas enabled on the root filesystem. I just noticed however that there are two root filesystem lines in /etc/fstab: /dev/mapper/SYSTEM-root / ext4 defaults,relatime 1 1 /dev/SYSTEM/root / ext4 defaults,relatime 1 1 Don't know if that can cause a problem.
I just recalled something however. Just after 2.6.32.16-141.fc12.x86_64 booted above (I think in this boot) I typed quotaoff -a and the command hung. I then edited /etc/fstab in another terminal and rebooted. Perhaps that explains the write with quota off?
Maybe ... still should not have oopsed :)
Maybe the quota files have been corrupted and need to be rebuilt.
(In reply to comment #24) > Maybe the quota files have been corrupted and need to be rebuilt. OK. The quota files were corrupt, probably due to crashes (bug #550724) and problems with the quota files (bug #578674) I've rebuilt some of my quota files and reenabled quotas on those filesystems and rebooted to 2.6.32.16-141.fc12.x86_64. So far so good. My report probably should have gone to bug #550724 and not here.
kernel-2.6.32.19-162.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/kernel-2.6.32.19-162.fc12
kernel-2.6.32.19-163.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/kernel-2.6.32.19-163.fc12
kernel-2.6.32.19-163.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.32.19-163.fc12
kernel-2.6.32.19-163.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.