Description of problem: The kernel is panicing when running the SPECsfs (NFS) benchmark. The configuration is 4 clients each communicating to the server via GigE with jumbo frames enabled. Each client communicates to an exclusive NIC/subnet on the server. The server is a 4 socket dual core / HT Xeons, with 16GB of memory. The kernel is RHEL4-U3 x86_64 largeSMP since we are at 16 logical CPU's. There are 16 ext3 filesystems, created from storage presented by 2 FC (dual ported) adapters and 4 HP MSA's. Each MSA presents a single large LUN that is partitioned by RHEL into 4 partitions. There are 256 NFS threads running. I have setup netdump but only seem to be able to get the console log of the failure. I'm investigating why we don't get a core file. Running the benchmark with the largeSMP kernel but with 8 logical processors booted succeeds. The console log is in the "Actual results" below. Version-Release number of selected component (if applicable): RHEL4-U3 largeSMP How reproducible: So far every time. It can take an hour or 5 hours. It depends which work load the benchmark is laying down. Steps to Reproduce: 1. See Barry Marson. Procedure is very simple 2. 3. Actual results: Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP: <ffffffffa00aefc6>{:jbd:journal_dirty_metadata+71} PML4 3f0302067 PGD 3f03e4067 PMD 3eeac4067 PTE 0 Oops: 0000 [1] SMP CPU 13 Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 parport_pc lp parport netconsole netdump aut ofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core dm_multipath button battery ac uhci_hcd ehc i_hcd hw_random e1000 tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transpo rt_fc cciss sd_mod scsi_mod Pid: 9262, comm: nfsd Not tainted 2.6.9-34.ELlargesmp RIP: 0010:[<ffffffffa00aefc6>] <ffffffffa00aefc6>{:jbd:journal_dirty_metadata+71} RSP: 0018:000001025d071b58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000010070baa6c0 RCX: 00000000ffffffff RDX: 000000000000000f RSI: 000001021c6a39d0 RDI: 00000101dbe4d580 RBP: 000001021c6a39d0 R08: 0000000000000000 R09: 000001000f10af80 R10: 000001021c6a39d0 R11: 000001021c6a39d0 R12: 0000000000000000 R13: 00000103ff737e00 R14: 00000101dbe4d580 R15: 000001037b38ba08 FS: 0000002a958a0b00(0000) GS:ffffffff804eb100(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000020 CR3: 00000000dfc7a000 CR4: 00000000000006e0 Process nfsd (pid: 9262, threadinfo 000001025d070000, task 00000103fe0cb7f0) Stack: 000001039f334000 0000000000000000 000001037b38bb18 000001037b38bb18 0000000000008000 ffffffffa00c68f5 000001021c6a39d0 000001025d071bd8 00000101dbe4d580 000001037b38bb18 Call Trace:<ffffffffa00c68f5>{:ext3:ext3_mark_iloc_dirty+740} <ffffffffa00c6a47>{:ext3:ext3_mark_inode_dirty+65} <ffffffffa00c4f3a>{:ext3:ext3_new_inode+2867} <ffffffffa00ae3c4>{:jbd:start_this_handle+964} <ffffffff8013347f>{__wake_up+54} <ffffffffa00cafe0>{:ext3:ext3_create+102} <ffffffff80185a77>{vfs_create+214} <ffffffffa0282f89>{:nfsd:nfsd_create_v3+811} <ffffffffa0289bdc>{:nfsd:nfsd3_proc_create+307} <ffffffffa027d7bd>{:nfsd:nfsd_dispatch+219} <ffffffffa019a39e>{:sunrpc:svc_process+1197} <ffffffff801333d8>{default_wake_function+0} <ffffffffa027d2fc>{:nfsd:nfsd+0} <ffffffffa027d534>{:nfsd:nfsd+568} <ffffffff8013212e>{schedule_tail+55} <ffffffff80110e17>{child_rip+8} <ffffffffa027d2fc>{:nfsd:nfsd+0} <ffffffffa027d2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0} Code: 49 39 5c 24 20 75 4b 41 83 7c 24 0c 02 75 43 49 3b 5d 50 0f RIP <ffffffffa00aefc6>{:jbd:journal_dirty_metadata+71} RSP <000001025d071b58> CR2: 0000000000000020 Expected results: Additional info: What ever is failing is not occuring right before the panic. One side effect of running this is benchmark with the 16 logical processors is the rate at which data laydown occurs is significantly diminished. in the 8CPU config, the data rate is about 200MB/sec. With the 16 CPUs it drops to 130MB/sec
Any clue if this is a regression?
QE ack for fixing this in U4. Is significant enough to get resolves after the code freeze in my opinion.
If was going to speculate, I would say no this is not a regression since this is the first time we've tried to run the SPECsfs benchmark. But with that said... on the nahant mailing list a server crash has recently been reported that happen in a different place but in the same journal code... https://www.redhat.com/archives/nahant-list/2006-May/msg00041.html
Created attachment 128685 [details] crash log file for SPECsfs run RHEL4-U3-largeSMP 16CPU 256 nfsd threads ext3
Running with NFSD set to 64 ran to completion but the results are not comparable yet because we dont have enough threads to handle the incoming requests at the high end of the benchmark. The attachement below is a crash log when running with 128 NFSD threads. We got significantly more crash stack data. vmcore-incomplete was created but no data was written. I will try and reduce the threads to a level where I can get a core dump. I successfully ran with ext2 and 256 NFSD threads but the performance was abysmal. Negative scaling (16 CPU) compared to 8CPU run.
Man it's getting late. An early run today that I thought had died due to the benchmark (sometimes happens) actually died from the panic. This time I have a log AND a vmcore. It's 16GB, what do I do with it. This was with 128 NFS threads :) barry
The core file has been pushed to http://ubrew.boston.redhat.com/benchmarks/SPEC/SPECsfs/bugzilla-189508/ Its compressed to under 1GB but expands to the 16GB on the test system. Have verified that is accessible. Barry
Barry, can you reproduce this one? I had the hunch that it was a dup of Bugzilla Bug 199667: ext3 file system crashed in my IA64 box, which is fixed in the latest release. Since you said you could reproduce it at will, should be easy enough to verify that it is fixed. Thanks, -Eric
I have not been able to reproduce this. The system was re-installed this past summer and even though the bits were technically the same, the problem which use to be easily recreated no longer fails. Nor does this fail with RHEL4-U4. I'm closing this since I cannot recreate it. Barry
worksforbarry :)