Bug 206212
| Summary: | kernel oops in cman:process_startdone_barrier_new durring the attempt of many mounts | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
| Component: | cman | Assignee: | Christine Caulfield <ccaulfie> |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4 | CC: | cluster-maint, teigland |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHBA-2007-0135 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2007-05-10 21:22:04 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The immediate cause of this is pretty simple to fix, the sg passed to process_startdone_barrier_new() has a NULL sevent in some cases (the deeper question is why). The function immediately references sev->flags without checking for NULL which leads to the oops above. We now just print an error and return if the sevent is NULL instead of oopsing. cvs commit: Examining . Checking in sm_barrier.c; /cvs/cluster/cluster/cman-kernel/src/Attic/sm_barrier.c,v <-- sm_barrier.c new revision: 1.1.2.2; previous revision: 1.1.2.1 done The reason process_startdone_barrier_new() is being called with a NULL sevent is bug 206193. Just mounted 65 gfs filesytems simultaneously on a 4 node cluster. Marking verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0135.html |
Description of problem: After a fresh reboot of all 4 nodes in the taft cluster, I brought them back up and attempted to mount all 60 GFS on all four nodes simultaneously. This caused taft-01 to panic. Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP: <ffffffffa021c43a>{:cman:process_startdone_barrier_new+7} PML4 20c02e067 PGD 20db06067 PMD 0 Oops: 0002 [1] SMP CPU 2 Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parpord Pid: 4706, comm: cman_serviced Not tainted 2.6.9-42.0.2.ELsmp RIP: 0010:[<ffffffffa021c43a>] <ffffffffa021c43a>{:cman:process_startdone_barrier_new+7} RSP: 0018:0000010212ea3f00 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000010217b00480 RCX: 00000100dfde5800 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000100dfde5800 RBP: ffffffffa021c831 R08: 0000010212ea2000 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000a R12: 00000102128a97c8 R13: 00000000fffffffc R14: 00000102128a97b8 R15: ffffffff8014b4f0 FS: 0000000000000000(0000) GS:ffffffff804e5280(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000020 CR3: 00000000dff9e000 CR4: 00000000000006e0 Process cman_serviced (pid: 4706, threadinfo 0000010212ea2000, task 0000010215cb47f0) Stack: ffffffffa021c5aa 0000000000000000 ffffffffa021c873 0000000000000018 ffffffff8014b4c7 ffffffffffffffff 00000102128a97b8 00000102128a9750 00000100dff613c0 0000000000000216 Call Trace:<ffffffffa021c5aa>{:cman:process_barriers+146} <ffffffffa021c873>{:cman:serviced+66} <ffffffff8014b4c7>{kthread+200} <ffffffff80110f47>{child_rip+8} <ffffffff8014b4f0>{keventd_create_kthread+0} <ffffffff8014b3ff>{kthread+0} <ffffffff80110f3f>{child_rip+0} Code: f0 0f ba 72 20 06 19 c0 85 c0 75 12 48 8b 7a 18 89 f2 48 c7 RIP <ffffffffa021c43a>{:cman:process_startdone_barrier_new+7} RSP <0000010212ea3f00> CR2: 0000000000000020 <0>Kernel panic - not syncing: Oops Version-Release number of selected component (if applicable): [root@taft-01 ~]# uname -ar Linux taft-01 2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 17:57:31 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux [root@taft-01 ~]# rpm -q cman cman-1.0.11-0