Bug 467040 - dlm kernel reference counting bug
Summary: dlm kernel reference counting bug
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm-kernel
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: beta
Assignee: David Teigland
QA Contact: Cluster QE
URL:
Whiteboard:
: 483361 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-10-15 12:35 UTC by Bryn M. Reeves
Modified: 2018-10-20 02:21 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-18 21:17:16 UTC
Embargoed:


Attachments (Terms of Use)
patch to prevent dlm module unload following emergency shutdown (745 bytes, patch)
2008-10-15 16:26 UTC, Bryn M. Reeves
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1051 0 normal SHIPPED_LIVE dlm-kernel bug-fix update 2009-05-18 21:17:04 UTC

Description Bryn M. Reeves 2008-10-15 12:35:28 UTC
Description of problem:
The dlm module allows itself to be unloaded while there are still objects allocated in its slab caches. This causes the calls to kmem_cache_destroy from cleanup_module to fail, leaving the slab caches present but with pointers to the (now unallocated) vmalloc region where the module was previously loaded:

    WARNING: dlm_emergency_shutdown
    slab error in kmem_cache_destroy(): cache `dlm_lvb/range': Can't free all 
    objects

    Call Trace:<ffffffff8016369f>{kmem_cache_destroy+202} <ffffffffa03bb00d>
    {:dlm:cleanup_module+23}
           <ffffffff8014eaf4>{sys_delete_module+487} <ffffffff801ed2a9>
    {__up_write+20}
    <ffffffff8016e31c>{sys_munmap+94} <ffffffff8011026a>{system_call+126}

If anything then accesses /proc/slabinfo (or something else that will follow those pointers), we'll get an oops:

NET: Unregistered protocol family 30
Unable to handle kernel paging request at ffffffffa03bd8d7 RIP:
<ffffffff801ed647>{strnlen+12}
PML4 103027 PGD 105027 PMD 21ab87067 PTE 0
Oops: 0000 [1] SMP
CPU 9
Modules linked in: mptctl ide_dump scsi_dump diskdump nfsd exportfs lockd nfs_acl netconsole i2c_dev i2c_core ipmi_devintf ipmi_si ipmi_msghandler sunrpc md5 ipv6 zlib_deflate d
m_round_robin dm_multipath button battery ac edac_mc shpchp e1000 bonding(U) sata_nv libata dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 mptscsih mptsas mptspi mptscsi
qla2400 qla2xxx scsi_transport_fc mptbase usb_storage uhci_hcd sd_mod scsi_mod ohci_hcd ehci_hcd
Pid: 23946, comm: agentmgr Not tainted 2.6.9-67.0.15.ELlargesmp
RIP: 0010:[<ffffffff801ed647>] <ffffffff801ed647>{strnlen+12}
RSP: 0018:0000010a1c669d20  EFLAGS: 00010097
RAX: ffffffffa03bd8d7 RBX: 0000010a1c669d78 RCX: 000000000000000a
RDX: 0000010a1c669da8 RSI: fffffffffffffffe RDI: ffffffffa03bd8d7
RBP: ffffffffa03bd8d7 R08: 00000000ffffffff R09: 0000000000000020
R10: 0000010a1c668000 R11: 0000000000000246 R12: 0000010c1ba150cc
R13: 0000000000000011 R14: 0000000000000010 R15: 0000010c1ba15fff
FS:  0000000040401960(0000) GS:ffffffff80506d00(005b) knlGS:00000000f7e17b80
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: ffffffffa03bd8d7 CR3: 000000081cf9c000 CR4: 00000000000006e0
Process agentmgr (pid: 23946, threadinfo 0000010a1c668000, task 00000109bc6e97f0)
Stack: ffffffff801edee9 0000000000000f34 0000010c1ba150cc ffffffff8032b71d
       00000105d80afa00 0000000000000000 ffffffffa03bd8d7 0000000000003575
       00000105d80afa00 0000000000000114
Call Trace:<ffffffff801edee9>{vsnprintf+848} <ffffffff8019847a>{seq_printf+165}
       <ffffffff8015ebca>{__pagevec_free+39} <ffffffff80164dea>{release_pages+366}
       <ffffffff801695d2>{unmap_vmas+1238} <ffffffff8016466b>{s_show+434}
       <ffffffff80197fac>{seq_read+267} <ffffffff8017a730>{vfs_read+207}
       <ffffffff8017a98c>{sys_read+69} <ffffffff801264a7>{cstar_do_call+27}


Code: 80 3f 00 74 11 48 ff ce 48 ff c0 48 83 fe ff 74 05 80 38 00
RIP <ffffffff801ed647>{strnlen+12} RSP <0000010a1c669d20>
CR2: ffffffffa03bd8d7

Version-Release number of selected component (if applicable):
2.6.9-67.0.15.ELlargesmp & corresponding dlm-kernel version

How reproducible:
Unknown, seen once in production. Will attempt to reproduce & add details here.

Steps to Reproduce:
1. Attempt to rmmod dlm while active objects exist in the slab caches
2. Access /proc/slabinfo after unloading dlm

  
Actual results:
After 1. - above slab warning
After 2. - above oops

Expected results:
No warning, no oops. Module will not unload while nr active object > 0.

Additional info:

Comment 2 David Teigland 2008-10-15 14:32:48 UTC
Emergency shutdown implies by definition that things can't be cleaned up
and the node needs to be rebooted.  So the solution is to not try to unload
the dlm module, and just reboot the machine.

Comment 4 Bryn M. Reeves 2008-10-15 15:13:32 UTC
running reboot/shutdown -r on a node that is in this state will trigger the problem:

# grep dlm /etc/init.d/*
/etc/init.d/cman:               # try to load the dlm module
/etc/init.d/cman:               modprobe dlm &> /dev/null
/etc/init.d/cman:               # try to unload dlm module
/etc/init.d/cman:               modprobe -r dlm &>/dev/null

Comment 5 David Teigland 2008-10-15 15:32:41 UTC
An orderly shutdown of cman/dlm/gfs/etc is simply not possible after cman
has been forced out of the cluster, abandoning instances of dlm and gfs in
the kernel.

One solution to allow them to run "reboot" might be for init.d/cman to
recognize that the cluster cannot be shutdown and to just quit without doing anything.

Comment 7 David Teigland 2008-10-15 16:07:27 UTC
When the cluster goes away from underneath dlm and gfs, all expectations
are tossed out the window; in fact we used to call panic() intentionally
when we saw this happen.  The rule of thumb is that when a node has write
access to shared storage, and it discovers it has lost connection/synchronization
with the other nodes in the cluster, it should fail as hard and as fast as
possible.  That's not always necessary, but it should be a common expectation
among people using this kind of software.

Another reason why this is something of a moot point is that the node is
going to be fenced (generally power cycled) very quickly after cman shuts
down -- you often won't have time to even notice the emergency shutdown,
much less do anything about it, before you're power cycled.  If SAN fencing
is being used, then I do have a degree of sympathy for this scenario, and I
think that doing something to init.d/cman to skip all steps in the case of
a failure would be a good thing to do.

Comment 9 Bryn M. Reeves 2008-10-15 16:26:13 UTC
Created attachment 320456 [details]
patch to prevent dlm module unload following emergency shutdown

How about something like this? This patch takes a reference on the DLM module at the top of the emergency shutdown routine (I'm sure there's a cleaner way to prevent the module unloading but I'm not remembering it at the moment and I can't find an example, but it illustrates the idea), preventing the modprobe -r in the initscript from unloading the module (it may throw an error, but that's harmless).

The alternative (kernel fix) seems to be something like keeping a count of all slab allocations and maintaining a !0 reference count on the module whenever this counter is non-zero. I think that could get quite messy & discussing this with Fabio we felt grabbing a reference in dlm_emergency_shutdown() looked cleaner/simpler (although I think it has a narrow race - an unload could arrive between our determining that an emergency shutdown is required and actually taking the reference).

Comment 10 David Teigland 2008-10-15 16:40:31 UTC
I think the patch is excellent.

Comment 11 David Teigland 2009-01-20 20:25:50 UTC
pushed to RHEL4 branch in cluster.git
fe92396a2237644cc75adc85472d332760d57782

Comment 12 Nate Straz 2009-01-30 22:56:50 UTC
*** Bug 483361 has been marked as a duplicate of this bug. ***

Comment 13 Nate Straz 2009-01-30 22:58:13 UTC
As per bug 483361, this caused an unknown symbol problem.  Moving back to assigned for a fix.

Comment 14 David Teigland 2009-01-30 22:59:15 UTC
From bug 483361

While trying to get testing going for RHEL 4.8 with cman and dlm, on loading
the dlm kernel module I got the following error:

FATAL: Error inserting dlm
(/lib/modules/2.6.9-78.19.ELhugemem/kernel/cluster/dlm.ko): Unknown symbol in
module, or unknown parameter (see dmesg)

Version-Release number of selected component (if applicable):
kernel-2.6.9-78.19.EL
dlm-kernel-2.6.9-56.2.el4

Comment 15 Nate Straz 2009-02-02 19:00:25 UTC
We can't check out anything that uses dlm until this is resolved.  That includes all tests with cman cluster configs.

Comment 16 David Teigland 2009-02-02 19:49:48 UTC
commit 89a5f2ec8c20c252296e214cf60d32ad74e32a75
Author: David Teigland <teigland>
Date:   Mon Feb 2 13:46:11 2009 -0600

    dlm-kernel: fix prevent dlm module unload following emergency shutdown
    
    bz 467040
    
    Fix to original commit that incorrectly used module_get() instead
    of __module_get().

Comment 17 David Teigland 2009-02-04 21:10:36 UTC
commit e253ffddacd851318cacf777a0d95bee757ebd1e
Author: David Teigland <teigland>
Date:   Wed Feb 4 15:06:34 2009 -0600

    dlm-kernel: fix2 prevent dlm module unload following emergency shutdown
    
    bz 467040
    
    It seems that dlm_emergency_shutdown() is called even when shutdowns are
    normal/clean, but doesn't do anything.  So, only do the __module_get()
    if dlm_emergency_shutdown finds lockspaces to clear.  Include the number
    of lockspace's cleared in the printk at the end.

Comment 21 errata-xmlrpc 2009-05-18 21:17:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1051.html


Note You need to log in before you can comment on or make changes to this bug.