Bug 198659 - slab error in kmem_cache_destroy() on mirror server after creation
slab error in kmem_cache_destroy() on mirror server after creation
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror (Show other bugs)
4
All Linux
high Severity high
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-12 11:45 EDT by Corey Marthaler
Modified: 2010-04-27 10:56 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-04-27 10:56:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2006-07-12 11:45:21 EDT
Description of problem:
I hope to reproduce this issue and gather more information but here is what I
have so far. I was running lvm_config and after creating a mirror, it appeared
that the mirror server hit this kmem_cache_destroy() issue and then both taft-02
and taft-04 attempted to become the new server.

lvm_config output:
creating lvm devices...
make PV out of all avail partitions on taft-04
make VG out of all avail partitions on taft-03
activate VG on taft-03
make needed LVs on taft-02
lvcreate -l 34875 -n mirror_1_7870 -m 1 mirror_1_787

[ Here is where the problem appears to happen ]

REDUCE size of lv from 34875 to 3607 on taft-02
EXTEND size of lv from 3607 to 22464 on taft-01
RENAME lv from /dev/mirror_1_787/mirror_1_7870 to rename_828 on taft-01

REDUCE size of lv from 22464 to 17987 on taft-04
EXTEND size of lv from 17987 to 30406 on taft-03

REDUCE size of lv from 30406 to 23283 on taft-02
EXTEND size of lv from 23283 to 24244 on taft-03
RENAME lv from /dev/mirror_1_787/rename_828 to rename_566 on taft-02

REDUCE size of lv from 24244 to 22147 on taft-03
EXTEND size of lv from 22147 to 22447 on taft-04

REDUCE size of lv from 22447 to 17591 on taft-03
EXTEND size of lv from 17591 to 22852 on taft-03

REDUCE size of lv from 22852 to 907 on taft-03
EXTEND size of lv from 907 to 14926 on taft-02



taft-03 (mirror server) after the creation on taft-02:
Jul 11 11:56:30 taft-03 kernel: device-mapper: I'm the cluster mirror log server
for dtbZMTPS
Jul 11 11:56:30 taft-03 kernel: device-mapper: Disk Resume::
Jul 11 11:56:30 taft-03 dmeventd[4768]: Monitoring mirror device,
mirror_1_787-mirror_1_7870 for events
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Live nodes        :: 1
Jul 11 11:56:30 taft-03 kernel: device-mapper:   In-Use Regions    :: 0
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Good IUR's        :: 0
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Bad IUR's         :: 0
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Sync count        :: 254424
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Disk Region count ::
18446744073709551615
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Region count      :: 279000
Jul 11 11:56:30 taft-03 kernel: device-mapper:   NOTE:  Mapping has changed.
Jul 11 11:56:30 taft-03 kernel: device-mapper: Marked regions::
Jul 11 11:56:30 taft-03 kernel: device-mapper:   0 - 24575
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Total = 24576
Jul 11 11:56:30 taft-03 kernel: device-mapper: Out-of-sync regions::
Jul 11 11:56:30 taft-03 kernel: device-mapper:   0 - 24575
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Total = 24576
Jul 11 11:56:46 taft-03 hald[3232]: Timed out waiting for hotplug event 529.
Rebasing to 517
Jul 11 11:56:50 taft-03 kernel: slab error in kmem_cache_destroy(): cache
`dm-io-bio': Can't free all objects
Jul 11 11:56:50 taft-03 kernel:
Jul 11 11:56:50 taft-03 kernel: Call
Trace:<ffffffff8016238f>{kmem_cache_destroy+202}
<ffffffffa0093cba>{:dm_mod:bio_set_exit+34}
Jul 11 11:56:50 taft-03 kernel:       
<ffffffffa0093e49>{:dm_mod:resize_pool+61} <ffffffffa00928c9>{:dm_mod:dev_remove+0}
Jul 11 11:56:50 taft-03 kernel:       
<ffffffffa00953b8>{:dm_mod:kcopyd_client_destroy+146}
Jul 11 11:56:50 taft-03 kernel:        <ffffffff801ea575>{__up_write+20}
<ffffffffa00d8a11>{:dm_mirror:mirror_dtr+101}
Jul 11 11:56:50 taft-03 kernel:       
<ffffffffa009090d>{:dm_mod:table_destroy+72} <ffffffffa0090020>{:dm_mod:dm_put+149}
Jul 11 11:56:50 taft-03 kernel:       
<ffffffffa009297d>{:dm_mod:dev_remove+180} <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600}
Jul 11 11:56:50 taft-03 kernel:        <ffffffff8018ad41>{sys_ioctl+853}
<ffffffff8011024a>{system_call+126}
Jul 11 11:56:50 taft-03 kernel:
Jul 11 11:56:55 taft-03 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 11:56:55 taft-03 kernel: device-mapper: Log context can not be found for
request
Jul 11 11:56:55 taft-03 kernel: device-mapper: Log context can not be found for
request
Jul 11 11:57:05 taft-03 kernel: device-mapper: Cluster mirror log server is
shutting down.


taft-02 then becomes the mirror server:
Jul 11 11:57:38 taft-02 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 11:57:39 taft-02 kernel: device-mapper: server tells us it no longer
controls the log
Jul 11 11:57:44 taft-02 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 11:57:44 taft-02 kernel: device-mapper: unable to notify server of
completed resync work
Jul 11 11:57:49 taft-02 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 11:57:49 taft-02 kernel: device-mapper: I'm the cluster mirror log server
for dtbZMTPS
Jul 11 11:57:49 taft-02 kernel: device-mapper: Disk Resume::
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Live nodes        :: 1
Jul 11 11:57:49 taft-02 kernel: device-mapper:   In-Use Regions    :: 0
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Good IUR's        :: 0
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Bad IUR's         :: 0
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Sync count        :: 254708
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Disk Region count :: 279000
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Region count      :: 279000
Jul 11 11:57:49 taft-02 kernel: device-mapper: Marked regions::
Jul 11 11:57:49 taft-02 kernel: device-mapper:   284 - 24575
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Total = 24292
Jul 11 11:57:49 taft-02 kernel: device-mapper: Out-of-sync regions::
Jul 11 11:57:49 taft-02 kernel: device-mapper:   284 - 24575
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Total = 24292



taft-04 then also becomes the mirror server:
Jul 11 13:05:22 taft-04 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 13:05:24 taft-04 kernel: device-mapper: I'm the cluster mirror log server
for dtbZMTPS
Jul 11 13:05:24 taft-04 kernel: device-mapper: Disk Resume::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Live nodes        :: 2
Jul 11 13:05:24 taft-04 kernel: device-mapper:   In-Use Regions    :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Good IUR's        :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Bad IUR's         :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Sync count        :: 254708
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Disk Region count :: 279000
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Region count      :: 279000
Jul 11 13:05:24 taft-04 kernel: device-mapper: Marked regions::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   284 - 24575
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Total = 24292
Jul 11 13:05:24 taft-04 kernel: device-mapper: Out-of-sync regions::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   284 - 24575
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Total = 24292
Jul 11 13:05:24 taft-04 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 13:05:24 taft-04 kernel: device-mapper: Clean-up required due to new server
Jul 11 13:05:24 taft-04 kernel: device-mapper:  - Wiping clear region list
Jul 11 13:05:24 taft-04 kernel: device-mapper:  - 1 clear region requests wiped
Jul 11 13:05:24 taft-04 kernel: device-mapper: Disk Resume::
Jul 11 13:05:24 taft-04 kernel: device-mapper:  - Resending all mark region requests
Jul 11 13:05:24 taft-04 kernel: device-mapper: Clean-up complete
Jul 11 13:05:24 taft-04 kernel: device-mapper: Continuing request::
LRT_COMPLETE_RESYNC_WORK
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Live nodes        :: 2
Jul 11 13:05:24 taft-04 kernel: device-mapper:   In-Use Regions    :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Good IUR's        :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Bad IUR's         :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Sync count        :: 254708
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Disk Region count :: 279000
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Region count      :: 279000
Jul 11 13:05:24 taft-04 kernel: device-mapper: Marked regions::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   284 - 24575
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Total = 24292
Jul 11 13:05:24 taft-04 kernel: device-mapper: Out-of-sync regions::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   284 - 24575
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Total = 24292
Jul 11 13:05:34 taft-04 kernel: device-mapper: Cluster mirror log server is
shutting down.


Version-Release number of selected component (if applicable):
[root@taft-04 ~]# rpm -q device-mapper
device-mapper-1.02.07-3.0.RHEL4
[root@taft-04 ~]# rpm -q cmirror
cmirror-1.0.1-0
[root@taft-04 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-10.0
[root@taft-04 ~]# rpm -q lvm2
lvm2-2.02.06-4.0.RHEL4
[root@taft-04 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.06-4.0.RHEL4


How reproducible:
once once so far
Comment 1 Jonathan Earl Brassow 2006-07-13 10:29:35 EDT
How are you resizing the mirror?

(Original mirror is 500M)
[root@tng1-1 ~]# lvresize -L 800M vg/lv
  Extending 2 mirror images.
  Mirrors cannot be resized while active yet.
[root@tng1-1 ~]# lvresize -L 400M vg/lv
  Mirrors cannot be resized while active yet.
Comment 2 Corey Marthaler 2006-07-13 11:02:50 EDT
I deactivate them first, then do it.
Comment 3 Corey Marthaler 2006-07-18 15:51:42 EDT
Updated reproduction on kernel 2.6.9-42.BRsmp:

Jul 18 09:54:26 taft-04 dmeventd[4406]: Monitoring mirror device,
mirror_1_7505-mirror1 for events
Jul 18 09:54:26 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:54:26 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:54:27 taft-04 kernel: device-mapper: 2000 out-of-sync regions
remaining for IhV5Xlh2.
Jul 18 09:54:59 taft-04 kernel: device-mapper: Cluster mirror log server is
shutting down.
Jul 18 09:55:10 taft-04 kernel: device-mapper: I'm the cluster mirror log server
for lYxvWdXF
Jul 18 09:55:10 taft-04 kernel: device-mapper: Disk Resume::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Live nodes        :: 1
Jul 18 09:55:10 taft-04 kernel: device-mapper:   In-Use Regions    :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Good IUR's        :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Bad IUR's         :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Sync count        :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Disk Region count ::
18446744073709551615
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Region count      :: 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   NOTE:  Mapping has changed.
Jul 18 09:55:10 taft-04 kernel: device-mapper: Marked regions::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:55:10 taft-04 dmeventd[4406]: Monitoring mirror device,
mirror_1_7505-mirror1 for events
Jul 18 09:55:10 taft-04 kernel: device-mapper: Out-of-sync regions::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:55:15 taft-04 kernel: device-mapper: 2000 out-of-sync regions
remaining for lYxvWdXF.
slab error in kmem_cache_destroy(): cache `dm-io-bio': Can't free all objects

Call Trace:<ffffffff80162443>{kmem_cache_destroy+202}
<ffffffffa0093cba>{:dm_mod:bio_set_exit+34}
       <ffffffffa0093e70>{:dm_mod:resize_pool+100}
<ffffffffa00928c9>{:dm_mod:dev_remove+0}
       <ffffffffa0095424>{:dm_mod:kcopyd_client_destroy+146}
       <ffffffff801ea645>{__up_write+20}
<ffffffffa00d8a7a>{:dm_mirror:mirror_dtr+101}
       <ffffffffa009090d>{:dm_mod:table_destroy+72}
<ffffffffa0090020>{:dm_mod:dm_put+149}
       <ffffffffa009297d>{:dm_mod:dev_remove+180}
<ffffffffa0093c34>{:dm_mod:ctl_ioctl+600}
       <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8011026a>{system_call+126}

Jul 18 09:55:31 taft-04 kernel: slab error in kmem_cache_destroy(): cache
`dm-io-bio': Can't free as
Jul 18 09:55:31 taft-04 kernel:
Jul 18 09:55:31 taft-04 kernel: Call
Trace:<ffffffff80162443>{kmem_cache_destroy+202} <ffffffffa009
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa0093e70>{:dm_mod:resize_pool+100} <ffffffffa00928c
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa0095424>{:dm_mod:kcopyd_client_destroy+146}
Jul 18 09:55:31 taft-04 kernel:        <ffffffff801ea645>{__up_write+20}
<ffffffffa00d8a7a>{:dm_mir
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa009090d>{:dm_mod:table_destroy+72} <ffffffffa00900
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa009297d>{:dm_mod:dev_remove+180} <ffffffffa0093c34
Jul 18 09:55:31 taft-04 kernel:        <ffffffff8018ae05>{sys_ioctl+853}
<ffffffff8011026a>{system_
Jul 18 09:55:31 taft-04 kernel:
device-mapper: Log context can not be found for request
device-mapper: Log context can not be found for request
Jul 18 09:55:36 taft-04 kernel: device-mapper: A cluster mirror log member has left.
Jul 18 09:55:36 taft-04 kernel: device-mapper: Log context can not be found for
request
Jul 18 09:55:36 taft-04 kernel: device-mapper: Log context can not be found for
request
Jul 18 09:55:41 taft-04 kernel: device-mapper: Cluster mirror log server is
shutting down.
kmem_cache_create: duplicate cache dm-io-bio
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at slab:1453
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: dm_cmirror(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport
autofs4 i2c_dev i2c_d
Pid: 4336, comm: clvmd Not tainted 2.6.9-42.BRsmp
RIP: 0010:[<ffffffff80162edc>] <ffffffff80162edc>{kmem_cache_create+1330}
RSP: 0000:000001021236dbe8  EFLAGS: 00010202
RAX: 000000000000002d RBX: 000001020c4c48b0 RCX: ffffffff803e1ee8
RDX: ffffffff803e1ee8 RSI: 0000000000000246 RDI: ffffffff8048e440
RBP: 000001020c462e80 R08: ffffffff803e1ee8 R09: 000001020c4c48b0
R10: 0000000100000000 R11: 0000000000000004 R12: ffffffff8048e440
R13: 0000000000002000 R14: 0000010000000000 R15: ffffffffa009f8c0
FS:  0000000041401960(005b) GS:ffffffff804e5080(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031fa38fae0 CR3: 0000000000101000 CR4: 00000000000006e0
Process clvmd (pid: 4336, threadinfo 000001021236c000, task 00000102127937f0)
Stack: 0000000000000202 0000010213802380 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 00000102138a8600 00000102170472c0
       0000000000000040 0000000000000100
Call Trace:<ffffffffa0093f23>{:dm_mod:resize_pool+279}
<ffffffffa00952e7>{:dm_mod:kcopyd_client_cre
       <ffffffffa00d8981>{:dm_mirror:mirror_ctr+1119}
<ffffffffa00910bb>{:dm_mod:dm_table_add_targe
       <ffffffffa009329c>{:dm_mod:table_load+225}
<ffffffffa00931bb>{:dm_mod:table_load+0}
       <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600} <ffffffff80181d5b>{sys_newstat+32}
       <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8011026a>{system_call+126}


Code: 0f 0b b2 5d 32 80 ff ff ff ff ad 05 48 8b 1b eb 96 65 48 8b
RIP <ffffffff80162edc>{kmem_cache_create+1330} RSP <000001021236dbe8>
 <0>Kernel panic - not syncing: Oops
Comment 4 Jonathan Earl Brassow 2006-07-18 17:48:34 EDT
have potential fix.  will check in after testing.
Comment 6 Jonathan Earl Brassow 2006-07-26 15:26:02 EDT
It  was possible for a 'dm_io_sync_vm' to do I/O after the kcopyd client was destroyed.  What i've 
learned is that any process doing dm_io_* calls _must_ also do dm_io_get() on initialization and 
dm_io_put() on exit.

diff -u -r1.1.2.9.2.3 -r1.1.2.9.2.4
--- cluster/cmirror-kernel/src/Attic/dm-cmirror-server.c        2006/06/29 19:46:37     1.1.2.9.2.3
+++ cluster/cmirror-kernel/src/Attic/dm-cmirror-server.c        2006/07/19 14:38:20     1.1.2.9.2.4
@@ -1058,6 +1058,7 @@
                DMWARN("Cluster mirror log server thread failed to start");
                return -1;
        }
+       dm_io_get(32);
        return 0;
 }
 
@@ -1066,6 +1067,7 @@
        atomic_set(&server_run, 0);
 
        wait_for_completion(&server_completion);
+       dm_io_put(32);
 }
 /*
  * Overrides for Emacs so that we follow Linus's tabbing style.
Comment 8 Corey Marthaler 2006-08-03 11:44:34 EDT
This issue appears to be fixed.
Comment 10 Alasdair Kergon 2010-04-27 10:56:36 EDT
Assuming this VERIFIED fix got released.  Closing.
Reopen if it's not yet resolved.

Note You need to log in before you can comment on or make changes to this bug.