Bug 198659

Summary: slab error in kmem_cache_destroy() on mirror server after creation
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 4CC: agk, dwysocha, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-27 14:56:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2006-07-12 15:45:21 UTC
Description of problem:
I hope to reproduce this issue and gather more information but here is what I
have so far. I was running lvm_config and after creating a mirror, it appeared
that the mirror server hit this kmem_cache_destroy() issue and then both taft-02
and taft-04 attempted to become the new server.

lvm_config output:
creating lvm devices...
make PV out of all avail partitions on taft-04
make VG out of all avail partitions on taft-03
activate VG on taft-03
make needed LVs on taft-02
lvcreate -l 34875 -n mirror_1_7870 -m 1 mirror_1_787

[ Here is where the problem appears to happen ]

REDUCE size of lv from 34875 to 3607 on taft-02
EXTEND size of lv from 3607 to 22464 on taft-01
RENAME lv from /dev/mirror_1_787/mirror_1_7870 to rename_828 on taft-01

REDUCE size of lv from 22464 to 17987 on taft-04
EXTEND size of lv from 17987 to 30406 on taft-03

REDUCE size of lv from 30406 to 23283 on taft-02
EXTEND size of lv from 23283 to 24244 on taft-03
RENAME lv from /dev/mirror_1_787/rename_828 to rename_566 on taft-02

REDUCE size of lv from 24244 to 22147 on taft-03
EXTEND size of lv from 22147 to 22447 on taft-04

REDUCE size of lv from 22447 to 17591 on taft-03
EXTEND size of lv from 17591 to 22852 on taft-03

REDUCE size of lv from 22852 to 907 on taft-03
EXTEND size of lv from 907 to 14926 on taft-02



taft-03 (mirror server) after the creation on taft-02:
Jul 11 11:56:30 taft-03 kernel: device-mapper: I'm the cluster mirror log server
for dtbZMTPS
Jul 11 11:56:30 taft-03 kernel: device-mapper: Disk Resume::
Jul 11 11:56:30 taft-03 dmeventd[4768]: Monitoring mirror device,
mirror_1_787-mirror_1_7870 for events
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Live nodes        :: 1
Jul 11 11:56:30 taft-03 kernel: device-mapper:   In-Use Regions    :: 0
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Good IUR's        :: 0
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Bad IUR's         :: 0
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Sync count        :: 254424
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Disk Region count ::
18446744073709551615
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Region count      :: 279000
Jul 11 11:56:30 taft-03 kernel: device-mapper:   NOTE:  Mapping has changed.
Jul 11 11:56:30 taft-03 kernel: device-mapper: Marked regions::
Jul 11 11:56:30 taft-03 kernel: device-mapper:   0 - 24575
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Total = 24576
Jul 11 11:56:30 taft-03 kernel: device-mapper: Out-of-sync regions::
Jul 11 11:56:30 taft-03 kernel: device-mapper:   0 - 24575
Jul 11 11:56:30 taft-03 kernel: device-mapper:   Total = 24576
Jul 11 11:56:46 taft-03 hald[3232]: Timed out waiting for hotplug event 529.
Rebasing to 517
Jul 11 11:56:50 taft-03 kernel: slab error in kmem_cache_destroy(): cache
`dm-io-bio': Can't free all objects
Jul 11 11:56:50 taft-03 kernel:
Jul 11 11:56:50 taft-03 kernel: Call
Trace:<ffffffff8016238f>{kmem_cache_destroy+202}
<ffffffffa0093cba>{:dm_mod:bio_set_exit+34}
Jul 11 11:56:50 taft-03 kernel:       
<ffffffffa0093e49>{:dm_mod:resize_pool+61} <ffffffffa00928c9>{:dm_mod:dev_remove+0}
Jul 11 11:56:50 taft-03 kernel:       
<ffffffffa00953b8>{:dm_mod:kcopyd_client_destroy+146}
Jul 11 11:56:50 taft-03 kernel:        <ffffffff801ea575>{__up_write+20}
<ffffffffa00d8a11>{:dm_mirror:mirror_dtr+101}
Jul 11 11:56:50 taft-03 kernel:       
<ffffffffa009090d>{:dm_mod:table_destroy+72} <ffffffffa0090020>{:dm_mod:dm_put+149}
Jul 11 11:56:50 taft-03 kernel:       
<ffffffffa009297d>{:dm_mod:dev_remove+180} <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600}
Jul 11 11:56:50 taft-03 kernel:        <ffffffff8018ad41>{sys_ioctl+853}
<ffffffff8011024a>{system_call+126}
Jul 11 11:56:50 taft-03 kernel:
Jul 11 11:56:55 taft-03 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 11:56:55 taft-03 kernel: device-mapper: Log context can not be found for
request
Jul 11 11:56:55 taft-03 kernel: device-mapper: Log context can not be found for
request
Jul 11 11:57:05 taft-03 kernel: device-mapper: Cluster mirror log server is
shutting down.


taft-02 then becomes the mirror server:
Jul 11 11:57:38 taft-02 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 11:57:39 taft-02 kernel: device-mapper: server tells us it no longer
controls the log
Jul 11 11:57:44 taft-02 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 11:57:44 taft-02 kernel: device-mapper: unable to notify server of
completed resync work
Jul 11 11:57:49 taft-02 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 11:57:49 taft-02 kernel: device-mapper: I'm the cluster mirror log server
for dtbZMTPS
Jul 11 11:57:49 taft-02 kernel: device-mapper: Disk Resume::
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Live nodes        :: 1
Jul 11 11:57:49 taft-02 kernel: device-mapper:   In-Use Regions    :: 0
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Good IUR's        :: 0
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Bad IUR's         :: 0
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Sync count        :: 254708
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Disk Region count :: 279000
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Region count      :: 279000
Jul 11 11:57:49 taft-02 kernel: device-mapper: Marked regions::
Jul 11 11:57:49 taft-02 kernel: device-mapper:   284 - 24575
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Total = 24292
Jul 11 11:57:49 taft-02 kernel: device-mapper: Out-of-sync regions::
Jul 11 11:57:49 taft-02 kernel: device-mapper:   284 - 24575
Jul 11 11:57:49 taft-02 kernel: device-mapper:   Total = 24292



taft-04 then also becomes the mirror server:
Jul 11 13:05:22 taft-04 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 13:05:24 taft-04 kernel: device-mapper: I'm the cluster mirror log server
for dtbZMTPS
Jul 11 13:05:24 taft-04 kernel: device-mapper: Disk Resume::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Live nodes        :: 2
Jul 11 13:05:24 taft-04 kernel: device-mapper:   In-Use Regions    :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Good IUR's        :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Bad IUR's         :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Sync count        :: 254708
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Disk Region count :: 279000
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Region count      :: 279000
Jul 11 13:05:24 taft-04 kernel: device-mapper: Marked regions::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   284 - 24575
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Total = 24292
Jul 11 13:05:24 taft-04 kernel: device-mapper: Out-of-sync regions::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   284 - 24575
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Total = 24292
Jul 11 13:05:24 taft-04 kernel: device-mapper: A cluster mirror log member has left.
Jul 11 13:05:24 taft-04 kernel: device-mapper: Clean-up required due to new server
Jul 11 13:05:24 taft-04 kernel: device-mapper:  - Wiping clear region list
Jul 11 13:05:24 taft-04 kernel: device-mapper:  - 1 clear region requests wiped
Jul 11 13:05:24 taft-04 kernel: device-mapper: Disk Resume::
Jul 11 13:05:24 taft-04 kernel: device-mapper:  - Resending all mark region requests
Jul 11 13:05:24 taft-04 kernel: device-mapper: Clean-up complete
Jul 11 13:05:24 taft-04 kernel: device-mapper: Continuing request::
LRT_COMPLETE_RESYNC_WORK
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Live nodes        :: 2
Jul 11 13:05:24 taft-04 kernel: device-mapper:   In-Use Regions    :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Good IUR's        :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Bad IUR's         :: 0
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Sync count        :: 254708
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Disk Region count :: 279000
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Region count      :: 279000
Jul 11 13:05:24 taft-04 kernel: device-mapper: Marked regions::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   284 - 24575
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Total = 24292
Jul 11 13:05:24 taft-04 kernel: device-mapper: Out-of-sync regions::
Jul 11 13:05:24 taft-04 kernel: device-mapper:   284 - 24575
Jul 11 13:05:24 taft-04 kernel: device-mapper:   Total = 24292
Jul 11 13:05:34 taft-04 kernel: device-mapper: Cluster mirror log server is
shutting down.


Version-Release number of selected component (if applicable):
[root@taft-04 ~]# rpm -q device-mapper
device-mapper-1.02.07-3.0.RHEL4
[root@taft-04 ~]# rpm -q cmirror
cmirror-1.0.1-0
[root@taft-04 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-10.0
[root@taft-04 ~]# rpm -q lvm2
lvm2-2.02.06-4.0.RHEL4
[root@taft-04 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.06-4.0.RHEL4


How reproducible:
once once so far

Comment 1 Jonathan Earl Brassow 2006-07-13 14:29:35 UTC
How are you resizing the mirror?

(Original mirror is 500M)
[root@tng1-1 ~]# lvresize -L 800M vg/lv
  Extending 2 mirror images.
  Mirrors cannot be resized while active yet.
[root@tng1-1 ~]# lvresize -L 400M vg/lv
  Mirrors cannot be resized while active yet.


Comment 2 Corey Marthaler 2006-07-13 15:02:50 UTC
I deactivate them first, then do it.

Comment 3 Corey Marthaler 2006-07-18 19:51:42 UTC
Updated reproduction on kernel 2.6.9-42.BRsmp:

Jul 18 09:54:26 taft-04 dmeventd[4406]: Monitoring mirror device,
mirror_1_7505-mirror1 for events
Jul 18 09:54:26 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:54:26 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:54:27 taft-04 kernel: device-mapper: 2000 out-of-sync regions
remaining for IhV5Xlh2.
Jul 18 09:54:59 taft-04 kernel: device-mapper: Cluster mirror log server is
shutting down.
Jul 18 09:55:10 taft-04 kernel: device-mapper: I'm the cluster mirror log server
for lYxvWdXF
Jul 18 09:55:10 taft-04 kernel: device-mapper: Disk Resume::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Live nodes        :: 1
Jul 18 09:55:10 taft-04 kernel: device-mapper:   In-Use Regions    :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Good IUR's        :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Bad IUR's         :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Sync count        :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Disk Region count ::
18446744073709551615
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Region count      :: 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   NOTE:  Mapping has changed.
Jul 18 09:55:10 taft-04 kernel: device-mapper: Marked regions::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:55:10 taft-04 dmeventd[4406]: Monitoring mirror device,
mirror_1_7505-mirror1 for events
Jul 18 09:55:10 taft-04 kernel: device-mapper: Out-of-sync regions::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:55:15 taft-04 kernel: device-mapper: 2000 out-of-sync regions
remaining for lYxvWdXF.
slab error in kmem_cache_destroy(): cache `dm-io-bio': Can't free all objects

Call Trace:<ffffffff80162443>{kmem_cache_destroy+202}
<ffffffffa0093cba>{:dm_mod:bio_set_exit+34}
       <ffffffffa0093e70>{:dm_mod:resize_pool+100}
<ffffffffa00928c9>{:dm_mod:dev_remove+0}
       <ffffffffa0095424>{:dm_mod:kcopyd_client_destroy+146}
       <ffffffff801ea645>{__up_write+20}
<ffffffffa00d8a7a>{:dm_mirror:mirror_dtr+101}
       <ffffffffa009090d>{:dm_mod:table_destroy+72}
<ffffffffa0090020>{:dm_mod:dm_put+149}
       <ffffffffa009297d>{:dm_mod:dev_remove+180}
<ffffffffa0093c34>{:dm_mod:ctl_ioctl+600}
       <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8011026a>{system_call+126}

Jul 18 09:55:31 taft-04 kernel: slab error in kmem_cache_destroy(): cache
`dm-io-bio': Can't free as
Jul 18 09:55:31 taft-04 kernel:
Jul 18 09:55:31 taft-04 kernel: Call
Trace:<ffffffff80162443>{kmem_cache_destroy+202} <ffffffffa009
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa0093e70>{:dm_mod:resize_pool+100} <ffffffffa00928c
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa0095424>{:dm_mod:kcopyd_client_destroy+146}
Jul 18 09:55:31 taft-04 kernel:        <ffffffff801ea645>{__up_write+20}
<ffffffffa00d8a7a>{:dm_mir
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa009090d>{:dm_mod:table_destroy+72} <ffffffffa00900
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa009297d>{:dm_mod:dev_remove+180} <ffffffffa0093c34
Jul 18 09:55:31 taft-04 kernel:        <ffffffff8018ae05>{sys_ioctl+853}
<ffffffff8011026a>{system_
Jul 18 09:55:31 taft-04 kernel:
device-mapper: Log context can not be found for request
device-mapper: Log context can not be found for request
Jul 18 09:55:36 taft-04 kernel: device-mapper: A cluster mirror log member has left.
Jul 18 09:55:36 taft-04 kernel: device-mapper: Log context can not be found for
request
Jul 18 09:55:36 taft-04 kernel: device-mapper: Log context can not be found for
request
Jul 18 09:55:41 taft-04 kernel: device-mapper: Cluster mirror log server is
shutting down.
kmem_cache_create: duplicate cache dm-io-bio
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at slab:1453
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: dm_cmirror(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport
autofs4 i2c_dev i2c_d
Pid: 4336, comm: clvmd Not tainted 2.6.9-42.BRsmp
RIP: 0010:[<ffffffff80162edc>] <ffffffff80162edc>{kmem_cache_create+1330}
RSP: 0000:000001021236dbe8  EFLAGS: 00010202
RAX: 000000000000002d RBX: 000001020c4c48b0 RCX: ffffffff803e1ee8
RDX: ffffffff803e1ee8 RSI: 0000000000000246 RDI: ffffffff8048e440
RBP: 000001020c462e80 R08: ffffffff803e1ee8 R09: 000001020c4c48b0
R10: 0000000100000000 R11: 0000000000000004 R12: ffffffff8048e440
R13: 0000000000002000 R14: 0000010000000000 R15: ffffffffa009f8c0
FS:  0000000041401960(005b) GS:ffffffff804e5080(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031fa38fae0 CR3: 0000000000101000 CR4: 00000000000006e0
Process clvmd (pid: 4336, threadinfo 000001021236c000, task 00000102127937f0)
Stack: 0000000000000202 0000010213802380 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 00000102138a8600 00000102170472c0
       0000000000000040 0000000000000100
Call Trace:<ffffffffa0093f23>{:dm_mod:resize_pool+279}
<ffffffffa00952e7>{:dm_mod:kcopyd_client_cre
       <ffffffffa00d8981>{:dm_mirror:mirror_ctr+1119}
<ffffffffa00910bb>{:dm_mod:dm_table_add_targe
       <ffffffffa009329c>{:dm_mod:table_load+225}
<ffffffffa00931bb>{:dm_mod:table_load+0}
       <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600} <ffffffff80181d5b>{sys_newstat+32}
       <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8011026a>{system_call+126}


Code: 0f 0b b2 5d 32 80 ff ff ff ff ad 05 48 8b 1b eb 96 65 48 8b
RIP <ffffffff80162edc>{kmem_cache_create+1330} RSP <000001021236dbe8>
 <0>Kernel panic - not syncing: Oops


Comment 4 Jonathan Earl Brassow 2006-07-18 21:48:34 UTC
have potential fix.  will check in after testing.

Comment 6 Jonathan Earl Brassow 2006-07-26 19:26:02 UTC
It  was possible for a 'dm_io_sync_vm' to do I/O after the kcopyd client was destroyed.  What i've 
learned is that any process doing dm_io_* calls _must_ also do dm_io_get() on initialization and 
dm_io_put() on exit.

diff -u -r1.1.2.9.2.3 -r1.1.2.9.2.4
--- cluster/cmirror-kernel/src/Attic/dm-cmirror-server.c        2006/06/29 19:46:37     1.1.2.9.2.3
+++ cluster/cmirror-kernel/src/Attic/dm-cmirror-server.c        2006/07/19 14:38:20     1.1.2.9.2.4
@@ -1058,6 +1058,7 @@
                DMWARN("Cluster mirror log server thread failed to start");
                return -1;
        }
+       dm_io_get(32);
        return 0;
 }
 
@@ -1066,6 +1067,7 @@
        atomic_set(&server_run, 0);
 
        wait_for_completion(&server_completion);
+       dm_io_put(32);
 }
 /*
  * Overrides for Emacs so that we follow Linus's tabbing style.


Comment 8 Corey Marthaler 2006-08-03 15:44:34 UTC
This issue appears to be fixed.

Comment 10 Alasdair Kergon 2010-04-27 14:56:36 UTC
Assuming this VERIFIED fix got released.  Closing.
Reopen if it's not yet resolved.