Bug 198563

Summary: clvmd panic in dm_mod:resize_pool while doing mirroring creation
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 4CC: agk, bstevens, dwysocha, heinzm, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-27 14:56:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2006-07-11 21:59:59 UTC
Description of problem:
I was running lvm_config which had just created new pvs on taft-02, vgs on
taft-03, and then a mirror on taft-04, and that caused taft-03 to panic. 

kmem_cache_create: duplicate cache dm-io-2
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at slab:1453
invalid operand: 0000 [1] SMP
CPU 1
Modules linked in: dm_cmirror(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport
autofs4 i2c_ded
Pid: 6774, comm: clvmd Not tainted 2.6.9-39.1.ELsmp
RIP: 0010:[<ffffffff80162e28>] <ffffffff80162e28>{kmem_cache_create+1330}
RSP: 0000:0000010212901be8  EFLAGS: 00010202
RAX: 000000000000002b RBX: 0000010217982670 RCX: ffffffff803e1be8
RDX: ffffffff803e1be8 RSI: 0000000000000246 RDI: ffffffff8048e140
RBP: 0000010217982680 R08: ffffffff803e1be8 R09: 0000010217982670
R10: 0000000100000000 R11: 0000000000000004 R12: ffffffff8048e140
R13: 0000000000002000 R14: 0000010000000000 R15: ffffffffa009f844
FS:  0000000041401960(005b) GS:ffffffff804e4e00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000006c1980 CR3: 00000001fffa8000 CR4: 00000000000006e0
Process clvmd (pid: 6774, threadinfo 0000010212900000, task 00000102137cf7f0)
Stack: 0000010217982480 0000000000000202 0000000000000000 0000000000000000
       0000000000000080 ffffffffa009f844 0000000000000100 0000000000000002
       ffffffffa009f820 0000000000000100
Call Trace:<ffffffffa0093f99>{:dm_mod:resize_pool+397}
<ffffffffa009527b>{:dm_mod:kcopyd_clie
       <ffffffffa00d8918>{:dm_mirror:mirror_ctr+1119}
<ffffffffa00910bb>{:dm_mod:dm_table_add
       <ffffffffa009329c>{:dm_mod:table_load+225}
<ffffffffa00931bb>{:dm_mod:table_load+0}
       <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600} <ffffffff80181c97>{sys_newstat+32}
       <ffffffff8018ad41>{sys_ioctl+853} <ffffffff8011024a>{system_call+126}


Code: 0f 0b 6f 5a 32 80 ff ff ff ff ad 05 48 8b 1b eb 96 65 48 8b
RIP <ffffffff80162e28>{kmem_cache_create+1330} RSP <0000010212901be8>
 <0>Kernel panic - not syncing: Oops


Version-Release number of selected component (if applicable):
[root@taft-04 ~]# rpm -q lvm2
lvm2-2.02.06-4.0.RHEL4
[root@taft-04 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.06-4.0.RHEL4
[root@taft-04 ~]# rpm -q device-mapper
device-mapper-1.02.07-3.0.RHEL4
[root@taft-04 ~]# rpm -q cmirror
cmirror-1.0.1-0
[root@taft-04 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-10.0

Comment 1 Corey Marthaler 2006-07-12 16:48:16 UTC
Reproduced this on taft-01. Here's a more complete stack trace:

kmem_cache_create: duplicate cache dm-io-bio
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at slab:1453
invalid operand: 0000 [1] SMP
CPU 2
Modules linked in: dm_cmirror(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport
autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core buttond
Pid: 6828, comm: clvmd Not tainted 2.6.9-39.1.ELsmp
RIP: 0010:[<ffffffff80162e28>] <ffffffff80162e28>{kmem_cache_create+1330}
RSP: 0018:00000102136e3be8  EFLAGS: 00010202
RAX: 000000000000002d RBX: 0000010037df06b0 RCX: ffffffff803e1be8
RDX: ffffffff803e1be8 RSI: 0000000000000246 RDI: ffffffff8048e140
RBP: 0000010217827380 R08: ffffffff803e1be8 R09: 0000010037df06b0
R10: 0000000100000000 R11: 0000000000000004 R12: ffffffff8048e140
R13: 0000000000002000 R14: 0000010000000000 R15: ffffffffa009f7c0
FS:  0000000041401960(005b) GS:ffffffff804e4e80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9556c000 CR3: 00000000dff9e000 CR4: 00000000000006e0
Process clvmd (pid: 6828, threadinfo 00000102136e2000, task 0000010216bf4030)
Stack: 0000000000000202 00000102152f6600 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000010214673e00 000001021747b520
       0000000000000040 0000000000000100
Call Trace:<ffffffffa0093ee7>{:dm_mod:resize_pool+219}
<ffffffffa009527b>{:dm_mod:kcopyd_client_create+598}
       <ffffffffa00d8918>{:dm_mirror:mirror_ctr+1119}
<ffffffffa00910bb>{:dm_mod:dm_table_add_target+328}
       <ffffffffa009329c>{:dm_mod:table_load+225}
<ffffffffa00931bb>{:dm_mod:table_load+0}
       <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600} <ffffffff80181c97>{sys_newstat+32}
       <ffffffff8018ad41>{sys_ioctl+853} <ffffffff8011024a>{system_call+126}


Code: 0f 0b 6f 5a 32 80 ff ff ff ff ad 05 48 8b 1b eb 96 65 48 8b
RIP <ffffffff80162e28>{kmem_cache_create+1330} RSP <00000102136e3be8>
 <0>Kernel panic - not syncing: Oops


Comment 2 Jonathan Earl Brassow 2006-07-12 18:07:23 UTC
This is most likely a kernel bug.

We'll need to determine the average user impact.  If it's really unlikely,
perhaps it can be pushed to RHEL4U5.

It be interesting to know if this affects single machine mirrors as well...


Comment 3 Jonathan Earl Brassow 2006-07-12 20:52:23 UTC
Any idea as to the number of mirrors that where present at the time?


Comment 4 Corey Marthaler 2006-07-12 21:06:14 UTC
just one mirror was present.

Comment 5 Corey Marthaler 2006-07-17 18:36:04 UTC
Just a note that I am still seeing this quite frequently on the taft cluster
while doing clustered mirroring operations. I'm still trying to pinpoint what
exact operation is causing this and whether or not the average user will see
this panic. I'd like to be able to reproduce this on another cluster as well. 

These are the rpms that I'm running with:
[root@taft-04 ~]# uname -ar
Linux taft-04 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64
x86_64 GNU/Linux
[root@taft-04 ~]# rpm -q lvm2
lvm2-2.02.06-6.0.RHEL4
[root@taft-04 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.06-6.0.RHEL4
[root@taft-04 ~]# rpm -q device-mapper
device-mapper-1.02.07-4.0.RHEL4
[root@taft-04 ~]# rpm -q cmirror
cmirror-1.0.1-0
[root@taft-04 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-10.2


Comment 6 Corey Marthaler 2006-07-17 19:42:27 UTC
I can reproduce this with just the following loop:

while true; do lvcreate -L 1G -n mirror1 -m 1 group; sleep 5; lvremove -f
/dev/group/mirror1; done


Comment 7 Corey Marthaler 2006-07-18 19:50:50 UTC
Updated reproduction on kernel 2.6.9-42.BRsmp:

Jul 18 09:54:26 taft-04 dmeventd[4406]: Monitoring mirror device,
mirror_1_7505-mirror1 for events
Jul 18 09:54:26 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:54:26 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:54:27 taft-04 kernel: device-mapper: 2000 out-of-sync regions
remaining for IhV5Xlh2.
Jul 18 09:54:59 taft-04 kernel: device-mapper: Cluster mirror log server is
shutting down.
Jul 18 09:55:10 taft-04 kernel: device-mapper: I'm the cluster mirror log server
for lYxvWdXF
Jul 18 09:55:10 taft-04 kernel: device-mapper: Disk Resume::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Live nodes        :: 1
Jul 18 09:55:10 taft-04 kernel: device-mapper:   In-Use Regions    :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Good IUR's        :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Bad IUR's         :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Sync count        :: 0
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Disk Region count ::
18446744073709551615
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Region count      :: 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   NOTE:  Mapping has changed.
Jul 18 09:55:10 taft-04 kernel: device-mapper: Marked regions::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:55:10 taft-04 dmeventd[4406]: Monitoring mirror device,
mirror_1_7505-mirror1 for events
Jul 18 09:55:10 taft-04 kernel: device-mapper: Out-of-sync regions::
Jul 18 09:55:10 taft-04 kernel: device-mapper:   0 - 2048
Jul 18 09:55:10 taft-04 kernel: device-mapper:   Total = 2048
Jul 18 09:55:15 taft-04 kernel: device-mapper: 2000 out-of-sync regions
remaining for lYxvWdXF.
slab error in kmem_cache_destroy(): cache `dm-io-bio': Can't free all objects

Call Trace:<ffffffff80162443>{kmem_cache_destroy+202}
<ffffffffa0093cba>{:dm_mod:bio_set_exit+34}
       <ffffffffa0093e70>{:dm_mod:resize_pool+100}
<ffffffffa00928c9>{:dm_mod:dev_remove+0}
       <ffffffffa0095424>{:dm_mod:kcopyd_client_destroy+146}
       <ffffffff801ea645>{__up_write+20}
<ffffffffa00d8a7a>{:dm_mirror:mirror_dtr+101}
       <ffffffffa009090d>{:dm_mod:table_destroy+72}
<ffffffffa0090020>{:dm_mod:dm_put+149}
       <ffffffffa009297d>{:dm_mod:dev_remove+180}
<ffffffffa0093c34>{:dm_mod:ctl_ioctl+600}
       <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8011026a>{system_call+126}

Jul 18 09:55:31 taft-04 kernel: slab error in kmem_cache_destroy(): cache
`dm-io-bio': Can't free as
Jul 18 09:55:31 taft-04 kernel:
Jul 18 09:55:31 taft-04 kernel: Call
Trace:<ffffffff80162443>{kmem_cache_destroy+202} <ffffffffa009
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa0093e70>{:dm_mod:resize_pool+100} <ffffffffa00928c
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa0095424>{:dm_mod:kcopyd_client_destroy+146}
Jul 18 09:55:31 taft-04 kernel:        <ffffffff801ea645>{__up_write+20}
<ffffffffa00d8a7a>{:dm_mir
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa009090d>{:dm_mod:table_destroy+72} <ffffffffa00900
Jul 18 09:55:31 taft-04 kernel:       
<ffffffffa009297d>{:dm_mod:dev_remove+180} <ffffffffa0093c34
Jul 18 09:55:31 taft-04 kernel:        <ffffffff8018ae05>{sys_ioctl+853}
<ffffffff8011026a>{system_
Jul 18 09:55:31 taft-04 kernel:
device-mapper: Log context can not be found for request
device-mapper: Log context can not be found for request
Jul 18 09:55:36 taft-04 kernel: device-mapper: A cluster mirror log member has left.
Jul 18 09:55:36 taft-04 kernel: device-mapper: Log context can not be found for
request
Jul 18 09:55:36 taft-04 kernel: device-mapper: Log context can not be found for
request
Jul 18 09:55:41 taft-04 kernel: device-mapper: Cluster mirror log server is
shutting down.
kmem_cache_create: duplicate cache dm-io-bio
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at slab:1453
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: dm_cmirror(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport
autofs4 i2c_dev i2c_d
Pid: 4336, comm: clvmd Not tainted 2.6.9-42.BRsmp
RIP: 0010:[<ffffffff80162edc>] <ffffffff80162edc>{kmem_cache_create+1330}
RSP: 0000:000001021236dbe8  EFLAGS: 00010202
RAX: 000000000000002d RBX: 000001020c4c48b0 RCX: ffffffff803e1ee8
RDX: ffffffff803e1ee8 RSI: 0000000000000246 RDI: ffffffff8048e440
RBP: 000001020c462e80 R08: ffffffff803e1ee8 R09: 000001020c4c48b0
R10: 0000000100000000 R11: 0000000000000004 R12: ffffffff8048e440
R13: 0000000000002000 R14: 0000010000000000 R15: ffffffffa009f8c0
FS:  0000000041401960(005b) GS:ffffffff804e5080(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031fa38fae0 CR3: 0000000000101000 CR4: 00000000000006e0
Process clvmd (pid: 4336, threadinfo 000001021236c000, task 00000102127937f0)
Stack: 0000000000000202 0000010213802380 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 00000102138a8600 00000102170472c0
       0000000000000040 0000000000000100
Call Trace:<ffffffffa0093f23>{:dm_mod:resize_pool+279}
<ffffffffa00952e7>{:dm_mod:kcopyd_client_cre
       <ffffffffa00d8981>{:dm_mirror:mirror_ctr+1119}
<ffffffffa00910bb>{:dm_mod:dm_table_add_targe
       <ffffffffa009329c>{:dm_mod:table_load+225}
<ffffffffa00931bb>{:dm_mod:table_load+0}
       <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600} <ffffffff80181d5b>{sys_newstat+32}
       <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8011026a>{system_call+126}


Code: 0f 0b b2 5d 32 80 ff ff ff ff ad 05 48 8b 1b eb 96 65 48 8b
RIP <ffffffff80162edc>{kmem_cache_create+1330} RSP <000001021236dbe8>
 <0>Kernel panic - not syncing: Oops


Comment 8 Jonathan Earl Brassow 2006-07-18 21:48:31 UTC
have potential fix.  will check in after testing.

Comment 9 Corey Marthaler 2006-08-03 15:22:51 UTC
This issue appears to be fixed.

Comment 11 Alasdair Kergon 2010-04-27 14:56:19 UTC
Assuming this VERIFIED fix got released.  Closing.
Reopen if it's not yet resolved.