Bug 235686

Summary:	Kernel BUG at dm_cmirror_server while recovering region
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Corey Marthaler <cmarthal>
Component:	cmirror	Assignee:	Jonathan Earl Brassow <jbrassow>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4	CC:	agk, cfeist, dwysocha, jbrassow, kanderso, mbroz, prockai
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-08-05 21:41:53 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Corey Marthaler 2007-04-09 16:27:16 UTC

Description of problem:
I was running a standard cmirror failure scenario and while the recovering was
taking place, link-08 died:

Senario: Kill primary leg of synced 2 leg mirror

****** Mirror hash info for this scenario ******
* name:      fail_primary_synced_2_legs
* sync:      1
* disklog:   1
* failpv:    /dev/sdb1
* legs:      2
* pvs:       /dev/sdb1 /dev/sdf1 /dev/sde1
************************************************

Creating mirror on link-02...
qarsh root@link-02 lvcreate -m 1 -n fail_primary_synced_2_legs -L 800M
helter_skelter /dev/sdb1:0-500 /dev/sdf1:0-500 /dev/sde1:0-50

Waiting for mirror to sync
Verifying that the mirror is fully syncd, currently at
 ...6.50% ...17.50% ...28.50% ...39.00% ...49.50% ...60.50% ...70.50% ...81.00%
...91.50% ...100.00%

Creating gfs on top of mirror on link-02...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-02...
Mounting gfs on link-02...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-04...
Mounting gfs on link-04...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-07...
Mounting gfs on link-07...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-08...
Mounting gfs on link-08...

Disabling device sdb on link-02
Disabling device sda on link-04
Disabling device sdb on link-07
Disabling device sdb on link-08

Attempting I/O to cause mirror down conversion on link-02
10+0 records in
10+0 records out
Verifying the down conversion from mirror to linear
  /dev/sdb1: open failed: No such device or address
[HANG]



LINK-04:
GFS: fsid=LINK_128:gfs.3: Joined cluster. Now mounting FS...
GFS: fsid=LINK_128:gfs.3: jid=3: Trying to acquire journal lock...
GFS: fsid=LINK_128:gfs.3: jid=3: Looking at journal...
GFS: fsid=LINK_128:gfs.3: jid=3: Done
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: Joined cluster. Now
mounting FS...
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Trying to
acquire journal lock...
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Looking at
journal...
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Done
Apr  9 06:23:23 link-08 qarshd[5509]: That's enough
Apr  9 06:23:28 link-08 qarshd[5541]: Talking to peer 10.15.80.47:35237
Apr  9 06:23:28 link-08 qarshd[5541]: Running cmdline: echo offline >
/sys/block/sdb/device/state
Apr  9 06:23:28 link-08 qarshd[5541]: That's enough
dm-cmirror: Recovering region mismatch: (5/18446744073709551615)
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at dm_cmirror_server:703
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) dm_cmirror(U) dlm(U)
cman(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket
pcmcia_core button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc
mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 5467, comm: cluster_log_ser Not tainted 2.6.9-50.ELsmp
RIP: 0010:[<ffffffffa025be6e>]
<ffffffffa025be6e>{:dm_cmirror:cluster_log_serverd+3721}
RSP: 0018:000001001a805e28  EFLAGS: 00010212
RAX: 0000000000000044 RBX: 000001003d66c920 RCX: ffffffff803e5768
RDX: ffffffff803e5768 RSI: 0000000000000246 RDI: ffffffff803e5760
RBP: 000001003c1f2680 R08: ffffffff803e5768 R09: 000001003d66c920
R10: 0000000100000000 R11: ffffffff8011e9a4 R12: 000001001f789400
R13: ffffffffa02617a0 R14: 0000000000000001 R15: 000001003c1f26a0
FS:  0000002a95562b00(0000) GS:ffffffff804ee200(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031d2f2f8d8 CR3: 0000000000101000 CR4: 00000000000006e0
Process cluster_log_ser (pid: 5467, threadinfo 000001001a804000, task
000001001b6887f0)
Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 000001003b5434c0 000001003b5434c0
       000001001a805eb8 0000000000000010
Call Trace:<ffffffff80134660>{default_wake_function+0}
<ffffffff80110f47>{child_rip+8}
       <ffffffffa025afe5>{:dm_cmirror:cluster_log_serverd+0}
       <ffffffff80110f3f>{child_rip+0}

Code: 0f 0b 52 e1 25 a0 ff ff ff ff bf 02 49 c7 44 24 40 ff ff ff
RIP <ffffffffa025be6e>{:dm_cmirror:cluster_log_serverd+3721} RSP <000001001a805e28>
 <0>Kernel panic - not syncing: Oops


Version-Release number of selected component (if applicable):
2.6.9-50.ELsmp
cmirror-kernel-2.6.9-28.0

How reproducible:
Only once so far

Comment 1 Jonathan Earl Brassow 2007-04-09 16:54:59 UTC

no machines killed during this test?

Comment 2 Corey Marthaler 2007-04-09 17:55:57 UTC

no machines were killed as apart of this test.

Comment 3 Jonathan Earl Brassow 2007-04-09 17:59:19 UTC

Here is the code causing the machine to panic:

			if (lr->u.lr_region != lc->recovering_region) {
				DMERR("Recovering region mismatch: (%Lu/%Lu)",
				      lr->u.lr_region, lc->recovering_region);
				BUG();
			}

The question is, how do we have a record state saying that there is a region in
recovery _and_ lc->recovering_region saying that there is not a region in recovery.

Comment 4 Jonathan Earl Brassow 2007-04-10 07:14:00 UTC

Several fixes have gone in to fix the handling of this bug:
1) During server relocation (which can happen due to machine failure
or normal mirror suspension), the server value could get set before
the client had a chance to clean-up.  This caused the server to
become confused and issue a BUG().

2) perform a flush of the log before suspending.  This ensures
that regions which are in-sync get correctly flushed to the disk
log.  Without this, there will always be recovery work to be done
when a mirror starts up - even if it was properly in-sync during
shutdown.

3) clean-up memory used to record region users when a mirror is
shutdown.  It was possible for some regions to be left over
(causing a memory leak) during certain fault scenarios.

4) properly initialize the state field (ru_rw) in the region
user structure when a mark occurs.  Without the initialization,
it was sometimes possible for the region to be misinterpretted
as recovering instead of marked.

5) resolve and unhandled case in server_complete_resync_work

6) reset a variable in cluster_complete_resync_work.  Failure
to do so was causing a retry to include the wrong value for
the completion of the resync work - confusing the server.

assigned -> post

Comment 5 Rob Kenna 2007-04-10 10:55:44 UTC

Important repair for cluster mirror release in 4.5.  pm-ack

Comment 6 Corey Marthaler 2007-04-10 18:20:07 UTC

This BUG can still be tripped when attempting the 'non synced secondary leg
failure' scenario. Marking back to ASSIGNED, with the QA Whiteboard tag.

Comment 7 Jonathan Earl Brassow 2007-04-10 18:23:26 UTC

Here are the comments I put into the code validating the scenario encountered
and obviating the need for the BUG();

<               BUG();
---
>               /*
>                * This is a valid case, when the following happens:
>                * 1) a region is recovering and has waiting writes
>                * 2) recovery fails and calls complete_resync_work (w/ failure)
>                * 2.1) RU is removed from our list
>                * 3) waiting writes are released
>                * 3.1) writes do not mark, because b/c region state != RH_CLEAN
>                * 4) write fails and calls complete_resync_work (w/ failure)
>                * 5) boom, we are here.
>                *
>                * Not a bug to be here
>                */

assigned -> post

Comment 8 Jonathan Earl Brassow 2007-08-22 17:37:25 UTC

post -> modified.

Comment 9 Corey Marthaler 2007-08-24 18:42:06 UTC

Fix verified with the latest code.
cmirror-kernel-2.6.9-33.2
2.6.9-56.ELsmp

Comment 10 Chris Feist 2008-08-05 21:41:53 UTC

Fixed in current release (4.7).