Bug 235686 - Kernel BUG at dm_cmirror_server while recovering region
Kernel BUG at dm_cmirror_server while recovering region
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2007-04-09 12:27 EDT by Corey Marthaler
Modified: 2010-01-11 21:03 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-08-05 17:41:53 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2007-04-09 12:27:16 EDT
Description of problem:
I was running a standard cmirror failure scenario and while the recovering was
taking place, link-08 died:

Senario: Kill primary leg of synced 2 leg mirror

****** Mirror hash info for this scenario ******
* name:      fail_primary_synced_2_legs
* sync:      1
* disklog:   1
* failpv:    /dev/sdb1
* legs:      2
* pvs:       /dev/sdb1 /dev/sdf1 /dev/sde1

Creating mirror on link-02...
qarsh root@link-02 lvcreate -m 1 -n fail_primary_synced_2_legs -L 800M
helter_skelter /dev/sdb1:0-500 /dev/sdf1:0-500 /dev/sde1:0-50

Waiting for mirror to sync
Verifying that the mirror is fully syncd, currently at
 ...6.50% ...17.50% ...28.50% ...39.00% ...49.50% ...60.50% ...70.50% ...81.00%
...91.50% ...100.00%

Creating gfs on top of mirror on link-02...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-02...
Mounting gfs on link-02...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-04...
Mounting gfs on link-04...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-07...
Mounting gfs on link-07...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-08...
Mounting gfs on link-08...

Disabling device sdb on link-02
Disabling device sda on link-04
Disabling device sdb on link-07
Disabling device sdb on link-08

Attempting I/O to cause mirror down conversion on link-02
10+0 records in
10+0 records out
Verifying the down conversion from mirror to linear
  /dev/sdb1: open failed: No such device or address

GFS: fsid=LINK_128:gfs.3: Joined cluster. Now mounting FS...
GFS: fsid=LINK_128:gfs.3: jid=3: Trying to acquire journal lock...
GFS: fsid=LINK_128:gfs.3: jid=3: Looking at journal...
GFS: fsid=LINK_128:gfs.3: jid=3: Done
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: Joined cluster. Now
mounting FS...
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Trying to
acquire journal lock...
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Looking at
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Done
Apr  9 06:23:23 link-08 qarshd[5509]: That's enough
Apr  9 06:23:28 link-08 qarshd[5541]: Talking to peer
Apr  9 06:23:28 link-08 qarshd[5541]: Running cmdline: echo offline >
Apr  9 06:23:28 link-08 qarshd[5541]: That's enough
dm-cmirror: Recovering region mismatch: (5/18446744073709551615)
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at dm_cmirror_server:703
invalid operand: 0000 [1] SMP
Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) dm_cmirror(U) dlm(U)
cman(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket
pcmcia_core button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc
mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 5467, comm: cluster_log_ser Not tainted 2.6.9-50.ELsmp
RIP: 0010:[<ffffffffa025be6e>]
RSP: 0018:000001001a805e28  EFLAGS: 00010212
RAX: 0000000000000044 RBX: 000001003d66c920 RCX: ffffffff803e5768
RDX: ffffffff803e5768 RSI: 0000000000000246 RDI: ffffffff803e5760
RBP: 000001003c1f2680 R08: ffffffff803e5768 R09: 000001003d66c920
R10: 0000000100000000 R11: ffffffff8011e9a4 R12: 000001001f789400
R13: ffffffffa02617a0 R14: 0000000000000001 R15: 000001003c1f26a0
FS:  0000002a95562b00(0000) GS:ffffffff804ee200(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031d2f2f8d8 CR3: 0000000000101000 CR4: 00000000000006e0
Process cluster_log_ser (pid: 5467, threadinfo 000001001a804000, task
Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 000001003b5434c0 000001003b5434c0
       000001001a805eb8 0000000000000010
Call Trace:<ffffffff80134660>{default_wake_function+0}

Code: 0f 0b 52 e1 25 a0 ff ff ff ff bf 02 49 c7 44 24 40 ff ff ff
RIP <ffffffffa025be6e>{:dm_cmirror:cluster_log_serverd+3721} RSP <000001001a805e28>
 <0>Kernel panic - not syncing: Oops

Version-Release number of selected component (if applicable):

How reproducible:
Only once so far
Comment 1 Jonathan Earl Brassow 2007-04-09 12:54:59 EDT
no machines killed during this test?
Comment 2 Corey Marthaler 2007-04-09 13:55:57 EDT
no machines were killed as apart of this test.
Comment 3 Jonathan Earl Brassow 2007-04-09 13:59:19 EDT
Here is the code causing the machine to panic:

			if (lr->u.lr_region != lc->recovering_region) {
				DMERR("Recovering region mismatch: (%Lu/%Lu)",
				      lr->u.lr_region, lc->recovering_region);

The question is, how do we have a record state saying that there is a region in
recovery _and_ lc->recovering_region saying that there is not a region in recovery.

Comment 4 Jonathan Earl Brassow 2007-04-10 03:14:00 EDT
Several fixes have gone in to fix the handling of this bug:
1) During server relocation (which can happen due to machine failure
or normal mirror suspension), the server value could get set before
the client had a chance to clean-up.  This caused the server to
become confused and issue a BUG().

2) perform a flush of the log before suspending.  This ensures
that regions which are in-sync get correctly flushed to the disk
log.  Without this, there will always be recovery work to be done
when a mirror starts up - even if it was properly in-sync during

3) clean-up memory used to record region users when a mirror is
shutdown.  It was possible for some regions to be left over
(causing a memory leak) during certain fault scenarios.

4) properly initialize the state field (ru_rw) in the region
user structure when a mark occurs.  Without the initialization,
it was sometimes possible for the region to be misinterpretted
as recovering instead of marked.

5) resolve and unhandled case in server_complete_resync_work

6) reset a variable in cluster_complete_resync_work.  Failure
to do so was causing a retry to include the wrong value for
the completion of the resync work - confusing the server.

assigned -> post
Comment 5 Rob Kenna 2007-04-10 06:55:44 EDT
Important repair for cluster mirror release in 4.5.  pm-ack
Comment 6 Corey Marthaler 2007-04-10 14:20:07 EDT
This BUG can still be tripped when attempting the 'non synced secondary leg
failure' scenario. Marking back to ASSIGNED, with the QA Whiteboard tag.
Comment 7 Jonathan Earl Brassow 2007-04-10 14:23:26 EDT
Here are the comments I put into the code validating the scenario encountered
and obviating the need for the BUG();

<               BUG();
>               /*
>                * This is a valid case, when the following happens:
>                * 1) a region is recovering and has waiting writes
>                * 2) recovery fails and calls complete_resync_work (w/ failure)
>                * 2.1) RU is removed from our list
>                * 3) waiting writes are released
>                * 3.1) writes do not mark, because b/c region state != RH_CLEAN
>                * 4) write fails and calls complete_resync_work (w/ failure)
>                * 5) boom, we are here.
>                *
>                * Not a bug to be here
>                */

assigned -> post
Comment 8 Jonathan Earl Brassow 2007-08-22 13:37:25 EDT
post -> modified.
Comment 9 Corey Marthaler 2007-08-24 14:42:06 EDT
Fix verified with the latest code.
Comment 10 Chris Feist 2008-08-05 17:41:53 EDT
Fixed in current release (4.7).

Note You need to log in before you can comment on or make changes to this bug.