Description of problem: I was running a standard cmirror failure scenario and while the recovering was taking place, link-08 died: Senario: Kill primary leg of synced 2 leg mirror ****** Mirror hash info for this scenario ****** * name: fail_primary_synced_2_legs * sync: 1 * disklog: 1 * failpv: /dev/sdb1 * legs: 2 * pvs: /dev/sdb1 /dev/sdf1 /dev/sde1 ************************************************ Creating mirror on link-02... qarsh root@link-02 lvcreate -m 1 -n fail_primary_synced_2_legs -L 800M helter_skelter /dev/sdb1:0-500 /dev/sdf1:0-500 /dev/sde1:0-50 Waiting for mirror to sync Verifying that the mirror is fully syncd, currently at ...6.50% ...17.50% ...28.50% ...39.00% ...49.50% ...60.50% ...70.50% ...81.00% ...91.50% ...100.00% Creating gfs on top of mirror on link-02... Creating mnt point /mnt/fail_primary_synced_2_legs on link-02... Mounting gfs on link-02... Creating mnt point /mnt/fail_primary_synced_2_legs on link-04... Mounting gfs on link-04... Creating mnt point /mnt/fail_primary_synced_2_legs on link-07... Mounting gfs on link-07... Creating mnt point /mnt/fail_primary_synced_2_legs on link-08... Mounting gfs on link-08... Disabling device sdb on link-02 Disabling device sda on link-04 Disabling device sdb on link-07 Disabling device sdb on link-08 Attempting I/O to cause mirror down conversion on link-02 10+0 records in 10+0 records out Verifying the down conversion from mirror to linear /dev/sdb1: open failed: No such device or address [HANG] LINK-04: GFS: fsid=LINK_128:gfs.3: Joined cluster. Now mounting FS... GFS: fsid=LINK_128:gfs.3: jid=3: Trying to acquire journal lock... GFS: fsid=LINK_128:gfs.3: jid=3: Looking at journal... GFS: fsid=LINK_128:gfs.3: jid=3: Done Apr 9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: Joined cluster. Now mounting FS... Apr 9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Trying to acquire journal lock... Apr 9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Looking at journal... Apr 9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Done Apr 9 06:23:23 link-08 qarshd[5509]: That's enough Apr 9 06:23:28 link-08 qarshd[5541]: Talking to peer 10.15.80.47:35237 Apr 9 06:23:28 link-08 qarshd[5541]: Running cmdline: echo offline > /sys/block/sdb/device/state Apr 9 06:23:28 link-08 qarshd[5541]: That's enough dm-cmirror: Recovering region mismatch: (5/18446744073709551615) ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at dm_cmirror_server:703 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) dm_cmirror(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod Pid: 5467, comm: cluster_log_ser Not tainted 2.6.9-50.ELsmp RIP: 0010:[<ffffffffa025be6e>] <ffffffffa025be6e>{:dm_cmirror:cluster_log_serverd+3721} RSP: 0018:000001001a805e28 EFLAGS: 00010212 RAX: 0000000000000044 RBX: 000001003d66c920 RCX: ffffffff803e5768 RDX: ffffffff803e5768 RSI: 0000000000000246 RDI: ffffffff803e5760 RBP: 000001003c1f2680 R08: ffffffff803e5768 R09: 000001003d66c920 R10: 0000000100000000 R11: ffffffff8011e9a4 R12: 000001001f789400 R13: ffffffffa02617a0 R14: 0000000000000001 R15: 000001003c1f26a0 FS: 0000002a95562b00(0000) GS:ffffffff804ee200(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000031d2f2f8d8 CR3: 0000000000101000 CR4: 00000000000006e0 Process cluster_log_ser (pid: 5467, threadinfo 000001001a804000, task 000001001b6887f0) Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 000001003b5434c0 000001003b5434c0 000001001a805eb8 0000000000000010 Call Trace:<ffffffff80134660>{default_wake_function+0} <ffffffff80110f47>{child_rip+8} <ffffffffa025afe5>{:dm_cmirror:cluster_log_serverd+0} <ffffffff80110f3f>{child_rip+0} Code: 0f 0b 52 e1 25 a0 ff ff ff ff bf 02 49 c7 44 24 40 ff ff ff RIP <ffffffffa025be6e>{:dm_cmirror:cluster_log_serverd+3721} RSP <000001001a805e28> <0>Kernel panic - not syncing: Oops Version-Release number of selected component (if applicable): 2.6.9-50.ELsmp cmirror-kernel-2.6.9-28.0 How reproducible: Only once so far
no machines killed during this test?
no machines were killed as apart of this test.
Here is the code causing the machine to panic: if (lr->u.lr_region != lc->recovering_region) { DMERR("Recovering region mismatch: (%Lu/%Lu)", lr->u.lr_region, lc->recovering_region); BUG(); } The question is, how do we have a record state saying that there is a region in recovery _and_ lc->recovering_region saying that there is not a region in recovery.
Several fixes have gone in to fix the handling of this bug: 1) During server relocation (which can happen due to machine failure or normal mirror suspension), the server value could get set before the client had a chance to clean-up. This caused the server to become confused and issue a BUG(). 2) perform a flush of the log before suspending. This ensures that regions which are in-sync get correctly flushed to the disk log. Without this, there will always be recovery work to be done when a mirror starts up - even if it was properly in-sync during shutdown. 3) clean-up memory used to record region users when a mirror is shutdown. It was possible for some regions to be left over (causing a memory leak) during certain fault scenarios. 4) properly initialize the state field (ru_rw) in the region user structure when a mark occurs. Without the initialization, it was sometimes possible for the region to be misinterpretted as recovering instead of marked. 5) resolve and unhandled case in server_complete_resync_work 6) reset a variable in cluster_complete_resync_work. Failure to do so was causing a retry to include the wrong value for the completion of the resync work - confusing the server. assigned -> post
Important repair for cluster mirror release in 4.5. pm-ack
This BUG can still be tripped when attempting the 'non synced secondary leg failure' scenario. Marking back to ASSIGNED, with the QA Whiteboard tag.
Here are the comments I put into the code validating the scenario encountered and obviating the need for the BUG(); < BUG(); --- > /* > * This is a valid case, when the following happens: > * 1) a region is recovering and has waiting writes > * 2) recovery fails and calls complete_resync_work (w/ failure) > * 2.1) RU is removed from our list > * 3) waiting writes are released > * 3.1) writes do not mark, because b/c region state != RH_CLEAN > * 4) write fails and calls complete_resync_work (w/ failure) > * 5) boom, we are here. > * > * Not a bug to be here > */ assigned -> post
post -> modified.
Fix verified with the latest code. cmirror-kernel-2.6.9-33.2 2.6.9-56.ELsmp
Fixed in current release (4.7).