Bug 228067

Summary: GFS I/O during mirror leg failure on three legged mirror paniced in dm_mirror:mirror_end_io
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED WORKSFORME QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: high    
Version: 4CC: agk, dwysocha, jbrassow, jnomura, mbroz, prockai, rkenna
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-24 18:56:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Analysis of oops
none
panic trace none

Description Corey Marthaler 2007-02-09 20:19:29 UTC
Description of problem:
I had 4 (3 legged) cluster mirrors (created out of 2 seperate VGs) each with GFS
on top and all running filesystem I/O. I then failed /dev/hda which was set up
to be one of the three legs in all 4 mirrors. Just after the leg failed I hit
the following panic on link-04. I also hit a different assertion on link-01 (bug
to be filed next).

 Feb  9 08:58:46 link-04 kernel: SCSI error : <1 0 1 1> return code = 0x10000
Feb  9 08:58:46 link-04 kernel: end_request: I/O error, dev sdh, sector 1277144231
Feb  9 08:58:46 link-04 kernel: SCSI error : <1 0 1 1> return code = 0x10000
Feb  9 08:58:46 link-04 kernel: end_request: I/O error, dev sdh, sector 1277144239
Feb  9 08:58:46 link-04 kernel: device-mapper: Write error during recovery
(error = 0x1)
Feb  9 08:58:46 link-04 kernel: dm-cmirror: Error while listening for server
response: -110
device-mapper: recovery failed on region 10346
SCSI error : <1 0 1 1> return code = 0x10000
end_request: I/O error, dev sdh, sector 1293230263
Unable to handle kernel NULL pointer dereference at 0000000000000038 RIP:
<ffffffffa00a4caa>{:dm_mirror:mirror_end_io+102}
PML4 32f45067 PGD 32f46067 PMD 0
Oops: 0002 [1] SMP
CPU 0
Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) mptfc dm_cmirror(U) dlm(U)
cman(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket
pcmcia_core button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 dm_snapshot
dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod
scsi_mod
Pid: 5430, comm: xdoio Not tainted 2.6.9-46.ELsmp
RIP: 0010:[<ffffffffa00a4caa>] <ffffffffa00a4caa>{:dm_mirror:mirror_end_io+102}
RSP: 0018:ffffffff8045ad18  EFLAGS: 00010046
RAX: 0000000000000216 RBX: 000001003eef2c30 RCX: 0000000000000216
RDX: 000000000000286a RSI: 0000000000000216 RDI: 000001003eef2c50
RBP: 000001003eef2c18 R08: ffffff000017a000 R09: 00000000000000ff
R10: ffffffffa00582d2 R11: ffffffffa00582d2 R12: 0000000000000000
R13: 000001003eef2c00 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a95563f00(0000) GS:ffffffff804ed500(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000000101000 CR4: 00000000000006e0
Process xdoio (pid: 5430, threadinfo 0000010032fac000, task 0000010032bf27f0)
Stack: 0000000100000000 00000100326ced90 0000000000000000 0000010039b06080
       0000000000000000 0000010031fce8e0 000001003efbc600 ffffffffa0058338
       0000000000000000 00000100326ced90
Call Trace:<IRQ> <ffffffffa0058338>{:dm_mod:clone_endio+102}
<ffffffffa005844f>{:dm_mod:clone_endio+381}
       <ffffffff80251b17>{__end_that_request_first+238}
<ffffffffa000703c>{:scsi_mod:scsi_end_request+40}
       <ffffffffa0007351>{:scsi_mod:scsi_io_completion+497}
       <ffffffffa0002d45>{:scsi_mod:scsi_softirq+213}
<ffffffff8013cfa8>{__do_softirq+88}
       <ffffffff8013d051>{do_softirq+49} <ffffffff80113247>{do_IRQ+328}
       <ffffffff80110833>{ret_from_intr+0}  <EOI>
<ffffffffa0227e80>{:gfs:get_block+0}
       <ffffffff801ed5a2>{copy_user_generic_c+8}
<ffffffff8015c2bd>{generic_file_buffered_write+720}
       <ffffffffa005ac27>{:dm_mod:dm_table_any_congested+68}
       <ffffffff8013cf22>{current_fs_time+89}
<ffffffff8015ca01>{__generic_file_aio_write_nolock+731}
       <ffffffff8015cc9f>{generic_file_aio_write_nolock+32}
       <ffffffff8015ce6d>{generic_file_write_nolock+158}
<ffffffff80136020>{autoremove_wake_function+0}
       <ffffffffa022a5a2>{:gfs:do_write_buf+997}
<ffffffffa02294a3>{:gfs:walk_vm+278}
       <ffffffffa022a1bd>{:gfs:do_write_buf+0} <ffffffff8016a3a9>{do_no_page+1023}
       <ffffffffa022a1bd>{:gfs:do_write_buf+0}
<ffffffff8016a7f6>{handle_mm_fault+373}
       <ffffffffa022a88e>{:gfs:__gfs_write+201} <ffffffffa022a8b5>{:gfs:gfs_write+0}
       <ffffffff8017a585>{do_readv_writev+463} <ffffffff8030c4d1>{thread_return+0}
       <ffffffff8030c529>{thread_return+88} <ffffffff8016d797>{do_mmap_pgoff+1560}
       <ffffffff8017a72d>{sys_writev+69} <ffffffff8011026a>{system_call+126}


Code: f0 41 ff 4c 24 38 0f 94 c0 84 c0 74 66 41 8b 44 24 10 83 f8
RIP <ffffffffa00a4caa>{:dm_mirror:mirror_end_io+102} RSP <ffffffff8045ad18>
CR2: 0000000000000038
 <0>Kernel panic - not syncing: Oops



link-07:

dm-cmirror: Failed to receive election results from server: -110
Feb  9 14:13:42 link-07 last message repeated 4 times
dm-cmirror: Failed to receive election results from server: -110
dm-cmirror: Failed to receive election results from server: -110
dm-cmirror: Failed to receive election results from server: -110
dm-cmirror: Failed to receive election results from server: -110
Feb  9 14:15:02 link-07 last message repeated 4 times
dm-cmirror: Failed to receive election results from server: -110
dm-cmirror: Failed to receive election results from server: -110



link-08:
Feb  9 08:59:45 link-08 kernel: SCSI error : <2 0 1 1> return code = 0x10000
Feb  9 08:59:45 link-08 kernel: end_request: I/O error, dev sdh, sector 10318201
Feb  9 08:59:45 link-08 kernel: device-mapper: Write error during recovery
(error = 0x1)
Feb  9 08:59:45 link-08 kernel: dlm: clvmd: dlm_dir_rebuild_wait failed -1
Feb  9 08:59:45 link-08 kernel: device-mapper: recovery failed on region 10075
Feb  9 08:59:45 link-08 kernel: dlm: gfs1: dlm_dir_rebuild_wait failed -1
Feb  9 08:59:45 link-08 kernel: dlm: gfs3: dlm_dir_rebuild_wait failed -1
Feb  9 08:59:45 link-08 kernel: dlm: gfs2: dlm_dir_rebuild_wait failed -1
Feb  9 08:59:45 link-08 kernel: device-mapper: recovery failed on region 5575
Feb  9 08:59:45 link-08 kernel: SCSI error : <2 0 1 1> return code = 0x10000
Feb  9 08:59:45 link-08 kernel: end_request: I/O error, dev sdh, sector 20971905


Version-Release number of selected component (if applicable):
2.6.9-46.ELsmp
lvm2-2.02.21-1.el4
lvm2-cluster-2.02.21-3.el4
cmirror-kernel-smp-2.6.9-19.1
cmirror-1.0.1-1

Comment 1 Corey Marthaler 2007-02-09 20:31:12 UTC
bz 228070 is the issue that happened on link-02 during this panic.

Comment 2 Corey Marthaler 2007-02-09 21:16:36 UTC
Feb  9 08:31:37 link-02 kernel: device-mapper: Unable to read from primary
mirror during recovery
Feb  9 08:31:37 link-02 kernel: device-mapper: Primary mirror device has failed
while mirror is out of sync.
Feb  9 08:31:37 link-02 kernel: device-mapper: Unable to choose alternative
primary device
Feb  9 08:31:37 link-02 kernel: device-mapper: recovery failed on region 5576


Comment 3 Dave Wysochanski 2007-02-09 23:39:50 UTC
Created attachment 147826 [details]
Analysis of oops

The call chain of the oops looks like:
mirror_end_io()->__rh_lookup()->rh_dec()

It looks like the call to __rh_lookup() inside rh_dec() fails to find a region
pointer and the atomic_dec_and_test() instruction is what faults (full analysis
in attachment).


static void rh_dec(struct region_hash *rh, region_t region)
{
	unsigned long flags;
	struct region *reg;
	int should_wake = 0;

	read_lock(&rh->hash_lock);
	reg = __rh_lookup(rh, region);
	read_unlock(&rh->hash_lock);

	spin_lock_irqsave(&rh->region_lock, flags);
>>>OOPS>>>	if (atomic_dec_and_test(&reg->pending)) {
		/*
		 * There is no pending I/O for this region.
		 * We can move the region to corresponding list for next
action.

Comment 4 Corey Marthaler 2007-02-12 21:55:56 UTC
Was able to reproduce this panic, Bumping priority.

Comment 6 Jonathan Earl Brassow 2007-02-13 18:51:49 UTC
I propose setting a restriction that mirrors are limited to 2 sides for 4.5. 
This would diffuse this bug.  Once we agree on that, I'll open a RFE for 4.6 and
make this bug dependent on that.


Comment 7 Jonathan Earl Brassow 2007-02-13 19:40:35 UTC
This bug look more like a problem with a reg struct being forgotten (if the
analysis is correct)


Comment 8 Jonathan Earl Brassow 2007-02-13 19:45:07 UTC
Once the machine (presumably the server) goes down with the panic, the server
seems unable to migrate...

Seems alot like 217895 after the panic.



Comment 9 Jonathan Earl Brassow 2007-02-14 17:46:41 UTC
228104 and 228056 have been fixed and are waiting for a package build.

I believe this bug will still exist after that fix, but it would be nice to confirm.

Comment 10 Jonathan Earl Brassow 2007-02-15 20:37:40 UTC
continuing on comment #3...

rh_dec should always find the reg... the only way it can't is if it's not on the
hash_list.  The only way it gets removed from the hash_list is if it somehow got
onto the clean_regions, recovered_regions, or failed_recovered_regions lists...

Recovery can't start until all writes for the region have cleared; and new
writes can't start on a region until a recovery is complete...  That would seem
to rule out latter two reasons for getting off the hash list.  In order to get
on the clean_regions list before the final write completes, the reg->pending
count would have to be screwed up.  I've reviewed the pending variable and I
don't see a way that can get screwed up...

That pretty much means that we looked up a region by the wrong number and there
was no memory to allocate a new region.  Very unlikely.

Looking at it another way... forget about whether the reg is lost or not.  We
would have to fail a lookup and fail an allocation to get a NULL reg.


Comment 11 Jonathan Earl Brassow 2007-02-21 20:36:39 UTC
corey: have you been able to reproduce this one?

It is _very_ interesting to see the node oops in mirror_end_io...

We might be able to add some debugging to the kernel, but we need to be able to
easily reproduce it first.


Comment 13 Jonathan Earl Brassow 2007-03-14 04:05:28 UTC
Waiting for reproduction.


Comment 14 Corey Marthaler 2007-04-12 16:43:53 UTC
This has not been reproduced with the latest packages. Marking verified.

Comment 15 Corey Marthaler 2007-04-26 15:16:22 UTC
Reproduced this last night on link-07 during cmirror failure, reopening...

Senario: Kill secondary leg of non synced 2 leg mirror

****** Mirror hash info for this scenario ******
* name:      fail_secondary_non_synced_2_legs
* sync:      0
* disklog:   1
* failpv:    /dev/sdb1
* legs:      2
* pvs:       /dev/sda1 /dev/sdb1 /dev/sdg1
************************************************

Creating mirror on link-07...
qarsh root@link-07 lvcreate -m 1 -n fail_secondary_non_synced_2_legs -L 800M
helter_skelter /dev/sda1:0-500 /dev/sdb1:0-500 /dev/sdg1:0-50

mirror is only 6.00% synced right now

Creating gfs on top of mirror on link-07...
Creating mnt point /mnt/fail_secondary_non_synced_2_legs on link-02...
Mounting gfs on link-02...
Creating mnt point /mnt/fail_secondary_non_synced_2_legs on link-04...
Mounting gfs on link-04...
Creating mnt point /mnt/fail_secondary_non_synced_2_legs on link-07...
Mounting gfs on link-07...
Creating mnt point /mnt/fail_secondary_non_synced_2_legs on link-08...
Mounting gfs on link-08...
Writing files to gfs on...
        link-02
checkit starting with:
CREATE
Num files:          100
Random Seed:        29592
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs
Working dir:        /mnt/fail_secondary_non_synced_2_legs
        link-04
checkit starting with:
CREATE
Num files:          100
Random Seed:        29684
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs
Working dir:        /mnt/fail_secondary_non_synced_2_legs
        link-07
checkit starting with:
CREATE
Num files:          100
Random Seed:        30312
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs
Working dir:        /mnt/fail_secondary_non_synced_2_legs
        link-08
checkit starting with:
CREATE
Num files:          100
Random Seed:        28879
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs
Working dir:        /mnt/fail_secondary_non_synced_2_legs
Disabling device sdb on link-02
Disabling device sdb on link-04
Disabling device sdb on link-07
Disabling device sdb on link-08

Attempting I/O to cause mirror down conversion on link-07
10+0 records in
10+0 records out
Didn't receive heartbeat for 120 seconds
Verifying the down conversion from mirror to linear
  connect() failed on local socket: Connection refused
  Skipping clustered volume group helter_skelter
down conversion didn't appear to work as a simple lvs fails
run_iteration() method failed at
/home/msp/cmarthal/work/rhel4/sts-root/lib/FI_engine.pm line 21.

[PANIC]

Apr 25 19:09:43 link-07 lvm[5439]: No longer monitoring mirror device
helter_skelter-fail_secondary_non_synced_2_legs for events                     
                                                                         
device-mapper: recovery failed on region 959                                   
                                 scsi2 (0:2): rejecting I/O to offline device  
                                                                Unable to handle
kernel NULL pointer dereference at 0000000000000038 RIP:                       
              <ffffffffa00f8caa>{:dm_mirror:mirror_end_io+102}                 
                                             PML4 1dc36067 PGD 1dc35067 PMD 0  
                                                                           
Oops: 0002 [1] SMP                                                             
                               CPU 0                                           
                                                              Modules linked in:
lock_dlm(U) gfs(U) lock_harness(U) dm_cmirror(U) dlm(U) cman(U) md5 ipv6
parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket
pcmcia_core button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc
mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod                         
                                                    Pid: 0, comm: swapper Not
tainted 2.6.9-55.ELsmp                                                         
     RIP: 0010:[<ffffffffa00f8caa>]
<ffffffffa00f8caa>{:dm_mirror:mirror_end_io+102}                               
RSP: 0018:ffffffff8045a898  EFLAGS: 00010046                                   
                               RAX: 0000000000000216 RBX: 000001003f47a830 RCX:
0000000000000216                                              RDX:
00000000000003bf RSI: 0000000000000216 RDI: 000001003f47a850                   
                          RBP: 000001003f47a818 R08: ffffff0000059000 R09:
000000000000003f                                              R10:
ffffffffa00ac2d2 R11: ffffffffa00ac2d2 R12: 0000000000000000                   
                          R13: 000001003f47a800 R14: 0000000000000000 R15:
0000000000000000                                              FS: 
0000002a95562b00(0000) GS:ffffffff804ed700(0000) knlGS:0000000000000000        
                          CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b    
                                                         CR2: 0000000000000038
CR3: 0000000000101000 CR4: 00000000000006e0                                    
         Process swapper (pid: 0, threadinfo ffffffff804f0000, task
ffffffff803d7280)                                   Stack: 0000000100000000
0000010036794750 0000000000000000 00000100367f9980                             
              0000000000000000 00000100367fc628 000001001f63f600
ffffffffa00ac338                                            0000000000000000
0000010036794750                                                               
       Call Trace:<IRQ> <ffffffffa00ac338>{:dm_mod:clone_endio+102} 
<ffffffffa00ac44f>{:dm_mod:clone_endio+381}             
<ffffffff802512cf>{__end_that_request_first+238}
<ffffffffa000703c>{:scsi_mod:scsi_end_request+40}            
<ffffffffa0007351>{:scsi_mod:scsi_io_completion+497}                           
                               <ffffffffa0002d45>{:scsi_mod:scsi_softirq+213}
<ffffffff8013cbec>{__do_softirq+88}                           
<ffffffff8013cc95>{do_softirq+49} <ffffffff80113247>{do_IRQ+328}               
                               <ffffffff80110833>{ret_from_intr+0}  <EOI>
<ffffffff8010e749>{default_idle+0}                                 
<ffffffff8010e769>{default_idle+32} <ffffffff8010e7dc>{cpu_idle+26}            
                               <ffffffff804f367b>{start_kernel+470}
<ffffffff804f31d5>{_sinittext+469}                                             
                                                                               
                                                                               
                                                 

Code: f0 41 ff 4c 24 38 0f 94 c0 84 c0 74 66 41 8b 44 24 10 83 f8              
                               RIP
<ffffffffa00f8caa>{:dm_mirror:mirror_end_io+102} RSP <ffffffff8045a898>        
                           CR2: 0000000000000038                               
                                                           <0>Kernel panic - not
syncing: Oops                                       




Comment 16 Corey Marthaler 2007-04-26 15:25:22 UTC
Created attachment 153521 [details]
panic trace

Attaching panic as a text file since it didn't look good in comment #15.

Comment 17 Jonathan Earl Brassow 2007-04-30 14:59:53 UTC
bug 238031 was fixed/built shortly after this bug was re-filed.  That bug addressed issues of conflicting 
recovery/writes.  Complain all you want about the performance of cluster mirrors during recovery, but you 
can't agrue with the results... no more GFS panics.

This bug may not be addressed with the latest packages... the mirror_end_io business seems new to me.  
I'd like to see it reproduced though.

Comment 18 Kiersten (Kerri) Anderson 2007-05-21 22:16:54 UTC
Moving defect to 4.6 list to track for the next release if more information is
uncovered.

Comment 19 Kiersten (Kerri) Anderson 2007-08-14 16:03:45 UTC
Moving to 4.7 consideration until more data is provided.

Comment 20 Corey Marthaler 2007-08-24 18:56:33 UTC
Closing this bug due to the inability to reproduce.