Bug 165242
Summary: | mirrors possibly reporting invalid blocks to the filesystem | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Corey Marthaler <cmarthal> |
Component: | kernel | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | agk |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHSA-2005-514 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-10-05 13:48:56 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 156323 |
Description
Corey Marthaler
2005-08-05 19:24:04 UTC
lvms: [root@link-12 ~]# pvscan PV /dev/sdb1 VG coreyvg lvm2 [40.71 GB / 216.00 MB free] PV /dev/sdb2 VG coreyvg lvm2 [40.71 GB / 216.00 MB free] PV /dev/sdb3 VG coreyvg lvm2 [40.71 GB / 40.71 GB free] PV /dev/sdb5 VG coreyvg lvm2 [40.71 GB / 40.71 GB free] PV /dev/sda1 VG gfs lvm2 [270.96 GB / 988.00 MB free] Total: 5 [433.81 GB] / in use: 5 [433.81 GB] / in no VG: 0 [0 ] [root@link-12 ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "coreyvg" using metadata type lvm2 Found volume group "gfs" using metadata type lvm2 [root@link-12 ~]# lvscan ACTIVE '/dev/coreyvg/coreymirror' [40.50 GB] inherit ACTIVE '/dev/coreyvg/coreymirror_mlog' [4.00 MB] inherit ACTIVE '/dev/coreyvg/coreymirror_mimage_0' [40.50 GB] inherit ACTIVE '/dev/coreyvg/coreymirror_mimage_1' [40.50 GB] inherit ACTIVE '/dev/gfs/gfs' [270.00 GB] inherit [root@link-12 ~]# lvs --all LV VG Attr LSize Origin Snap% Move Log Copy% coreymirror coreyvg mwi-a- 40.50G coreymirror_mlog 1.96 [coreymirror_mimage_0] coreyvg iwi-ao 40.50G [coreymirror_mimage_1] coreyvg iwi-ao 40.50G [coreymirror_mlog] coreyvg lwi-ao 4.00M gfs gfs -wi-a- 270.00G I was just playing around with four mirrors and ext3 and saw errors and this time a hang of the fs. [...] Aug 5 10:36:36 link-12 kernel: device-mapper: All replicated volumes dead, failing I/O Aug 5 10:36:36 link-12 kernel: device-mapper: A read failure occurred on a mirror device. Aug 5 10:36:36 link-12 kernel: device-mapper: incrementing error_count on 253:7 Aug 5 10:36:36 link-12 kernel: device-mapper: All mirror devices are dead. Unable to choose mirror. Aug 5 10:36:36 link-12 kernel: device-mapper: All replicated volumes dead, failing I/O Aug 5 10:36:36 link-12 kernel: device-mapper: All mirror devices are dead. Unable to choose mirror. Aug 5 10:36:36 link-12 kernel: EXT2-fs error (device dm-8): read_inode_bitmap: Cannot read inode bitmap - block_group = 0, inode_bitmap = 1028 [...] [root@link-12 ~]# lvs --all LV VG Attr LSize Origin Snap% Move Log Copy% coreymirror0 coreyvg mwi-ao 20.00G coreymirror0_mlog 17.79 [coreymirror0_mimage_0] coreyvg iwi-ao 20.00G [coreymirror0_mimage_1] coreyvg iwi-ao 20.00G [coreymirror0_mlog] coreyvg lwi-ao 4.00M coreymirror1 coreyvg mwi-ao 20.00G coreymirror1_mlog 17.42 [coreymirror1_mimage_0] coreyvg iwi-ao 20.00G [coreymirror1_mimage_1] coreyvg iwi-ao 20.00G [coreymirror1_mlog] coreyvg lwi-ao 4.00M coreymirror2 coreyvg mwi-ao 20.00G coreymirror2_mlog 13.67 [coreymirror2_mimage_0] coreyvg iwi-ao 20.00G [coreymirror2_mimage_1] coreyvg iwi-ao 20.00G [coreymirror2_mlog] coreyvg lwi-ao 4.00M coreymirror3 coreyvg mwi-ao 20.00G coreymirror3_mlog 12.68 [coreymirror3_mimage_0] coreyvg iwi-ao 20.00G [coreymirror3_mimage_1] coreyvg iwi-ao 20.00G [coreymirror3_mlog] coreyvg lwi-ao 4.00M gfs gfs -wi-a- 270.00G [root@link-12 bin]# df -h Filesystem Size Used Avail Use% Mounted on /dev/hda2 36G 2.6G 32G 8% / /dev/hda1 99M 16M 79M 17% /boot none 248M 0 248M 0% /dev/shm /dev/mapper/coreyvg-coreymirror0 20G 48M 19G 1% /mnt/mirror0 /dev/mapper/coreyvg-coreymirror1 20G 45M 19G 1% /mnt/mirror1 /dev/mapper/coreyvg-coreymirror2 20G 45M 19G 1% /mnt/mirror2 /dev/mapper/coreyvg-coreymirror3 20G 49M 19G 1% /mnt/mirror3 [root@link-12 bin]# touch /mnt/mirror0/sdkfjs [root@link-12 bin]# touch /mnt/mirror1/sdkfjs [hang] Try using something other than ext[23] - just to make sure it isn't the FS. Also, in addition to 'lvs --all' do a 'dmsetup status'. You sure your drives aren't spitting out any errors? Exactly which kernel was this with? (If it didn't include the recently-added patches like linux-2.6.9-dm-raid1-race.patch and linux-2.6.9-bio-clone.patch then please retest.) Indeed, the kernel would have preceeded on with the bio_clone fix. 2.6.9-15.EL has these fixes. I've asked corey to retest with these. Hit this after about 50 minutes of running the fsstress load on a gfs filesystem with no_lock: Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: fatal: invalid metadata block Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: bh = 8050939 (magic) Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: function = gfs_rgrp_read Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: file = /usr/src/build/600062-x86_64/BUILD/gfs-kernel-2.6.9-37/smp/src/gfs/rgrp.c, line = 830 Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: time = 1123514218 Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: about to withdraw from the cluster Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: waiting for outstanding I/O Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: telling LM to withdraw Aug 8 10:16:58 link-08 kernel: GFS: fsid=dm-5.0: withdrawn [root@link-08 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 52G 5.4G 45G 11% / /dev/hda5 99M 35M 59M 38% /boot none 501M 0 501M 0% /dev/shm df: `/mnt/mirror': Input/output error [root@link-08 mirror]# uname -ar Linux link-08 2.6.9-15.ELsmp #1 SMP Fri Aug 5 19:00:35 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux GFS 2.6.9-37.1 (built Aug 8 2005 13:59:40) installed I reproduced this as well with a smaller ext fs (25M) on a single CPU machine running a UP kernel. Next we probably need to build one-off kernels with some of the recent patches removed to see if it goes away or not. Using older kernels is unlikely to help as they include other bugs we fixed. Could also try using Linus's latest 2.6.13* kernel. [Again, stick with UP 25Meg for further testing.] Corey is now working on 2.6.9-15.EL with linux-2.6.9-dm-mirroring.patch removed. This will tell us if it was a pre-existing condition. When reproducing, please list the following in your comment: 1) kernel + any changes (like the removal of linux-2.6.9-dm-mirroring.patch) 2) UP or SMP 3) size of FS if one is used 4) tests run 5) time to reproduce See also bug 164630: Is this because readahead that returns EWOULDBLOCK is treated as a hard error and causes the device to be disabled? Testing update of this afternoons activities: x86 UP machine (link-12) running a 2.6.9-15.EL kernel compiled without linux-2.6.9-dm-mirroring.patch. Running ext3 on a 28M linear and running 6 iterations of fsstress and while doing pvmoves at the same time to simulate mirroring. This ran around 1.5 - 2 hours before triping 164630. x86 UP machine (link-12) running a 2.6.9-15.EL kernel compiled without linux-2.6.9-dm-mirroring.patch. Running ext3 on a bigger 25G linear and running 6 iterations of fsstress and while doing pvmoves at the same time to simulate mirroring. This continues to run. x86 UP machine (link-11) running a regular 2.6.9-15.EL kernel. Running ext3 on a 28M mirror and running 6 iterations of fsstress. This _always_ trips 165242 instantanously. x86 UP machine (link-11) running a regular 2.6.9-15.EL kernel. Running b_iogen/b_doio (a block level io generator/write verifier) straight to a 28M mirror. b_iogen/b_doio are this time doing overlapping I/O to random offsets with random transfer size between 1k and 4m. This continues to run. x86_64 smp machine (link-08) running a regular 2.6.9-15.EL kernel. Running GFS on a 500M mirror and doing 8 iterations of fsstress. This continues to run but is causing the following errors every minute or so: Aug 9 11:43:37 link-08 kernel: GFS: fsid=dm-5.0: quotad: error = -28 Aug 9 11:48:41 link-08 kernel: GFS: fsid=dm-5.0: quotad: error = -28 Reproduced this finally after 2 hours of running a much bigger I/O load to a much bigger filesystem. This is with the lastest patches from Jon: patch -p1 < /pub/164630-1.patch patch -p1 < /pub/165242-2.patch patch -p1 < /pub/spinlock_cleanup-3.patch On x86_64 (still have yet to reproduce on x86). The filesytem/mirror size 10G. The I/O load consisted of many fsstress/genesis/accordion processes along with pvmoves going on. With these latest patches, this bug is definately much harder to see as it used to happen seconds after starting I/O (and like stated earlier, I still haven't seen it on x86). Which patches and applied to which kernel? Please attach patches to this bug so we can see what you've tested! And is this a true reproduction - exactly the same error messages as in comment #1 - if not, please be more specific and quote the new messages. 164630-1.patch: diff -purN linux-2.6.9-12.EL-01/drivers/md/dm-raid1.c linux-2.6.9-12.EL-02/drivers/md/dm-raid1.c --- linux-2.6.9-12.EL-01/drivers/md/dm-raid1.c 2005-08-03 15:28:40.031213364 -0500 +++ linux-2.6.9-12.EL-02/drivers/md/dm-raid1.c 2005-08-03 15:30:24.220524638 -0500 @@ -378,6 +378,11 @@ static void rh_inc(struct region_hash *r read_lock(&rh->hash_lock); reg = __rh_find(rh, region); + + spin_lock_irq(&rh->region_lock); + atomic_inc(®->pending); + spin_unlock_irq(&rh->region_lock); + if (reg->state == RH_CLEAN) { rh->log->type->mark_region(rh->log, reg->key); @@ -387,7 +392,6 @@ static void rh_inc(struct region_hash *r spin_unlock_irq(&rh->region_lock); } - atomic_inc(®->pending); read_unlock(&rh->hash_lock); } @@ -409,17 +413,17 @@ static void rh_dec(struct region_hash *r reg = __rh_lookup(rh, region); read_unlock(&rh->hash_lock); + spin_lock_irqsave(&rh->region_lock, flags); if (atomic_dec_and_test(®->pending)) { - spin_lock_irqsave(&rh->region_lock, flags); if (reg->state == RH_RECOVERING) { list_add_tail(®->list, &rh->quiesced_regions); } else { reg->state = RH_CLEAN; list_add(®->list, &rh->clean_regions); } - spin_unlock_irqrestore(&rh->region_lock, flags); should_wake = 1; } + spin_unlock_irqrestore(&rh->region_lock, flags); if (should_wake) wake(); 165242-2.patch: diff -purN linux-2.6.9-15.EL-race/drivers/md/dm-raid1.c linux-2.6.9-15.EL-01/drivers/md/dm-raid1.c --- linux-2.6.9-15.EL-race/drivers/md/dm-raid1.c 2005-08-10 10:53:04.267420165 -0500 +++ linux-2.6.9-15.EL-01/drivers/md/dm-raid1.c 2005-08-10 10:58:45.888278366 -0500 @@ -1371,7 +1371,7 @@ static int mirror_map(struct dm_target * */ m = choose_mirror(ms, NULL); if (likely(m)) { - bmi = mempool_alloc(bio_map_info_pool, GFP_KERNEL); + bmi = mempool_alloc(bio_map_info_pool, GFP_NOIO); if (likely(bmi)) { /* without this, a read is not retryable */ @@ -1408,6 +1408,7 @@ static int mirror_end_io(struct dm_targe int rw = bio_rw(bio); struct mirror_set *ms = (struct mirror_set *) ti->private; struct mirror *m = NULL; + struct dm_bio_details *bd = NULL; /* * We need to dec pending if this was a write. @@ -1417,9 +1418,13 @@ static int mirror_end_io(struct dm_targe return error; } - if (unlikely(error)) { - struct dm_bio_details *bd = NULL; + if (error == -EOPNOTSUPP) + goto out; + + if ((error == -EWOULDBLOCK) && bio_rw_ahead(bio)) + goto out; + if (unlikely(error)) { DMERR("A read failure occurred on a mirror device."); if (!map_context->ptr) { /* @@ -1437,7 +1442,7 @@ static int mirror_end_io(struct dm_targe * to the daemon for another shot to * one (if any) intact mirrors. */ - if (rw == READ && default_ok(m)) { + if (default_ok(m)) { bd = &(((struct bio_map_info *)map_context->ptr)->bmi_bd); DMWARN("Trying different device."); @@ -1450,6 +1455,7 @@ static int mirror_end_io(struct dm_targe DMERR("All replicated volumes dead, failing I/O"); } + out: if (map_context->ptr) mempool_free(map_context->ptr, bio_map_info_pool); spinlock_cleanup-3.patch: diff -purN linux-2.6.9-15.EL-01/drivers/md/dm-raid1.c linux-2.6.9-15.EL-02/drivers/md/dm-raid1.c --- linux-2.6.9-15.EL-01/drivers/md/dm-raid1.c 2005-08-10 10:58:45.888278366 -0500 +++ linux-2.6.9-15.EL-02/drivers/md/dm-raid1.c 2005-08-10 11:21:48.808704093 -0500 @@ -1067,12 +1067,12 @@ static void do_mirror(struct mirror_set { struct bio_list reads, writes; - spin_lock(&ms->lock); + spin_lock_irq(&ms->lock); reads = ms->reads; writes = ms->writes; bio_list_init(&ms->reads); bio_list_init(&ms->writes); - spin_unlock(&ms->lock); + spin_unlock_irq(&ms->lock); rh_update_states(&ms->rh); do_recovery(ms); @@ -1320,14 +1320,15 @@ static void mirror_dtr(struct dm_target static void queue_bio(struct mirror_set *ms, struct bio *bio, int rw) { + unsigned long flags; int should_wake = 0; struct bio_list *bl; bl = (rw == WRITE) ? &ms->writes : &ms->reads; - spin_lock(&ms->lock); + spin_lock_irqsave(&ms->lock, flags); should_wake = !(bl->head); bio_list_add(bl, bio); - spin_unlock(&ms->lock); + spin_unlock_irqrestore(&ms->lock, flags); if (should_wake) wake(); I could probably never be positive that this is an exact repoduction, but it sure seems similar if not the same: Aug 10 09:23:18 link-08 kernel: attempt to access beyond end of device Aug 10 09:23:18 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 Aug 10 09:23:18 link-08 kernel: attempt to access beyond end of device Aug 10 09:23:18 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 Aug 10 09:23:18 link-08 kernel: attempt to access beyond end of device Aug 10 09:23:18 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 Aug 10 09:37:17 link-08 kernel: attempt to access beyond end of device Aug 10 09:37:17 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 Aug 10 09:45:37 link-08 kernel: attempt to access beyond end of device Aug 10 09:45:37 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5): ext3_free_blocks: Freeing blocks not in datazone - bloc k = 2038004089, count = 1 Aug 10 09:45:37 link-08 kernel: Aborting journal on device dm-5. Aug 10 09:45:37 link-08 kernel: ext3_abort called. Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5): ext3_journal_start_sb: Detected aborted journal Aug 10 09:45:37 link-08 kernel: Remounting filesystem read-only Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in ext3_reserve_inode_write: Journal has aborted Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in ext3_reserve_inode_write: Journal has aborted Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in ext3_orphan_del: Journal has aborted Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in ext3_truncate: Journal has aborted Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in ext3_reserve_inode_write: Journal has aborted Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in ext3_dirty_inode: Journal has aborted Aug 10 09:45:37 link-08 kernel: journal commit I/O error Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in start_transaction: Journal has aborted Aug 10 09:45:37 link-08 kernel: journal commit I/O error Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in start_transaction: Journal has aborted Aug 10 09:45:37 link-08 kernel: journal commit I/O error Aug 10 09:45:37 link-08 kernel: EXT3-fs error (device dm-5) in start_transaction: Journal has aborted Aug 10 09:45:48 link-08 kernel: EXT3-fs error (device dm-5): ext3_free_blocks: Freeing blocks not in datazone - bloc k = 2038004089, count = 1 Aug 10 09:45:48 link-08 last message repeated 859 times Aug 10 09:45:48 link-08 kernel: EXT3-fs error (device dm-5) in ext3_reserve_inode_write: Journal has aborted Aug 10 09:45:48 link-08 kernel: EXT3-fs error (device dm-5) in ext3_reserve_inode_write: Journal has aborted Aug 10 09:45:48 link-08 kernel: EXT3-fs error (device dm-5) in ext3_orphan_del: Journal has aborted Aug 10 09:45:48 link-08 kernel: EXT3-fs error (device dm-5) in ext3_truncate: Journal has aborted Aug 10 09:45:48 link-08 kernel: __journal_remove_journal_head: freeing b_committed_data Aug 10 09:46:07 link-08 kernel: attempt to access beyond end of device Aug 10 09:46:07 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 Aug 10 09:46:07 link-08 kernel: attempt to access beyond end of device Aug 10 09:46:07 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 Aug 10 09:46:07 link-08 kernel: attempt to access beyond end of device Aug 10 09:46:07 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 Aug 10 09:46:07 link-08 kernel: attempt to access beyond end of device [...] But curiously no device-mapper error messages in there this time. And UP 2.6.9-15.EL still I presume? That makes it a different problem from the original one. And can you confirm that the *first* error in the log was: Aug 10 09:23:18 link-08 kernel: attempt to access beyond end of device Aug 10 09:23:18 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 If so, next time it happens please dump the configuration of the device corresponding to the error message at the time of the error. dm-5 means the minor number is 5 so something like 'dmsetup table -m 5 -j 253' should do this (if 253 is the dm major number). [If that's difficult to do, you could run 'dmsetup table' in a loop with sleep inbetween capturing timestamped output, so you can work it out after the event.] This was on link-08, a machine with one processor and 2.6.9-15.EL, correct. Yes, that was the very first message as things started south: Aug 10 09:23:18 link-08 kernel: attempt to access beyond end of device Aug 10 09:23:18 link-08 kernel: dm-5: rw=0, want=16304032720, limit=20971520 If this is a diff problem, should I be opening a new bug then? Yes, please open a new bug. This bug is dedicated to the problem that I wasn't handling READA responses correctly. Thus, all the devices would go missing - giving the problems described. The bug you have after the latest patches has nothing to do with the mirror marking all devices as disabled. Rather, for some reason ext[23] is asking for blocks way outside your range... could be corruption, not sure. Submited 165717 for comments #15 - #23. Which means that after applying those patches, this bug has not been seen in about 16 hours of testing. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-514.html |