RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 889368 - LVM RAID: I/O can hang if entire stripe (mirror group) of RAID10 LV is killed while under snapshot
Summary: LVM RAID: I/O can hang if entire stripe (mirror group) of RAID10 LV is kille...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Mikuláš Patočka
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 886658
Blocks: 1217621 1268411
TreeView+ depends on / blocked
 
Reported: 2012-12-20 23:48 UTC by Jonathan Earl Brassow
Modified: 2016-05-10 21:47 UTC (History)
14 users (show)

Fixed In Version: kernel-2.6.32-609.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 886658
Environment:
Last Closed: 2016-05-10 21:47:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
RHEL 6 patch (4.47 KB, patch)
2016-01-09 00:11 UTC, Mikuláš Patočka
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:0855 0 normal SHIPPED_LIVE Moderate: kernel security, bug fix, and enhancement update 2016-05-10 22:43:57 UTC

Comment 2 Jonathan Earl Brassow 2012-12-21 17:49:46 UTC
This bug is a direct result of the way snapshots are handling failed writes to the origin.  Specifically, 'retry_origin_bios' doesn't not allow the failures to propagate - causing I/O to hang indefinitely.

Comment 6 Jonathan Earl Brassow 2014-08-27 03:52:09 UTC
making this a 6.7 discussion.

Comment 10 Jonathan Earl Brassow 2015-10-20 19:52:15 UTC
I don't remember the specifics of the snapshot code, but from comment 2 it is related to 'retry_origin_bios' in dm-snap.c.

You don't need RAID to reproduce this either, any device will do.  In this case, I used stripe as the origin.
[root@bp-01 ~]# lvs vg
  LV     VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  snap   vg   swi-I-s---  50.00g      stripe 100.00
  stripe vg   owi-aos--- 100.00g

Steps to repo:
1) create LV
2) create a snapshot of it.
3) start I/O to the origin (dd)
4) kill a device in the origin
**) the 'dd' will never complete due to indefinite retry of bios.  It should emit errors.

I get a lot of the following messages also:
Oct 20 14:41:10 bp-01 kernel: lost page write due to I/O error on dm-3
Oct 20 14:41:10 bp-01 kernel: Buffer I/O error on device dm-3, logical block 572824
Oct 20 14:41:10 bp-01 kernel: lost page write due to I/O error on dm-3
Oct 20 14:41:10 bp-01 kernel: Buffer I/O error on device dm-3, logical block 572825
Oct 20 14:41:10 bp-01 kernel: lost page write due to I/O error on dm-3

[root@bp-01 ~]# ls -l /dev/vg
total 0
lrwxrwxrwx. 1 root root 7 Oct 20 14:33 snap -> ../dm-6
lrwxrwxrwx. 1 root root 7 Oct 20 14:33 stripe -> ../dm-3

Comment 13 Mikuláš Patočka 2016-01-09 00:09:48 UTC
Upstream patch: https://www.redhat.com/archives/dm-devel/2016-January/msg00090.html

Comment 14 Mikuláš Patočka 2016-01-09 00:11:36 UTC
Created attachment 1113054 [details]
RHEL 6 patch

The patch, backported to RHEL 6

Comment 16 Aristeu Rozanski 2016-01-28 22:16:32 UTC
Patch(es) available on kernel-2.6.32-609.el6

Comment 20 Corey Marthaler 2016-04-13 17:17:58 UTC
Marking verified based on the test case given in comment #10.

2.6.32-639.el6.x86_64
lvm2-2.02.143-7.el6    BUILT: Wed Apr  6 10:08:33 CDT 2016
lvm2-libs-2.02.143-7.el6    BUILT: Wed Apr  6 10:08:33 CDT 2016
lvm2-cluster-2.02.143-7.el6    BUILT: Wed Apr  6 10:08:33 CDT 2016
udev-147-2.72.el6    BUILT: Tue Mar  1 06:14:05 CST 2016
device-mapper-1.02.117-7.el6    BUILT: Wed Apr  6 10:08:33 CDT 2016
device-mapper-libs-1.02.117-7.el6    BUILT: Wed Apr  6 10:08:33 CDT 2016
device-mapper-event-1.02.117-7.el6    BUILT: Wed Apr  6 10:08:33 CDT 2016
device-mapper-event-libs-1.02.117-7.el6    BUILT: Wed Apr  6 10:08:33 CDT 2016
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 08:58:09 CDT 2016
cmirror-2.02.143-7.el6    BUILT: Wed Apr  6 10:08:33 CDT 2016



[root@host-113 ~]# lvcreate -L 20G -i 2 -n stripe vg
  Using default stripesize 64.00 KiB.
  Logical volume "stripe" created.
[root@host-113 ~]# lvcreate -s vg/stripe -n snap -L 5G 
  Logical volume "snap" created.
[root@host-113 ~]# lvs -a -o +devices
  LV      VG    Attr       LSize   Pool Origin Data%  Devices                  
  snap    vg    swi-a-s---   5.00g      stripe 0.00   /dev/sda1(2560)          
  stripe  vg    owi-a-s---  20.00g                    /dev/sda1(0),/dev/sdb1(0)

[root@host-113 ~]# dd if=/dev/urandom of=/dev/vg/stripe  bs=1M count=2000

# Takes awhile, but this does eventually finish well after the device failure
2000+0 records in
2000+0 records out


[root@host-113 ~]# echo offline > /sys/block/sdb/device/state


# Additional writes now either work of fail with an i/o error depending on the size
[root@host-113 ~]# dd if=/dev/urandom of=/dev/vg/stripe count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00153896 s, 3.3 MB/s

[root@host-113 ~]# dd if=/dev/urandom of=/dev/vg/stripe count=1000
dd: writing to `/dev/vg/stripe': Input/output error
137+0 records in
136+0 records out
69632 bytes (70 kB) copied, 0.0250647 s, 2.8 MB/s


[root@host-113 ~]# lvs -a -o +devices
  /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
  /dev/vg/stripe: read failed after 0 of 4096 at 0: Input/output error
  /dev/vg/stripe: read failed after 0 of 4096 at 21474770944: Input/output error
  /dev/vg/stripe: read failed after 0 of 4096 at 21474828288: Input/output error
  /dev/vg/snap: read failed after 0 of 4096 at 0: Input/output error
  /dev/vg/snap: read failed after 0 of 4096 at 21474770944: Input/output error
  /dev/vg/snap: read failed after 0 of 4096 at 21474828288: Input/output error
  /dev/vg/snap: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sdb1: read failed after 0 of 4096 at 26838958080: Input/output error
  /dev/sdb1: read failed after 0 of 4096 at 26839048192: Input/output error
  /dev/sdb1: read failed after 0 of 4096 at 4096: Input/output error
  Couldn't find device with uuid q6iRyy-YT4M-kqdr-2qZW-oR4u-f19u-lAX6gQ.
  Couldn't find device for segment belonging to vg/stripe while checking used and assumed devices.
  LV      VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                       
  snap    vg         swi-I-s---   5.00g      stripe 100.00                                  /dev/sda1(2560)               
  stripe  vg         owi-aos-p-  20.00g                                                     /dev/sda1(0),unknown device(0)

Comment 22 errata-xmlrpc 2016-05-10 21:47:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-0855.html


Note You need to log in before you can comment on or make changes to this bug.