Bug 1267042

Summary: XFS needs to better handle EIO and ENOSPC
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: kernelAssignee: Carlos Maiolino <cmaiolin>
kernel sub component: XFS QA Contact: Zorro Lang <zlang>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: high CC: cmaiolin, dshaw, eguan, esandeen, jaeshin, msnitzer, swhiteho, vgoyal, vikumar, wevans, xzhou
Version: 7.2   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.0-428.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 12:33:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1113511, 1203710, 1295577, 1313485    
Attachments:
Description Flags
dmesg none

Description Corey Marthaler 2015-09-28 23:11:43 UTC
Description of problem:

host-110.virt.lab.msp.redhat.com: pvcreate /dev/sdc1 /dev/sda1 /dev/sdd1 /dev/sde1 /dev/sdf1
host-110.virt.lab.msp.redhat.com: vgcreate  snapper_thinp /dev/sdc1 /dev/sda1 /dev/sdd1 /dev/sde1 /dev/sdf1

SCENARIO - [full_XFS_thin_snap_verification_threshold_off]
Verify full snapshots on top of XFS can be unmounted and removed
Making pool volume
lvcreate  --thinpool POOL  --zero n -L 1G --poolmetadatasize 4M snapper_thinp

Sanity checking pool device (POOL) metadata
examining superblock
examining devices tree
examining mapping tree
checking space map counts

Making origin volume
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n origin
  WARNING: Sum of all thin volume sizes (2.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  -V 2G -T snapper_thinp/POOL -n other1
  WARNING: Sum of all thin volume sizes (4.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  -V 2G -T snapper_thinp/POOL -n other2
  WARNING: Sum of all thin volume sizes (6.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other3
  WARNING: Sum of all thin volume sizes (8.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other4
  WARNING: Sum of all thin volume sizes (10.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other5
  WARNING: Sum of all thin volume sizes (12.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
Placing an XFS filesystem on origin volume
Making snapshot of origin volume
lvcreate  -k n -s /dev/snapper_thinp/origin -n full_snap
Mounting snapshot volume

Filling snapshot /dev/snapper_thinp/full_snap
dd if=/dev/zero of=/mnt/full_snap/fill_file count=1500 bs=1M oflag=direct
dd: error writing ‘/mnt/full_snap/fill_file’: Input/output error
1011+0 records in
1010+0 records out
1059061760 bytes (1.1 GB) copied, 82.4476 s, 12.8 MB/s

Attempt to create other virtual volumes while pool is full and in RO mode
lvcreate  -V 1G -T snapper_thinp/POOL -n other6
  WARNING: Sum of all thin volume sizes (15.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
  Free space in pool snapper_thinp/POOL is above threshold, new volumes are not allowed.

Extend the pool to kick it back into RW mode
lvextend -L +2G snapper_thinp/POOL
  WARNING: Sum of all thin volume sizes (14.00 GiB) exceeds the size of thin pools (1.00 GiB)!
Again, attempt to create other virtual volumes while pool is full back in RW mode
lvcreate  -V 1G -T snapper_thinp/POOL -n other7
  WARNING: Sum of all thin volume sizes (15.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (3.00 GiB)!
umount /mnt/full_snap
[deadlock]


[  497.883690] XFS (dm-12): Unmounting Filesystem
[  720.513158] INFO: task umount:4862 blocked for more than 120 seconds.
[  720.514625] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.516349] umount          D ffff88003ce55610     0  4862   4860 0x00000080
[  720.517961]  ffff88003a853d90 0000000000000086 ffff8800367cae00 ffff88003a853fd8
[  720.519741]  ffff88003a853fd8 ffff88003a853fd8 ffff8800367cae00 ffff88003ce55600
[  720.521509]  ffff880036c8ce40 ffff88003ce55640 ffff88003ce55668 ffff88003ce55610
[  720.523278] Call Trace:
[  720.523817]  [<ffffffff81638e49>] schedule+0x29/0x70
[  720.524977]  [<ffffffffa0154b61>] xfs_ail_push_all_sync+0xc1/0x110 [xfs]
[  720.526472]  [<ffffffff810a5aa0>] ? wake_up_atomic_t+0x30/0x30
[  720.527792]  [<ffffffffa013faa8>] xfs_unmountfs+0x68/0x190 [xfs]
[  720.529166]  [<ffffffffa014058b>] ? xfs_mru_cache_destroy+0x6b/0x90 [xfs]
[  720.530679]  [<ffffffffa0142962>] xfs_fs_put_super+0x32/0x90 [xfs]
[  720.532051]  [<ffffffff811df5a6>] generic_shutdown_super+0x56/0xe0
[  720.533421]  [<ffffffff811df9f7>] kill_block_super+0x27/0x70
[  720.534673]  [<ffffffff811dfd39>] deactivate_locked_super+0x49/0x60
[  720.536057]  [<ffffffff811e0336>] deactivate_super+0x46/0x60
[  720.537319]  [<ffffffff811fd205>] mntput_no_expire+0xc5/0x120
[  720.538583]  [<ffffffff811fe33f>] SyS_umount+0x9f/0x3c0
[  720.539746]  [<ffffffff81643e09>] system_call_fastpath+0x16/0x1b



Version-Release number of selected component (if applicable):
3.10.0-313.el7.x86_64

lvm2-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-libs-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-cluster-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015

How reproducible:
Everytime

Comment 1 Corey Marthaler 2015-09-28 23:13:55 UTC
Created attachment 1078082 [details]
dmesg

Comment 3 Carlos Maiolino 2016-03-14 13:25:58 UTC
XFS until now, can't gracefully handle thin provisioned volumes, and can't react accordingly to errors reported by dm-thin.

In the case reported here, there were not enough space free in the thin volume for xfs to writeback changes in its AIL, and currently xfs keeps trying to writeback it forever.

There is a patchset from Dave, to fix this problem, already being discussed upstream.

Comment 4 Carlos Maiolino 2016-04-28 11:09:07 UTC
*** Bug 1326954 has been marked as a duplicate of this bug. ***

Comment 5 Carlos Maiolino 2016-04-28 11:17:46 UTC
*** Bug 1318400 has been marked as a duplicate of this bug. ***

Comment 12 Rafael Aquini 2016-06-08 23:48:59 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 14 Rafael Aquini 2016-06-10 14:49:59 UTC
Patch(es) available on kernel-3.10.0-428.el7

Comment 26 Eric Sandeen 2016-07-21 23:29:19 UTC
*** Bug 1103792 has been marked as a duplicate of this bug. ***

Comment 28 errata-xmlrpc 2016-11-03 12:33:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2574.html

Comment 29 Corey Marthaler 2017-05-31 15:35:01 UTC
I'm still seeing the same test case i used in comment #0 fail in the exact same way. Was something else fixed/verified? Is the following test case still expected to fail, or should this no longer deadlock?


3.10.0-672.el7.x86_64

lvm2-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
lvm2-libs-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
lvm2-cluster-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-libs-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-event-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-event-libs-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017


SCENARIO - [full_XFS_thin_snap_verification_threshold_off]
Verify full snapshots on top of XFS can be unmounted and removed
Making pool volume
lvcreate  --thinpool POOL -L 1G --profile thin-performance --zero y --poolmetadatasize 4M snapper_thinp

Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap.547
examining superblock
examining devices tree
examining mapping tree
checking space map counts


Making origin volume
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n origin
  WARNING: Sum of all thin volume sizes (2.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  -V 2G -T snapper_thinp/POOL -n other1
  WARNING: Sum of all thin volume sizes (4.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other2
  WARNING: Sum of all thin volume sizes (6.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  -V 2G -T snapper_thinp/POOL -n other3
  WARNING: Sum of all thin volume sizes (8.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  -V 2G -T snapper_thinp/POOL -n other4
  WARNING: Sum of all thin volume sizes (10.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  -V 2G -T snapper_thinp/POOL -n other5
  WARNING: Sum of all thin volume sizes (12.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
Placing an XFS filesystem on origin volume
Making snapshot of origin volume

lvcreate  -k n -s /dev/snapper_thinp/origin -n full_snap
Mounting snapshot volume

Filling snapshot /dev/snapper_thinp/full_snap
dd if=/dev/zero of=/mnt/full_snap/fill_file count=1500 bs=1M oflag=direct
dd: error writing ‘/mnt/full_snap/fill_file’: Input/output error
1012+0 records in
1011+0 records out
1060110336 bytes (1.1 GB) copied, 70.604 s, 15.0 MB/s

Attempt to create other virtual volumes while pool is full and in RO mode
lvcreate  -V 1G -T snapper_thinp/POOL -n other6
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
Extend the pool to kick it back into RW mode
lvextend -L +2G snapper_thinp/POOL
  WARNING: Sum of all thin volume sizes (14.00 GiB) exceeds the size of thin pools (3.00 GiB)!
Again, attempt to create other virtual volumes while pool is full back in RW mode
lvcreate  -V 1G -T snapper_thinp/POOL -n other7
  WARNING: Sum of all thin volume sizes (15.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (3.00 GiB)!

Attempt umount...
[ deadlock ]

[root@host-073 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool Origin Data%  Meta%  Devices
  POOL            snapper_thinp twi-aotz--   3.00g             33.34  14.55  POOL_tdata(0)
  [POOL_tdata]    snapper_thinp Twi-ao----   3.00g                           /dev/sdc1(1)
  [POOL_tmeta]    snapper_thinp ewi-ao----   4.00m                           /dev/sde1(0)
  full_snap       snapper_thinp Vwi-aotz--   2.00g POOL origin 49.90
  [lvol0_pmspare] snapper_thinp ewi-------   4.00m                           /dev/sdc1(0)
  origin          snapper_thinp Vwi-a-tz--   2.00g POOL        0.52
  other1          snapper_thinp Vwi-a-tz--   2.00g POOL        0.00
  other2          snapper_thinp Vwi-a-tz--   2.00g POOL        0.00
  other3          snapper_thinp Vwi-a-tz--   2.00g POOL        0.00
  other4          snapper_thinp Vwi-a-tz--   2.00g POOL        0.00
  other5          snapper_thinp Vwi-a-tz--   2.00g POOL        0.00
  other7          snapper_thinp Vwi-a-tz--   1.00g POOL        0.00


May 31 10:21:10 host-073 kernel: XFS (dm-12): Unmounting Filesystem
May 31 10:22:30 host-073 systemd: Starting Cleanup of Temporary Directories...
May 31 10:22:30 host-073 systemd: Started Cleanup of Temporary Directories.
May 31 10:25:08 host-073 kernel: INFO: task umount:4181 blocked for more than 120 seconds.
May 31 10:25:08 host-073 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 10:25:08 host-073 kernel: umount          D ffff95ec3bb7f310     0  4181   4180 0x00000080
May 31 10:25:08 host-073 kernel: ffff95ec3baefd78 0000000000000082 ffff95ec369e6dd0 ffff95ec3baeffd8
May 31 10:25:08 host-073 kernel: ffff95ec3baeffd8 ffff95ec3baeffd8 ffff95ec369e6dd0 ffff95ec3bb7f300
May 31 10:25:08 host-073 kernel: ffff95ec3bb7f340 ffff95ec26668460 ffff95ec3bb7f368 ffff95ec3bb7f310
May 31 10:25:08 host-073 kernel: Call Trace:
May 31 10:25:08 host-073 kernel: [<ffffffff930a5709>] schedule+0x29/0x70
May 31 10:25:08 host-073 kernel: [<ffffffffc041a2ba>] xfs_ail_push_all_sync+0xba/0x110 [xfs]
May 31 10:25:08 host-073 kernel: [<ffffffff92ab18e0>] ? wake_up_atomic_t+0x30/0x30
May 31 10:25:08 host-073 kernel: [<ffffffffc04032b1>] xfs_unmountfs+0x71/0x1c0 [xfs]
May 31 10:25:08 host-073 kernel: [<ffffffffc0403dbd>] ? xfs_mru_cache_destroy+0x6d/0xa0 [xfs]
May 31 10:25:08 host-073 kernel: [<ffffffffc0405e62>] xfs_fs_put_super+0x32/0x90 [xfs]
May 31 10:25:08 host-073 kernel: [<ffffffff92c029b2>] generic_shutdown_super+0x72/0x100
May 31 10:25:08 host-073 kernel: [<ffffffff92c02df7>] kill_block_super+0x27/0x70
May 31 10:25:08 host-073 kernel: [<ffffffff92c03139>] deactivate_locked_super+0x49/0x60
May 31 10:25:08 host-073 kernel: [<ffffffff92c03746>] deactivate_super+0x46/0x60
May 31 10:25:08 host-073 kernel: [<ffffffff92c2064f>] cleanup_mnt+0x3f/0x80
May 31 10:25:08 host-073 kernel: [<ffffffff92c206e2>] __cleanup_mnt+0x12/0x20
May 31 10:25:08 host-073 kernel: [<ffffffff92aad1a5>] task_work_run+0xc5/0xf0
May 31 10:25:08 host-073 kernel: [<ffffffff92a2ab62>] do_notify_resume+0x92/0xb0
May 31 10:25:08 host-073 kernel: [<ffffffff930b14bd>] int_signal+0x12/0x17
May 31 10:27:08 host-073 kernel: INFO: task umount:4181 blocked for more than 120 seconds.
May 31 10:27:08 host-073 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 10:27:08 host-073 kernel: umount          D ffff95ec3bb7f310     0  4181   4180 0x00000080
May 31 10:27:08 host-073 kernel: ffff95ec3baefd78 0000000000000082 ffff95ec369e6dd0 ffff95ec3baeffd8
May 31 10:27:08 host-073 kernel: ffff95ec3baeffd8 ffff95ec3baeffd8 ffff95ec369e6dd0 ffff95ec3bb7f300
May 31 10:27:08 host-073 kernel: ffff95ec3bb7f340 ffff95ec26668460 ffff95ec3bb7f368 ffff95ec3bb7f310
May 31 10:27:08 host-073 kernel: Call Trace:
May 31 10:27:08 host-073 kernel: [<ffffffff930a5709>] schedule+0x29/0x70
May 31 10:27:08 host-073 kernel: [<ffffffffc041a2ba>] xfs_ail_push_all_sync+0xba/0x110 [xfs]
May 31 10:27:08 host-073 kernel: [<ffffffff92ab18e0>] ? wake_up_atomic_t+0x30/0x30
May 31 10:27:08 host-073 kernel: [<ffffffffc04032b1>] xfs_unmountfs+0x71/0x1c0 [xfs]
May 31 10:27:08 host-073 kernel: [<ffffffffc0403dbd>] ? xfs_mru_cache_destroy+0x6d/0xa0 [xfs]
May 31 10:27:08 host-073 kernel: [<ffffffffc0405e62>] xfs_fs_put_super+0x32/0x90 [xfs]
May 31 10:27:08 host-073 kernel: [<ffffffff92c029b2>] generic_shutdown_super+0x72/0x100
May 31 10:27:08 host-073 kernel: [<ffffffff92c02df7>] kill_block_super+0x27/0x70
May 31 10:27:08 host-073 kernel: [<ffffffff92c03139>] deactivate_locked_super+0x49/0x60
May 31 10:27:08 host-073 kernel: [<ffffffff92c03746>] deactivate_super+0x46/0x60
May 31 10:27:08 host-073 kernel: [<ffffffff92c2064f>] cleanup_mnt+0x3f/0x80
May 31 10:27:08 host-073 kernel: [<ffffffff92c206e2>] __cleanup_mnt+0x12/0x20
May 31 10:27:08 host-073 kernel: [<ffffffff92aad1a5>] task_work_run+0xc5/0xf0
May 31 10:27:08 host-073 kernel: [<ffffffff92a2ab62>] do_notify_resume+0x92/0xb0
May 31 10:27:08 host-073 kernel: [<ffffffff930b14bd>] int_signal+0x12/0x17

Comment 30 Corey Marthaler 2017-05-31 22:39:09 UTC
I tried this test case with /sys/fs/xfs/$dm/error/fail_at_unmount set to both 1 and 0 and it deadlocked both times. Am i missing how to properly set fail_at_unmount=1 like is mentioned in comment #23? Or should a 7.4 version of this bug be filed?

Comment 31 Eric Sandeen 2017-05-31 23:00:16 UTC
Hi Corey - I think Carlos is working on some remaining kinks in the error handling - I'll let him chime in on this one...

Comment 32 Carlos Maiolino 2017-06-01 09:16:45 UTC
Hi Corey,

please look at BZ: 1352385.

This BZ was used for the error handlers implementation. The problem with unmount hangs is actually just a symptom of the real problem. BZ 1352385 has been created to track that, although most of the discussion is happening upstream.