Bug 852812

Summary: Automatic umount (in theory) of full thinp snapshots (in one of the many cases that exist)
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: agk, coughlan, ddumas, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.98-1.el6 Doc Type: Known Issue
Doc Text:
When filling a thin pool to 100% by writing to thin volume device, access to all thin volumes using this thin pool can be blocked. To prevent this, try not to overfill the pool. If the pool is overfilled and this error occurs, extend the thin pool with new space to continue using the pool.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 08:13:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2012-08-29 16:26:17 UTC
Description of problem:
This function is already available and tested for normal snapshot volumes (bug 189462). It should also be applied to thinp snapshots.


SCENARIO - [full_XFS_snap_verification]
Verify full snapshots on top of XFS are automatically unmounted and can be removed
Making origin volume
Creating thinpool and corresponding thin origin volume
lvcreate --thinpool POOL -L 300M snapper_thinp
lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n origin
Placing an XFS filesystem on origin volume
Making snapshot of origin volume
lvcreate -s /dev/snapper_thinp/origin -n full_snap
Mounting snapshot volume

Filling snapshot /dev/snapper_thinp/full_snap
110000+0 records in
110000+0 records out
56320000 bytes (56 MB) copied, 21.5313 s, 2.6 MB/s
Verify snapshot was auto-unmounted due to corruption
snapshot was not auto-unmounted


Version-Release number of selected component (if applicable):
2.6.32-279.el6.x86_64

lvm2-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-libs-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-cluster-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
cmirror-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012


How reproducible:
Everytime

Comment 1 Zdenek Kabelac 2012-08-29 17:43:49 UTC
The problem here is that thin volumes/snapshots are different - while old snapshot was 'a separate' device which was having its own live - overfilling  thin snapshot currently kills all related thin volumes. Thus setting up enough free space where the thin pool could grow via dmeventd lvextend is key element here. We need to decide what should precisely happen in several scenarios.

We could have either full data device and|or full metadata device.
All thin volumes from one thin pool share the same space in those devices.

If we want to safely unmount device - we need to try unmounting device much sooner then 100% fullness is reported (so all metadata & data could be stored).

Comment 2 Zdenek Kabelac 2012-10-15 17:59:43 UTC
Upstream solution for 6.4 went into 2.02.98 release.

https://www.redhat.com/archives/lvm-devel/2012-October/msg00126.html

System detects when the thin pool goes above lvm.conf value of:

activation/thin_pool_autoextend_threshold

When the lvextends --use-policies fails to extend pool LV (which could be now also tested by setting lvm.conf value of: 

activation/thin_pool_autoextend_percent 

to 0 (and extension by 0% returns error) - dmeventd should try to:

unmount -fl 

all thin volumes using this pool.

For now this check is only based on dmeventd timer - so it make take upto 10 seconds to observer umount reaction.

New bugzilla for 6.5 will be opened to cover further fixes for related problems.

Comment 4 Zdenek Kabelac 2012-10-16 13:15:07 UTC
Bug 866971  should take care about lvm2 thin pool policies.

Comment 5 Corey Marthaler 2012-12-04 00:11:25 UTC
It's now impossible to fill a thinp snapshot. So was this part of the fix? Or is there no longer a need to auto unmount a "full" thin snapshot if filling it is not longer possible?

SCENARIO - [full_EXT_snap_verification]
Verify full snapshots on top of EXT are automatically unmounted and can be removed
Making origin volume
Creating thinpool and corresponding thin virtual volumes (one to be used as an origin)
lvcreate --thinpool POOL -L 1G snapper_thinp
lvcreate --virtualsize 800M --thinpool snapper_thinp/POOL -n origin
Placing an EXT filesystem on origin volume
mke2fs 1.41.12 (17-May-2010)
Making snapshot of origin volume
lvcreate -s /dev/snapper_thinp/origin -n full_snap
Mounting snapshot volume

Filling snapshot /dev/snapper_thinp/full_snap
dd if=/dev/zero of=/mnt/full_snap/fill_file count=801 bs=1M oflag=direct
dd: writing `/mnt/full_snap/fill_file': No space left on device
786+0 records in
785+0 records out
823132160 bytes (823 MB) copied, 16.042 s, 51.3 MB/s
snapshot didn't fill completely up

[root@taft-01 ~]# lvs -a -o +devices
  LV           VG            Attr      LSize   Pool Origin Data%  Devices        
  POOL         snapper_thinp twi-a-tz-   1.00g              78.19 POOL_tdata(0)             
  [POOL_tdata] snapper_thinp Twi-aot--   1.00g                    /dev/sdg1(0)              
  [POOL_tmeta] snapper_thinp ewi-aot--   4.00m                    /dev/sdc1(0)              
  full_snap    snapper_thinp Vwi-aotz- 800.00m POOL origin  99.95                                          
  origin       snapper_thinp Vwi-a-tz- 800.00m POOL          1.65                                          


[root@taft-01 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/snapper_thinp-full_snap
                      788M  787M     0 100% /mnt/full_snap

Comment 6 Zdenek Kabelac 2013-01-22 13:54:02 UTC
From the  lvs   -

pool size is 1G - while  origin and it's snapshot has only 800M size.

Origin has provisioned  1.65% of blocks from 800M.
Its snapshot after 'dd'  provisioned 99.95% from 800M.

This all fits within a pool - which is now being used for 78.19%
(so  ~13.2 + ~799.6  consumed 812.8MB -  since pool is reported
to consume ~800.7  there is around 12MB of shared size)

So to test filling of the snapshot (which is now limited only by the thin pool size)  you need to use smalled pool size and there needs to be no free space for pool extension or extension must by disabled - so you overfill thinpool.

(This is were thin snapshot and old-snapshot differs significantly,
since only thinpool size is now limit of all snaps inside pool - this
may change with 'external-origin' snapshot planned for 6.5.

This leads however to whole new area of problems with pool recovery
after pool being overfilled.

Comment 7 Corey Marthaler 2013-01-25 20:50:37 UTC
I must still be missing something as just like in comment #5, I can fill the file system, but can not fully fill the actual snapshot volume and again, the full file system is not auto unmounted.

Is there actually a devel tested scenario where a full thinp snap volume is auto unmounted? If so please post the results.


SCENARIO - [full_EXT_snap_verification]
Verify full snapshots on top of EXT are automatically unmounted and can be removed
Making origin volume
Creating thinpool and corresponding thin virtual volumes (one to be used as an origin)
lvcreate --thinpool POOL -L 800M snapper_thinp
lvcreate --virtualsize 800M --thinpool snapper_thinp/POOL -n origin
lvcreate --virtualsize 800M --thinpool snapper_thinp/POOL -n other1
lvcreate --virtualsize 800M --thinpool snapper_thinp/POOL -n other2
lvcreate --virtualsize 800M --thinpool snapper_thinp/POOL -n other3
lvcreate --virtualsize 800M --thinpool snapper_thinp/POOL -n other4
lvcreate --virtualsize 800M --thinpool snapper_thinp/POOL -n other5
Placing an EXT filesystem on origin volume
mke2fs 1.41.12 (17-May-2010)
Making snapshot of origin volume
lvcreate -s /dev/snapper_thinp/origin -n full_snap
Mounting snapshot volume

Filling snapshot /dev/snapper_thinp/full_snap
dd if=/dev/zero of=/mnt/full_snap/fill_file count=850 bs=1M oflag=direct

[root@hayes-01 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/snapper_thinp-full_snap
                      788M  788M     0 100% /mnt/full_snap

[root@hayes-01 ~]# lvs -a -o +devices
 LV           Attr      LSize   Pool Origin Data%  Devices
 POOL         twi-a-tz- 800.00m             100.00 POOL_tdata(0)
 [POOL_tdata] Twi-aot-- 800.00m                    /dev/etherd/e1.1p2(0)
 [POOL_tmeta] ewi-aot--   4.00m                    /dev/etherd/e1.1p1(0)
 full_snap    Vwi-aotz- 800.00m POOL origin  99.90
 origin       Vwi-a-tz- 800.00m POOL          1.65
 other1       Vwi-a-tz- 800.00m POOL          0.00
 other2       Vwi-a-tz- 800.00m POOL          0.00
 other3       Vwi-a-tz- 800.00m POOL          0.00
 other4       Vwi-a-tz- 800.00m POOL          0.00
 other5       Vwi-a-tz- 800.00m POOL          0.00

Jan 25 14:19:46 hayes-01 lvm[2516]: Thin snapper_thinp-POOL-tpool is now 80% full.
Jan 25 14:20:55 hayes-01 lvm[2516]: Thin snapper_thinp-POOL-tpool is now 85% full.
Jan 25 14:22:05 hayes-01 lvm[2516]: Thin snapper_thinp-POOL-tpool is now 90% full.
Jan 25 14:23:35 hayes-01 lvm[2516]: Thin snapper_thinp-POOL-tpool is now 95% full.
Jan 25 14:24:31 hayes-01 kernel: device-mapper: thin: 253:5: reached low water mark, sending event.
Jan 25 14:24:31 hayes-01 lvm[2516]: Thin snapper_thinp-POOL-tpool is now 100% full.
Jan 25 14:24:31 hayes-01 kernel: device-mapper: thin: 253:5: no free space available.
Jan 25 14:27:09 hayes-01 kernel: INFO: task dd:2618 blocked for more than 120 seconds.
Jan 25 14:27:09 hayes-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 25 14:27:09 hayes-01 kernel: dd            D 0000000000000002     0  2618   2617 0x00000080
Jan 25 14:27:09 hayes-01 kernel: ffff88011d54fa78 0000000000000082 0000000000000000 ffffffffa00043ec
Jan 25 14:27:09 hayes-01 kernel: ffff88011d54fa48 00000000ed7583d7 0000000000000000 ffff88011cebb540
Jan 25 14:27:09 hayes-01 kernel: ffff88011c3165f8 ffff88011d54ffd8 000000000000fb88 ffff88011c3165f8
Jan 25 14:27:09 hayes-01 kernel: Call Trace:
Jan 25 14:27:09 hayes-01 kernel: [<ffffffffa00043ec>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff8150d9c3>] io_schedule+0x73/0xc0
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff811be84e>] __blockdev_direct_IO_newtrunc+0x6de/0xb30
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff811becfe>] __blockdev_direct_IO+0x5e/0xd0
Jan 25 14:27:09 hayes-01 kernel: [<ffffffffa047c380>] ? ext2_get_block+0x0/0x50 [ext2]
Jan 25 14:27:09 hayes-01 kernel: [<ffffffffa047afde>] ext2_direct_IO+0x5e/0x60 [ext2]
Jan 25 14:27:09 hayes-01 kernel: [<ffffffffa047c380>] ? ext2_get_block+0x0/0x50 [ext2]
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff8111a8f2>] generic_file_direct_write+0xc2/0x190
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff8111c221>] __generic_file_aio_write+0x3a1/0x490
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff810572f0>] ? __dequeue_entity+0x30/0x50
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff8111c398>] generic_file_aio_write+0x88/0x100
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff81180baa>] do_sync_write+0xfa/0x140
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff81510025>] ? page_fault+0x25/0x30
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff81283512>] ? __clear_user+0x42/0x70
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff81228a6b>] ? selinux_file_permission+0xfb/0x150
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff8121b946>] ? security_file_permission+0x16/0x20
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff81180ea8>] vfs_write+0xb8/0x1a0
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff811821ab>] ? fget_light+0x3b/0x90
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff811817a1>] sys_write+0x51/0x90
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff810dc565>] ? __audit_syscall_exit+0x265/0x290
Jan 25 14:27:09 hayes-01 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Comment 8 Zdenek Kabelac 2013-01-28 08:06:04 UTC
Unmounting of thin volumes works only for the case, the pool is tried to be extend, and extension fails. If the thin-pool write is very fast and pool is small, there could be a latency problem to get some reaction and thin-pool gets 100% full. In this moment device driver is stuck and expects more space to be added to the pool to finish in flight-operations (since snapshot is same as normal thin volume this applies also to regular write to a single thin volume in thin-pool.)

So the user should ensure the pool does not get 100% full, and if it does, there is needed some extra spaces to be added to VG (vgextend) and pool is extended (lvextend) to unblock blocked pool.

Comment 18 errata-xmlrpc 2013-02-21 08:13:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html