Bug 1121736 - docker should clear the 'needs-check' flag on a thin-pool transparently.
Summary: docker should clear the 'needs-check' flag on a thin-pool transparently.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.0
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Vivek Goyal
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1165246
Blocks: 1070851 1113141 1133060
TreeView+ depends on / blocked
 
Reported: 2014-07-21 18:29 UTC by Karl Hastings
Modified: 2019-04-16 14:14 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1165246 (view as bug list)
Environment:
Last Closed: 2015-03-18 21:52:01 UTC


Attachments (Terms of Use)
[PATCH] devmapper: Add option for specifying an lvm2 created thin-pool device (3.79 KB, patch)
2014-10-25 00:49 UTC, Mike Snitzer
no flags Details | Diff

Description Karl Hastings 2014-07-21 18:29:31 UTC
Description of problem:
If any io issues occur with docker's thin-pool, the kernel will flag it as 'needs-check' and set it to read-only.

Docker's error messages aren't very clear as to what the issue is here, and it seems that lvm2 is able to handle this situation transparently.

Version-Release number of selected component (if applicable):
docker-0.11.1-19.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. While docker is running fill up the /var partition
2. writes to the thin-pool will error, and the thin-pool will get flagged read-only
3. free up space on /var and restart the docker daemon

Actual results:
thin-pool is still flagged as read-only

Expected results:
needs-check flag should be cleared transparently and the thin pool should be read/write.

Additional info:
# docker run rhel7 echo "hello world"
2014/07/09 20:20:54 Error: Error running DeviceCreate (createSnapDevice) dm_task_run failed
# dmsetup_status 
docker-253:3-276151140-pool: 0 209715200 thin-pool 474 1973/524288 144182/1638400 - ro discard_passdown queue_if_no_space
# systemctl stop docker.service
# # thin_check --clear-needs-check-flag /var/lib/docker/devicemapper/devicemapper/metadata
examining superblock
examining devices tree
examining mapping tree
# systemctl start docker.service
# dmsetup status
docker-253:1-33555176-pool: 0 209715200 thin-pool 34 279/524288 7495/1638400 - rw discard_passdown queue_if_no_space 
# docker run rhel7 echo "hello world"
hello world

Comment 2 Daniel Walsh 2014-09-12 19:21:45 UTC
Can you verify the failure still exists with docker-1.2

Comment 8 Daniel Walsh 2014-09-17 17:59:07 UTC
Mike any chance you could submit a patch?

Comment 9 Mike Snitzer 2014-09-17 22:50:13 UTC
(In reply to Daniel Walsh from comment #8)
> Mike any chance you could submit a patch?

As much as I'd love to see this issue resolved ASAP I unfortunately cannot make implementing a fix for this BZ a priority.  Working on Docker (with the Go language in particular) is a pretty awful context switch from my primary duties as upstream Device Mapper maintainer -- especially with the Linux 3.18 merge window approaching.

If Alex cannot handle this then I think we _really_ need to hire another Docker developer with storage expertise.

Comment 10 Matthias Clasen 2014-09-30 14:25:58 UTC
moving docker bugs off alexl

Comment 11 Daniel Walsh 2014-10-24 17:59:11 UTC
Michal want to take a stab at this one?

Comment 12 Mike Snitzer 2014-10-24 22:37:09 UTC
In speaking with Tom Coughlan about this issue I arrived at the idea of using a hyrbid model for how to support thin-provisioning in docker.  The primary goals being:
1) leverage all the enterprise-oriented thin-pool functionality lvm2 provides, e.g.:
   - automatic monitoring and resizing of the thin-pool's metadata and data areas
   - automatic checking of the metadata and repair if the needs_check flag needs clearing
2) allow docker to continue to create/destroy dm thin LVs and snapshot thin LVs like it already does -- translation: minimalist changes to docker

So the idea I hashed out while discussing with Tom is that:
1) lvm2 would be used to create the thin-pool
2) a new docker --storage-opt dm.thinpooldev=<device> will be added to hand the lvm2 created thin-pool to docker for it's exclusive use

This allows for the best of both worlds: a feature rich thin-pool management/monitoring interface and a docker that doesn't need an intrusive dm-thin-provisioning focused re-write to use lvm2.

I'll see if I can implement the small docker patch needed to realize this vision as a POC.  And will defer to others on hardening the docker code in this area (e.g. make sure docker has negative checks for conflicting --storage-opt args, etc).

Comment 13 Mike Snitzer 2014-10-25 00:47:35 UTC
// show existing data and metadata LVs in the 'thin' volume group:

# lvs thin
  LV       VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  metadata thin -wi-------  16.00g                                                    
  pool     thin -wi------- 100.00g                                                    

// create the lvm2 thin-pool using the existing LVs:

# lvconvert --chunksize 64K --thinpool thin/pool --poolmetadata thin/metadata
  WARNING: Converting logical volume thin/pool and thin/metadata to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Do you really want to convert thin/pool and thin/metadata? [y/n]: y
  Logical volume "lvol0" created
  Converted thin/pool to thin pool.

// activate the pool:

# vgchange -ay thin
Oct 24 20:18:44 rhel-storage-02 systemd: Starting Device-mapper event daemon...
Oct 24 20:18:44 rhel-storage-02 systemd: Started Device-mapper event daemon.
Oct 24 20:18:44 rhel-storage-02 dmeventd[27164]: dmeventd ready for processing.
  1 logical volume(s) in volume group "thin" now active
Oct 24 20:18:44 rhel-storage-02 lvm[27164]: Monitoring thin thin-pool-tpool.

# dmsetup table thin-pool-tpool
0 209715200 thin-pool 253:2 253:3 128 0 0 

// get docker setup to use this lvm2 created pool:

Modify /usr/lib/systemd/system/docker.service's ExecStart:

ExecStart=/usr/bin/docker -d --selinux-enabled -H fd:// --storage-opt dm.basesize=512G --storage-opt dm.thinpooldev=thin-pool-tpool --storage-opt dm.fs=xfs

# systemctl daemon-reload
# systemctl start docker

// docker created the base thin volume:

# dmsetup table --target thin-pool
thin-pool-tpool: 0 209715200 thin-pool 253:2 253:3 128 0 0 
# dmsetup table --target thin
docker-253:1-1476395215-base: 0 1073741824 thin 253:7 0

# dmsetup ls --tree
docker-253:1-1476395215-base (253:9)
 └─thin-pool-tpool (253:7)
    ├─thin-pool_tdata (253:3)
    │  └─ (8:16)
    └─thin-pool_tmeta (253:2)
       └─ (8:16)

// Looks like everything worked... here are some tests:

# docker run busybox df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/docker-253:1-1476395215-3f68936c5292009e9eba2cdb6eda87607156b82540901a2265eccf706b89f57b
                     536607744     36172 536571572   0% /
tmpfs                  6065208         0   6065208   0% /dev
shm                      65536         0     65536   0% /dev/shm
/dev/mapper/rhel_rhel--storage--02-root
                     366976000  41656580 325319420  11% /etc/resolv.conf
/dev/mapper/rhel_rhel--storage--02-root
                     366976000  41656580 325319420  11% /etc/hostname
/dev/mapper/rhel_rhel--storage--02-root
                     366976000  41656580 325319420  11% /etc/hosts
tmpfs                  6065208        16   6065192   0% /run/secrets
tmpfs                  6065208         0   6065208   0% /proc/kcore


// Here I'm loading a custom container image I got from jeder
# cat r7perf_150.img | docker load
# docker run -d r7perf_150 /root/read.sh 1GB
ac71431c2c26bf3ebfe8456e9790d4dcce6edf8a3a2c81fee65d678a71609ad3
# docker run -d r7perf_150 /root/read.sh 1GB
adf8b8816221db1a215e735a75883e6953a7d174f4ba0fb6a3c50693dc65b784
# docker run -d r7perf_150 /root/read.sh 1GB
b9d1dddac0e12fa2db1db7e5d094e00a7dff1b6945217845dcda12024bc2578b

// you can clearly see that docker is creating the devices for the containers from lvm2-created thin-pool:

# dmsetup table --target thin-pool
thin-pool-tpool: 0 209715200 thin-pool 253:2 253:3 128 0 0 
# dmsetup table --target thin
docker-253:1-1476395215-adf8b8816221db1a215e735a75883e6953a7d174f4ba0fb6a3c50693dc65b784: 0 1073741824 thin 253:7 30
docker-253:1-1476395215-ac71431c2c26bf3ebfe8456e9790d4dcce6edf8a3a2c81fee65d678a71609ad3: 0 1073741824 thin 253:7 28
docker-253:1-1476395215-base: 0 1073741824 thin 253:7 0
docker-253:1-1476395215-b9d1dddac0e12fa2db1db7e5d094e00a7dff1b6945217845dcda12024bc2578b: 0 1073741824 thin 253:7 32

Comment 14 Mike Snitzer 2014-10-25 00:49:03 UTC
Created attachment 950563 [details]
[PATCH] devmapper: Add option for specifying an lvm2 created  thin-pool device

Docker will not activate/deactivate the specified thin-pool device but
it will exclusively manage/create thin and snapshot thin volumes in it.

Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)

Comment 15 Mike Snitzer 2014-10-25 01:09:00 UTC
Continuing example from comment#13:

// lvm can easily see the space getting consumed by docker:

# lvs thin
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  pool thin twi-a-tz-- 100.00g             6.01   0.03                            

// BUT there is further work needed to get lvm2 to work with docker's pool changes, e.g.:
// grow the thin-pool's data area while docker is using it:

# lvextend -L+10G thin/pool
  Size of logical volume thin/pool_tdata changed from 100.00 GiB (25600 extents) to 110.00 GiB (28160 extents).
  Thin pool transaction_id is 33, while expected 0.
  Problem reactivating pool
  Releasing activation in critical section.
  libdevmapper exiting with 2 device(s) still suspended.

# lvs thin
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  pool thin twi-s-tz-- 110.00g             5.46   0.03                            

You can clearly see above that docker has been bumping the transaction id and lvm2 didn't like it.  Resulting in "Problem reactivating pool".

# dmsetup info thin-pool
Name:              thin-pool
State:             SUSPENDED
Read Ahead:        8192
Tables present:    LIVE & INACTIVE
Open count:        0
Event number:      0
Major, minor:      253, 8
Number of targets: 1
UUID: LVM-as1MecIoHZfYEWb81GI6Semauq8MSn2hN2IQVhHg83H4o8KS1In3IGy8hr5JeFqa-pool

# dmsetup table thin-pool
0 209715200 linear 253:7 0
# dmsetup table --inactive thin-pool
0 230686720 linear 253:7 0

Comment 16 Mike Snitzer 2014-10-25 01:28:16 UTC
So while the POC patch from comment#14 shows docker can happily be handed an lvm2-created thin-pool... things get rocky once lvm2 is asked to perform operations against the thin-pool.

So docker and lvm2 need to coordinate operations (Alasdair suggested maybe using a shared flock between lvm2 and docker, e.g.:  --storage-opt dm.filelock=/var/lock/lvm/V_vgname).

BUT there are a lot of details to get worked out.  Not sure there is a viable way forward here after all (despite docker already using libdm).

Time will tell, to be continued next week.

Comment 17 Zdenek Kabelac 2014-10-25 13:16:04 UTC
I wanted to actually 'optimize' thin-pool operations with pools there are not used by any thin-volume.   

I don't think it's good idea to support thin-pool used outside of lvm2 scope. thin-pool volume are purely meant to be used by lvm2 thin volume as the are technically 'internal' lvm2 volumes - and their 'user' visibility is rather to allow complex construction of data & metadata volumes.

Is there any 'real' issue why docker should be sending 'dm messages' instead of using lvm2 commands?

Comment 18 Mike Snitzer 2014-10-25 14:49:50 UTC
(In reply to Zdenek Kabelac from comment #17)
> I wanted to actually 'optimize' thin-pool operations with pools there are
> not used by any thin-volume.   
> 
> I don't think it's good idea to support thin-pool used outside of lvm2
> scope. thin-pool volume are purely meant to be used by lvm2 thin volume as
> the are technically 'internal' lvm2 volumes - and their 'user' visibility is
> rather to allow complex construction of data & metadata volumes.

In an ideal world we'd have an effective liblvm that docker could use.  Unfortunately we don't have that and won't for some time.  The idea from comment#12, and the POC that followed, was more to do with arriving at a quick solution that enables docker to get the most out of lvm2's more advanced thin-pool management capabilities _but_ still support quick container creation.

> Is there any 'real' issue why docker should be sending 'dm messages' instead
> of using lvm2 commands?

Yes, docker is able to more efficiently start many more containers at once by using libdm based messages to create its thin snapshot volumes (one per container).

Comment 19 Zdenek Kabelac 2014-10-25 20:12:17 UTC
(In reply to Mike Snitzer from comment #18)
> (In reply to Zdenek Kabelac from comment #17)
> > I wanted to actually 'optimize' thin-pool operations with pools there are
> > not used by any thin-volume.   
> > 
> > I don't think it's good idea to support thin-pool used outside of lvm2
> > scope. thin-pool volume are purely meant to be used by lvm2 thin volume as
> > the are technically 'internal' lvm2 volumes - and their 'user' visibility is
> > rather to allow complex construction of data & metadata volumes.
> 
> In an ideal world we'd have an effective liblvm that docker could use. 
> Unfortunately we don't have that and won't for some time.  The idea from
> comment#12, and the POC that followed, was more to do with arriving at a
> quick solution that enables docker to get the most out of lvm2's more
> advanced thin-pool management capabilities _but_ still support quick
> container creation.

A lot of internals inside lvm code is simply not counting with the idea someone would use subvolumes.

Technicallyu there is not a big benefit for docker to use purely thin-pool devices - if docker code is skilled enough to create thin volumes - it's likely pretty easy to just build 2 volumes for data & metadata.

The major logic is behind creation of thin volumes and maintaining synchronous state of kernel & user space and handling some basic repair operation and slowly add those more advanced.  If docker is not using those - there is not much point IMHO to use the rest  - it would be probably mostly equivalent to use  plain  data LV and metadata LV if docker still wants leave out basic volume management out of them.

> 
> > Is there any 'real' issue why docker should be sending 'dm messages' instead
> > of using lvm2 commands?
> 
> Yes, docker is able to more efficiently start many more containers at once
> by using libdm based messages to create its thin snapshot volumes (one per
> container).

Now that's more interesting - of course lvm2 is not super efficient in terms of spreading 10000 volumes in a second.  But there are several ways how to reach better speed in 'specific' loads.

User may ask for creation of inactive thins (lvcreate -an) - and then activate them in a single command (vgchange -ay) - which avoids the 'slowest' part of this element which is udev synchronization.

I'm just wondering how docker deals with udev anyway - since the synchronization with udev is another complex part of lvm2 code - maybe they don't and that's why docker believes it's much more efficient?

Would be good to put here in some numbers - 

What speed if device creation docker expects ?
What current lvm2 gives and what is the 'good' value for docker to start to use lvm2 ?
Is docker actually dealing with udev at all ?
(lvm2 could run with disabled (--noudevsync) udev sync - normally ~90% of volume creation time is just wait on udev to know device is ready in /dev and has all the links)

Comment 20 Mike Snitzer 2014-10-26 00:50:48 UTC
Kabi, docker needs lvm2's more capable thin-pool management (as I noted in comment#12: particularly the auto-resizing and the metadata checking on activation).  Just because it isn't convenient for you to have lvm2 support a standalone thin-pool that docker then consumes doesn't mean it shouldn't be supported.

Fact is, all the thin-pool resizing operations I'm asking for lvm2 to support in this hybrid setup can be supported in the kernel without requiring userspace to be too complex.  lvm2 _should_ be able to easily handle this hybrid setup.  

After all, I'm just wanting lvm2 to handle:
- resizing the thin-pool (metadata or data) when the user asks
- resizing the thin-pool (metadata or data) when lvm2 monitoring suggests it is warranted
- checking/repairing metadata on activation
- re-configuring thin-pool features (e.g. disable block zeroing)

And docker would handle:
- creating/deleting thin volumes
- creating/deleting thin snapshot volumes

We need lvm2 to be more tolerant of things it doesn't quite understand (e.g. unexpected transaction_id).  In all likelihod the only reason lvm2 blew up (like I showed in comment#15) is because it overreacted to the transaction_id not being what it expected.

Alasdair said that maybe docker needs to use the same vg flock as lvm2 for synchronization of operations.. thinking further I see no obvious reason why that'd need to be the case.  docker isn't _ever_ going to resize the data or metadata volumes in this setup; nor is it ever going to activate the thin-pool (lvm2 will always do those things).  There should _not_ be any overlap between functions performed by lvm2 vs docker in this hybrid setup -- and the kernel (thin-pool DM target) should be able to handle concurrent operations by both lvm2 and docker e.g.: a resize occurring (via lvm2) at the same time a snapshot or thin is created/deleted (via docker) just through the normal locking in the thin-pool target.

In short: please have an open-mind and help make this work.

Comment 21 Mike Snitzer 2014-10-26 14:59:02 UTC
lvm2 doesn't increment the thin-pool metadata's transaction_id when resizing either the thin-pool's data or metadata volumes.  It also doesn't increment when reconfiguring the the thin-pool's features.

It also doesn't modify the transaction_id on normal activation/deactivation of the thin-pool.  AFAIK the transaction_id might only get modified as a side-effect of lvm2 running the thin_check + thin_repair.

So basically all we need is for lvm2 to be tolerant of a transaction_id that is larger than it expects IFF the pool is created with a new flag, e.g.:
lvconvert --standalone-pool --chunksize 64K --thinpool thin/pool --poolmetadata thin/metadata

If --standalone-pool (or some better named flag) is specified then lvm2's transaction_id checks should either be relaxed to tolerate transaction_id greater than expected or, more likely, disabled entirely.

Docker already manages the thin-pool metadata's transaction_id when it creates/deletes thin/snapshot volumes.

Comment 22 Zdenek Kabelac 2014-10-26 15:06:17 UTC
I'd propose better idea - to introduce new  type -

standalone-pool  (let's pick name here)
(docker-pool, ...)

So thin-pool will remain to be dedicated for lvm and thin volumes.

Then this 'standalone' pool will get separate behavior with supported behavior for functions docker needs.

Creation is then like:

lvcreate --type standalone-pool -L1T  --poolmetadatasize 100M  vg

(soon to be also supported with option   --pooldatasize 1T)


Then lvm2 will not try ty manage transaction_id on this pool device.

Comment 23 Mike Snitzer 2014-10-26 16:43:37 UTC
(In reply to Zdenek Kabelac from comment #22)
> I'd propose better idea - to introduce new  type -
> 
> standalone-pool  (let's pick name here)
> (docker-pool, ...)
> 
> So thin-pool will remain to be dedicated for lvm and thin volumes.
> 
> Then this 'standalone' pool will get separate behavior with supported
> behavior for functions docker needs.
> 
> Creation is then like:
> 
> lvcreate --type standalone-pool -L1T  --poolmetadatasize 100M  vg
> 
> (soon to be also supported with option   --pooldatasize 1T)
> 
> 
> Then lvm2 will not try to manage transaction_id on this pool device.

That sounds good to me.  We do want lvconvert thin-pool syntax to work too though (so that a data and metadata lv can be paired into a standalone thin-pool).  And as we discussed on irc (for the benefit of others):

12:29 <snitm> do you understand that I want all the thin-pool services that lvm2 provides?
12:29 <zkabelac> yes
12:29 <snitm> I want auto-resize based on monitor
12:30 <zkabelac> but lvm2 must prohibit i.e.  creation of thin volume on such pool
12:30 <snitm> I want thin-pool activation, and thin_check on activation

So if a new 'standalone-thin-pool' type is the best way to "flag" the thin-pool creation (rather than a --standalone-pool flag I suggested in comment#21) then that is fine.

Comment 24 Michal Minar 2014-10-27 05:44:41 UTC
(In reply to Daniel Walsh from comment #11)
> Michal want to take a stab at this one?
It seems like the work is in progress. I'd rather not interfere to avoid redundant effort. Mike, if you want some help on docker side, please let me know.

Comment 25 Mike Snitzer 2014-10-27 12:58:25 UTC
(In reply to Michal Minar from comment #24)
> (In reply to Daniel Walsh from comment #11)
> > Michal want to take a stab at this one?
> It seems like the work is in progress. I'd rather not interfere to avoid
> redundant effort. Mike, if you want some help on docker side, please let me
> know.

Sure, will do.

Comment 26 Daniel Walsh 2014-10-27 12:59:36 UTC
Yes that sounds great,  any help you can give Mike would be great, and if we have patches, shephard them through getting them upstream in docker.

Comment 27 Vivek Goyal 2014-10-27 13:01:36 UTC
I am beginning to think that why to introduce this additional mode and why not train docker to use standard thin volumes.

I see there have been two arguments.

- We need a quick fix so that docker can deal with need_recheck flag.
- libdm based solution is more scalable than lvm command line based solution.

In one of the comments mike mentioned possibility of calling thin_check/thin_repair directly from docker. I kind of like that solution as quick fix.

W.r.t scalability issue, I feel that we should really do some experiments and see how slow lvm is and is it really a problem.

IOW, at this point of time, I am kind of little hesitant in increasing complexity further and introduce another hybrid mode. I would rather prefer to stick with standard thin volume creation approach.

Comment 30 Daniel Walsh 2014-10-27 13:13:19 UTC
The main reason I believe that Docker will move towards overlayfs is the memory sharing capability BTW.  Allowing for greater scalling.

Comment 32 Mike Snitzer 2014-10-27 13:37:32 UTC
(In reply to Daniel Walsh from comment #30)
> The main reason I believe that Docker will move towards overlayfs is the
> memory sharing capability BTW.  Allowing for greater scalling.

Sure, I'm aware, makes sense.  And that is perfectly fine/expected.  Best solution to all requirements wins!

Comment 33 Daniel Walsh 2015-01-19 15:05:08 UTC
Any update on this?

Comment 34 Vivek Goyal 2015-01-19 15:43:05 UTC
Now docker and lvm2 support this hybrid mode where thin pool can be setup by lvm2 and it can be used by docker.

Mike Snitzer did the docker changes which have been merged and lvm2 team
has finished lvm2 changes.

To make it easier to setup for user, I have modified docker-storage-setup script and sent in a pull request.

https://github.com/projectatomic/docker-storage-setup/pull/4

Once this is merged, it will become easier to setup lvm thin pool and let docker use it.

But I think one question still remains. How will this needs_check flag be cleared automatically even in this new model?

"man lvmthin" says following.

       When  a  thin  pool LV is activated, lvm runs the thin_check command to
       check the correctness of the metadata on the pool metadata LV.

So in this case once I have freed space on /var, there is no fresh activation
of thin pool LV. That means lvm will not run thin_check on pool and that
should mean that needs_check will not be cleared.

I think problem should go away after a reboot as a pool LV will be reactivated again and thin_check will be run. Otherwise I think user will have to run thin_check manually on pool or do some lvm action to trigger it.

Comment 35 Mike Snitzer 2015-01-19 18:10:20 UTC
(In reply to Vivek Goyal from comment #34)

> But I think one question still remains. How will this needs_check flag be
> cleared automatically even in this new model?
> 
> "man lvmthin" says following.
> 
>        When  a  thin  pool LV is activated, lvm runs the thin_check command
> to
>        check the correctness of the metadata on the pool metadata LV.
> 
> So in this case once I have freed space on /var, there is no fresh activation
> of thin pool LV. That means lvm will not run thin_check on pool and that
> should mean that needs_check will not be cleared.
> 
> I think problem should go away after a reboot as a pool LV will be
> reactivated again and thin_check will be run. Otherwise I think user will
> have to run thin_check manually on pool or do some lvm action to trigger it.

You need to be precise about why the needs_check flag got set in the first place in your hypotehtical case you're concerned about.

A properly configured thin-pool will reside in an lvm2 VG that has free space available that the lvm2 monitor will quickly resize the thin-pool to use if/when space is exhausted.  If that rescue via resize doesn't occur the kernel will degenerate to "out-of-space" mode.  But it will _not_ set the needs_check flag.

We don't set needs_check purely because the pool runs out of space.  If needs_check is set it is because the kernel encountered a potentially serious problem (e.g. corruption).  The appropriate response to needs_check being set is to take the pool down for checking and possible repair (next activation lvm2 would handle doing so "for free").  Doing so will clear needs_check if all was fine or if lvm2 call out to the thin repair fixed a problem.

Bottomline, docker's thin-pool volume should be managed using the same exact management workflows and lvm2 commands as is supported on RHEL7.  If you think there are gaps they need to be addressed in the context of lvm2 improvements, etc.

Comment 36 Vivek Goyal 2015-01-20 16:54:59 UTC
Ok, Colin Walters just merged my patches into upstream docker-storage-setup. Now if that script is run during boot, it should automatically setup an lvm thin pool for docker use.

https://github.com/projectatomic/docker-storage-setup/pull/4

Comment 37 Vivek Goyal 2015-01-21 21:26:01 UTC
(In reply to Mike Snitzer from comment #35)
> You need to be precise about why the needs_check flag got set in the first
> place in your hypotehtical case you're concerned about.
> 
> A properly configured thin-pool will reside in an lvm2 VG that has free
> space available that the lvm2 monitor will quickly resize the thin-pool to
> use if/when space is exhausted.  If that rescue via resize doesn't occur the
> kernel will degenerate to "out-of-space" mode.  But it will _not_ set the
> needs_check flag.

Ok, Agreed. Pool being out of space and pool having needs_check set are two
different issues and should be handled accordingly.

I did some testing with pool being out of space scenario. 

I enabled automatic pool size extension by setting thin_pool_autoextend_threshold to 80 in /etc/lvm/lvm.conf. That way whenever
pool was running short on space it got extended and docker continued to
run with failures. Which is good.

In another testing, I disabled automatic extension (default) and tried to manual resize of pool after it was full. Extension succeeded but further
docker operation failed. I opened a bug to track that.

https://bugzilla.redhat.com/show_bug.cgi?id=1184592

I will do some testing with setting and clearing of needs-check flag. Not sure though that how to get to a state where pool sets needs-check flag. Do I need to fill up metadata device?

Comment 38 Mike Snitzer 2015-01-21 22:41:19 UTC
(In reply to Vivek Goyal from comment #37)
> (In reply to Mike Snitzer from comment #35)
> > You need to be precise about why the needs_check flag got set in the first
> > place in your hypotehtical case you're concerned about.
> > 
> > A properly configured thin-pool will reside in an lvm2 VG that has free
> > space available that the lvm2 monitor will quickly resize the thin-pool to
> > use if/when space is exhausted.  If that rescue via resize doesn't occur the
> > kernel will degenerate to "out-of-space" mode.  But it will _not_ set the
> > needs_check flag.
> 
> Ok, Agreed. Pool being out of space and pool having needs_check set are two
> different issues and should be handled accordingly.
> 
> I did some testing with pool being out of space scenario. 
> 
> I enabled automatic pool size extension by setting
> thin_pool_autoextend_threshold to 80 in /etc/lvm/lvm.conf. That way whenever
> pool was running short on space it got extended and docker continued to
> run with failures. Which is good.

Think you mean s/with/without/ failures? ;)

> In another testing, I disabled automatic extension (default) and tried to
> manual resize of pool after it was full. Extension succeeded but further
> docker operation failed. I opened a bug to track that.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1184592
> 
> I will do some testing with setting and clearing of needs-check flag. Not
> sure though that how to get to a state where pool sets needs-check flag. Do
> I need to fill up metadata device?

Yes, that'll set needs_check.  There are ways to reload the metadata device with an error target to force an error on metadata IO which triggers needs_check too.

Comment 40 Mike Snitzer 2015-03-11 17:06:26 UTC
This was already resolved when we switched over to using lvm2 to manage the thin-pool that is passed to docker (lvm2 will be used to trap needs_check and user will need to use lvm2 to clear needs_check).

Comment 41 Daniel Walsh 2015-03-11 19:46:59 UTC
Fixed in docker-1.5


Note You need to log in before you can comment on or make changes to this bug.