Bug 1121279

Summary: running thin-check should clear the needs-check flag or at least tell you how to clear it.
Product: Red Hat Enterprise Linux 7 Reporter: Karl Hastings <kazen>
Component: device-mapper-persistent-dataAssignee: Peter Rajnoha <prajnoha>
Status: CLOSED CURRENTRELEASE QA Contact: Bruno Goncalves <bgoncalv>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: agk, bgoncalv, heinzm, jbrassow, lvm-team, msnitzer, thornber
Target Milestone: rcKeywords: Reopened, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-02 13:30:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1113141, 1203710, 1295577, 1313485    

Description Karl Hastings 2014-07-18 20:56:33 UTC
Description of problem:
Upon any io-error the kernel will set the 'needs-check' flag on a thin pool, causing the pool to start as read-only even on future activations.  

There is no error message telling you that the pool is activated read-only because of a flag or error.

running thin_check on a pool with no other errors will not clear the flag, nor does it tell you the flag is set.

Version-Release number of selected component (if applicable):
device-mapper-persistent-data-0.3.2-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
starting with:
# dmsetup status
docker-253:1-33555176-pool: 0 209715200 thin-pool 34 279/524288 7495/1638400 - ro discard_passdown queue_if_no_space 
# systemctl stop docker.service
# thin_check /var/lib/docker/devicemapper/devicemapper/metadata
examining superblock
examining devices tree
examining mapping tree
# systemctl start docker.service
# dmsetup status
docker-253:1-33555176-pool: 0 209715200 thin-pool 34 279/524288 7495/1638400 - ro discard_passdown queue_if_no_space 

Actual results:
The thin pool is still 'ro' because the needs-check flag is still set.

Expected results:
Since we've now run the check, the 'needs-check' flag should be turned off.
(similar to a filesystem that's flagged dirty, once you run 'fsck' on the FS, even if there were no errors, the dirty flag is cleared).

Additional info:
It would be awesome if dmsetup status indicated what the issue is.  I.e.

# dmsetup status
docker-253:1-33555176-pool: 0 209715200 thin-pool 34 279/524288 7495/1638400 - ro discard_passdown queue_if_no_space [needs_check]

Comment 2 Alasdair Kergon 2014-07-18 22:47:11 UTC
I think there should be some indication of the state of this flag when thin_check is run.

Separately (kernel), this is indeed important internal target state and so I think the target status line ought to report it.

Comment 5 Mike Snitzer 2016-01-22 18:54:04 UTC
(In reply to Alasdair Kergon from comment #2)
> I think there should be some indication of the state of this flag when
> thin_check is run.
> 
> Separately (kernel), this is indeed important internal target state and so I
> think the target status line ought to report it.

http://git.kernel.org/linus/e4c78e210da added 'needs_check' to the thin-pool status output.

As for rhel, it was included in this rhel7.git commit:
# git describe 11ed68fce6af5a6e660f37f0261bf12706be6c9f
kernel-3.10.0-297.el7-6-g11ed68f

Comment 6 Jonathan Earl Brassow 2016-01-22 19:00:35 UTC
'thin_check --clear-needs-check-flag' can be used, which will clear the flag after use.

Comment 7 Jonathan Earl Brassow 2016-01-22 19:07:53 UTC
peter, can you make sure this is packaged in rhel7.3

Comment 8 Peter Rajnoha 2016-01-25 14:53:11 UTC
(In reply to Jonathan Earl Brassow from comment #7)
> peter, can you make sure this is packaged in rhel7.3

The --clear-needs-check-flag is already there early since RHEL 7.0 so I'm closing this one.

Comment 9 Karl Hastings 2016-01-25 16:48:42 UTC
This BZ was not about adding '--clear-needs-check-flag".  

But rather, why must I pass --clear-needs-check-flag at all?

If a filesystem has a problem and it's flagged dirty you run a fsck, and if there are no errors it removes the dirty flag, you don't also have to pass in --clear-dirty-flag... 

Shouldn't that be the default?

And if not, (since there is still a state preventing activation) it should at least provide more output, along the lines of: "needs_check flag found.  run 'thin_check --clear-needs-check-flag' to remove."   (rather than claiming the thin_check competed successfully)

Stated another way:  It says it needs to be checked, so I ran thin_check, why does it still say it needs to be checked?  Why would I *ever* NOT want the needs_check flag cleared?  As the man page states:  "If the metadata check failed, the flag is not cleared and a thin_repair run is needed to fix any issues." 


(The additional request to have dmsetup status report 'needs_check' appears to have made it in according to c#5  "dm thin: display 'needs_check' in status if it is set"  Which is handled in BZ#1282900 )

Comment 10 Peter Rajnoha 2016-06-02 13:30:12 UTC
(In reply to Karl Hastings from comment #9)
> This BZ was not about adding '--clear-needs-check-flag".  
> 
> But rather, why must I pass --clear-needs-check-flag at all?

By design, thin_check is not supposed to change anything, only do the checks by default. So that's the reason why you need to pass the --clear-needs-check-flag to to the change in this case (to clear the flag if the check is successful and thin_repair is not needed).

> 
> If a filesystem has a problem and it's flagged dirty you run a fsck, and if
> there are no errors it removes the dirty flag, you don't also have to pass
> in --clear-dirty-flag... 
> 
> Shouldn't that be the default?
> 
> And if not, (since there is still a state preventing activation) it should
> at least provide more output, along the lines of: "needs_check flag found. 
> run 'thin_check --clear-needs-check-flag' to remove."   (rather than
> claiming the thin_check competed successfully)
> 
> Stated another way:  It says it needs to be checked, so I ran thin_check,
> why does it still say it needs to be checked?  Why would I *ever* NOT want
> the needs_check flag cleared?  As the man page states:  "If the metadata
> check failed, the flag is not cleared and a thin_repair run is needed to fix
> any issues." 
> 

It's a design decision to do it this way (and yes, maybe differently from fsck) and at the same time, thin_check is a low-level tool (just like dmsetup) so the user is fully responsible for the actions done. The higher level tools (like LVM2) make sure this is handled correctly and users don't need to care.

Also, this behaviour is documented in thin_check man page.