Bug 1248972 - Attaching a detached cache pool makes lvm/device-mapper think the old blocks from the pool should be flushed
Summary: Attaching a detached cache pool makes lvm/device-mapper think the old blocks ...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Zdenek Kabelac
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-31 08:57 UTC by Vratislav Podzimek
Modified: 2016-11-24 12:36 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-24 12:36:15 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Vratislav Podzimek 2015-07-31 08:57:13 UTC
Description of problem:
When a cache pool is detached (with lvconvert --splitcache) and then attached back after some use of the original (now uncached) LV, LVM/device-mapper think that the blocks that are in the cache are the current/right ones and which damages the file system on the LV and if is detached again, LVM flushes all the old/stale blocks.

Version-Release number of selected component (if applicable):
lvm2-2.02.116-3.fc21.x86_64

How reproducible:
cannot say, but it happened to me twice yesterday

Steps to Reproduce:
1. create a cached LV
2. detach the cache with --splitcache (not destroying it)
3. use the LV without the cache
4. attach the cache back
5. see if the file system on the cached LV is broken or try --splitcache again and see if the old/stale blocks get flushed

Actual results:
file system damaged, old/stale blocks flushed

Expected results:
old/stale blocks discarded and replaced with up-to-date/current ones in the cache

Comment 1 Alasdair Kergon 2015-07-31 13:36:17 UTC
What were you intending to achieve by this?  It's obviously not intended to be used in this way, so we can add some prompts (and wipe cache by default) to steps 2 and 4 to make the user confirm they understand what they are doing.

Comment 2 Alasdair Kergon 2015-07-31 13:40:10 UTC
Obviously the code cannot magically know what you are changing once you have separated the devices. (With --splitmirrors we have a --trackchanges option, but would that sort of thing really have any benefit for a mere cache?)

Comment 3 Vratislav Podzimek 2015-08-04 13:14:14 UTC
(In reply to Alasdair Kergon from comment #1)
> What were you intending to achieve by this?  It's obviously not intended to
> be used in this way, so we can add some prompts (and wipe cache by default)
> to steps 2 and 4 to make the user confirm they understand what they are
> doing.

My use case was that I wanted to compare two different SSDs being used as a cache for my HDD. So I had been using the first one for some time, then split/detached the cache and started using the second one. Then I found out that the first one was better, so I split/detached the second one and attached the first one back. Because I knew it would again take a lot of time to populate the cache for the first time (may be fixed now), I intentionally didn't destroy the first cache pool.

Some prompts explaining that the above won't work would be nice. Or at least some explanation/warning in the lvmcache(8) manpage. It really didn't occur to me that attaching a (writethrough) cache pool to an LV could damage a file system on it.

Comment 4 Vratislav Podzimek 2015-08-04 13:16:17 UTC
(In reply to Alasdair Kergon from comment #2)
> Obviously the code cannot magically know what you are changing once you have
> separated the devices. (With --splitmirrors we have a --trackchanges option,
> but would that sort of thing really have any benefit for a mere cache?)

I think --trackchanges could be useful. I don't quite understand the purpose of the --splitcache option without it. It's only a matter of luck if somebody uses it and then attaches the pool back and nothing bad happens if I understand it correctly now.

Comment 5 Zdenek Kabelac 2015-08-04 18:24:42 UTC
--splitcache has a clear bug inside.

there is a missing 'clearing' of metadata when cache is attached to a new cached LV.

The currect pure use case for --splitcache is possible exploration of used cache metadata and another possible use-case is to 'reuse'  prepared cache-pool for another LV.

However the 2nd. part will be much better handled with upcomming profile support - so you could just specify --commandprofile during cache creation.

Anyway - as such - it's a clear bug in lvm2 code which is missing to zero metadata before first use as cache (as it now clears them only when creating cache-pool).

And it's known bug for a while...

Comment 6 Jonathan Earl Brassow 2015-08-04 21:21:29 UTC
The --splitcache operation does flush the cache before the split (especially if it is in writeback mode), right?

Comment 7 Corey Marthaler 2015-08-06 22:46:13 UTC
Appears so.

[root@harding-03 ~]# lvconvert --splitcache cache_sanity/pool
  Flushing cache for origin.
  Logical volume cache_sanity/origin is not cached and cache pool cache_sanity/pool is unused.

Comment 8 Fedora End Of Life 2015-11-04 15:25:27 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Vratislav Podzimek 2015-11-12 10:05:36 UTC
I don't see anything done which could be fixing this issue in newer releases.

Comment 10 Tony Asleson 2016-01-27 14:46:14 UTC
(In reply to Alasdair Kergon from comment #1)
> What were you intending to achieve by this?  It's obviously not intended to
> be used in this way, so we can add some prompts (and wipe cache by default)
> to steps 2 and 4 to make the user confirm they understand what they are
> doing.

Please explain what the intended use of this option is.

From the man page I interpreted it as a way to split a LV from its cache which would allow it to be re-usable, like the result of step #3 of the man page.

Comment 11 Zdenek Kabelac 2016-11-04 13:39:01 UTC
I believe this bug is now already fixed.

From version 2.02.162 you would need to explicitly specify -Zn  with lvconvert to actually enfore  non-zeroing of cache metadata when cache pool is being reattached to a new LV.

Comment 12 Vratislav Podzimek 2016-11-04 14:27:53 UTC
(In reply to Zdenek Kabelac from comment #11)
> I believe this bug is now already fixed.
> 
> From version 2.02.162 you would need to explicitly specify -Zn  with
> lvconvert to actually enfore  non-zeroing of cache metadata when cache pool
> is being reattached to a new LV.

Yes, I can confirm this.

Comment 13 Fedora End Of Life 2016-11-24 12:15:36 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.


Note You need to log in before you can comment on or make changes to this bug.