Bug 840174 - LVM caching broken after preupgrade to F16
LVM caching broken after preupgrade to F16
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: lvm2 (Show other bugs)
17
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: LVM and device-mapper development team
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-14 01:08 EDT by bob mckay
Modified: 2012-07-16 04:47 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-16 04:47:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lvm dumpconfig output (1.68 KB, application/octet-stream)
2012-07-14 01:08 EDT, bob mckay
no flags Details
lvm vgscan -vvv stderr output (471 bytes, application/octet-stream)
2012-07-14 01:09 EDT, bob mckay
no flags Details
lvm vgscan -vvv stdout output (471 bytes, application/octet-stream)
2012-07-14 01:10 EDT, bob mckay
no flags Details
lvm vgscan -vvv stderr output (75.10 KB, application/octet-stream)
2012-07-14 01:12 EDT, bob mckay
no flags Details

  None (edit)
Description bob mckay 2012-07-14 01:08:51 EDT
Created attachment 598207 [details]
lvm dumpconfig output

Description of problem: LVM cacheing seems to have been broken in some preupgrade (probably F15->F16). It also shows up in all my F17 systems, but since these had already been preupgraded from F15 to F16 six months ago, I assume that the problem was already present in F16.

Version-Release number of selected component (if applicable):
F16:
lvm lvchange --version
  LVM version:     2.02.86(2) (2011-07-08)
  Library version: 1.02.65 (2011-07-08)
  Driver version:  4.22.0
F17:
lvm lvchange --version
  LVM version:     2.02.95(2) (2012-03-06)
  Library version: 1.02.74 (2012-03-06)
  Driver version:  4.22.0

How reproducible:
Reliable (4 systems checked, all exhibit problem)

Steps to Reproduce:
1. Create F15 system
2. Preupgrade F15->F16[->F17]
  
Actual results:
LVM does not cache to /etc/lvm/cache/.cache

[root@sc1 LVCHK]# ls /etc/lvm/cache -a
.  ..

Expected results:
LVM caches to /etc/lvm/cache/.cache

Additional info: I'm reporting this against lvm rather than preupgrade because whatever fixing is needed probably needs to come from lvm updates rather than preupgrade. I can't be absolutely certain which preupgrade created this problem (because I can't recall the exact history of all machines which show it), but I strongly suspect F15->F16. Output from vgscan -vvv and dumpconfig attached.
Comment 1 bob mckay 2012-07-14 01:09:41 EDT
Created attachment 598208 [details]
lvm vgscan -vvv stderr output
Comment 2 bob mckay 2012-07-14 01:10:40 EDT
Created attachment 598209 [details]
lvm vgscan -vvv stdout output
Comment 3 bob mckay 2012-07-14 01:12:20 EDT
Created attachment 598210 [details]
lvm vgscan -vvv stderr output

Ooops, sorry for finger trouble on previous stderr
Comment 4 bob mckay 2012-07-14 01:35:02 EDT
Just to clarify, this shows up in 
1. An F16 system that was probably originally created before F14 (but I don't remember exactly when) and has been preupgraded at every subsequent stage
2. An F17 system that was also originally created before F14 and has also been preupgraded at every subsequent stage
3. Another F17 system that to the best of my recollection was originally installed as F15 (but it could possibly have been F14) and has been sequentially upgraded to F15->F16->F17.
4... and other systems

The lack of caching isn't directly causing problems (the systems all run perfectly OK, though I guess in theory they might be a little slower), which I suppose is why it hasn't been reported before. However it's the kind of bit-rot that needs to be fixed (on the thousands of systems that presumably are currently suffering it) before it causes more pain down the track.

In my case, I am encountering great difficulties in upgrading the one remaining F16 system, and the root cause seems to link back to this lack of LVM caching. I have an lv which I need to delete (two lvs with identical volume name lv_swap is causing upgrade to crash), but can't delete. The reason I can't delete it is that the initramfs contains a reference to the deleted lv_swap, and refuses to boot when it doesn't exist. This should be fixable by re-creating the initramfs with dracut. In fact, however, even if dracut re-creates the initramfs after I delete the problematic lv_swap, it still creates a dependency on the missing lv_swap (it shouldn't, according to the documentation). I'm assuming that this arises from the same root cause as is causing the lvm cache not to be stored. Unfortunately I can't figure where to go from here in diagnosing this.
Comment 5 Peter Rajnoha 2012-07-14 18:36:10 EDT
(In reply to comment #4)
> Just to clarify, this shows up in 
> 1. An F16 system that was probably originally created before F14 (but I
> don't remember exactly when) and has been preupgraded at every subsequent
> stage
> 2. An F17 system that was also originally created before F14 and has also
> been preupgraded at every subsequent stage
> 3. Another F17 system that to the best of my recollection was originally
> installed as F15 (but it could possibly have been F14) and has been
> sequentially upgraded to F15->F16->F17.
> 4... and other systems
> 
> The lack of caching isn't directly causing problems (the systems all run
> perfectly OK, though I guess in theory they might be a little slower), which
> I suppose is why it hasn't been reported before. However it's the kind of
> bit-rot that needs to be fixed (on the thousands of systems that presumably
> are currently suffering it) before it causes more pain down the track.
> 

The old style cache in /etc/lvm/cache/.cache got obsolete with the introduction of "obtain_device_list_from_udev" feature. You can find this setting in the "devices" section of /etc/lvm/lvm.conf. It was introduced in lvm2 version 2.02.85 and it's on and used by default.

Before this change, we cached block device names together with all its aliases in the .cache file, now we make use of udev to get the list of block devices which is also the recommended way.

Also, current movement is to rely on udev and its event system even more with recent introduction of lvmetad - the LVM metadata caching daemon (since lvm2 version 2.02.89, but made available only in current Fedora rawhide only at the moment). This daemon provides much more advanced caching scheme and it's updated on-the-fly as devices appear and disappear...

> In my case, I am encountering great difficulties in upgrading the one
> remaining F16 system, and the root cause seems to link back to this lack of
> LVM caching. I have an lv which I need to delete (two lvs with identical
> volume name lv_swap is causing upgrade to crash), but can't delete. The
> reason I can't delete it is that the initramfs contains a reference to the
> deleted lv_swap, and refuses to boot when it doesn't exist. This should be
> fixable by re-creating the initramfs with dracut. In fact, however, even if
> dracut re-creates the initramfs after I delete the problematic lv_swap, it
> still creates a dependency on the missing lv_swap (it shouldn't, according
> to the documentation). I'm assuming that this arises from the same root
> cause as is causing the lvm cache not to be stored. Unfortunately I can't
> figure where to go from here in diagnosing this.

Please, provide the list/sequence of commands you used. Also, please, attach the output of the "lvs" command and dracut debug output ("dracut --debug" option). This might be a bug in dracut, but we should be able to see more from its debug log. Thanks.
Comment 6 bob mckay 2012-07-14 21:20:16 EDT
(In reply to comment #5)
Peter, thank you for your very helpful and thoughtful comments. I really appreciate it.
> 
> The old style cache in /etc/lvm/cache/.cache got obsolete with the
> introduction of "obtain_device_list_from_udev" feature. You can find this
> setting in the "devices" section of /etc/lvm/lvm.conf. It was introduced in
> lvm2 version 2.02.85 and it's on and used by default.
> 
> Before this change, we cached block device names together with all its
> aliases in the .cache file, now we make use of udev to get the list of block
> devices which is also the recommended way.
> 
> Also, current movement is to rely on udev and its event system even more
> with recent introduction of lvmetad - the LVM metadata caching daemon (since
> lvm2 version 2.02.89, but made available only in current Fedora rawhide only
> at the moment). This daemon provides much more advanced caching scheme and
> it's updated on-the-fly as devices appear and disappear...

Would it be possible to document this change in lvm.conf in the comments to the cache_dir, cache_file_prefix and write_cache_state parameters? Something like 

# These settings are ignored if obtain_device_list_from_udev is set

At the moment, I don't think it's very clear from lvm.conf, unless you are already expert in lvm caching and udev, that obtain_device_list_from_udev overrides caching. I think documenting it there is likely to save a lot of people grief, because there is tons of documentation still describing /etc/lvm/cache/.cache caching, but it all directs to /etc/lvm/lvm.conf, so if the change were clearly described there, most problems would be caught.

I'd mark this as 'notabug', but if I did so I'm not sure  whether you would see the above request, so I'm leaving it unchanged for now.

> 
> > In my case, I am encountering great difficulties in upgrading the one
> > remaining F16 system, and the root cause seems to link back to this lack of
> > LVM caching. I have an lv which I need to delete (two lvs with identical
> > volume name lv_swap is causing upgrade to crash), but can't delete. The
> > reason I can't delete it is that the initramfs contains a reference to the
> > deleted lv_swap, and refuses to boot when it doesn't exist. This should be
> > fixable by re-creating the initramfs with dracut. In fact, however, even if
> > dracut re-creates the initramfs after I delete the problematic lv_swap, it
> > still creates a dependency on the missing lv_swap (it shouldn't, according
> > to the documentation). I'm assuming that this arises from the same root
> > cause as is causing the lvm cache not to be stored. Unfortunately I can't
> > figure where to go from here in diagnosing this.
> 
> Please, provide the list/sequence of commands you used. Also, please, attach
> the output of the "lvs" command and dracut debug output ("dracut --debug"
> option). This might be a bug in dracut, but we should be able to see more
> from its debug log. Thanks.

I'm sorry, I finally found the source of the problem late last night. The only bug wa3s in my memory. I had forgotten that, some while ago, I had included the offending lv in a boot parameter with rd.lvm.lv in /etc/default/grub. Every time I ran dracut, it was picking it up from there (or from /boot/grub2/grub.cfg, which is effectively the same). So no bug in dracut, it is working perfectly once I removed the offending parameter. I have now removed the duplicate lv and rerun the preupgrade install, there's every chance that when I go to check the machine today, it will already be upgraded to F17.

Thank you again for your careful explanation. All the best.
Comment 7 Peter Rajnoha 2012-07-16 04:47:39 EDT
(In reply to comment #6)
> Would it be possible to document this change in lvm.conf in the comments to
> the cache_dir, cache_file_prefix and write_cache_state parameters? Something
> like 

OK, I've added a short comment to lvm.conf about this:

http://git.fedorahosted.org/git/?p=lvm2.git;a=commit;h=35ebc5d343e2e01b719ce15c801d522a36e54ac4

Note You need to log in before you can comment on or make changes to this bug.