Bug 804135

Summary: vgscan does not populate lvmetad with metadata information
Product: Red Hat Enterprise Linux 6 Reporter: Peter Rajnoha <prajnoha>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: agk, cmarthal, dwysocha, heinzm, jbrassow, mbroz, msnitzer, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.95-3.el6 Doc Type: Bug Fix
Doc Text:
No Documentation needed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 15:02:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Peter Rajnoha 2012-03-16 15:39:04 UTC
Two problems actually but with the same consequence:

1 - vgscan bug
==============
If there's lvmetad running but it does not contain any metadata information yet, the initial vgscan does not populate it with existing PV/VG information.

This causes a problem mainly in the lvmetad init script that calls vgscan for lvmetad initialisation. If vgscan does not populate it, the PV/VG is invisible for any subsequent LVM commands.

The pvscan --cache works correctly though! Also, any device that appears after lvmetad is initialised, is processed correctly as well since udev rule already calls "pvscan --cache". So it's just vgscan that does not work with lvmetad.

I've changed the vgscan to pvscan in the init script (as per Mornfall's recommendation). But we should fix vgscan as well.

2 - init script not synchronized (the problem I haven't actually hit, but I can imagine someone could hit it sooner or later)
==============
There's a possible race while running the init script - we're starting lvmetad daemon and then we call vgscan/pvscan. It's possible that the socket is still not ready and the vgscan/pvscan would fallback to normal scanning without populating lvmetad and then the lvmetad will miss this information.

A quick solution here would be to (actively) wait for the socket to appear and call vgscan/pvscan only after we're really sure the daemon is ready to accept any conncetion.

(Note: systemd is OK here since the socket is already prepared and any request on the socket is buffered until the daemon is ready to process it.)

Comment 1 Peter Rajnoha 2012-03-16 15:50:00 UTC
(In reply to comment #0)
> 2 - init script not synchronized (the problem I haven't actually hit, but I can
> imagine someone could hit it sooner or later)
> ==============

Well, looking at the code, we have:

      if (!_systemd_activation && s.socket_path) {
                s.socket_fd = _open_socket(s);
                if (s.socket_fd < 0)
                        failed = 1;
        }   

        /* Signal parent, letting them know we are ready to go. */
        if (!s.foreground)
                kill(getppid(), SIGTERM);

The problem should not arise then, fortunately. So the only problem remaining is the problem #1.

Comment 2 Peter Rajnoha 2012-03-22 11:35:08 UTC
The patch for vgscan is proposed here:
  https://www.redhat.com/archives/lvm-devel/2012-March/msg00137.html

It's disputable whether this is the correct way considering we have a separate pvscan and "pvscan --cache", vgscan should probably follow the same principle.

But anyway, I've already changed the init script to call "pvscan --cache" instead which does the job we need for certain.

Comment 3 Peter Rajnoha 2012-03-27 11:10:29 UTC
I've added the "--cache" option to vgscan as well so it behaves the same way as pvscan/pvscan --cache. This way, it's more consistent and it's not misleading for users.

(Anyway, the original bug is already fixed with using "pvscan --cache" in the init script.)

Comment 4 Peter Rajnoha 2012-03-27 12:49:35 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No Documentation needed.

Comment 5 Peter Rajnoha 2012-03-29 11:08:02 UTC
So now we have (using preexisting "vg" as testing volume group):

global/use_lvmetad=0 + lvmetad not running
------------------------------------------
[0] devel/~ # vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vg" using metadata type lvm2

[0] devel/~ # vgscan --cache
  Cannot proceed since lvmetad is not active.


global/use_lvmetad=0 + lvmetad running
--------------------------------------
[0] devel/~ # vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vg" using metadata type lvm2
[0] devel/~ # vgscan --cache
  Cannot proceed since lvmetad is not active.


global/use_lvmetad=1 + lvmetad running
--------------------------------------
-> first vgscan run after running lvmetad

[0] devel/~ # vgscan
  Reading all physical volumes.  This may take a while...
  No volume groups found
[0] devel/~ # vgscan --cache
  Reading all physical volumes.  This may take a while...
  Found volume group "vg" using metadata type lvm2

-> second and further vgscan after running lvmetad

[0] devel/~ # vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vg" using metadata type lvm2
[0] devel/~ # vgscan --cache
  Reading all physical volumes.  This may take a while...
  Found volume group "vg" using metadata type lvm2


global/use_lvmetad=1 + lvmetad not running
------------------------------------------
[0] devel/~ # vgscan
  WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.
  Reading all physical volumes.  This may take a while...
  Found volume group "vg" using metadata type lvm2

[0] devel/~ # vgscan --cache
  WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.
  Cannot proceed since lvmetad is not active.


That's the same sort of behaviour as we already have in pvscan.

Comment 7 Corey Marthaler 2012-04-25 21:53:22 UTC
Verified the test case listed in comment #5 works as expected in the latest rpms.

2.6.32-268.el6.x86_64

lvm2-2.02.95-6.el6    BUILT: Wed Apr 25 04:39:34 CDT 2012
lvm2-libs-2.02.95-6.el6    BUILT: Wed Apr 25 04:39:34 CDT 2012
lvm2-cluster-2.02.95-6.el6    BUILT: Wed Apr 25 04:39:34 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-6.el6    BUILT: Wed Apr 25 04:39:34 CDT 2012
device-mapper-libs-1.02.74-6.el6    BUILT: Wed Apr 25 04:39:34 CDT 2012
device-mapper-event-1.02.74-6.el6    BUILT: Wed Apr 25 04:39:34 CDT 2012
device-mapper-event-libs-1.02.74-6.el6    BUILT: Wed Apr 25 04:39:34 CDT 2012
cmirror-2.02.95-6.el6    BUILT: Wed Apr 25 04:39:34 CDT 2012


global/use_lvmetad=0 + lvmetad not running
------------------------------------------
[root@taft-01 ~]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "taft" using metadata type lvm2
  Found volume group "vg_taft01" using metadata type lvm2
[root@taft-01 ~]# vgscan --cache
  Cannot proceed since lvmetad is not active.

global/use_lvmetad=0 + lvmetad running
--------------------------------------
[root@taft-01 ~]# lvmetad
[root@taft-01 ~]# ps -ef | grep lvmetad
root     30221     1  4 16:47 ?        00:00:00 lvmetad
root     30223  1945  0 16:47 pts/0    00:00:00 grep lvmetad
[root@taft-01 ~]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "taft" using metadata type lvm2
  Found volume group "vg_taft01" using metadata type lvm2
[root@taft-01 ~]# vgscan --cache
  Cannot proceed since lvmetad is not active.

global/use_lvmetad=1 + lvmetad running
--------------------------------------
-> first vgscan run after running lvmetad

[root@taft-01 ~]# vgscan
  Reading all physical volumes.  This may take a while...
  No volume groups found
[root@taft-01 ~]# vgscan --cache
  Reading all physical volumes.  This may take a while...
  Found volume group "taft" using metadata type lvm2
  Found volume group "vg_taft01" using metadata type lvm2

-> second and further vgscan after running lvmetad

[root@taft-01 ~]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vg_taft01" using metadata type lvm2
  Found volume group "taft" using metadata type lvm2
[root@taft-01 ~]# vgscan --cache
  Reading all physical volumes.  This may take a while...
  Found volume group "taft" using metadata type lvm2
  Found volume group "vg_taft01" using metadata type lvm2

global/use_lvmetad=1 + lvmetad not running
------------------------------------------
[root@taft-01 ~]# killall lvmetad
[root@taft-01 ~]# ps -ef | grep lvmetad
root     30242  1945  0 16:49 pts/0    00:00:00 grep lvmetad
[root@taft-01 ~]# vgscan
  WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.
  Reading all physical volumes.  This may take a while...
  Found volume group "taft" using metadata type lvm2
  Found volume group "vg_taft01" using metadata type lvm2
[root@taft-01 ~]# vgscan --cache
  WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.
  Cannot proceed since lvmetad is not active.

Comment 8 errata-xmlrpc 2012-06-20 15:02:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html