Bug 889465

Summary: lvcreate core dumps if attempted on a VG with missing PV while lvmetad is running
Product: Red Hat Enterprise Linux 6 Reporter: Nenad Peric <nperic>
Component: lvm2Assignee: Petr Rockai <prockai>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.4CC: agk, dwysocha, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: 6.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.100-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-22 12:02:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 892991    

Description Nenad Peric 2012-12-21 10:09:26 UTC
Description of problem:

lvcreate core dumps if an attempt at creating LV inside a VG with a missing PV. 


Version-Release number of selected component (if applicable):

lvm2-2.02.98-6.el6.x86_64
lvm2-libs-2.02.98-6.el6.x86_64
device-mapper-1.02.77-6.el6.x86_64
device-mapper-libs-1.02.77-6.el6.x86_64


How reproducible:
Every time

Steps to Reproduce:

Make sure lvmetad is enabled in lvm.conf and lvm2-lvmetad is running. 

 use_lvmetad = 1

service lvm2-lvmetad status

1. vgcreate vg /dev/sda1 /dev/sdb1
2. echo "offline" >/sys/block/sdb/device/state 
3. lvcreate vg -n test_lv -L1G

  
Actual results:

(04:02:04) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status
lvmetad (pid  6001) is running...
(04:02:14) [root@r6-node01:~]$ vgcreate vg /dev/sda1 /dev/sdb1
  No physical volume found in lvmetad cache for /dev/sdb1
  Physical volume "/dev/sdb1" successfully created
  Volume group "vg" successfully created
(04:02:22) [root@r6-node01:~]$ vgs
  VG       #PV #LV #SN Attr   VSize  VFree 
  VolGroup   1   2   0 wz--n-  9.51g     0 
  vg         2   0   0 wz--n- 19.98g 19.98g
(04:02:24) [root@r6-node01:~]$ echo "offline" >/sys/block/sdb/device/state 
(04:02:28) [root@r6-node01:~]$ vgs
  No device found for PV FFcDQr-vaB3-WNau-382i-ivo3-kKvI-dv2XGj.
  No device found for PV FFcDQr-vaB3-WNau-382i-ivo3-kKvI-dv2XGj.
  VG       #PV #LV #SN Attr   VSize  VFree 
  VolGroup   1   2   0 wz--n-  9.51g     0 
  vg         2   0   0 wz--n- 19.98g 19.98g
(04:02:30) [root@r6-node01:~]$ lvcreate vg -n test_lv -L1G
  No device found for PV FFcDQr-vaB3-WNau-382i-ivo3-kKvI-dv2XGj.
lvcreate: metadata/pv_map.c:140: _create_maps: Assertion `pvl->pv->dev' failed.
Aborted (core dumped)


Expected results:

The command should either:
- succeed and create LV on existing physical volume if it fits, 
or 
- refuse to get executed completely saying that VG is missing a PV, and that it needs to be repaired first.

Additional info:

Comment 2 RHEL Program Management 2012-12-25 06:48:07 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 3 Peter Rajnoha 2013-01-08 12:35:53 UTC
Well, this bug is caused by using the "echo offline > /sys...". Thing is that with this, there's no uevent generated that would update lvmetad via udev rules for this change (here meaning going running/offline).

It's questionable what is proper behaviour here as with using "echo offline > /sys", the device is still accessible via /sys/<device> - this item is not removed from kernel! Also, the kernel does not generate any event when changing states from running to offline and vice versa (which imho it should!). This should not happen in normal hotplug environment as events are always generated. The running/offline device state is a special case.

To fix this for now, you can use:

echo offline > /sys/<dev>/device/state
echo remove > /sys/<dev>/uevent

...

echo running > /sys/<dev>/device/state
echo add > /sys/<dev>/uevent

Comment 4 Peter Rajnoha 2013-01-08 12:36:50 UTC
(In reply to comment #0)
>   No device found for PV FFcDQr-vaB3-WNau-382i-ivo3-kKvI-dv2XGj.
> lvcreate: metadata/pv_map.c:140: _create_maps: Assertion `pvl->pv->dev'
> failed.
> Aborted (core dumped)

...anyway, we should fix this as well - it should not fail this way!

Comment 5 Peter Rajnoha 2013-01-08 12:54:14 UTC
There's also another echo that removes the dev altogether (including proper uevents):

  echo 1 > /sys/block/<dev>/device/delete

...and then this call to bring it back:

  echo "- - -" > /sys/class/scsi_host/hostN/scan

However, this way, you're not sure that the dev is assigned the same name again (but that is OK and expected in hotplug system).

Comment 6 Marian Csontos 2013-01-08 14:29:25 UTC
(In reply to comment #5)
> There's also another echo that removes the dev altogether (including proper
> uevents):
> 
>   echo 1 > /sys/block/<dev>/device/delete

Verified using this method lvmetad content is refreshed and the issue is likely to go away.

Comment 7 Peter Rajnoha 2013-01-10 14:04:16 UTC
(In reply to comment #0)
> (04:02:30) [root@r6-node01:~]$ lvcreate vg -n test_lv -L1G
>   No device found for PV FFcDQr-vaB3-WNau-382i-ivo3-kKvI-dv2XGj.
> lvcreate: metadata/pv_map.c:140: _create_maps: Assertion `pvl->pv->dev'
> failed.

The problem here lies in using 2 sources of information while working with the VG.


*Without lvmetad*, the vg_read call returns with error state ("missing PVs") which is properly recognized via vg_read_error_call:

int lvcreate(struct cmd_context *cmd, int argc, char **argv)
...
  vg = vg_read_for_update(cmd, lp.vg_name, NULL, 0);
...
  if (vg_read_error(vg)) {
    release_vg(vg);
    stack;
    return ECMD_FAILED;
  }

And this causes the lvcreate operation to properly error out.


However, *with lvmetad* in play, we first get the VG metadata from lvmetad which is stale as lvmetad was not updated properly because of the missing event as mentioned in comment #3. We have this sequence in the code:

...
_vg_read
lvmcache_get_vg
lvmetad_vg_lookup

The lvmetad_vg_lookup then calls:

if ((pvcn = dm_config_find_node(top, "metadata/physical_volumes")))
                        for (pvcn = pvcn->child; pvcn; pvcn = pvcn->sib)
                                _pv_populate_lvmcache(cmd, pvcn, 0); 

So it takes the metadata from lvmetad and tries to update the lvmcache with PV info. However, one of the PVs is missing!

The problem is that we don't check the return value of _pv_populate_lvmcache and we silently continue and so we end up with lvmetad info and lvmcache info going apart.

We should probably error out here directly and mark the VG as "unreliable" and exit. OR, alternatively, we should fix the information given from lvmetad in-situ and additionally mark the VG with "this VG has missing PVs" flag.

Mornfall, any opinion?

Comment 8 Peter Rajnoha 2013-01-10 14:20:40 UTC
(In reply to comment #7)
>   vg = vg_read_for_update(cmd, lp.vg_name, NULL, 0);
> ...
>   if (vg_read_error(vg)) {
>     release_vg(vg);
>     stack;
>     return ECMD_FAILED;
>   }

(...with lvmetad, this passes as VG is considered to be OK and complete...)

Comment 9 Petr Rockai 2013-03-18 14:18:52 UTC
Well, I suggest just checking validity of dev pointers and complaining earlier. A write would fail anyway, so we basically check if we have all dev pointers, and if no, set the PV_MISSING flag or such, ensuring that usual policies apply (no lvcreate, among others).

Comment 10 Petr Rockai 2013-04-03 10:59:23 UTC
This should now work correctly upstream (devices not visible to the lvm command but visible to lvmetad are now treated as MISSING by the command).

Comment 11 Nenad Peric 2013-07-12 07:02:54 UTC
Tested with lvm2-2.02.99-0.13.el6 and marking verified. 
The command fails to execute while PV is missing from the VG, rather than just core dumping. 


[root@tardis-01 ~]# echo offline >/sys/block/sdc/device/state 
[root@tardis-01 ~]# lvcreate vg -n lv_missing_pv -L 5G 
  No device found for PV aAyY75-bLyv-gFgJ-D8Ri-8Cuc-xgdH-eqrZyC.
  Cannot change VG vg while PVs are missing.
  Consider vgreduce --removemissing.