Bug 889465
| Summary: | lvcreate core dumps if attempted on a VG with missing PV while lvmetad is running | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Nenad Peric <nperic> |
| Component: | lvm2 | Assignee: | Petr Rockai <prockai> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.4 | CC: | agk, dwysocha, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, prockai, thornber, zkabelac |
| Target Milestone: | rc | ||
| Target Release: | 6.4 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.02.100-1.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-07-22 12:02:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 892991 | ||
|
Description
Nenad Peric
2012-12-21 10:09:26 UTC
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Well, this bug is caused by using the "echo offline > /sys...". Thing is that with this, there's no uevent generated that would update lvmetad via udev rules for this change (here meaning going running/offline). It's questionable what is proper behaviour here as with using "echo offline > /sys", the device is still accessible via /sys/<device> - this item is not removed from kernel! Also, the kernel does not generate any event when changing states from running to offline and vice versa (which imho it should!). This should not happen in normal hotplug environment as events are always generated. The running/offline device state is a special case. To fix this for now, you can use: echo offline > /sys/<dev>/device/state echo remove > /sys/<dev>/uevent ... echo running > /sys/<dev>/device/state echo add > /sys/<dev>/uevent (In reply to comment #0) > No device found for PV FFcDQr-vaB3-WNau-382i-ivo3-kKvI-dv2XGj. > lvcreate: metadata/pv_map.c:140: _create_maps: Assertion `pvl->pv->dev' > failed. > Aborted (core dumped) ...anyway, we should fix this as well - it should not fail this way! There's also another echo that removes the dev altogether (including proper uevents): echo 1 > /sys/block/<dev>/device/delete ...and then this call to bring it back: echo "- - -" > /sys/class/scsi_host/hostN/scan However, this way, you're not sure that the dev is assigned the same name again (but that is OK and expected in hotplug system). (In reply to comment #5) > There's also another echo that removes the dev altogether (including proper > uevents): > > echo 1 > /sys/block/<dev>/device/delete Verified using this method lvmetad content is refreshed and the issue is likely to go away. (In reply to comment #0) > (04:02:30) [root@r6-node01:~]$ lvcreate vg -n test_lv -L1G > No device found for PV FFcDQr-vaB3-WNau-382i-ivo3-kKvI-dv2XGj. > lvcreate: metadata/pv_map.c:140: _create_maps: Assertion `pvl->pv->dev' > failed. The problem here lies in using 2 sources of information while working with the VG. *Without lvmetad*, the vg_read call returns with error state ("missing PVs") which is properly recognized via vg_read_error_call: int lvcreate(struct cmd_context *cmd, int argc, char **argv) ... vg = vg_read_for_update(cmd, lp.vg_name, NULL, 0); ... if (vg_read_error(vg)) { release_vg(vg); stack; return ECMD_FAILED; } And this causes the lvcreate operation to properly error out. However, *with lvmetad* in play, we first get the VG metadata from lvmetad which is stale as lvmetad was not updated properly because of the missing event as mentioned in comment #3. We have this sequence in the code: ... _vg_read lvmcache_get_vg lvmetad_vg_lookup The lvmetad_vg_lookup then calls: if ((pvcn = dm_config_find_node(top, "metadata/physical_volumes"))) for (pvcn = pvcn->child; pvcn; pvcn = pvcn->sib) _pv_populate_lvmcache(cmd, pvcn, 0); So it takes the metadata from lvmetad and tries to update the lvmcache with PV info. However, one of the PVs is missing! The problem is that we don't check the return value of _pv_populate_lvmcache and we silently continue and so we end up with lvmetad info and lvmcache info going apart. We should probably error out here directly and mark the VG as "unreliable" and exit. OR, alternatively, we should fix the information given from lvmetad in-situ and additionally mark the VG with "this VG has missing PVs" flag. Mornfall, any opinion? (In reply to comment #7) > vg = vg_read_for_update(cmd, lp.vg_name, NULL, 0); > ... > if (vg_read_error(vg)) { > release_vg(vg); > stack; > return ECMD_FAILED; > } (...with lvmetad, this passes as VG is considered to be OK and complete...) Well, I suggest just checking validity of dev pointers and complaining earlier. A write would fail anyway, so we basically check if we have all dev pointers, and if no, set the PV_MISSING flag or such, ensuring that usual policies apply (no lvcreate, among others). This should now work correctly upstream (devices not visible to the lvm command but visible to lvmetad are now treated as MISSING by the command). Tested with lvm2-2.02.99-0.13.el6 and marking verified. The command fails to execute while PV is missing from the VG, rather than just core dumping. [root@tardis-01 ~]# echo offline >/sys/block/sdc/device/state [root@tardis-01 ~]# lvcreate vg -n lv_missing_pv -L 5G No device found for PV aAyY75-bLyv-gFgJ-D8Ri-8Cuc-xgdH-eqrZyC. Cannot change VG vg while PVs are missing. Consider vgreduce --removemissing. |