Bug 1268190 - LVMError: vgreduce failed for vg_guests: running lvm vgreduce --config devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } --removemissing --force vg_guests failed [NEEDINFO]
Summary: LVMError: vgreduce failed for vg_guests: running lvm vgreduce --config devic...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: anaconda
Version: 7.2
Hardware: ppc64
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: David Lehman
QA Contact: Release Test Team
URL:
Whiteboard: abrt_hash:ebf51ceb98e127bc0ba22edee14...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-02 05:14 UTC by Peter Robinson
Modified: 2020-02-14 17:34 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-22 15:36:12 UTC
Target Upstream Version:
mkolbas: needinfo? (mkolbas)


Attachments (Terms of Use)
File: anaconda-tb (919.42 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: anaconda.log (8.84 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: environ (492 bytes, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: ks.cfg (2.82 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: lsblk_output (6.06 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: nmcli_dev_list (5.22 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: os_info (522 bytes, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: program.log (72.62 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: storage.log (594.78 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: syslog (145.15 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: ifcfg.log (39.03 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details
File: packaging.log (2.05 KB, text/plain)
2015-10-02 05:14 UTC, Peter Robinson
no flags Details

Description Peter Robinson 2015-10-02 05:14:36 UTC
Description of problem:
Install 7.2 Beta on a IBM Power 8 8247-22L using mdraid on 12 disks with LVM on top using a kickstart. Disk config looks like:

erombr
clearpart --all
bootloader --location=mbr --driveorder=sda

part prepboot --fstype="prepboot" --size=5 --ondisk=sda --asprimary 
part prepboot --fstype="prepboot" --size=5 --ondisk=sdb --asprimary 
part /boot --size 1000 --fstype ext4 --ondisk=sda --asprimary 
part /boot/whocares --size 1000 --fstype ext4 --ondisk=sdb --asprimary 

part raid.008011 --size=65536 --ondisk=sda --grow
part raid.008012 --size=65536 --ondisk=sdb --grow

part raid.008021 --grow --size=200 --ondisk=sdc
part raid.008022 --grow --size=200 --ondisk=sdd
part raid.008023 --grow --size=200 --ondisk=sde
part raid.008024 --grow --size=200 --ondisk=sdf
part raid.008025 --grow --size=200 --ondisk=sdg
part raid.008026 --grow --size=200 --ondisk=sdh
part raid.008027 --grow --size=200 --ondisk=sdi
part raid.008028 --grow --size=200 --ondisk=sdj
part raid.008029 --grow --size=200 --ondisk=sdk
part raid.008030 --grow --size=200 --ondisk=sdl

raid pv.009001 --level=1 --device=md1 raid.008011 raid.008012 
raid pv.009002 --level=6 --device=md2 raid.008021 raid.008022 raid.008023 raid.008024 raid.008025 raid.008026 raid.008027 raid.008028 raid.008029 raid.008030

volgroup vg_guests --pesize=4096 pv.009002
volgroup vg_server --pesize=4096 pv.009001
logvol / --fstype=xfs --name=LogVol00 --vgname=vg_server --size=32768
logvol swap --name=LogVol01 --vgname=vg_server --size=8192

Version-Release number of selected component:
anaconda-21.48.22.35-1

The following was filed automatically by anaconda:
anaconda 21.48.22.35-1 exception report
Traceback (most recent call first):
  File "/usr/lib/python2.7/site-packages/blivet/devicelibs/lvm.py", line 408, in vgreduce
    raise LVMError("vgreduce failed for %s: %s" % (vg_name, msg))
  File "/usr/lib/python2.7/site-packages/blivet/devices/lvm.py", line 230, in _destroy
    lvm.vgreduce(self.name, None, missing=True)
  File "/usr/lib/python2.7/site-packages/blivet/devices/storage.py", line 512, in destroy
    self._destroy()
  File "/usr/lib/python2.7/site-packages/blivet/deviceaction.py", line 344, in execute
    self.device.destroy()
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 377, in processActions
    action.execute(callbacks)
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 373, in doIt
    self.devicetree.processActions(callbacks)
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 223, in turnOnFilesystems
    storage.doIt(callbacks)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/install.py", line 186, in doInstall
    turnOnFilesystems(storage, mountOnly=flags.flags.dirInstall, callbacks=callbacks_reg)
  File "/usr/lib64/python2.7/threading.py", line 764, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 227, in run
    threading.Thread.run(self, *args, **kwargs)
LVMError: vgreduce failed for vg_guests: running lvm vgreduce --config  devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }  --removemissing --force vg_guests failed

Additional info:
addons:         com_redhat_kdump, org_fedora_oscap
cmdline:        /usr/bin/python  /sbin/anaconda
cmdline_file:   ks initrd=images/RHEL/7.2b/ppc64/initrd.img method=http://10.5.126.23/repo/rhel/RHEL7.2-beta-ppc64/ ip=dhcp ks=http://10.5.126.23/repo/rhel/ks/hardware-rhel-7-power8-BE-12disk.cfg text net.ifnames=0 biosdevname=0
executable:     /sbin/anaconda
hashmarkername: anaconda
kernel:         3.10.0-306.0.1.el7.ppc64
product:        Red Hat Enterprise Linux 7
release:        Red Hat Enterprise Linux ComputeNode release 7.2 Beta (Maipo)
release_type:   pre-release
type:           anaconda
uid:            0
version:        7.2

Comment 1 Peter Robinson 2015-10-02 05:14:38 UTC
Created attachment 1079329 [details]
File: anaconda-tb

Comment 2 Peter Robinson 2015-10-02 05:14:38 UTC
Created attachment 1079330 [details]
File: anaconda.log

Comment 3 Peter Robinson 2015-10-02 05:14:39 UTC
Created attachment 1079331 [details]
File: environ

Comment 4 Peter Robinson 2015-10-02 05:14:40 UTC
Created attachment 1079332 [details]
File: ks.cfg

Comment 5 Peter Robinson 2015-10-02 05:14:41 UTC
Created attachment 1079333 [details]
File: lsblk_output

Comment 6 Peter Robinson 2015-10-02 05:14:41 UTC
Created attachment 1079334 [details]
File: nmcli_dev_list

Comment 7 Peter Robinson 2015-10-02 05:14:42 UTC
Created attachment 1079335 [details]
File: os_info

Comment 8 Peter Robinson 2015-10-02 05:14:43 UTC
Created attachment 1079336 [details]
File: program.log

Comment 9 Peter Robinson 2015-10-02 05:14:44 UTC
Created attachment 1079337 [details]
File: storage.log

Comment 10 Peter Robinson 2015-10-02 05:14:45 UTC
Created attachment 1079338 [details]
File: syslog

Comment 11 Peter Robinson 2015-10-02 05:14:45 UTC
Created attachment 1079339 [details]
File: ifcfg.log

Comment 12 Peter Robinson 2015-10-02 05:14:46 UTC
Created attachment 1079340 [details]
File: packaging.log

Comment 13 Peter Robinson 2015-10-02 05:16:25 UTC
To clarify, the kickstart disk config (worked on other identical hosts) is:

zerombr
clearpart --all
bootloader --location=mbr --driveorder=sda

part prepboot --fstype="prepboot" --size=5 --ondisk=sda --asprimary 
part prepboot --fstype="prepboot" --size=5 --ondisk=sdb --asprimary 
part /boot --size 1000 --fstype ext4 --ondisk=sda --asprimary 
part /boot/whocares --size 1000 --fstype ext4 --ondisk=sdb --asprimary 

part raid.008011 --size=65536 --ondisk=sda --grow
part raid.008012 --size=65536 --ondisk=sdb --grow

part raid.008021 --grow --size=200 --ondisk=sdc
part raid.008022 --grow --size=200 --ondisk=sdd
part raid.008023 --grow --size=200 --ondisk=sde
part raid.008024 --grow --size=200 --ondisk=sdf
part raid.008025 --grow --size=200 --ondisk=sdg
part raid.008026 --grow --size=200 --ondisk=sdh
part raid.008027 --grow --size=200 --ondisk=sdi
part raid.008028 --grow --size=200 --ondisk=sdj
part raid.008029 --grow --size=200 --ondisk=sdk
part raid.008030 --grow --size=200 --ondisk=sdl

raid pv.009001 --level=1 --device=md1 raid.008011 raid.008012 
raid pv.009002 --level=6 --device=md2 raid.008021 raid.008022 raid.008023 raid.008024 raid.008025 raid.008026 raid.008027 raid.008028 raid.008029 raid.008030

volgroup vg_guests --pesize=4096 pv.009002
volgroup vg_server --pesize=4096 pv.009001
logvol / --fstype=xfs --name=LogVol00 --vgname=vg_server --size=32768
logvol swap --name=LogVol01 --vgname=vg_server --size=8192

Comment 14 Peter Robinson 2015-10-02 05:16:56 UTC
Oh and then it crashed again once it'd reported this bug:

Logging out
Status: NEW https://bugzilla.redhat.com/show_bug.cgi?id=1268190
===============================================================================
===============================================================================

===============================================================================
An unknown error has occurred
===============================================================================
anaconda 21.48.22.35-1 exception report
Traceback (most recent call first): 
  File "/usr/lib/python2.7/site-packages/blivet/devicelibs/lvm.py", line 408, in vgreduce
    raise LVMError("vgreduce failed for %s: %s" % (vg_name, msg))
  File "/usr/lib/python2.7/site-packages/blivet/devices/lvm.py", line 230, in _destroy
    lvm.vgreduce(self.name, None, missing=True)
  File "/usr/lib/python2.7/site-packages/blivet/devices/storage.py", line 512, in destroy
    self._destroy()
  File "/usr/lib/python2.7/site-packages/blivet/deviceaction.py", line 344, in execute
    self.device.destroy()
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 377, in processActions
    action.execute(callbacks)
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 373, in doIt
    self.devicetree.processActions(callbacks)
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 223, in turnOnFilesystems
    storage.doIt(callbacks)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/install.py", line 186, in doInstall
    turnOnFilesystems(storage, mountOnly=flags.flags.dirInstall, callbacks=callbacks_reg)
  File "/usr/lib64/python2.7/threading.py", line 764, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 227, in run
    threading.Thread.run(self, *args, **kwargs)
LVMError: vgreduce failed for vg_guests: running lvm vgreduce --config  devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }  --removemissing --force vg_guests failed

What do you want to do now?
1) Report Bug
2) Debug
3) Quit

Comment 16 Brian Lane 2015-10-02 23:29:17 UTC
It looks like this is having trouble removing a previous install, vg_guests appears to vanish before it can run vg_reduce on it. You should be able to work around this by making sure the disks have been cleaned up before starting the installation.

Comment 17 Peter Robinson 2015-10-06 11:15:37 UTC
(In reply to Brian Lane from comment #16)
> It looks like this is having trouble removing a previous install, vg_guests
> appears to vanish before it can run vg_reduce on it. You should be able to
> work around this by making sure the disks have been cleaned up before
> starting the installation.

I wrote out 100Mb of /dev/zero to all hard disks (a-l) and it still crashes in the same way. I have 3 identical servers being installed with a single kick start, two worked just fine, this one crashes in the same way ever with the zeros written out to all the disks.

Comment 18 Chris Lumens 2016-01-11 21:33:06 UTC
> I wrote out 100Mb of /dev/zero to all hard disks (a-l) and it still crashes
> in the same way. I have 3 identical servers being installed with a single
> kick start, two worked just fine, this one crashes in the same way ever with
> the zeros written out to all the disks.

Writing to the beginning isn't necessarily good enough.  Backup copies of metadata tend to get scattered all over the place and are sometimes picked up.

Also are you still seeing this with 7.2 final?  I see the original report was against the beta.

Comment 19 Peter Robinson 2016-01-12 03:26:17 UTC
> Writing to the beginning isn't necessarily good enough.  Backup copies of
> metadata tend to get scattered all over the place and are sometimes picked
> up.

I'm aware of that, I'd also tried dding /dev/zero to the entire set of disks. It's LVM on a RAID-1 two disk md-array. The non OS disks are md-raid 6,

> Also are you still seeing this with 7.2 final?  I see the original report
> was against the beta.

I need to rebuild all four of the machines with a clean RHEL 7.2 but have not had a change window to do it yet (we're awaiting some network changes in PXH2)

Comment 20 Dan Horák 2016-01-12 08:16:58 UTC
Can't the wipefs tool do the job, it should be able to clean up everything, all metadata?

Comment 21 David Lehman 2016-03-15 18:27:31 UTC
I think this is a duplicate of bug 1261621. Is the bug reproducible on 7.2 release or 7.3 nightlies?

Comment 22 Peter Robinson 2016-03-30 14:37:57 UTC
(In reply to David Lehman from comment #21)
> I think this is a duplicate of bug 1261621. Is the bug reproducible on 7.2
> release or 7.3 nightlies?

I will be rebuilding the Power8 infrastructure in Fedora from clean 7.2+updates in early April so I'll be able to report back then. We're in freeze at the moment for F-24 Alpha, and as it's production I don't have the resources to test 7.3 nightlies.

Comment 23 David Lehman 2016-03-30 14:53:56 UTC
That works for me -- thanks. I'll mark this as NEEDINFO until then so we don't forget.

Comment 24 Peter Robinson 2016-04-18 10:39:17 UTC
Getting the following, just trying to get all the rest of the details off the server.

An unknown error has occured, look at the /tmp/anaconda-tb* file(s) for more details

===============================================================================
An unknown error has occurred
===============================================================================
anaconda 21.48.22.56-1 exception report
Traceback (most recent call first):
  File "/usr/lib/python2.7/site-packages/blivet/devicelibs/lvm.py", line 538, in lvactivate
    raise LVMError("lvactivate failed for %s: %s" % (lv_name, msg))
  File "/usr/lib/python2.7/site-packages/blivet/devices/lvm.py", line 690, in _setup
    lvm.lvactivate(self.vg.name, self._name)
  File "/usr/lib/python2.7/site-packages/blivet/devices/storage.py", line 413, in setup
    self._setup(orig=orig)
  File "/usr/lib/python2.7/site-packages/blivet/deviceaction.py", line 650, in execute
    self.device.setup(orig=True)
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 377, in processActions
    action.execute(callbacks)
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 374, in doIt
    self.devicetree.processActions(callbacks)
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 224, in turnOnFilesystems
    storage.doIt(callbacks)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/install.py", line 186, in doInstall
    turnOnFilesystems(storage, mountOnly=flags.flags.dirInstall, callbacks=callbacks_reg)
  File "/usr/lib64/python2.7/threading.py", line 764, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 227, in run
    threading.Thread.run(self, *args, **kwargs)
LVMError: lvactivate failed for LogVol01: running lvm lvchange -a y --config  devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }  vg_server/LogVol01 failed

Comment 27 David Lehman 2016-04-19 14:20:24 UTC
Okay, I think I can see what's generally happening. When an md pv gets activated, udev fires off an asynchronous task to run pvscan on the newly-active pv. Until that pvscan has run, lvm has no knowledge of the md pv's existence (even if it was previously active on the system). Since that happens asynchronously, we need to find a way to tell when it's done. I will update the bug when I have more information.

Comment 28 David Lehman 2016-08-15 19:03:07 UTC
Peter, this is similar to bug 1261621 (how to know when lvmetad has auto-activated lvm on a just-activated md pv) except that the vg appears to contain no lvs. So how can we know when lvmetad has finished its work in this case?

Comment 29 Peter Rajnoha 2016-08-16 10:15:40 UTC
(In reply to David Lehman from comment #28)
> Peter, this is similar to bug 1261621 (how to know when lvmetad has
> auto-activated lvm on a just-activated md pv) except that the vg appears to
> contain no lvs. So how can we know when lvmetad has finished its work in
> this case?

We can't - the activation part is asynchronous operation - we call "pvscan --cache -aay" for each PV that appears (either from udev rule directly [RHEL6] or as part of lvm2-pvscan@major:minor.service [RHEL7]). Each pvscan informs lvmetad about new PV present in the system and lvmetad sends back information to pvscan whether the VG is now complete or not. If it's complete, the pvscan runs VG activation in addition (...while lvmetad continues in parallel with any other queries it may receive from other clients - LVM commands).

Now, the important part is that lvmetad doesn't get the information back from pvscan once pvscan activated the VG - lvmetad just caches LVM metadata, it doesn't store runtime information, like the activation state. So lvmetad doesn't do any waiting for pvscan to do the activation part. So if any LVM command queries lvmetad *after* it got the PV scan result and *before* the pvscan activates the VG, it will see the VG with all LVs and the LVM command itself will look for the activation state (it doens't get this information from lvmetad).

If some concrete device is expected within systemd environment, we can hook a service on .device sytemd unit representing the device itself and wait for it to appear this way (...like it is when there's a .mount unit for a device which waits for .device unit to appear with a timeout so it can do the actual mount).

Comment 30 Peter Rajnoha 2016-08-16 10:27:35 UTC
(In reply to David Lehman from comment #28)
> Peter, this is similar to bug 1261621 (how to know when lvmetad has
> auto-activated lvm on a just-activated md pv) except that the vg appears to
> contain no lvs. So how can we know when lvmetad has finished its work in
> this case?

(sorry, I misread the "contain no lvs")

...well, if there are no LVs in the VG at all, there's nothing extra that pvscan can do - there's nothing to activate. So it's just the scanning part that pvscan does. Then it's just about whether the VG is complete or not and we know that from the "partial" state of the VG (vgs -o+partial or the corresponding "p" char as 4th bit in vg_attr column).

Comment 31 David Lehman 2016-08-17 15:20:29 UTC
Please try this updates image and report back with the results:

  http://people.redhat.com/dlehman/updates-1268190.0.img

Comment 32 Peter Robinson 2016-08-17 16:19:35 UTC
(In reply to David Lehman from comment #31)
> Please try this updates image and report back with the results:
> 
>   http://people.redhat.com/dlehman/updates-1268190.0.img

I don't have the hardware spare to randomly test this at the moment, all 4 devices where we were seeing this problem are currently in production with numerous production services running on top, it would take weeks to schedule maintenance to migrate these or schedule downtime.

I can provide the kickstart that recreated it if that's useful.

Comment 33 David Lehman 2016-08-17 16:58:39 UTC
I imagine the kickstart would be useful to QE.

Comment 49 Jiri Konecny 2019-02-22 15:36:12 UTC
This issue is not active for more than a year so I'm closing it now. If it is still requested to be fixed please feel free to re-open it.


Note You need to log in before you can comment on or make changes to this bug.