Bug 481698

Summary: vgremove failed on s390*
Product: Red Hat Enterprise Linux 4 Reporter: Alexander Todorov <atodorov>
Component: anacondaAssignee: Anaconda Maintenance Team <anaconda-maint-list>
Status: CLOSED ERRATA QA Contact: Alexander Todorov <atodorov>
Severity: high Docs Contact:
Priority: high    
Version: 4.8CC: borgan, jgranado, mbroz
Target Milestone: betaKeywords: Regression, TestBlocker
Target Release: ---   
Hardware: s390   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 20:16:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
anacdump.txt
none
anacdump.txt with anaconda-10.1.1.94 none

Description Alexander Todorov 2009-01-27 07:33:19 UTC
Description of problem:
Installation fails with a traceback on s390 and s390x

Traceback (most recent call last):
  File "/var/tmp/anaconda-10.1.1.91//usr/lib/anaconda/gui.py", line 1074, in handleRenderCallback
    self.currentWindow.renderCallback()
  File "/var/tmp/anaconda-10.1.1.91//usr/lib/anaconda/iw/progress_gui.py", line 242, in renderCallback
    self.intf.icw.nextClicked()
  File "/var/tmp/anaconda-10.1.1.91//usr/lib/anaconda/gui.py", line 789, in nextClicked
    self.dispatch.gotoNext()
  File "/var/tmp/anaconda-10.1.1.91//usr/lib/anaconda/dispatch.py", line 171, in gotoNext
    self.moveStep()
  File "/var/tmp/anaconda-10.1.1.91//usr/lib/anaconda/dispatch.py", line 239, in moveStep
    rc = apply(func, self.bindArgs(args))
  File "/var/tmp/anaconda-10.1.1.91//usr/lib/anaconda/packages.py", line 564, in turnOnFilesystems
    partitions.doMetaDeletes(diskset)
  File "/var/tmp/anaconda-10.1.1.91//usr/lib/anaconda/partitions.py", line 1236, in doMetaDeletes
    lvm.vgremove(delete.name)
  File "/var/tmp/anaconda-10.1.1.91//usr/lib/anaconda/lvm.py", line 169, in vgremove
    raise SystemError, "vgremove failed"
SystemError: vgremove failed


Version-Release number of selected component (if applicable):
anaconda-10.1.1.91

How reproducible:
Always

Steps to Reproduce:
1. Start the installer and let it auto partition. Relevant kickstart:
bootloader
clearpart --all
autopart

2.
3.
  
Actual results:
Traceback. Install fails.

Expected results:
Install completes.

Additional info:
Seeing only on s390

Comment 2 Alexander Todorov 2009-01-27 09:27:24 UTC
Created attachment 330073 [details]
anacdump.txt

Notice the 'unknown device' value

Local variables in innermost frame:
pvs: ['/dev/dasdb1', '/dev/dasdc1', '/dev/dasdd1', '/dev/dasde1', '/dev/dasdf1', '/dev/dasdg1', '/dev/dasdh1', 'unknown device']
args: ['lvm', 'vgremove', 'VolGroup00']
pv: ('unknown device', 'VolGroup00', '2348810240')
vgname: VolGroup00
rc: 1280



and the leaking file descriptors:


/tmp/lvmout:
File descriptor 3 (/tmp/anaconda.log) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 4 (/tmp/product/.buildstamp) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 5 (socket:[959]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 6 (/proc/cmdline) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 7 (socket:[454]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 8 (socket:[459]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 9 (socket:[460]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 10 (socket:[464]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 11 (/.buildstamp) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 12 (socket:[473]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 13 (socket:[477]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 14 (socket:[478]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 15 (socket:[960]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 16 (socket:[487]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 17 (socket:[961]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 18 (pipe:[964]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 19 (pipe:[964]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 20 (socket:[1240]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 21 (socket:[1229]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 22 (socket:[1230]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 24 (socket:[1231]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 25 (socket:[1241]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 26 (socket:[1242]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 27 (socket:[2037]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 28 (socket:[2038]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 29 (socket:[2039]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 30 (socket:[2044]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 31 (socket:[2045]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 32 (socket:[2046]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 33 (socket:[2050]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 34 (socket:[2051]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
File descriptor 35 (socket:[2052]) leaked on lvm invocation. Parent PID 346: /usr/bin/python
  Couldn't find device with uuid 'ShV8tU-PTSa-kOLw-hQmw-k3Fi-FKQ0-A2xPoN'.
  Couldn't find device with uuid 'ShV8tU-PTSa-kOLw-hQmw-k3Fi-FKQ0-A2xPoN'.
  Couldn't find device with uuid 'ShV8tU-PTSa-kOLw-hQmw-k3Fi-FKQ0-A2xPoN'.
  Couldn't find device with uuid 'ShV8tU-PTSa-kOLw-hQmw-k3Fi-FKQ0-A2xPoN'.
  Couldn't find device with uuid 'ShV8tU-PTSa-kOLw-hQmw-k3Fi-FKQ0-A2xPoN'.
  Couldn't find device with uuid 'ShV8tU-PTSa-kOLw-hQmw-k3Fi-FKQ0-A2xPoN'.
  Couldn't find device with uuid 'ShV8tU-PTSa-kOLw-hQmw-k3Fi-FKQ0-A2xPoN'.
  Volume group "VolGroup00" not found, is inconsistent or has PVs missing.
  Consider vgreduce --removemissing if metadata is inconsistent.

Comment 3 Alexander Todorov 2009-01-27 11:27:08 UTC
This bug started happening around 0120.

0120.nightly has anaconda-10.1.1.92 and test case passes on s390 but fails on s390x (bug #480793).

0121.nightly has anaconda-10.1.1.92 and fails for s390 and s390x 

0123.nightly has anaconda-10.1.1.91 and fails for s390 only

the 0126.2 tree has anaconda-10.1.1.93 and fails for s390 (s390x untested yet).

Something other than anaconda is failing. Did lvm change?

Comment 4 Joel Andres Granados 2009-01-27 11:35:05 UTC
anaconda-10.1.1.91 is the version used  in rhel4.7
anaconda-10.1.1.92 anaconda version built around Jan 16.
anaconda-10.1.1.93 Final anaconda version with some fixes to partitioning, built on Jan 26.

I think that, if the bug is present on the three version of anaconda, it must be in some other place.  The solution can be in anaconda though.  I will compare the rhel5 and rhel4 code bases and see what I can back port from rhel5.

Comment 5 Joel Andres Granados 2009-01-27 14:07:46 UTC
Found what could be a solution in the rhel5 tree.  This was previously handled by doing a lvm vgreduce before the vgerase.  This will be present in the next version of anaconda anaconda-10.1.1.94

Comment 6 RHEL Program Management 2009-01-27 14:11:57 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Alexander Todorov 2009-01-29 09:03:43 UTC
Created attachment 330336 [details]
anacdump.txt with anaconda-10.1.1.94

The traceback now is:

Traceback (most recent call last):
  File "/var/tmp/anaconda-10.1.1.94//usr/lib/anaconda/gui.py", line 1074, in handleRenderCallback
    self.currentWindow.renderCallback()
  File "/var/tmp/anaconda-10.1.1.94//usr/lib/anaconda/iw/progress_gui.py", line 242, in renderCallback
    self.intf.icw.nextClicked()
  File "/var/tmp/anaconda-10.1.1.94//usr/lib/anaconda/gui.py", line 789, in nextClicked
    self.dispatch.gotoNext()
  File "/var/tmp/anaconda-10.1.1.94//usr/lib/anaconda/dispatch.py", line 171, in gotoNext
    self.moveStep()
  File "/var/tmp/anaconda-10.1.1.94//usr/lib/anaconda/dispatch.py", line 239, in moveStep
    rc = apply(func, self.bindArgs(args))
  File "/var/tmp/anaconda-10.1.1.94//usr/lib/anaconda/packages.py", line 564, in turnOnFilesystems
    partitions.doMetaDeletes(diskset)
  File "/var/tmp/anaconda-10.1.1.94//usr/lib/anaconda/partitions.py", line 1236, in doMetaDeletes
    lvm.vgremove(delete.name)
  File "/var/tmp/anaconda-10.1.1.94//usr/lib/anaconda/lvm.py", line 200, in vgremove
    raise SystemError, "pvremove failed"
SystemError: pvremove failed

Local variables in innermost frame:
vgname: VolGroup00
pv: ('unknown device', 'VolGroup00', '2348810240')
args: ['lvm', 'pvremove', '-ff', '-y', '-v', 'unknown device']
pvs: ['/dev/dasdb1', '/dev/dasdc1', '/dev/dasdd1', '/dev/dasde1', '/dev/dasdf1', '/dev/dasdg1', '/dev/dasdh1', 'unknown device']
pvname: unknown device
rc: 1280


I believe this is another manifestation of the same issue. Please advise if you need another bug report.

Comment 10 Joel Andres Granados 2009-01-29 11:07:11 UTC
Alex:

Can this be found in other archs?  or is it only s390 specific?

Comment 11 Alexander Todorov 2009-01-29 11:26:41 UTC
Joel,
the new traceback is seen on i386 and s390 with the latest compose.

Comment 12 Joel Andres Granados 2009-01-29 12:33:49 UTC
The "unknown_device" comes from calling `lvm vgdisplay -C --noheadings --units b --separator : --nosuffix --options vg_name,vg_size,vg_extent_size'.  apparently, vg_name is left wih "unknown_device" and it happens with dasda (it the only one missing from the device list).  How can this happen with lvm and how can it be avoided through the command line?

Comment 13 Milan Broz 2009-01-29 14:18:21 UTC
(btw why parsing vgdisplay output when vgs is designed for scripting much more better?)

Anyway, if you have VG consisting from several PVs, and you lost some PVs, metadata are still read from the remaining PVs (every PV contains full set of metadata by default).

So LVM know that there should be some PV with known UUID, but cannot find it. In vgs output this device will be marked like "unknown device".

If you just need to remove all devices (to wipe metadata), you can probably ignore such devices.

Or call vgreduce --removemissing --force first.

Comment 14 Joel Andres Granados 2009-01-29 17:23:57 UTC
(In reply to comment #12)
> The "unknown_device" comes from calling `lvm vgdisplay -C --noheadings --units
This should be pvdisplay instead of vgdisplay.

> b --separator : --nosuffix --options vg_name,vg_size,vg_extent_size'
this should be pv_name,vg_name,pv_size instead of vg_name,vg_size,vg_extent_size.

sorry for the confusion.

Does comment 13 still apply considering my mistake?

Comment 15 Joel Andres Granados 2009-01-29 17:32:44 UTC
Also notice.  I'm pretty sure, by looking at the log files that there is a dasda device in the system.  for some reason the pvdisplay is not seeing it properly.  I guess this is due to the fact that dasda does not have the correct metadata and so it gets "unknown device".

In any case I don't think that avoiding "unknown device" when it happens is going to cause major error.

Comment 16 Milan Broz 2009-01-29 18:27:13 UTC
Yes, comment #13 still apply, you can easily simulate this situation (it is not arch dependent, just s390 uses often lot of small devices combined into one VG).

Create VG over several devices:
# pvcreate /dev/sd[bcd]
  Physical volume "/dev/sdb" successfully created
  Physical volume "/dev/sdc" successfully created
  Physical volume "/dev/sdd" successfully created
# vgcreate vg_test /dev/sd[bcd]
  Volume group "vg_test" successfully created
# pvdisplay -C --noheadings --units b --separator : --nosuffix --options pv_name,vg_name,pv_size
  /dev/sdb:vg_test:209715200
  /dev/sdc:vg_test:209715200
  /dev/sdd:vg_test:209715200

Now we destroy metadata on sdb, but sda + sdc still contains metadata:

# dd if=/dev/zero of=/dev/sdb bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0290812 s, 36.1 MB/s

# pvdisplay -C --noheadings --units b --separator : --nosuffix --options pv_name,vg_name,pv_size
  Couldn't find device with uuid 'LzIgRJ-r8Pc-1AL9-qW6k-6aOo-VlxP-7EB2hR'.
  ...
  Couldn't find device with uuid 'LzIgRJ-r8Pc-1AL9-qW6k-6aOo-VlxP-7EB2hR'.
  /dev/sdc:vg_test:209715200
  /dev/sdd:vg_test:209715200
  unknown device:vg_test:209715200

(this is upstream lvm2 code, but 4.8 should be exactly the same here)

Probably skip "unknown device" for pvremove -ff is ok for now.

(just note: when lvm cannot find some device referenced from metadata, it tries to scan more and more aggresively to find it, last try is scan all block devices in system)

Comment 17 Brock Organ 2009-02-04 21:30:21 UTC
changing status from ON_QA to FAILS_QA (the issue is still open, see comment #11) ...

Comment 18 Joel Andres Granados 2009-02-05 13:11:18 UTC
A change when it on Jan29 for this.  Is it still failing with the latest tree?

Comment 19 Alexander Todorov 2009-02-05 13:16:44 UTC
All nightlies since Feb 1st pass and there's no traceback. Moving to VERIFIED.

Comment 21 errata-xmlrpc 2009-05-18 20:16:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0978.html