Bug 876789

Summary: FormatDestroyError: error wiping old signatures from /dev/md/Volume0_0p2: 1
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: agk, bmarzins, bmr, dwysocha, g.kaviyarasu, heinzm, Jes.Sorensen, jkortus, joerg.lechner, jonathan, kay, kparal, kzak, lpoetter, lvm-team, mschmidt, msnitzer, prajnoha, prockai, robatino, rsterenb, rvykydal, sbueno, stephent98, vanmeeuwen+fedora, zkabelac
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:ef387c87fc27789661a312ada280a0285dba4e5df7e4e8fca1f49fa37c659b4b https://fedoraproject.org/wiki/Common_F18_bugs#old-signatures-crash AcceptedBlocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-19 00:13:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 752661    
Attachments:
Description Flags
File: anaconda-tb
none
File: product
none
File: type
none
File: ifcfg.log
none
File: storage.log
none
File: version
none
File: environ
none
File: executable
none
File: anaconda.log
none
File: syslog
none
File: hashmarkername
none
File: packaging.log
none
File: cmdline_file
none
File: release
none
File: program.log
none
program.log from successful f17 install (for comparison)
none
program.log from successful f17 install (for comparison)
none
storage.log from successful f17 install (for comparison)
none
ps aux output after the bug happens (and while the command is still erroring when run manually)
none
storage.log from the install-over-softRAID case
none
anaconda.log with old lvm
none
program.log with existing lvm
none
storage.log with old lvm
none
syslog with old lvm
none
ls -l /dev/mapper output confirming the old lvm volumes present
none
anaconda-tb
none
syslog luks
none
anaconda.log luks
none
anaconda-tb luks
none
program.log luks
none
storage.log luks
none
lsblk luks
none
F18 TC1 logs none

Description Adam Williamson 2012-11-15 00:23:09 UTC
Description of problem:
After working around #873576, continued with install to an IMSM RAID-1 array. Used autopart. It claimed I had enough space. The array did not contain any data or partitions and never had, but I had booted the installer to the fail state from 873576 using the array several times. The crash happened during the phase of actual installation where partitioning happens.

Version-Release number of selected component:
anaconda-18.29

Additional info:
libreport version: 2.0.18
cmdline:        /usr/bin/python  /sbin/anaconda
kernel:         3.6.6-3.fc18.x86_64

description:
:The following was filed automatically by anaconda:
:anaconda 18.29 exception report
:Traceback (most recent call first):
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/formats/__init__.py", line 322, in destroy
:    raise FormatDestroyError(msg)
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devices.py", line 1478, in _postCreate
:    DeviceFormat(device=self.path, exists=True).destroy()
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devices.py", line 789, in create
:    self._postCreate()
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/deviceaction.py", line 241, in execute
:    self.device.create()
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 323, in processActions
:    action.execute()
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 336, in doIt
:    self.devicetree.processActions()
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 174, in turnOnFilesystems
:    storage.doIt()
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/install.py", line 113, in doInstall
:    turnOnFilesystems(storage)
:  File "/usr/lib64/python2.7/threading.py", line 504, in run
:    self.__target(*self.__args, **self.__kwargs)
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 91, in run
:    threading.Thread.run(self, *args, **kwargs)
:FormatDestroyError: error wiping old signatures from /dev/md/Volume0_0p2: 1

Comment 1 Adam Williamson 2012-11-15 00:23:13 UTC
Created attachment 645262 [details]
File: anaconda-tb

Comment 2 Adam Williamson 2012-11-15 00:23:14 UTC
Created attachment 645263 [details]
File: product

Comment 3 Adam Williamson 2012-11-15 00:23:16 UTC
Created attachment 645264 [details]
File: type

Comment 4 Adam Williamson 2012-11-15 00:23:18 UTC
Created attachment 645265 [details]
File: ifcfg.log

Comment 5 Adam Williamson 2012-11-15 00:23:20 UTC
Created attachment 645266 [details]
File: storage.log

Comment 6 Adam Williamson 2012-11-15 00:23:22 UTC
Created attachment 645267 [details]
File: version

Comment 7 Adam Williamson 2012-11-15 00:23:23 UTC
Created attachment 645268 [details]
File: environ

Comment 8 Adam Williamson 2012-11-15 00:23:25 UTC
Created attachment 645269 [details]
File: executable

Comment 9 Adam Williamson 2012-11-15 00:23:27 UTC
Created attachment 645270 [details]
File: anaconda.log

Comment 10 Adam Williamson 2012-11-15 00:23:28 UTC
Created attachment 645271 [details]
File: syslog

Comment 11 Adam Williamson 2012-11-15 00:23:30 UTC
Created attachment 645272 [details]
File: hashmarkername

Comment 12 Adam Williamson 2012-11-15 00:23:31 UTC
Created attachment 645273 [details]
File: packaging.log

Comment 13 Adam Williamson 2012-11-15 00:23:33 UTC
Created attachment 645274 [details]
File: cmdline_file

Comment 14 Adam Williamson 2012-11-15 00:23:35 UTC
Created attachment 645275 [details]
File: release

Comment 15 Adam Williamson 2012-11-15 00:23:36 UTC
Created attachment 645276 [details]
File: program.log

Comment 16 Adam Williamson 2012-11-15 23:44:13 UTC
re-testing install to an IMSM RAID-1 array, with the short-term fix for 873576 in place, to a newly-created array. Seems like I hit this bug dependably. It may be an interaction with the resync process, but F17 did not have that problem, it can install just fine to a newly-created array, while the array is re-syncing.

Package: anaconda-18.29
OS Release: Fedora release 18

Comment 17 Adam Williamson 2012-11-15 23:45:42 UTC
Nominating as a blocker, this seems to be a showstopper for the fairly simple case of creating an IMSM RAID-1 array and installing to it.

Comment 18 Adam Williamson 2012-11-15 23:50:29 UTC
This appears to be the problem, from program.log:

19:16:48,018 INFO program: Running... wipefs -a /dev/md/Volume0_0p2
19:16:48,026 ERR program: wipefs: error: /dev/md/Volume0_0p2: probing initialization failed: Device or resource busy

note that the last test was a freshly-created array, so it had no partitions or anything. anaconda is trying to run wipefs on a partition it just created, for some reason, I think.

Comment 19 Adam Williamson 2012-11-16 00:30:24 UTC
Created attachment 646083 [details]
program.log from successful f17 install (for comparison)

I did an f17 run for comparison, attaching program.log and storage.log. It looks like f17 did call wipefs too, but the call succeeds, no problems.

Comment 20 Adam Williamson 2012-11-16 00:33:27 UTC
Created attachment 646096 [details]
program.log from successful f17 install (for comparison)

I did an identical F17 install (freshly created RAID-1 array, LVM autopart) for comparison. It works fine. Attaching program.log and storage.log. Note wipefs calls do happen, but work fine, no 'busy' errors.

Comment 21 Adam Williamson 2012-11-16 00:35:32 UTC
Created attachment 646097 [details]
storage.log from successful f17 install (for comparison)

Comment 22 Adam Williamson 2012-11-16 00:47:08 UTC
dlehman asked for the contents of /dev/md/ when the bug happens, though I think he may have been thinking of another bug when he asked for that. But just to confirm, /dev/md/Volume0_0p2 does exist (the contents are Volume0_0, Volume0_0p1, Volume0_0p2 and imsm0 ). Note that the error is that it's 'busy', not that it doesn't exist.

Comment 23 Adam Williamson 2012-11-16 00:51:35 UTC
oh, resync is in progress in both the f17 and f18 cases when partitioning happens, note.

Comment 24 Adam Williamson 2012-11-16 23:25:49 UTC
Further info asked for by dlehman: after the bug happens, manually running the wipefs command from ctrl-alt-f2 results in the same error. I'll attach the result of running 'ps aux' at that point.

I will wait out the resync process and see if the command succeeds after that.

Comment 25 Adam Williamson 2012-11-16 23:27:46 UTC
Created attachment 646653 [details]
ps aux output after the bug happens (and while the command is still erroring when run manually)

Comment 26 Adam Williamson 2012-11-17 02:05:30 UTC
after waiting out resync, wipefs still fails. and if I reboot and re-attempt the install, without resync happening at all during the install process, I still hit this tb. So it's not resync interfering, it's something else.

Comment 27 Adam Williamson 2012-11-17 02:57:38 UTC
This doesn't happen with a softraid install on the same system, btw. pretty different case, but we thought it might be useful data.

Comment 28 Jes Sorensen 2012-11-19 10:10:06 UTC
Can you provide the output of /proc/mdstat?

Thanks,
Jes

Comment 29 Jes Sorensen 2012-11-19 10:12:26 UTC
In addition, please check that mdmon is indeed running:

ps aux | grep dmon 

Don't grep for mdmon as it may be @dmon if launched with --off-root

Jes

Comment 30 Jes Sorensen 2012-11-19 11:37:05 UTC
Adam,

When you say the array never had any data on it - do you mean it had never
been created as a RAID array and you just created the container in the BIOS,
or did you have the RAID-1 array already created?

Testing with an IMSM setup here with two RAID-1 arrays on it, I don't see
the problem you report.

Jes

Comment 31 Jes Sorensen 2012-11-19 12:29:13 UTC
Note here it runs wipefs like this

wipefs -a /dev/md/IMSM00_0p3

Comment 32 Adam Williamson 2012-11-19 16:14:04 UTC
jes: happens either way, for me. but mostly I've been testing with a completely freshly created array - go into the Intel RAID bios, remove the existing array, reboot (probably not necessary but eh) create a new array, boot straight into the installer and try to install.

Comment 33 Adam Williamson 2012-11-19 16:15:06 UTC
jes: that's about the same - I believe the Volume0 / IMSM00 difference is just how our BIOSes by default name the created array. I could change mine to anything else at the creation step if I want to.

Comment 34 Adam Williamson 2012-11-19 16:38:04 UTC
Discussed at 2012-11-19 QA meeting, acting as a blocker review meeting. Rejected as a blocker on the basis that it's working for Jes - we're pretty generous with the RAID criteria for Beta, and read it more as 'it at least has to work for someone' than 'it must work for everyone'. This can be re-visited if we figure out what's going wrong in more detail and it looks serious. Accepted as NTH.

Comment 35 Jes Sorensen 2012-11-19 17:15:53 UTC
Adam: Ok, I'll try and blow away the arrays I have on this scratch box and
report back.

Comment 36 Jes Sorensen 2012-11-19 17:38:04 UTC
Adam: I still need your /proc/mdstat output from just after you get the error.

However I have a theory, which matches in my rather limited testing here.
If the array is clean, the installation process succeeds fine, if the array
is dirty and in the process of being resynchronized, wipefs fails for me.

Jes

Comment 37 Adam Williamson 2012-11-19 20:12:25 UTC
Nope. I already tested that, see above. I've tested running the wipefs command manually after resync completed, and also waiting for resync to complete, rebooting, verifying that resync was not running, and trying the install: I hit the bug, and manually running wipefs fails. See https://bugzilla.redhat.com/show_bug.cgi?id=876789#c26 . That was what I explicitly spent three hours testing that day.

Comment 38 Adam Williamson 2012-11-19 20:14:01 UTC
oh, and just for comprehensiveness, I also checked that it works in F17 even if resync is in progress: the attached logs from an F17 install are from an install which was run with resync in progress the whole time. The entire install succeeds, including the same wipefs calls that F18 uses. My testing suggests pretty solidly that this has nothing to do with resync.

Comment 39 Karel Zak 2012-11-20 09:37:41 UTC
Hmm.. maybe I know where is the problem, wipefs opens devices with O_EXCL (in f18 and rawhide). If there is a partition table or mounted filesystem then it returns EBUSY. See bug #872831 for more details.

Comment 40 Jes Sorensen 2012-11-20 09:53:00 UTC
Karel, I just came up with the same conclusion here.

The problem is as follows, wipefs uses O_EXCL to open the device it is
trying to wipe. However O_EXCL will fail on a device if it is already open
per 'man 2 open'.

If one has existing LVM volume groups on the disk(s), Anaconda will assemble
those, and they will leave the open count > 0, causing the O_EXCL to fail.

I just managed to reproduce the problem here. If I install onto LVM, then
try to do another install onto LVM, it fails for me too. If I jump to the
shell and manually wipe out the pv and vg, I can run wipefs again on the
partition.

It looks like Anaconda needs to make sure to check it doesn't leave anything
assembled on a device it is intending to run wipefs on. I don't think this is
in fact limited to IMSM RAID arrays, it ought to happen to on everything where
there can be LVM devices on top, which raises the severity of the problem.

Jes

Comment 41 Karel Zak 2012-11-20 11:01:59 UTC
rjones suggested --force for wipefs to avoid O_EXCL. I think especially for anaconda where we want delete everything it seems as a good idea.

I'll probably improve wipefs, the O_EXCL is pretty useless on devices with partition table (e.g. sda) because for kernel such device is used and open returns EBUSY.

Comment 42 Jes Sorensen 2012-11-20 11:21:08 UTC
It might be useful to have --force, however I still believe that trying to
wipe out the underlying RAID array while LVM volumes are still assembled on
top of it is a rather serious problem here. By-passing the issue by using
--force would just risk causing data corruption IMHO.

Comment 43 Karel Zak 2012-11-20 15:27:35 UTC
I agree :-)

Comment 44 David Lehman 2012-11-20 16:28:04 UTC
(In reply to comment #40)
> If one has existing LVM volume groups on the disk(s), Anaconda will assemble
> those, and they will leave the open count > 0, causing the O_EXCL to fail.

Anaconda is not activating any lvm in the attached logs.

Are we talking about lvm on top of degraded md? The case of reinstalling over a sane lvm configuration has been done more than a few times, so I rather doubt that this is a case of anaconda just not having a clue how manage an install with preexisting lvm.

Comment 45 Jes Sorensen 2012-11-20 16:50:32 UTC
We're talking about LVM volumes created on top of a clean IMSM RAID array.
If I do an install to IMSM RAID1 with LVM, then try to re-do the same install
it will fail.

You can see it happen quite easily by performing the first install, then
re-launching the install, selecting the same device, 'reclaim' the space used
and then go to the point just before you launch the install. At this point,
jump to the shell and check /proc/partitions or /dev/mapper - you'll see the old
LVM devices are still visible.

Jes

Comment 46 David Lehman 2012-11-20 17:19:27 UTC
Please provide the following logs from the scenario you just described:

 /tmp/syslog
 /tmp/storage.log
 /tmp/program.log
 /tmp/anaconda.log

Thanks.

Comment 47 Adam Williamson 2012-11-20 19:52:01 UTC
So this is interesting info but I'm still not sure it's capturing my case. Remember, each time I test, I wipe the existing RAID-1 array and create a new one. Are you saying that somehow there's LVM config data from some previous install remaining even after I do that? Or is the stuff above not actually what's causing the problem for me?

Comment 48 Jes Sorensen 2012-11-20 21:44:17 UTC
David: I'll get it to you, but I can't do it before Thursday. It's 22h40 here
now and I am not near the test box tomorrow.

Adam: That would explain things, especially if you create the same RAID device
over and over again. It should be easy to check though, just jump to the shell
before clicking go, and check what you have in /dev/mapper

Jes

Comment 49 Adam Williamson 2012-11-22 06:55:22 UTC
Tried to do a custom btrfs install over top of a previous software RAID install.

Package: anaconda-18.29.2
OS Release: Fedora release 18-Beta

Comment 50 Adam Williamson 2012-11-22 06:56:39 UTC
so that last time I hit it a different way - I did a software RAID install, and then I tried to do an install with different layout over top of that install. Basically, looks like wipefs is not reliable for destroying existing RAID devices, any more. I don't believe LVM was involved, but I may be missing something. Will attach storage.log .

Comment 51 Adam Williamson 2012-11-22 06:58:49 UTC
Created attachment 649549 [details]
storage.log from the install-over-softRAID case

Comment 52 Adam Williamson 2012-11-22 07:07:33 UTC
as you seem to be able to hit this with various 'install over existing RAID' cases, commonbugs. we should document that you need to zero out the drives...

Comment 53 Jes Sorensen 2012-11-22 11:02:53 UTC
Created attachment 649692 [details]
anaconda.log with old lvm

Comment 54 Jes Sorensen 2012-11-22 11:04:09 UTC
Created attachment 649693 [details]
program.log with existing lvm

Comment 55 Jes Sorensen 2012-11-22 11:05:02 UTC
Created attachment 649694 [details]
storage.log with old lvm

Comment 56 Jes Sorensen 2012-11-22 11:05:54 UTC
Created attachment 649695 [details]
syslog with old lvm

Comment 57 Jes Sorensen 2012-11-22 11:07:21 UTC
Created attachment 649696 [details]
ls -l /dev/mapper output confirming the old lvm volumes present

Comment 58 Jes Sorensen 2012-11-22 11:44:01 UTC
Adam,

It can be a real headache for people to zero out their drives. There are going
to be users who simply wish to reinstall on top of an existing install rather
than doing an upgrade. They will get hit by this, and it may not be that simple
for them to zero out the drives.

It's still fine to document 'if a raid install crashes, try zeroing your drives
as a workaround' but I do not think it should be documented as a must do.

Jes

Comment 59 Adam Williamson 2012-11-22 11:47:25 UTC
yes, that was the idea.

Comment 60 Kamil Páral 2012-11-22 15:44:26 UTC
I have started a netinst+inst.repo=http:// installation. This happened when "creating ext4 on /dev/sda1". In the partitioning step I used guided partitioning and marked all existing partitions to be deleted.

Package: anaconda-18.29.2
Architecture: i686
OS Release: Fedora release 18-Beta

Comment 61 Kamil Páral 2012-11-22 15:51:58 UTC
Created attachment 649905 [details]
anaconda-tb

Traceback from the crash. I _wasn't_ using any RAID, just a single standard disk.

Comment 62 Kamil Páral 2012-11-22 15:56:59 UTC
Can't reproduce, because my existing layout was destroyed in the process.

Comment 63 Adam Williamson 2012-11-22 20:23:21 UTC
From Kamil's log:

10:41:40,761 INFO program: Running... wipefs -a /dev/sda2
10:41:40,790 ERR program: wipefs: error: /dev/sda2: probing initialization failed: Device or resource busy

so we can hit this with no RAID involved at all, it seems...

Comment 64 Jes Sorensen 2012-11-23 09:35:27 UTC
Adam: I can confirm this finding, reproducing this without RAID is rather
easy. It matches my theory smack on.

1) Do a fresh install onto a fresh disk, and click yes to encrypt it
2) Reboot back into the installer, now select to install back onto the same disk
3) Select reclaim of all the partitions that were created in the previous install
4) Watch the fireworks

I'll post logs in a minute.

Speaking of debug output - please add lsblk output to the list of default
generated output when anaconda runs.

Jes

Comment 65 Jes Sorensen 2012-11-23 09:36:31 UTC
Created attachment 650285 [details]
syslog luks

Comment 66 Jes Sorensen 2012-11-23 09:37:31 UTC
Created attachment 650286 [details]
anaconda.log luks

Comment 67 Jes Sorensen 2012-11-23 09:38:22 UTC
Created attachment 650287 [details]
anaconda-tb luks

Comment 68 Jes Sorensen 2012-11-23 09:39:04 UTC
Created attachment 650288 [details]
program.log luks

Comment 69 Jes Sorensen 2012-11-23 09:39:42 UTC
Created attachment 650289 [details]
storage.log luks

Comment 70 Jes Sorensen 2012-11-23 09:40:10 UTC
Created attachment 650290 [details]
lsblk luks

Comment 71 Karel Zak 2012-11-23 09:52:40 UTC
Guys,

Running... wipefs -a /dev/sda2
wipefs: error: /dev/sda2: probing initialization failed: Device or resource busy

lsblk:

`-sda2                   8:2    0  29.5G  0 part 
  |-fedora-swap (dm-1) 253:1    0   7.8G  0 lvm  
  `-fedora-root (dm-2) 253:2    0  21.8G  0 lvm  

you cannot call wipefs in situation when the device (sda2) is actively used, just deactive lvm.

Thanks Jes!

Comment 72 Kamil Páral 2012-11-23 15:58:45 UTC
This is most certainly a Final blocker, proposing as one.
 The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above 
https://fedoraproject.org/wiki/Fedora_18_Final_Release_Criteria

Comment 73 Adam Williamson 2012-11-24 08:13:02 UTC
Jes: "please add lsblk output to the list of default generated output when anaconda runs" - can you please file a separate bug for that? The request will likely get lost if it's just mentioned as part of a separate bug.

Comment 74 Jes Sorensen 2012-11-25 16:29:24 UTC
(In reply to comment #73)
> Jes: "please add lsblk output to the list of default generated output when
> anaconda runs" - can you please file a separate bug for that? The request
> will likely get lost if it's just mentioned as part of a separate bug.

Done BZ#879940

Jes

Comment 75 David Lehman 2012-11-26 18:51:29 UTC
Apparently someone started automatically activating all lvm between TC8 and TC9.

Also from Kamil's logs:

15:41:36,945 ERR systemd-udevd: '/usr/sbin/lvm pvscan --cache --activate ay --major 8 --minor 4' [1272] terminated by signal 6 (Aborted)


That is almost certainly the problem.

Comment 76 David Lehman 2012-11-26 19:52:32 UTC
Comment 61 appears to be caused by us trying to wipe any potential old metadata from sda2 immediately after creating the partition. It fails because something is running on that partition, triggered automatically by its appearance, thereby preventing us from wiping it. What a joke.

Here's a theory on why there's no lvm data in sda2's udev db entry.
From syslog provided in comment 65:

09:26:51,257 INFO kernel:[   16.113655] lvm[928]: segfault at 38 ip 0000000000442ca0 sp 00007fff79186d18 error 4 in lvm[400000+e5000]
09:26:51,257 ERR systemd-udevd: '/sbin/lvm pvs --config 'global {locking_type=4}' --units k --nosuffix --nameprefixes --rows --unquoted --noheadings -opv_name,pv_uuid,pv_size,pv_pe_count,pv_pe_alloc_count,pe_start,vg_name,vg_uuid,vg_size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,pv_count /dev/sda2' [928] terminated by signal 11 (Segmentation fault)
09:26:51,265 ERR systemd-udevd: '/sbin/lvm pvs --config 'global {locking_type=4}' --units k --nosuffix --nameprefixes --rows --unquoted --noheadings -olv_name,lv_uuid,lv_size,lv_attr /dev/sda2' [930] terminated by signal 11 (Segmentation fault)
09:26:51,265 INFO kernel:[   16.121496] lvm[930]: segfault at 38 ip 0000000000442ca0 sp 00007fffeb7ac468 error 4 in lvm[400000+e5000]
09:26:51,580 INFO kernel:[   16.436417] lvm[943]: segfault at 38 ip 0000000000442ca0 sp 00007ffff6ce5c98 error 4 in lvm[400000+e5000]
09:26:51,581 ERR systemd-udevd: '/sbin/lvm pvs --config 'global {locking_type=4}' --units k --nosuffix --nameprefixes --rows --unquoted --noheadings -opv_name,pv_uuid,pv_size,pv_pe_count,pv_pe_alloc_count,pe_start,vg_name,vg_uuid,vg_size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,pv_count /dev/sdb2' [943] terminated by signal 11 (Segmentation fault)
09:26:51,588 INFO kernel:[   16.444455] lvm[945]: segfault at 38 ip 0000000000442ca0 sp 00007fff2a205288 error 4 in lvm[400000+e5000]
09:26:51,589 ERR systemd-udevd: '/sbin/lvm pvs --config 'global {locking_type=4}' --units k --nosuffix --nameprefixes --rows --unquoted --noheadings -olv_name,lv_uuid,lv_size,lv_attr /dev/sdb2' [945] terminated by signal 11 (Segmentation fault)

I would recommend that you open a bug against lvm to address this.

Comment 77 Adam Williamson 2012-11-27 09:59:22 UTC
"Apparently someone started automatically activating all lvm between TC8 and TC9."

That sounds extremely like it would've come from the RAID 'fix':

https://bugzilla.redhat.com/show_bug.cgi?id=873576

that went in for TC9 and could certainly have the effect described, if it's triggered by udev. (The 'fix' was to revert a udev patch which kills processes forked off from udev rules - see https://bugzilla.redhat.com/show_bug.cgi?id=873576#c57 , that came in with systemd 183). Still, we had mdmon activated by udev in F17, and this problem apparently didn't crop up. Did we suppress the activation of LVM by udev in previous releases somehow? Or did udev have nothing to activate LVM in previous releases?

CCing Lennart, Kay and LVM folks. This is starting to look an awful lot like a clusterf**k, we might need to get systemd, udev, anaconda and storage folks all together to thrash out how this ought to actually work.

Comment 78 Peter Rajnoha 2012-11-27 10:33:01 UTC
Please, try the latest lvm2-2.02.98-3.fc18 and see if you can still hit the problem (should be in testing repo at the moment).

Comment 79 Peter Rajnoha 2012-11-27 10:38:24 UTC
The version before used lvmetad by default if lvm.conf is not used and/or use_lvmetad configuration setting was not defined. This should be disabled by default in F18 (only used if directly set in the conf).

However, if this is the case, then we need to analyze the problem more as it shouldn't fail even if lvmetad is used!

Comment 80 Peter Rajnoha 2012-11-27 10:43:40 UTC
If lvmetad is used, the activation is done automatically based on incoming udev events and once the VG is complete, the volumes are all activated unless they're limited by activation/auto_activation_volume_list lvm.conf setting.

Comment 81 Adam Williamson 2012-11-29 19:22:11 UTC
Discussed at 2012-11-29 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-29/f18final-blocker-review-1.1.2012-11-29-17.01.log.txt . Accepted as a blocker per criterion "The installer's custom partitioning mode must be capable of the following: Creating, destroying and assigning mount points to partitions of any specified size using most commonly-used filesystem types ..." (though actually this is not custom-specific).

Comment 82 Chris Lumens 2012-11-30 02:55:54 UTC
Looks like we're waiting for a retest involving a tree with the version of the lvm2 package mentioned in comment #78, right?

Comment 83 Adam Williamson 2012-11-30 05:35:46 UTC
Yes, but annoyingly, I'm having trouble reproducing this reliably. Jes' reproduction steps don't seem to be working for me. I found *another* bug, but couldn't make this one happen on demand...

Comment 84 joerg.lechner 2012-11-30 17:43:03 UTC
installing Fedora 18 Beta. LXDE Desktop, options: LXDE tools, administration, libre office.

Package: anaconda-18.29.2
Architecture: i686
OS Release: Fedora release 18-Beta

Comment 85 rsterenb 2012-12-05 20:48:50 UTC
I'm installing x86_64 F18 in KVM on x86_64 F17.
VM configuation: 1vCPU, 2GB RAM, 50GB disk (virtio) and qxl video.

When installing F18 beta, I just ran Anaconda: select timezone, packages, partitioning. The latter is quirky in itself as I'm unable to reclaim existing partitions (the reclaim button stays disabled no matter what I do): I had to create a new LV or erase any partition on the LV before install. When the disk should be partitioned and the root password must be entered (not getting the chance to do that) the error occurs.


Package: anaconda-18.29.2
OS Release: Fedora release 18-Beta

Comment 86 Brian Lane 2012-12-06 17:17:41 UTC
http://dl.fedoraproject.org/pub/alt/qa/20121205_f18-smoke4/ was built with lvm2-2.02.98-3.fc18, please retest with it.

Comment 87 Adam Williamson 2012-12-11 00:31:24 UTC
-3 is stable now. If no-one is able to confirm directly, we can probably just close this after a week or two of no more duplicate reports with Final TC1 or later.

Comment 88 Jes Sorensen 2012-12-12 14:20:07 UTC
Created attachment 662386 [details]
F18 TC1 logs

Sorry for spoiling the party - I finally got my new IMSM test box assembled,
pulled TC1 and I see what looks to me like the same problem.

Comment 89 David Lehman 2012-12-12 14:56:00 UTC
Jes, the crash you hit is totally unrelated to this bug. Please open a separate bug.

anaconda 18.36 exception report
Traceback (most recent call first):
  File "/usr/lib/python2.7/site-packages/dbus/connection.py", line 651, in call_blocking
    message, timeout)
  File "/usr/lib/python2.7/site-packages/dbus/proxies.py", line 145, in __call__
    **keywords)
  File "/usr/lib/python2.7/site-packages/dbus/proxies.py", line 70, in __call__
    return self._proxy_method(*args, **keywords)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/network.py", line 467, in createMissingDefaultIfcfgs
    setting = con.GetSettings()
  File "/usr/lib64/python2.7/site-packages/pyanaconda/network.py", line 1028, in networkInitialize
    createMissingDefaultIfcfgs()
  File "/usr/lib64/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 91, in run
    threading.Thread.run(self, *args, **kwargs)
DBusException: org.freedesktop.DBus.Error.UnknownMethod: Method "GetSettings" with signature "" on interface "(null)" doesn't exist

Comment 90 Radek Vykydal 2012-12-12 15:03:57 UTC
(In reply to comment #89)
> Jes, the crash you hit is totally unrelated to this bug. Please open a

The traceback from comment #89 is fixed in bug #884660 in version 18.37 of anaconda.

Comment 91 Jes Sorensen 2012-12-12 15:20:10 UTC
If this is already fixed, I won't bother with a new bug for it.

Any ETA for when there will be an install image with the update I can pull
for testing?

Thanks,
Jes

Comment 92 Adam Williamson 2012-12-12 22:46:22 UTC
There have been a few already, but 18.37 keeps having much more serious bugs that would probably block you from testing this. Another smoke build with http://koji.fedoraproject.org/koji/buildinfo?buildID=372476 should show up soon, if it doesn't seem to have any obvious facepalms, I'll let you know.

Comment 93 Adam Williamson 2012-12-13 04:28:12 UTC
smoke6 looks to be basically workable, so give it a shot:

https://dl.fedoraproject.org/pub/alt/qa/20121212_f18-smoke6/

Comment 94 Adam Williamson 2012-12-18 03:31:17 UTC
TC3 should also include the fix:

https://dl.fedoraproject.org/pub/alt/stage/18-TC3/

Comment 95 Jes Sorensen 2012-12-18 17:00:29 UTC
I've spent the day running a number of tests with 18-TC3 on my IMSM test
box, and I can no longer produce the install problem. Note that this is not
the old box that I reproduced it on earlier, that is currently in pieces.

Note that I still see the hang on reboot due to / not being unmounted
correctly problem, which we see in BZ#834245, but it is less critical than
not being able to install, and I think we can fix that post release.

I would opt for closing this bug for now.

Jes

Comment 96 Adam Williamson 2012-12-19 00:13:19 UTC
OK, let's go with that. 18.37.3 is now stable.