Bug 480793
Summary: | [RHEL4.8][Anaconda] raise SystemError, "lvcreate failed for %s" %(self.name,) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Jeff Burke <jburke> | ||||||
Component: | anaconda | Assignee: | Anaconda Maintenance Team <anaconda-maint-list> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Alexander Todorov <atodorov> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.8 | CC: | atodorov, benl, borgan, ddumas, hdegoede, jgranado, k-keiichi, kkii, lwang, m-ikeda, mikeda, pbunyan, rvykydal, vgoyal | ||||||
Target Milestone: | beta | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
URL: | http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6017187 | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-05-18 20:15:59 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jeff Burke
2009-01-20 16:21:41 UTC
Sorry about the confusion. This is a 4.8 BZ but in some of the description fields I mention RHEL5.3-Server-20081120.1. The correct version is RHEL4-U8-re20090120.nightly Created attachment 329499 [details]
anaconda.log file
Log for information requested
I've seen this with virtio installs too, a possible cause is that the partition table does not get rescanned by the kernel after it was written. In the virtio case this happened because parted did not recognize the type of disk it was using and thus assumed it was a file and thus did not so the rescan partition table ioctl on the blockdevice after writing the partition table out. If you can also attach the kernelmessages from this install I might be able to confirm my suspicion. Created attachment 329502 [details]
Console log
Just a note. All of this data is also located in the URL section of the BZ. That is a link to the RHTS logs.
Judging from these logs it indeed looks like the partition table is not getting rescanned. Jeff Burke: Dose the install succeed with rhel4.7? This is important to pinpoint the place where the bug was introduced. This is a regression. I can not tell you when exactly it started because we have not had a valid RHEL4.U8 install tree since Jan 12 2009. ok, but if the installed worked on Jan 2009, that means that the bug was probably introduced around jan 15 when anaconda was built. So it probably *is* a regression. I'm thinking this is somewhere inside all the partitioning changes we did for rhel4.8. I could not reproduce it with RHEL4-U8-re20090120.nightly in VM with clean disk. It might be some old LVM metadata corner case. We've fixed some in RHEL 5.3. Can you provide information about the system you are trying to install on? (especially it's install history, used disks, possibility of lvm metadata from previous installs being present on the disks). This is not just one system I am seeing it on. RHEL4-U8-re20090120.nightly x86_64 dell-pe2900-02.rhts.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6016811 dell-pe2900-03.rhts.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6016813 hp-xw8400-01.rhts.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6016816 ia64 altix4.rhts.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6017166 squad6-lp1.rhts.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6017187 s/390 z209.z900.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6017188 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6036380 i386 bigisis.rhts.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6017204 ibm-hermes-n1.rhts.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6017205 Jeff: Thx for the logs. But still, I cannot reproduce what you are seeing with the rhts ks file. The only way I can reproduce something that is close to what you are seeing is when I do an interactive install with a Volume Group present in one of the devices on the system that is not the device to which I want to install. When I use the same ksfile (not the post nor pre) I always get an install. no matter how I organize the underlying devices. About the logs. lvmout.log: None of the logs show anything related to the error. (they should as the error is lvm related). The logs that I see, in my test machine, from lvmout.log correlate the error seen in tty1 with the lvm error. Im guessing this is due to the way the rhts recollects the info. anaconda.log: I see, in some installations, parted warning messages about filesystem not having the correct label. I have seen this before and has not prevented a correct install. I *really* doubt that this is the reason for the failure. Moreover we did not changes to the device/filesystem detection code. Our changes where mostly to patch the Grow functions in rhel4 installer. alas. my test continue. Moreover, what I have found, to add to the strangeness of this issue, is that the only way I get a traceback if with an lvm created on /dev/sdb (on a disk) as apposed to /dev/sdb1 (on a partition). Could this be what is happening in rhts? And the what is stated in comment 12 is also present in rhel4.7, so its no regression and its too much of a corner case to worry about At this moment. Still no luck reproducing the original reported issue. Reply to Comment #13 I have attached a anaconda.log file to the BZ. I would hope that would have some data in it that would help answer your questions. If you click on the links in Comment #10 they also have logs in the "Binary Logs" section. Looking at the kickstarts they have the following: zerombr clearpart --all --initlabel autopart So if for some reason it is trying to create LVM on /dev/sdb it is being done by the autopart magic. Nothing that RHTS is doing. Since you can't reproduce this in your test environment. I highly recommend you use the systems in Comment #10 to reproduce the issue. (In reply to comment #14) > Reply to Comment #13 > I have attached a anaconda.log file to the BZ. I would hope that would have > some data in it that would help answer your questions. And this has been *very* help full. but my analysis of the logs does not tell me that anything concrete. > > If you click on the links in Comment #10 they also have logs in the "Binary > Logs" section. Looking at the kickstarts they have the following: > > zerombr > clearpart --all --initlabel > autopart > > So if for some reason it is trying to create LVM on /dev/sdb it is being done > by the autopart magic. Nothing that RHTS is doing. Agreed. the lvm call is done by anaconda. but the reason it fails does not necessarily lie within anaconda. We use a bunch of libraries for storage and any one of them can be doing something that is just not right. With the given logs I can not say anything for certain. And my failed attemptes to reproduce (using the same rhts ks) tell me that it is on of two things, or both at the same time: 1. the libraries and secondary packages that are being used have a bug 2. the specific storage hardware cannot be correctly handled. > > Since you can't reproduce this in your test environment. I highly recommend you > use the systems in Comment #10 to reproduce the issue. Yep. that would be my next step. It is something I rather avoid as I am in Czech Republic. Will get bach with more insight once I have tested with said machines. This looks like a dramatic change in frequency of bad behavior between the 0119.nightly tree: http://hank.test.redhat.com/kate/auto.cgi?action=tree_report&d=30&m=&t=RHEL4-U8-re20090119.nightly&b=Show+tree and the 0120.nightly tree: http://hank.test.redhat.com/kate/auto.cgi?action=tree_report&d=30&m=&t=RHEL4-U8-re20090120.nightly&b=Show+tree here are the failing cases showing the above behavior in kate by arch: 0119.nightly 0120.nightly i386 1 6 ia64 0 4 ppc 0 3 s390 0 1 s390x 0 6 x86_64 0 8 So the behavior did exist in nightly trees prior to 0120.nightly, but the number of errors increased dramatically with 0120.nightly and later trees ... (In reply to comment #16) > This looks like a dramatic change in frequency of bad behavior between the > 0119.nightly tree: In some other cases this information would be very helpful, but in this case I do not know what to make of it. I'll explain. Anaconda was first build for rhel 4.8 on Jan 15. It was built and tagged for 4.8 along with parted and booty. This means that the three builds where probably present in jan19 as well as in jan20. The relative versions are anaconda-10.1.1.92-1, parted-1.6.19-23.EL and booty-0.44.9-1 On Jan 22 it was determined that the nightlies would have the anaconda version previous to the the Jan15 build (Which has not been reverted yet). This means that from Jan 22 on (This might mean Jan 23 or Jan 24, not really sure here) the anaconda version is anaconda-10.1.1.91-1. The other two apps (parted and booty) where not changed and the builds from Jan 15 were maintained (parted-1.6.19-23.EL and booty-0.44.9-1). Now all these changes to these three installer related apps (anaconda, parted and booty) occurred on dates different from 19-20 Jan. Which tells me that this issue is just not related. TODO: * Got the tree form the 25 Jan (every package has latest version except anaconda) Will test with that see if I can reproduce. (my previous tests were with rhel4.7 tree and anaconda changes only. didn't have the rhel4.8 nightlies) * Dmach is going to build a i386 tree with all the latest tags. Going to take some hours but I'll test with that ass well. The test with the lates tree allowed me to reproduce the bug. After some troubleshooting I think I have nailed the issue. I have a patch being reviewed at anaconda-devel-list. I'll probably commit later in the day. This reminds me last friday I looked at some of the latest 4.8 anaconda changes, and found this (perhaps related?) error: @@ -564,11 +572,14 @@ def growLogicalVolumes(diskset, requests): fraction = float(req.getStartSize())/float(totsize) + newsize = lvm.clampPVSize(vgfree*fraction, vgreq.pesize) + newsize += cursize[req.logicalVolumeName] + newsize = cursize[req.logicalVolumeName] + vgfree*fraction if req.maxSizeMB: newsize = min(newsize, req.maxSizeMB) - req.size = lvm.clampLVSizeRequest(newsize, vgreq.pesize) + req.size = newsize if req.size != cursize[req.logicalVolumeName]: nochange = 0 Note how we calculate a new size (the first 3 added lines), and then overwrite the calculated size immediately the line after the first 3 added lines. Also notice that we now use the overwritten and thus old newsize without clamping it, so no clamping is done at all now. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. (In reply to comment #21) Hans, you're right. I ported one of patches for bug #233050 from 5.3 badly, the overwriting line should have been removed: @@ -564,11 +572,14 @@ def growLogicalVolumes(diskset, requests): fraction = float(req.getStartSize())/float(totsize) + newsize = lvm.clampPVSize(vgfree*fraction, vgreq.pesize) + newsize += cursize[req.logicalVolumeName] + - newsize = cursize[req.logicalVolumeName] + vgfree*fraction if req.maxSizeMB: newsize = min(newsize, req.maxSizeMB) - req.size = lvm.clampLVSizeRequest(newsize, vgreq.pesize) + req.size = newsize if req.size != cursize[req.logicalVolumeName]: nochange = 0 VERIFIED with RHEL4-U8-re20090128.1/anaconda-10.1.1.94-1 All reported failures from Jeff Burke are for /distribution/updateDistro test case which passes with the latest build. While looking at the RHEL4-U8-re20090202.nightly tree on ia64 we are seeing this: -------------------------------------------------------------------------------- Traceback (most recent call last): File "/usr/bin/anaconda", line 1190, in ? intf.run(id, dispatch, configFileData) File "/var/tmp/anaconda-10.1.1.95//usr/lib/anaconda/cmdline.py", line 117, in run dispatch.gotoNext() File "/var/tmp/anaconda-10.1.1.95//usr/lib/anaconda/dispatch.py", line 171, in gotoNext self.moveStep() File "/var/tmp/anaconda-10.1.1.95//usr/lib/anaconda/dispatch.py", line 239, in moveStep rc = apply(func, self.bindArgs(args)) File "/var/tmp/anaconda-10.1.1.95//usr/lib/anaconda/packages.py", line 570, in turnOnFilesystems thefsset.createLogicalVolumes(instPath) File "/var/tmp/anaconda-10.1.1.95//usr/lib/anaconda/fsset.py", line 1560, in createLogicalVolumes entry.device.setupDevice(chroot) File "/var/tmp/anaconda-10.1.1.95//usr/lib/anaconda/fsset.py", line 2199, in setupDevice raise SystemError, "lvcreate failed for %s" %(self.name,) SystemError: lvcreate failed for swap0 install exited abnormally -------------------------------------------------------------------------------- This looks like it is the same issue as before. Here is a link to the logs: http://rhts.redhat.com/testlogs/44423/152018/1266237/anaconda.log http://rhts.redhat.com/testlogs/44423/152018/1266237/sys.log http://rhts.redhat.com/testlogs/44423/152018/1266237/ks.cfg NOTE: About above logs. After several days RHTS will GZ the files to save on space. If the links do not work append a .gz on the end of the url. moving back to assigned per comment #27 I fail to see this in i386 tests. Is this erro present in other archs? On Comment #27 I say "ia64 we are seeing this" I completely understand comment 27. But comment #27 does not tell me that its *only* happening in ia64. It Just tells me that its happening in ia64. The subtlety is important because if it is *only* occurring in ia64 its probably lvm related. if its occurring in other archs besides ia64, then it probably *is* anaconda. It can also be that this particular device configuration is rendering the grow logic unusable. As per comment 30, I'm assuming this has been seen only on ia64. More over: does this happen with all the ia64 installs? just a few? did it happen only once? The only thing I know for sure is that it _is_ happening on ia64. The ES distro failed with this error but another system ia64 system did install the AS distro. Maybe it is specific to the host is was using when it tried to do the install. I am not sure. RHTS Host: hp-matterhorn1.rhts.bos.redhat.com I just finished a test using `console -M rhts.test.redhat.com roentgen` and the ks that is in comment 27. Nothing out of the ordinary occurred. Jeff: Can I get access to the specific machine that the test used, with the specific devices that were used? I would need console access and a efi target that has the lates vmlinuz and initrd (Feb03)? Can I use hp-matterhorn1.rhts.bos.redhat.com? When testing with hp-matterhorn1.rhts.bos.redhat.com I have verified that it is an insufficient free extents thing. The patch should be present in the new version of anaconda (anaconda-10.1.1.96-1) atodorov: Should be ok now. I personally installed in hp-matterhorn1.rhts.bos.redhat.com. I'll tell jburke to put this system into rhts again so you guys can test. Joel, We ran into this issue again last night using RHEL4-U8-re20090223.nightly. It was using anaconda-10.1.1.96-1. I am moving this BZ back to FAILS_QA based on Comment#35 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=7030322 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=7030329 One of the failures is with hp-matterhorn1.rhts.bos.redhat.com and the other is with hp-rx1620-01.rhts.bos.redhat.com Traceback (most recent call last): File "/usr/bin/anaconda", line 1190, in ? intf.run(id, dispatch, configFileData) File "/var/tmp/anaconda-10.1.1.96//usr/lib/anaconda/cmdline.py", line 117, in run dispatch.gotoNext() File "/var/tmp/anaconda-10.1.1.96//usr/lib/anaconda/dispatch.py", line 171, in gotoNext self.moveStep() File "/var/tmp/anaconda-10.1.1.96//usr/lib/anaconda/dispatch.py", line 239, in moveStep rc = apply(func, self.bindArgs(args)) File "/var/tmp/anaconda-10.1.1.96//usr/lib/anaconda/packages.py", line 570, in turnOnFilesystems thefsset.createLogicalVolumes(instPath) File "/var/tmp/anaconda-10.1.1.96//usr/lib/anaconda/fsset.py", line 1560, in createLogicalVolumes entry.device.setupDevice(chroot) File "/var/tmp/anaconda-10.1.1.96//usr/lib/anaconda/fsset.py", line 2199, in setupDevice raise SystemError, "lvcreate failed for %s" %(self.name,) SystemError: lvcreate failed for swap0 install exited abnormally sending termination signals...done jburke, atodorov: Current anaconda is 10.1.1.97. It specifically targets this issue. Pls retest again with current anaconda. Joel, We also ran into same issue using RHEL4-U8-re20090225.nightly with anaconda-10.1.1.97 on i386/x86_64 machine. It occurred on nec-em15.rhts.bos.redhat.com. http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=48071 But, the problem didn't occur on another i386/x86_64 machine under the same conditions(RHEL4-U8-re20090225.nightly, anaconda-10.1.1.97). the machine that has no probrems is nec-em12.rhts.bos.redhat.com. http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=48072 So, maybe this is machine-dependent probrem as you say. If you need more informations and tests, please feel free to ask me anything. [ Test Log on i386/x86_64 ] Traceback (most recent call last): File "/usr/bin/anaconda", line 1190, in ? intf.run(id, dispatch, configFileData) File "/var/tmp/anaconda-10.1.1.97//usr/lib/anaconda/cmdline.py", line 117, in run dispatch.gotoNext() File "/var/tmp/anaconda-10.1.1.97//usr/lib/anaconda/dispatch.py", line 171, in gotoNext self.moveStep() File "/var/tmp/anaconda-10.1.1.97//usr/lib/anaconda/dispatch.py", line 239, in moveStep rc = apply(func, self.bindArgs(args)) File "/var/tmp/anaconda-10.1.1.97//usr/lib/anaconda/packages.py", line 570, in turnOnFilesystems thefsset.createLogicalVolumes(instPath) File "/var/tmp/anaconda-10.1.1.97//usr/lib/anaconda/fsset.py", line 1560, in createLogicalVolumes entry.device.setupDevice(chroot) File "/var/tmp/anaconda-10.1.1.97//usr/lib/anaconda/fsset.py", line 2199, in setupDevice raise SystemError, "lvcreate failed for %s" %(self.name,) SystemError: lvcreate failed for LogVol00 install exited abnormally sending termination signals...done committed a different patch. Should be available in anaconda-10.1.1.98-1 Joel, Can you make sure that anaconda-10.1.1.98-1 gets into tonights nightly. RHEL4-U8-re20090302.nightly failed again last night. It has anaconda-10.1.1.96-1 in it still. Thanks, Jeff ping. Has this been tested with anaconda-10.1.1.99 ? jburke: I have tested on matterhorn1 and anaconda-10.1.1.98 takes care of the issue that was discussed in this bug. I am told that to verify this issue we must get feedback from you. Can you pls test matterhorn1 with anaconda-10.1.1.98 or later and give feedback. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0978.html |