Red Hat Bugzilla – Bug 879338
RC1 crashes nearly instantly in NFSISO install on system with many existing partitions (appears to be confused about what to mount as sysimage)
Last modified: 2013-06-03 10:46:42 EDT
Created attachment 649910 [details]
Description of problem:
I tried to PXE-install 18-Beta-RC1 with an NFS-mounted
ISO source and it crashed very soon, before the language window.
(I don't understand why /dev/sda10 is shown as sysimage at
that point, since it is an older Fedora (14) partition.)
I manually mounted /dev/sda12 (an ext2 partition where I had
planned to have F18-beta-RC1 install itself) and saved the
/tmp/*log files I found, with a manually-created "df" output.
Here is the extract of my pxelinux.cfg/default file I used:
# 186. pre-beta F18 RC1
append initrd=F18/20121120/initrd.img ramdisk_size=10000 repo=nfs:proto=udp:172.22.101.50:/home/me/F/F18/20121120 ip=dhcp noipv6 loglevel=debug keymap=us lang=en_US.UTF-8 selinux=0
Version-Release number of selected component (if applicable):
I only tried it once
Steps to Reproduce:
1. PXE-install an NFS-mounted ISO image
crash dialog box
Created attachment 649911 [details]
output of df
Created attachment 649912 [details]
Created attachment 649913 [details]
Created attachment 649917 [details]
Created attachment 649927 [details]
Created attachment 649928 [details]
proposing as F18 Beta blocker
I think this can be easily duped to bug 879187. The whole NFS and NFSISO functionality got broken in Beta RC1, we know that and bug 879187 should summarize it.
Thanks for reporting.
*** This bug has been marked as a duplicate of bug 879187 ***
15:32:12,422 INFO program: Running... /bin/umount /mnt/sysimage
15:32:12,447 ERR program: umount: /mnt/sysimage: target is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
The reporter says "it crashed very soon, before the language window".
packaging.log is almost empty.
None of these results really matches 879187, so I'm tentatively un-duping. Paul, can you try with Beta TC9 and see if the behaviour is the same or different?
Yes, I will do that, but I will have to download TC9 first
(so it will take me a while); I went from TC8 to RC1, sigh.
I don't suppose there is any command or file I could give
you from RC1 (since of course booting it takes only a few
minutes), in the meantime? I see I forgot to do a journalctl
for instance. Or I could add another PXE boot choice with
rd.debug turned on; that too would be pretty trivial.
But I'll get started, downloading TC9.
Thanks for being willing to consider reopening this.
I don't know what if any other data from RC1 would be useful, sorry. The logs attached already may well be enough.
This was implicitly rejected as a blocker as part of the deliberations on 879187, so marking as such.
I downloaded Beta TC9 and was able to successfully PXE-install
it, with repo= pointing to an NFSISO directory with TC9 in it.
The boot/install sequence seemed normal to me (language choice,
the dialog asking you to give up all your rights to sue, the
main ("spoke") window, which after the usual pause gave my
server number as the software source, and GNOME, etc.
There was no "early" crash, the way I saw with RC1.
I am assuming this knowledge is all you need, and that you don't
need me to upload any files and so on, but say so if you do.
If this is not going to be allowed as a Beta blocker
because there is some workaround to install F18 Beta,
please tell me how to install F18 Beta (as I do not
understand 879187) -- some way to install it which is
not using the Internet and does use the downloaded
image, not burning it -- RC1 at the moment. Thanks.
By the way, I don't understand why you didn't mark it
as being an F18 (=final) blocker.
for your specific case I'm not sure if there *is* a workaround, but then I haven't looked into it much.
879187 is a more general bug: a typical (non-UDP, boot.iso not PXE) NFS install does not work in RC1 (Beta) - either passing inst.repo=nfs:blah or interactively specifying an NFS repo. The workaround for 879187 is to use PXE boot (which is not affected by that bug) or to replace the default 'inst.stage2=blah' parameter that's present when booting from boot.iso with 'inst.repo=nfs:blah' - i.e. you don't just append inst.repo=nfs, you wipe out the inst.stage2 parameter and replace it with inst.repo=nfs.
if neither of these works for your specific case, then we're definitely looking at a different bug, but we don't really have a lot of detail on it yet.
Thank you for your response. I appreciate it.
I understand that this bug hasn't been investigated yet.
(That's why I was so unhappy when it was just marked as a
duplicate, apparently because of a few identical terms.)
So it is premature to speculate on its cause.
But it seems to me that if there is no workaround, then it
should be a blocker. I understand that for your scheduling
reasons there is probably resistance to marking *anything*
as a Beta blocker, this close to Beta release, but I don't
see how this behavior is acceptable, at Beta or certainly
not for Final.
I can't tell from reading 879187 but my current interpretation
is that its problems will not be addressed at Beta, in Beta RC2
or Beta-anything else.
But I'm guessing that when its problems are fixed, in Final-TC2
or whatever, that only then will this one be addressed, and I
am guessing: only to the point of asking me (then) to see if
this bug still happens, on Final TC2 or whatever.
Oh well, F18 will happen when it happens. I'll be able to use
it when I'm able to use it. If I can't install it now then I
can't install it now. Life will go on.
It's only really a blocker if it's going to cause significant inconvenience for lots of people. So far you seem to be the only one who's hit it. And I rather suspect it may not be to do with NFS at all, but to do with your giant pile of partitions in some way.
I just took another look at the logs and anaconda.log indicates there was actually a crash and there should be a traceback file (anaconda-tb-XXXX) in /tmp along with the logs. Could you check and attach that if possible? Thanks.
Yes, I saw that file but I never looked at it, to see what it was.
And I didn't copy it because it didn't have "log" in its name.
In case you haven't guessed, I don't install these things when I
am connected to the Internet, so that's why I didn't just click
on the dialog box when it asked me if I wanted to submit a bug.
Anyway, I recreated the test and I am now appending the resulting
file. As you say, it seems to be cycling through my existing linux
installations on that disk when it crashes (which worked fine in
TC9 of course). I don't suppose you have some command-line flag I
can give, to tell it not to do that? (I doubt it: no demand.)
Thank you again, for caring about it.
Created attachment 651183 [details]
requested anaconda traceback file
(Off topic. I apologize. Speaking of "demand" how do
I go about making a feature request to Fedora? (I'd
like to see an editor like nano or pico included in the
initial squashfs.img, as it only has "vi".) Thanks.)
"So far you seem to be the only one who's hit it."
Perhaps that's because many (most? all?) of your "regular"
alpha and beta testers do it in a virtual image, on an empty
virtual disk, whereas I am trying it on a real machine,
with real exiting partitions? Not that it really matters.
You can't really have enough test cases specified to cover
every possible case out in the real world. You clearly need
to specify those which are most prevalent and just hope that
the corner cases get checked, somehow. Like this one was.
Besides, I can easily imagine that most (all?) of your test
cases were problems that you hit using the old anaconda, and
so wanted to check for each time. With this new anaconda I
would guess there are many things which used to work which
don't now, or at least didn't, before they were fixed.
No, that's not at all true. We do a lot of testing on metal, with various partition configurations. Our test cases are not based on previous bugs. I don't think this discussion is taking us anywhere productive. The beta release is already signed off. the anaconda developers will look at your bug when they're back from thanksgiving.
I don't see anything obviously wrong from looking at the log files and source code, nor do I see that this should be nfsiso specific. It could, however, be something specific to your existing storage setup. Have you tried other installation methods to rule out that it's nfsiso specific?
This kind of has the feel of some program digging around in /mnt/sysimage while anaconda's working and keeping us from being able to unmount it, but I don't have any evidence to back that up.
As always, I'll be happy to try anything you want.
Just tell me what to do.
What I did a few days ago was to remove that disk,
with its half-dozen linux partitions, and insert a disk
with only one linux (and a WXP and a grub partition)
and then Beta RC1 was able to do the NFSISO install
just fine. Then I installed my original disk, making
the new one sdb (since that's where the Beta F18 had
installed its grub), and installed a grub stanza for it
(using my original/ancient grub), as well as adjusting
its /etc/fstab from some other partition (changing swap
from sda6 to sdb6 for instance). And then once I'd gone
through booting into single-user, doing a "touch" of a
new "/.autorelabel" and then finally an enforcing=0 boot,
I was then able to boot Beta just fine. Although that
was only a day or so ago and I haven't had time to try
much with it yet.
But it's easy to remove that second disk and restore the
test conditions to the one which fails, anytime you tell
me what you want me to try. As I mentioned in Comment #10
I could pretty easily boot it with rd.debug turned on for
instance. The only other thing which comes to mind to
mention is that as I also said above, TC9 worked just fine,
the same NFSISO way, with the multi-linux disk.
Thanks for looking into it.
paul: it would help if you could try with the 'problem' disk with a different installation method, I think that's what Chris is asking, so we can tell if the fact that you're installing via NFSISO is at all relevant or not.
Yeah, that's exactly what I was getting at.
On the other hand, the comment that TC9 worked fine while the beta does not makes me wonder if it's related to those several other NFS bugs, though.
Well, that machine is behind a firewall, on a LAN where I
don't normally do any Internet access. (The linuxes there
are typically old, and thus not secure. The second disk for
instance, where the F18beta was successfully installed, did
so by the installer noticing the old linux, and overwriting
it, but it was a Fedora 5 so I didn't really care anymore.)
So I am loath to try any installation methods which involve
connecting to the Internet, on that machine, on that LAN.
So the two possibilities which occur to my mind (feel free
to suggest something I haven't thought of) are (1) burning
that F18beta ISO image onto a real/physical DVD and then
doing the install that way, and (2) copying that ISO image
to some partition on that "problem disk" and then trying to
do a hard-disk-install from it.
I lean towards number two since I am not enthusiastic about
wasting a physical DVD on a beta version of anything. (One
reason I like PXE installs, from ISO images, is no optical
disk ever needs to be burned.)
But I've never done an install from a disk. At least in a
very long time; it seems to me that years ago I tried it but
the installer then wanted the disk partition to be a FAT one.
Also I wondered if it would work, since bug 873647 hasn't
been marked as fixed or anything, at least that I could see.
So I removed the second disk from that machine, to restore
the test condition, then created a new (ext2) partition on
it and copied the F18beta ISO image over to that partition.
It took two tries to guess at what my PXE stanza should say:
append initrd=F18/20121120/initrd.img ramdisk_size=10000 repo=hd:/dev/sda14 ip=dhcp noipv6 loglevel=debug keymap=us lang=en_US.UTF-8 selinux=0
But number 189 worked fine. It booted and then started
18.29.2 and then I got the language-selection screen, then
the give-up-all-rights screen, and then the main one, which
after a little time settled down and said the installation
source was Fedora-18-Beta-i386-DVD.iso and the software
selection was the GNOME desktop.
So I didn't go on and actually try to install anything since
I figured it had gone far enough past the trouble point to
tell you whatever you were trying to find out. For a
similar reason I didn't upload any logs or whatever (but of
course I could, if you want to see something).
So I hope that helps. Let me know if there is anything else
you want me to try, or do. Thanks.