Bug 544177 - udev settle timeout to low causing LVM scanning issues (with plain ATA drives)
Summary: udev settle timeout to low causing LVM scanning issues (with plain ATA drives)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 14
Hardware: i386
OS: Linux
low
high
Target Milestone: ---
Assignee: Anaconda Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 533047 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-12-04 05:12 UTC by bob mckay
Modified: 2010-12-14 22:01 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-14 22:01:20 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Dump of /tmp file at time of anaconda exit (21.22 KB, application/tgz)
2009-12-04 10:48 UTC, bob mckay
no flags Details

Description bob mckay 2009-12-04 05:12:25 UTC
This looks _awfully_ like bug 499321. However based on the request there:
   If you are continuing to see a problem along these lines, please do open a 
   new bug report as this one's getting confused and unwieldy.  Thanks.
I'm doing precisely that.

Description of problem:
Preupgrade reboot fails, doesn't find lvm root.

Version-Release number of selected component (if applicable):
12-4 I think (will check later)

How reproducible:
Always

Steps to Reproduce:
1. Run preupgrade
2. Reboot
3.
  
Actual results:
Cannot find root

Expected results:
Finds root

Additional info:
I assume I should record this against the anaconda version number, rather than the release I'm trying to upgrade?

lvm filesystem, but no RAID or anything fancy. However it's _very_ old hardware, it could be just a timeout (running lvm gui in the old system also takes forever to bring up the gui screen). Is there any way to change the anaconda timeout?

Chris, any chance you could put back
    http://clumens.fedorapeople.org/499321.img
so I can check whether it fixes the problem?

More info later (when the system finally reboots itself)

Comment 1 bob mckay 2009-12-04 09:42:06 UTC
Current system: F10
Current kernel: 2.6.27.38-170.2.113.fc10-i686
CPU: Pentium 3 (500 MHz)

lvm gui startup time: around 400 sec (display of gui window to display of lvm data) - I can see this might cause a timeout 8^)

I'll add a traceback if I can figure out how (some problems there also, may be network-related).

Comment 2 bob mckay 2009-12-04 10:46:09 UTC
Hmmm, generating traceback didn't seem to work. I tried as per comment 63 of Bug 499321 (as in that case, I am getting an error message, not an exception). Switching to vt2 and issuing killall didn't work, because killall couldn't be found (either manually or in path), so I manually killed 
   kill -USR2
the two anaconda processes I could see. However this didn't generate the anaconda dump file in /tmp. Will append the contents of my /tmp directory in case it's of any use (dumped to USB stick) - minus the image file, which is a bit large, but I still have it if it's of use.

Comment 3 bob mckay 2009-12-04 10:48:50 UTC
Created attachment 376055 [details]
Dump of /tmp file at time of anaconda exit

dump of /tmp (minus system image) at time of anaconda failure - i.e. at this point, anaconda was displaying an error screen saying that root could not be found.

Comment 4 Chris Lumens 2009-12-06 14:57:37 UTC
What happens if you add "upgradeany" to the boot arguments?

Comment 5 bob mckay 2009-12-07 06:44:36 UTC
Exactly the same, the root for the installation isn't found. Is there any way to change the anaconda timeout for the root to be found (I'm pretty sure the anaconda search for the root times out in less than the 400 sec. that it takes the lvm gui to come ready when I try to use it from F10; I think that is probably a reasonable proxy for the lvm activation time).

Comment 6 Hans de Goede 2009-12-07 09:58:06 UTC
Hi,

I've prepared an updates.img forcing our udevadm settle timeout to 500 seconds for all udevadm settle calls. This is the only place where we wait for things
to settle atm, so I hope this helps.

You can get the updates.img here:
http://people.fedoraproject.org/~jwrdegoede/updates-544177.img

And you can use it by adding:
updates=http://people.fedoraproject.org/~jwrdegoede/updates-544177.img

To the end of syslinux cmdline (separated from the pre filled boot command
with a space).

Regards,

Hans

Comment 7 bob mckay 2009-12-07 11:52:17 UTC
Hi Hans; thank you. I will try it now. I was writing a further update which collided with yours, may be irrelevant now but I'll include anyway (the boot times on this machine are so long, it takes forever to do these checks)

> I'm now more confused than ever.

> Looking in the logs of the F10 restart after the failure to install, I see
> quite a few error messages from the disk startup (I'm guessing this is just
> trying various access methods till it realises just how old the hardware is).
> But these all take < 1 second to go through, it brings up the filesystems on
> the lvm volumes less than 1 second after the first disk-related call. 

> Many of the messages are about ata exceptions, followed by ata DRDY ERR and ata
> UNC. However these only occur before the lvm is opened, I do not see any errors
> in normal running (except when trying to open the lvm gui - see below).

> On the other hand, when I try to start the lvm gui, there are close to 1000
> lines of disk-related error messages, over about 6 minutes, before the lvm gui
> comes ready and I get to see the gui. 

> I've done _a lot_ of googling about these, without finding anything relevant
> (the most promising was a link about adding all_generic_ide to the kernel
> options, but it didn't help).

Comment 8 bob mckay 2009-12-07 13:58:35 UTC
Hi Hans; it looks like the timeout is definitely the problem. With your image, it got into the install process, but eventually hung with 

     You need more space on the following file systems:
     12 M on /mnt/sysimage/usr

I'm not sure whether /mnt/sysimage/usr means the initrd /usr or my installed /usr; the latter seems to have 2.7GB free, but just in case, I'll try increasing it by a couple more GB (it will then take a few hours to get through preupdate, so as it's 11pm now, testing probably won't happen till tomorrow). So it would be great if you could leave the .img there a bit longer.

If it's the initrd /usr, I'm not too sure how I change its size. If this is something you can easily fix (or tell me how to fix) in the img, that would be great (I should mention one further constraint: I can edit grub.conf to add kernel parameters, but I can't do anything at boot time - for some reason I haven't been able to figure, at boot time my keyboard is only recognised as typing numbers, not alpha characters; this is consistent across usb and ps/2 keyboards, googling hasn't brought up anything similar - bottom line is, I can get into the boot edit menus, but can't do anything useful once I'm in them). 

But I think the upshot is, this timeout is pretty clearly due to the HD problems, and thus the problem is unrelated to Bug 499321, right? In the end, I'm going to have to find a fix for the HD errors - but it would be good if I could get the upgrade to F12 working first, just in case they have been automagically fixed in a later kernel. (as an example of the weirdness, the actual resize of the lvm took about 5 seconds - but then lvm-manager took its usual 5-6 minutes to reload).

Comment 9 Hans de Goede 2009-12-07 14:46:25 UTC
(In reply to comment #8)
> Hi Hans; it looks like the timeout is definitely the problem. With your image,
> it got into the install process, but eventually hung with 
> 
>      You need more space on the following file systems:
>      12 M on /mnt/sysimage/usr
> 
> I'm not sure whether /mnt/sysimage/usr means the initrd /usr or my installed
> /usr; the latter seems to have 2.7GB free,

It is your installed /usr, during the update first all new files get installed
before old files get cleaned up, so you can need quite a bit of free space
during the upgrade.

Regards,

Hans

Comment 10 bob mckay 2009-12-09 04:32:46 UTC
Hans rocks, OK! I'm writing this from my newly installed F12. 

So the problem was definitely the timeout (i.e. unrelated to bug 499321), and not really a bug (well not in anaconda - there's clearly a problem in some aspect of ATA handling). Installing F12 cured the lvm gui problem also - the gui starts up in about 20 seconds and there are no error messages in the logs. However there still are plenty of the same error messages in the logs during boot-up. I'm guessing that someone somewhere knows about the problem, and has fixed it as far as possible. It always seems to get over the errors during boot, and once it gets the vg activated, there are no more errors. So I think I will ignore for now. 

I'll edit this further to clarify the situation, and put a brief comment in bug 499321. Hans, are you able to leave the .img file where it is for a while, in case anyone else finds this and needs it?

Comment 11 Hans de Goede 2009-12-09 09:28:27 UTC
(In reply to comment #10)
> Hans rocks, OK! I'm writing this from my newly installed F12. 
> 

I'm glad I could help :)

> So the problem was definitely the timeout (i.e. unrelated to bug 499321), and
> not really a bug (well not in anaconda - there's clearly a problem in some
> aspect of ATA handling).

Still I'm afraid we might hit this on other systems to, so I'm going to reopen this, and start a discussion with some fellow anaconda devs on how to resolve this, so that things will work out of the box.

> Hans, are you able to leave the .img file where it is for a while, in
> case anyone else finds this and needs it?  

Sure.

Comment 12 Hans de Goede 2009-12-09 19:58:05 UTC
*** Bug 533047 has been marked as a duplicate of this bug. ***

Comment 13 bob mckay 2009-12-10 01:57:34 UTC
Sure; anyway, thank you again for your help in getting me upgraded. Please let me know if any further information is needed (I will retain the old F10 kernel in case you need me to boot it up again)
    Thanks and Best Wishes

> (In reply to comment #10)

> Still I'm afraid we might hit this on other systems to, so I'm going to reopen
> this, and start a discussion with some fellow anaconda devs on how to resolve
> this, so that things will work out of the box.
>

Comment 14 Hans de Goede 2009-12-10 18:59:25 UTC
Fixed for F-13 by this commit, closing:

http://git.fedorahosted.org/git/?p=anaconda.git;a=commitdiff;h=75bf0aac951ccd79576d06b5f2c2fa4c30435e4a

Comment 15 bob mckay 2010-12-03 07:03:18 UTC
Dear fedora folks (fedorators?)
Sorry to raise this again folks. Unfortunately what looks very like the same problem seems to have re-emerged with F13-F14 preupgrade ("The root for the previously installed system was not found", yet everything is where it is supposed to be). Is there anything I should supply to confirm that this really is the cause? It's quite possible that even 5 minutes isn't enough for my system. Is there anywhere I can pass this as a parameter, or does it have to be compiled in?

Apologies Hans, I had to re-mark it as assigned to you, that and 'closed' seem to be the only options I am allowed.

Comment 16 Chris Lumens 2010-12-14 22:01:20 UTC
No, you can't pass this as a parameter anywhere.  If five minutes is seriously not long enough for your system, you need to take it up with the driver people, as that seems to be a bit ridiculous as far as I'm concerned.


Note You need to log in before you can comment on or make changes to this bug.