497240 – anaconda loops endlessly when trying to save traceback

Bug 497240 - anaconda loops endlessly when trying to save traceback

Summary: anaconda loops endlessly when trying to save traceback

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anaconda
Sub Component:
Version:	11
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	David Lehman
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-04-22 22:52 UTC by Clyde E. Kunkel
Modified:	2009-08-06 14:51 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-07-25 16:17:12 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
requested anacdump.txt after killall -USR2 anaconda (1.40 MB, text/plain) 2009-06-19 18:51 UTC, Clyde E. Kunkel	no flags	Details
correct dump after killall -USR2 anaconda (1.48 MB, text/plain) 2009-06-19 21:46 UTC, Clyde E. Kunkel	no flags	Details
Image of frozen anaconda screen (104.62 KB, image/jpeg) 2009-06-19 21:47 UTC, Clyde E. Kunkel	no flags	Details
updates to skip save-exception-to-disk if original exception occurred during storage scan (107.70 KB, application/octet-stream) 2009-06-29 20:13 UTC, David Lehman	no flags	Details
from killall -USR2 anaconda with updates.img (1.57 MB, text/plain) 2009-06-30 00:44 UTC, Clyde E. Kunkel	no flags	Details
View All

Description Clyde E. Kunkel 2009-04-22 22:52:29 UTC

Description of problem:
if a traceback occurs, selecting save puts anaconda in endless loop looking for storage devices

Version-Release number of selected component (if applicable):
anaconda-11.5.0.47

How reproducible:
every time

Steps to Reproduce:
1. boot.iso, askmethod
2. install.img from download.fedora.redhat.com
3. custom paritioning
4. when traceback occurs regardless of when/where, anaconda loops looking for storage devices
  
Actual results:
infinit loop

Expected results:
Should be able to create bz (doesn't work either) or a dialogue should come up with candidate storage locations.

Additional info:
Workaround is to go to tty-2 and mount a partition and copy the files from /tmp

Comment 1 Chris Lumens 2009-05-06 14:59:40 UTC

Are there any messages on tty3 or tty4 when it's stuck in this loop?  These bugs are especially hard to reproduce for us, and I haven't heard of any other reports of it from QA lately.

Comment 2 Clyde E. Kunkel 2009-05-06 17:35:24 UTC

Sorry, this happened several tests ago and I don't recall the tty msgs and no tracebacks recently.  The tracebacks were occurring back whern encrypted file systems were causing tracebacks.  I will revisit if I get a future traceback.

Feel free to close with insufficient information or can't duplicate.

Comment 3 Clyde E. Kunkel 2009-05-07 15:30:25 UTC

No problem when bz 499662 encountered, so would say this is fixed or OBE.

Comment 4 Clyde E. Kunkel 2009-05-08 15:41:22 UTC

Well, this popped up again, but I have more information.

This is probably not a loop, but instead an overwritten buffer or other clobbered memory locations.

When bz 499854 "Error when probing exception disks: filedescriptor out of range in select()" occurred I clicked save in the traceback dialogue so I could create the automatic bz.  I got the selection box for bz, but the Finding Storage Devices box also popped up and never quit even after over waiting 30 minutes.  X was still active, but I couldn't type in the bz dialogue fields.

Seems that Finding Storage Devices should not be invoked unless the user specifically asks for local storage of tracebacks.  In this case it was a problem with the storage routines that led to the traceback.

I have changed the status to assigned.  Thanks.

Comment 5 Clyde E. Kunkel 2009-05-08 15:43:24 UTC

Sorry, forgot the request in comment 1...tty3 contained the error described in comment 4 and tty 4 had nothing unexpected.

Comment 6 Chris Lumens 2009-05-11 14:54:26 UTC

The original intention was that we wouldn't display the option to save to a local device unless any were found.  Moving the probing to after the option's selected moves the problem to a smaller subset of people, but it doesn't really help I don't think.

Can you describe your hardware setup here?  I haven't heard of any problems like this since the floppy disk probing was removed, and I have not personally been able to reproduce it.

Comment 7 Clyde E. Kunkel 2009-05-11 18:01:56 UTC

ASUS P5K-Ewifi MOBO

4 SATA HDs and 1 SATA DVD on the Intel SATA controller, 1 IDE drive on the MOBO IDE controller and 1 IDE HD on a SIL680 type on PCI.  

This is a test system used to test many distros and so is set up with numerous small paritions for /boot and the rest as mostly VG/LVs.  There is a WinVista installation on the system also.

sda = 12 partitions
sdb = 10 partitions
sdc = 9 partitions
sdd = 6 partitions
sde = 15 "
sdf = 10 "

VG 00, 6 LVS
VG 01, 7 LVs
VG 02, 2 LVs
VG 03, 5 LVs

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc RV630 [Radeon HD 2600 Series]
01:00.1 Audio device: ATI Technologies Inc RV630/M76 audio device [Radeon HD 2600 Series]
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
03:00.0 SATA controller: JMicron Technologies, Inc. 20360/20363 Serial ATA Controller (rev 03)
03:00.1 IDE interface: JMicron Technologies, Inc. 20360/20363 Serial ATA Controller (rev 03)
05:01.0 RAID bus controller: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
05:03.0 FireWire (IEEE 1394): Agere Systems FW322/323 (rev 70)

Comment 8 Clyde E. Kunkel 2009-05-14 18:38:45 UTC

with anaconda .52, "Detecting storage devices" again came up before I could make a selection where to copy the traceback.  While no problems encountered this time, I still wonder if the opportunity to select a traceback save location should be presented before detecting storage devices.  That way if the storage device routine(s) have been clobbered you might still be able to send to bugzilla.  Of course a knowledgeable user can go to tty-2 and save the dump to a local device for later bz'ing.

Comment 9 Clyde E. Kunkel 2009-05-19 20:16:46 UTC

,53 same problem.  Can't use the bz dialogue to save a traceback because probing for storage devices and that routine was clobbered.

Comment 10 Clyde E. Kunkel 2009-06-03 18:21:50 UTC

same problem with all versions up to and including .59.  Locks forever on finding storage devices.  Original traceback error that led to this attempt to use anaconda to report on anaconda was:
AttributeError: 'StorageDevice' object has no attribute 'geometry'

TTY-3 shows the following error, probably related to this dialogue and not the original traceback:
Error when probing exception disks: filedescriptor out of range in select ()

Comment 11 Bug Zapper 2009-06-09 14:26:16 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 David Lehman 2009-06-19 15:51:37 UTC

Can you switch to tty2 and run 'killall -USR2 anaconda', then scp /tmp/anacdump.txt somewhere and attach it to this bug report?

Comment 13 Clyde E. Kunkel 2009-06-19 18:51:44 UTC

Created attachment 348697 [details]
requested anacdump.txt after killall -USR2 anaconda

Comment 14 David Lehman 2009-06-19 19:26:06 UTC

At what point did you obtain the dump? It looks to me like it was still in the middle of the second storage device scan. If it is hanging, I need to see a dump captured during the hang.

Comment 15 Clyde E. Kunkel 2009-06-19 21:46:16 UTC

Created attachment 348716 [details]
correct dump after killall -USR2 anaconda

too fast on the trigger before.  This one is surely after failure.  Also, next file will be screen shot after all activity ceased and anaconda frozen.

Comment 16 Clyde E. Kunkel 2009-06-19 21:47:22 UTC

Created attachment 348717 [details]
Image of frozen anaconda screen

Comment 17 David Lehman 2009-06-25 16:39:56 UTC

It seems as though you have DDF raid metadata on partitions sdb12, sdb13, sde13, sdg7, and sdg8. This is a configuration we have not encountered previously. Can you describe your setup? This is relevant to the first traceback.

I am also working on the immediate problem of doing a storage scan to identify exception dump target disks when the initial failure prevented a previous attempt at storage detection from completing. I hope to have an updates image soon with a proposed fix.

Comment 18 Clyde E. Kunkel 2009-06-26 18:57:54 UTC

i now have a DDF raid container which I did not have with the first traceback.  I also now have md0, raid 5 and md1, raid 10.  All of those were added after the first traceback appeared.  I do not use those raid devices and can gladly remove them for testing purposes since they were created as experiments in learning how various raid configurations and software have matured.

I was working towards LVM over raid, which I used to use several fedora releases ago, but had so much trouble with installations that I decided to let things mature a bit before returning to them.

Based on what I am reading in anaconda-dev list, appropo at this time.

Comment 19 David Lehman 2009-06-29 20:13:28 UTC

Created attachment 349868 [details]
updates to skip save-exception-to-disk if original exception occurred during storage scan

Please try this updates image and see if it resolves the freeze problem. The basic approach is to not offer save-to-disk if the exception originated during the storage scanning routines.

See http://fedoraproject.org/wiki/Anaconda/Updates if necessary.

Comment 20 Clyde E. Kunkel 2009-06-30 00:44:11 UTC

Created attachment 349892 [details]
from killall -USR2 anaconda with updates.img

same failure as before.  After first trace back, clicked save and anaconda began searching for storage devices and eventually looped.  Trace back attached.  Screen image same as 348716.

Note:

could not make instructions from wiki work, specifically: dd if=updates-497240.img of=/dev/fd0 bs=72k count=20.  Had to create ext2 filesystem on floppy and copy updates-497240.img to the floppy.  Otherwise, anaconda gave a msg saying could not mount /dev/fd0.  Tried wiki method on CF card and USB key, same result--couldn't mount device.  So, hopefully, my method allowed the updates.img file to be read.  I did see a msg saying that the updates were being read and the floppy ground away for the duration of the msg.

Final note:  have deleted all raid parititions and containers.  They were created under rawhide but I was getting bizarre results between Fedora 10 and rawhide WRT them.  I am going to recreate them under Fedora 11 (Fedora 10 install updated to Fedora 11 using Fedora 11 DVD) to test the next version of anaconda.

Comment 21 David Lehman 2009-06-30 16:10:38 UTC

By far the easiest way to specify updates is by url, ie: updates=http://....

To verify that the updates are being used, look in the /tmp/updates directory -- you should see two files with a '.py' extension and a directory named 'storage'.

Comment 22 Clyde E. Kunkel 2009-06-30 21:40:23 UTC

OK, success.  Bz 509018 was automatically created.

I don't have access to a web server and entering the URL of the attachment failed.  So I expanded the .img file and simply copied the resulting files and directories to an ext2 floppy and it worked.

I think the wiki needs some updating.  How is that accomplished and by whom?  Do you bz the wiki?

Thank you for your patience with this.  I am glad I am able to help with Fedora.

Comment 23 Bart Nauwelaerts 2009-07-01 22:18:51 UTC

Hi,

I've just read this thread hoping to find a workaround for exactly this problem, but I guess no such luck. I've no logfiles/dumps/whatever handy right now from my system, but I'll try to get them if they can help.

The system I'm trying to install has the following disk layout (which worked without any problem at all under Fedora 10):

1 IDE DVD-RW
7 Sata-2 HD of 1.5TB (sda-sdg) with identical partitioning
sda1 - sdg1 => md1 (RAID1, 100MB) for /boot (ext2)
sda2 - sdg2 => md2 (RAID10f3, 400MB) for / (ext4)
sda3 - sdg3 => md3 (RAID10f3, 200GB) for LVM (vg: vg_raid10)
sda5 - sdg5 => md5 (RAID6, 200GB) for LVM (vg: vg_raid6)
sda6 - sdg6 => md6 (RAID6, 200GB) for LVM (vg: vg_raid6)
sda7 - sdg7 => md7 (RAID6, 200GB) for LVM (vg: vg_raid6)
sda8 - sdg8 => md8 (RAID6, 200GB) for LVM (vg: vg_raid6)
sda9 - sdg9 => md9 (RAID6, 200GB) for LVM (vg: vg_raid6)
sda10-sdg10 => unused 200GB partitions, tagged fd

vg_raid10: 1 lv (lv_swap)
vg_raid6: 3 lvs (lv_usr, lv_var, lv_home) - all ext4

I've encountered two other errors that might be related to this one (since I didn't have those in F10 either):
1) creating a partition layout on one disk and copying this layout to the other six resulted in an error stating that my source disk had partitions spanning more then 1 disk. This error remained even after all 7 disks where zero-d out before trying to install so no remains of previous partitions could have been present. In the end, I created the layout using the F10 install DVD.
2) on two occasions I managed to get past the error described here (don't remember how, but I think it was by having no lvs existing when entering custom layout). This resulted in the installation getting stuck at the "checking dependencies" step, before any progress was shown in the progress bar.

At this point, I'm considering falling back to F10 for half a year hoping F12 will be behaving better, but I'd like it much more if I could get F11 working as it should be. Please let me know what - if anything - I can do to help this bug get fixed.

Comment 24 Clyde E. Kunkel 2009-07-25 16:17:12 UTC

This is now fixed.  Closing.

Comment 25 Clyde E. Kunkel 2009-08-04 22:09:19 UTC

Ugh...back again.  Ran into it with 499854.  got a traceback at what looked like end of storage scan and when I clicked save, storage scan started up again and just sat there.

Could the triager please reopen, thanks.  Anaconda 12.7.

Comment 26 Clyde E. Kunkel 2009-08-06 14:51:21 UTC

this bug does not appear when using text mode install.

Please reopen...

Note You need to log in before you can comment on or make changes to this bug.