Bug 115394

Summary: New kernel 2.4.22-1.2166 hangs on firewire initialization
Product: [Fedora] Fedora Reporter: Alfredo Ferrari <alfredo.maria.ferrari>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 1   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-29 20:05:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alfredo Ferrari 2004-02-11 21:38:45 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031114

Description of problem:
With kernel-2.4.22-1.2166.nptl the firewire initialization
procedure hangs solid a DELL Latitude D800, Centrino 1.7 GHz,
while no issue whatsoever is present with 2.4.22-1.2149.nptl

Version-Release number of selected component (if applicable):
kernel-2.4.22-1.2166.nptl

How reproducible:
Always

Steps to Reproduce:
1.Boot under 2.4.22-1.2166
2.Errors about sbp2 are spit out almost immediately
3.When the boot reaches the initialization of firewire modules
the machine hangs solid
    

Actual Results:  The machine hangs

Expected Results:  The boot should go on

Additional info:

There was no problem with 2.4.22-1.2149, but clearly something has
been changed (now with 2166 the kernel seems to "see" the firewire
devices almost immediately, failing miserably)

If I disconnect all firewire devices (two Lacie hard drives and one
Fujitsu magneto-optical drive) the boot doesn't hang. If I reconnect
them later, the machine hangs often (not always) when scanning the
scsi bus for device recognition (I use rescan-scsi-bus.sh)

If I put alias alias ieee1394-controller off inside /etc/modules.conf
the boot doesn't hang (-> it doesn't perform the initialization where
it hangs) but it still spits error messages about sbp2 at the very
beginning and again further scanning of the devices is often troublesome

Comment 1 Alfredo Ferrari 2004-02-12 08:18:26 UTC
Further details: setting "alias ieee1394-controller off" doesn't
really help too much. 9 out of 10 boots hang when "checking for new
hardware"instead of when initializing firewire, not such a big
improvement indeed.
Clearly kudzu scans the firewire bus and hangs.

The only reliable way to boot is:
a) physically disconnect the firewire cable
b) connect the cable again after boot completion (timed-out and failed
reconnect errors all over around, as during the very beginning of the
boot process)
c) rmmod sbp2 (if not rescan-scsi-bus.sh hangs the machine)
d) modprobe sbp2
e) manually echo "scsi add-single-device x y z w" > /proc/scsi/scsi
   this because the order in which the 3 firewire disks show up
   is random... (they always showed up with 2149 in the "right"
   order -> sda <-> the hd connected first, sdb <-> the second etc,
   while with 2166, after all the mess for booting, there is
   a random permutation of the three disks)

Some sort of (private) reverse patch/indication for bringing 2166 back
to 2149 for what concerns firewire would be EXTREMELY WELCOME while
waiting for an answer (I have bugzilla reports on previous firewire
problems still NEW after years....), I can manage myself to patch and
compile a new kernel, 2166 is clearly a no-go for my hardware



Comment 2 Dave Jones 2004-02-13 16:21:17 UTC
a 'reverse patch' isnt quite so simple, as nothing changed in ieee1394
between 2149 and 2166.


Comment 3 Alfredo Ferrari 2004-02-13 17:05:09 UTC
I already realized that the issue must be in some other changes (I
diff'ed all sources). I am not really expert, excluding video related
changes between 2149 and 2166, I do not know which other change can
indirectly trigger the problem, if you have some good suggestions
about which patches are possible candidates to be removed and/or
brought back selectively to the 2149 status, I can try to check if I
can trim down the problem. I have no problem in compiling and building
modified rpm's for the kernel provided it doesn't imply patching by
hand some code.
I would like to stress that it is a fully deterministic situation,
with the firewire HD's connected no chance to boot the machine.
Connecting them after boot and making gymnastics with sbp2 and
/proc/scsi/scsi results in a perfectly stable and working system.
There were firewire issue all the time with psyche/shrike, but never
so serious. 2149 was perfect, 2166 is a nightmare...


Comment 4 Johnny Hughes 2004-02-16 15:45:38 UTC
I have a similar problem with the kernel-smp-2.4.22-1.2166.nptl.i686.
(On an Intel 860 chipset motherboard, dual P4 Xeon 1.7ghz processors).
 It hangs on boot up if there is something attached to the firewire
card ...

It hangs (hard lock) if the scsi bus is scanned during operation
(whether or not something is attached to the firewire bus).

Shifting back to kernel-smp-2.4.22-1.2149.nptl.i686 fixes the problem.

I have the proprietary nvidia drivers installed for a GeForce 5200 FX
video card.

Comment 5 Alfredo Ferrari 2004-02-19 11:22:06 UTC
A few comments related to 2.4.22-1.2173.nptl and 2.4.22-1.2174.nptl

2.4.22-1.2173.nptl : suddenly booting with firewire devices connected
works again, no "time-out" or "failed to reconnect" messages, all
three HD are recognized and configured. However, using 0-1-2 as the
numbering sequence for the devices as they are physically connected (0
being the one connected to the computer, 1 the following one etc),
they are recognized as 2-1-0 (2 gets /dev/sda, 1 gets /dev/sdb, 0 gets
/dev/sdc). Not a major hassle.

Summarizing:

2.4.22-1.2149: no problem at boot, devices configured as 0-1-2,
I didn't care whether the usb scsi cdrom appeared on bus 1 and the
firewire devices on scsi bus 0 or viceversa (I could check if it is
of interest)

2.4.22-1.2166: no chance to boot when firewire devices are connected,
connecting them after (with various error messages) they get randomly
configured as 2-0-1 or 2-1-0 (never 0-1-2), the external usb cdrom
is on scsi bus 0 (0 0 0 0) while the firewire devices are on bus 1
(1 0 x 0, x=0,1,2)

2.4.22-1.2173: no problem at boot, devices configured always (5 boots
up to now) as 2-1-0, the external usb cdrom is on scsi bus 1 (1 0 0 0)
while the firewire devices are on bus 0 (0 0 x 0, x=0,1,2)

2.4.22-1.2174: ... the machine doesn't boot at all, kernel panic
apparently on acpi (see bug 116232)



Comment 6 David Lawrence 2004-09-29 20:05:03 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/