Bug 444694 - ALi Corporation M5253 P1394 OHCI 1.1 Controller driver causing problems in kernels newer than 2.6.24.3-50
Summary: ALi Corporation M5253 P1394 OHCI 1.1 Controller driver causing problems in ke...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 9
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Jarod Wilson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-30 04:55 UTC by Naveed Hasan
Modified: 2008-07-04 03:40 UTC (History)
2 users (show)

Fixed In Version: 2.6.25.9-76.fc9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-02 06:34:50 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 10796 0 None None None Never
Linux Kernel 10935 0 None None None Never

Description Naveed Hasan 2008-04-30 04:55:16 UTC
kernel-2.6.24.4-64.fc8.x86_64
kernel-2.6.24.5-85.fc8.x86_64

Both of the above kernels do not boot on my box. System freezes during after
activating /etc/fstab swaps and before entering non-interactive startup. INIT
NEVER enters runlevel 2. I haven't even been able to get a console at runlevel 1.

This always happens. It never happened before and I am currently using
kernel-2.6.24.3-50.fc8.x86_64 without any issues. There is at least one more
person having this problem as per the bodhi feedback -

https://admin.fedoraproject.org/updates/F8/FEDORA-2008-2871
https://admin.fedoraproject.org/updates/F8/FEDORA-2008-3260

This may be the same issue - https://bugzilla.redhat.com/show_bug.cgi?id=441161
- the only wireless hardware I have on my machine is a Bluetooth dongle. Any ideas?

Comment 1 Chuck Ebbert 2008-05-01 05:57:43 UTC
Can you try addign "nmi_watchdog=1" or "nmi_watchdog=2" to the kernel boot
options and see if you get a stack trace when it locks up? Let it sit for a few
minutes when it freezes to give the watchdog time to activate.

Comment 2 Naveed Hasan 2008-05-02 04:07:16 UTC
No such luck. I tried both the kernels mentioned above as well as the F9-Preview
2008-04-17 live image. None of them do anything even with nmi_watchdog set to 1
or 2. At one point, I even left the desktop for about an hour with no stack
trace, just a hard hang. No response from the keyboard or mouse and only a cold
reboot back to kernel-2.6.24.3-50 works. What else can I try? Thanks for your help.

Comment 3 Naveed Hasan 2008-05-20 16:51:02 UTC
The problem persists with kernel-2.6.24.7-92.fc8.x86_64 and
kernel-2.6.25-3.fc9.i686 from the live image released with Fedora 9.

https://admin.fedoraproject.org/updates/F8/FEDORA-2008-3873


Comment 4 Chuck Ebbert 2008-05-23 05:48:15 UTC
Did you try the workarounds?
http://fedoraproject.org/wiki/KernelCommonProblems



Comment 5 Naveed Hasan 2008-05-27 18:04:25 UTC
No luck with the workarounds so far. I have been focusing on section 1.4
Crashes/Hangs and no set of options has yielded a different result except for
acpi=off which causes a kernel panic (not syncing: videodev: bad unregister)
even earlier than the usual hang.

Comment 6 Naveed Hasan 2008-05-27 20:14:31 UTC
An interesting thing I've noticed while rebooting to test all these times is
that when the system hangs, they keyboard completely stops responding BUT the
kernel seems to be alive to some extent in that USB device attach and detach
messages appear on the console. This is without any special kernel options being
used. I noticed this when trying to use a live USB image of Fedora 9, hanging on
boot and pulling out the USB device before hard resetting the machine.

Comment 8 Naveed Hasan 2008-06-06 19:37:51 UTC
Same issue with kernel-2.6.25.4-10.fc8 which just hit stable.

https://admin.fedoraproject.org/updates/F8/FEDORA-2008-4484


Comment 9 Chuck Ebbert 2008-06-09 03:35:43 UTC
Hmm... can you try adding:

  pci=rom

to the kernel boot options? The default was changed between -50 and -64...


Comment 10 Naveed Hasan 2008-06-10 00:28:25 UTC
pci=rom does not make any difference. What is the new default?


Comment 11 Naveed Hasan 2008-06-10 04:17:59 UTC
Based on helpful information in
https://bugzilla.redhat.com/show_bug.cgi?id=446763 I was able to narrow down the
problem to my ALi Corporation M5253 P1394 OHCI 1.1 Controller. This is a generic
PCI card that provides both USB2 and FW1 ports in
which the Firewire component has NEVER worked in Fedora and the USB worked fine.

After disabling the Firewire OHCI driver by renaming
/lib/modules/2.6.25.4-10.fc8/kernel/drivers/firewire/firewire-ohci.ko I was able
to get past the "Enabling /etc/fstab swaps: OK" message and boot into runlevel 5.

So it appears that something was changed in firewire-ohci between
kernel-2.6.24.3-50 and kernel-2.6.24.4-64 that causes my nonfunctional (in
Fedora) Firewire card to prevent the entire system from booting.

I hope that at the end of this process I will have both a booting machine
without renaming stock driver files AND my Firewire ports will work. One can
dream, no?

Thanks for all your help, Chuck!


Comment 12 Naveed Hasan 2008-06-13 21:35:10 UTC
I have since upgraded to Fedora 9, which needed a nofirewire kernel option to
get anaconda going. With the latest released kernel-2.6.25.6-55.fc9, my keyboard
stops responding during boot up. This happens when the system is "Starting udev"
and before it says "[OK]" for that item. I cannot type anything afterwards, but
can Ctrl-Alt-Delete on one of the ttys to trigger a reboot.

Instead of renaming the firewire-ohci.ko file, I added 'blacklist firewire-ohci'
to modprobe.conf and can boot successfully into the newest kernel. The 'ALi
Corporation M5253 P1394 OHCI 1.1 Controller' of course, does not work.

Comment 13 Chuck Ebbert 2008-06-15 20:43:30 UTC
Maybe we should just add a temporary workaround, printing an error and skipping
initialization when we hit one of these devices?

Comment 14 Stefan Richter 2008-06-17 18:15:50 UTC
proposed fix for part of the problem:
patch "firewire: deadline for PHY config transmission"
http://marc.info/?l=linux1394-devel&m=121372642105480

see also https://bugzilla.redhat.com/show_bug.cgi?id=446763#c38

Comment 15 Jarod Wilson 2008-06-17 20:01:57 UTC
I will attempt to put a test kernel together this evening and post it somewhere
for folks to try out. Also planning to acquire an ALi card to beat on myself...

Comment 16 Jarod Wilson 2008-06-18 04:13:19 UTC
x86_64 test kernel w/patch in comment #14 here:

http://people.redhat.com/jwilson/kernels/2.6.25.7-64.fw.fc9/

Comment 17 Stefan Richter 2008-06-18 09:33:18 UTC
I got a Belkin F5U508 PCI card with ALi M5271 now.  lspci says:

05:00.4 FireWire (IEEE 1394) [0c00]: ALi Corporation M5253 P1394 OHCI 1.1
Controller [10b9:5253] (prog-if 10 [OHCI])
        Subsystem: Belkin Unknown device [1799:0519]

firewire-ohci starts fine with it.  I have the deadline patch applied.  When I
plug something in, no bus reset IRQ happens.  I.e. the PHY is completely dead.

However, ohci1394 does work; I can access an SBP-2 disk through it.

When I then "modprobe firewire-ohci debug=7" with the disk already plugged in, I
get:

firewire_ohci: Added fw-ohci device 0000:02:02.0, OHCI version 1.10
firewire_ohci: IRQ 00010010 selfID AR_req
firewire_ohci: 1 selfIDs, generation 1, local node ID ffc0
firewire_ohci: selfID 0: 807fcc56, phy 0 [---] beta gc=63 -3W Lci
firewire_ohci: AR evt_bus_reset, generation 1

(this was from a FireWire 800 card)

firewire_ohci: Added fw-ohci device 0000:05:00.4, OHCI version 1.10
firewire_ohci: IRQ 00000010 AR_req
firewire_ohci: AR evt_bus_reset, generation 1
firewire_ohci: IRQ 00010000 selfID
firewire_ohci: 2 selfIDs, generation 1, local node ID ffc0
firewire_ohci: selfID 0: 807f8c66, phy 0 [-p-] S400 gc=63 -3W Lci
firewire_ohci: selfID 0: 817f8470, phy 1 [-c.] S400 gc=63 -3W L

(this is the ALi card, i.e. selfID reception now works, probably thanks to
ohci1394's twiddling with it)

firewire_ohci: Added fw-ohci device 0000:05:04.0, OHCI version 1.10
firewire_ohci: IRQ 00000010 AR_req
firewire_ohci: AR evt_bus_reset, generation 1
firewire_ohci: IRQ 00010000 selfID
firewire_ohci: 1 selfIDs, generation 1, local node ID ffc0
firewire_ohci: selfID 0: 807f8952, phy 0 [--.] S400 gc=63 +15W Lci

(this is an onboard VT6307)

firewire_core: created device fw0: GUID 080028560000319b, S800
firewire_core: created device fw1: GUID 0030bd051800064f, S400
firewire_ohci: AT spd 0 tl 17, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_core: created device fw2: GUID 0010dc5600fed2d4, S400
firewire_ohci: AT spd 0 tl 18, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 19, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 1a, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 1b, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 1c, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 1d, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 1e, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 1f, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 00, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_ohci: AT spd 0 tl 01, ffc0 -> ffc1, pending/cancelled, QR req, fffff0000400
firewire_core: giving up on config rom for node id ffc1

(these mean split transaction timeouts when fw-device.c attempts to read the
SBP-2 disk's config ROM)

firewire_core: phy config: card 1, new root=ffc0, gap_count=5

(now fw-card.c tries to select an IRM capable root node and to perform gap count
optimization)

------------[ cut here ]------------
WARNING: at drivers/firewire/fw-transaction.c:352 fw_card_bm_work+0x176/0x380
[firewire_core]()
Modules linked in: firewire_ohci firewire_core crc_itu_t i915 drm
cpufreq_ondemand acpi_cpufreq freq_table snd_pcm_oss snd_mixer_oss snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device nfsd lockd sunrpc exportfs coretemp
w83627ehf hwmon_vid sg sd_mod usbhid hid ehci_hcd ata_piix processor libata
snd_hda_intel snd_pcm yenta_socket rsrc_nonstatic pcmcia_core thermal_sys hwmon
uhci_hcd snd_timer e1000 usbcore snd snd_page_alloc dock rtc [last unloaded:
ieee1394]
Pid: 9, comm: events/0 Tainted: G        W 2.6.26-rc6 #26
 [<c0121e7f>] warn_on_slowpath+0x5f/0x90
 [<c02f62e5>] _spin_lock_irqsave+0x45/0x60
 [<c012b197>] lock_timer_base+0x27/0x60
 [<c012b197>] lock_timer_base+0x27/0x60
 [<c02f662a>] _spin_unlock_irqrestore+0x2a/0x50
 [<c012b218>] try_to_del_timer_sync+0x48/0x50
 [<c012b22e>] del_timer_sync+0xe/0x20
 [<c02f42d2>] schedule_timeout+0x52/0xd0
 [<c02f6255>] _spin_lock_irq+0x35/0x40
 [<c02f6676>] _spin_unlock_irq+0x26/0x40
 [<c02f38b0>] wait_for_common+0x120/0x180
 [<c011af50>] default_wake_function+0x0/0x10
 [<f92a1f4b>] fw_send_phy_config+0xab/0xf0 [firewire_core]
 [<f92a07d6>] fw_card_bm_work+0x176/0x380 [firewire_core]
 [<f92a18e0>] transmit_complete_callback+0x0/0xa0 [firewire_core]
 [<f92a2fb0>] fw_device_init+0x0/0x2a0 [firewire_core]
 [<c0119fde>] __wake_up+0x3e/0x60
 [<f92a2fb0>] fw_device_init+0x0/0x2a0 [firewire_core]
 [<f92a2f8b>] fw_device_release+0x5b/0x80 [firewire_core]
 [<f92a31a9>] fw_device_init+0x1f9/0x2a0 [firewire_core]
 [<c02f6255>] _spin_lock_irq+0x35/0x40
 [<f92a0660>] fw_card_bm_work+0x0/0x380 [firewire_core]
 [<c0131d79>] run_workqueue+0x159/0x1f0
 [<c0131d21>] run_workqueue+0x101/0x1f0
 [<c0135570>] autoremove_wake_function+0x0/0x50
 [<c0132818>] worker_thread+0x98/0xf0
 [<c0135570>] autoremove_wake_function+0x0/0x50
 [<c0132780>] worker_thread+0x0/0xf0
 [<c0135262>] kthread+0x42/0x70
 [<c0135220>] kthread+0x0/0x70
 [<c0103d6f>] kernel_thread_helper+0x7/0x18
 =======================
---[ end trace 03dad1d6d51fa423 ]---

(the PHY config packet transmission from fw-card.c timed out)

Comment 18 Stefan Richter 2008-06-18 16:34:15 UTC
I posted an update of the patch which improves it for working controllers but is
effectively unchanged for non-working controllers.
http://marc.info/?l=linux1394-devel&m=121380606431945

Comment 19 Naveed Hasan 2008-06-19 17:32:09 UTC
From lspci -vvv output -

04:01.4 FireWire (IEEE 1394): ALi Corporation M5253 P1394 OHCI 1.1 Controller
(prog-if 10 [OHCI])
        Subsystem: ALi Corporation M5253 P1394 OHCI 1.1 Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64 (750ns max), Cache Line Size: 64 bytes
        Interrupt: pin C routed to IRQ 9
        Region 0: Memory at dccf8800 (32-bit, non-prefetchable) [size=2K]
        Expansion ROM at dcc00000 [disabled] [size=64K]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1+,D2+,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Kernel modules: firewire-ohci


Comment 20 Naveed Hasan 2008-06-19 17:43:40 UTC
(In reply to comment #16)
> x86_64 test kernel w/patch in comment #14 here:
> 
> http://people.redhat.com/jwilson/kernels/2.6.25.7-64.fw.fc9/

I cold booted with this kernel and 'options firewire-ohci debug=7' in
modprobe.conf and nothing plugged into the firewire bus of the card. Here's what
I see -

firewire_ohci: Added fw-ohci device 0000:04:01.4, OHCI version 1.10
firewire_ohci: IRQ 00000010 AR_req
firewire_ohci: IRQ 00010000 selfID
firewire_ohci: 1 selfIDs, generation 1, local node ID ffc0
firewire_core: created device fw0: GUID 0090e639000000f4, S400
firewire_ohci: IRQ 00200000 cycle64Seconds
firewire_ohci: IRQ 00200000 cycle64Seconds
firewire_ohci: IRQ 00200000 cycle64Seconds


Comment 21 Naveed Hasan 2008-06-19 17:51:01 UTC
Another cold boot, now with an external Firewire hub already plugged into the card -

ACPI: PCI Interrupt 0000:04:01.4[C] -> GSI 19 (level, low) -> IRQ 19
firewire_ohci: Added fw-ohci device 0000:04:01.4, OHCI version 1.10
firewire_ohci: IRQ 00000010 AR_req
AR evt_bus_reset, generation 1
firewire_ohci: IRQ 00010000 selfID
firewire_ohci: 2 selfIDs, generation 1, local node ID ffc1
selfID 0: 803f8c64, phy 0 [-p-] S400 gc=63 -3W c
selfID 0: 817f8872, phy 1 [-c.] S400 gc=63 +0W Lci
firewire_core: created device fw0: GUID 0090e639000000f4, S400
firewire_core: phy config: card 0, new root=ffc1, gap_count=5
------------[ cut here ]------------
WARNING: at drivers/firewire/fw-transaction.c:350 fw_send_phy_config+0xdf/0xeb
[firewire_core]() (Not tainted)
Modules linked in: v4l2_common tveeprom dcdbas parport_pc parport i2c_i801
i2c_core pcspkr iTCO_wdt iTCO_vendor_support sg snd_intel8x0 snd_ac97_codec
ac97_bus firewire_ohci shpchp firewire_core snd_seq_dummy crc_itu_t
pata_pdc2027x snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer snd_page_alloc snd_usb_lib snd_rawmidi snd_seq_device
snd_hwdep tg3 hci_usb button pwc joydev snd compat_ioctl32 videodev v4l1_compat
bluetooth soundcore ahci dm_snapshot dm_zero dm_mirror dm_mod ata_piix pata_acpi
ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
[last unloaded: scsi_wait_scan]
Pid: 9, comm: events/0 Not tainted 2.6.25.7-64.fw.fc9.x86_64 #1

Call Trace:
 [<ffffffff81033571>] warn_on_slowpath+0x60/0x91
 [<ffffffff8103cc79>] ? process_timeout+0x0/0xb
 [<ffffffff8128d5f9>] ? schedule_timeout+0x88/0xb4
 [<ffffffff8128d48c>] ? wait_for_common+0x10b/0x137
 [<ffffffff8102b55f>] ? default_wake_function+0x0/0xf
 [<ffffffff881fdc04>] :firewire_core:fw_send_phy_config+0xdf/0xeb
 [<ffffffff881fc52a>] :firewire_core:fw_card_bm_work+0x356/0x3c9
 [<ffffffff81043718>] ? insert_work+0x5b/0x5f
 [<ffffffff81043aff>] ? __queue_work+0x36/0x3f
 [<ffffffff81043b8b>] ? queue_work+0x47/0x50
 [<ffffffff81043ffd>] ? queue_delayed_work+0x33/0x4f
 [<ffffffff81044045>] ? schedule_delayed_work+0x2c/0x33
 [<ffffffff881ff2c2>] ? :firewire_core:fw_device_init+0x255/0x273
 [<ffffffff881fc1d4>] ? :firewire_core:fw_card_bm_work+0x0/0x3c9
 [<ffffffff8104351d>] run_workqueue+0x84/0x10c
 [<ffffffff81043682>] worker_thread+0xdd/0xee
 [<ffffffff81046b83>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff810435a5>] ? worker_thread+0x0/0xee
 [<ffffffff81046863>] kthread+0x49/0x76
 [<ffffffff8100ccf8>] child_rip+0xa/0x12
 [<ffffffff8104681a>] ? kthread+0x0/0x76
 [<ffffffff8100ccee>] ? child_rip+0x0/0x12

---[ end trace 0f103775ac4c7d8a ]---
firewire_ohci: IRQ 00200000 cycle64Seconds
firewire_ohci: IRQ 00200000 cycle64Seconds
firewire_ohci: IRQ 00200000 cycle64Seconds


Comment 22 Naveed Hasan 2008-06-19 17:59:17 UTC
(In reply to comment #14)
> proposed fix for part of the problem:
> patch "firewire: deadline for PHY config transmission"
> http://marc.info/?l=linux1394-devel&m=121372642105480
> 
> see also https://bugzilla.redhat.com/show_bug.cgi?id=446763#c38

Your patch does fix the locked up keyboard problem nicely. With
kernel-2.6.25.7-64.fw.fc9 I no longer need to blacklist firewire-ohci. Thanks!


Comment 23 Stefan Richter 2008-06-21 09:37:49 UTC
upstream bug status:
10796 - closed, fixed in v2.6.26-rc7
10935 - open, fw-ohci's bus reset tasklet is unable to finish the first bus
reset event

Comment 24 Fedora Update System 2008-06-30 16:33:47 UTC
kernel-2.6.25.9-76.fc9 has been submitted as an update for Fedora 9

Comment 25 Fedora Update System 2008-07-01 05:28:20 UTC
kernel-2.6.25.9-76.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-5893

Comment 26 Fedora Update System 2008-07-02 06:34:34 UTC
kernel-2.6.25.9-76.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 27 Fedora Update System 2008-07-04 03:40:10 UTC
kernel-2.6.25.9-76.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.