Bug 228017

Summary:	firewire noise on connection
Product:	[Fedora] Fedora	Reporter:	Bill Nottingham <notting>
Component:	kernel	Assignee:	Kristian Høgsberg <krh>
Status:	CLOSED RAWHIDE	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rawhide	CC:	cebbert, davej, fenlason, jarod, krh, rvokal, stefan-r-rhbz, wtogami, zaitcev
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-03-04 16:27:00 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Bill Nottingham 2007-02-09 16:07:46 UTC

2.6.20-1.2922.fc7, x86_64.

I connected the only FW device I have around (an IDE DVD/CD-RW in a crappy
enclosure.) This is what I got:

fw_core: created new fw device fw0 (0 config rom retries)
fw_sbp2: logged in to sbp2 unit fw0.0
fw_sbp2:  - management_agent_address: 0xfffff0030000
fw_sbp2:  - command_block_agent_address: 0xffc0fffff0100000
fw_sbp2:  - status write address: 0x000100000000
fw_sbp2: agent reset write rcode=0
scsi7 : SBP-2 IEEE-1394
scsi 7:0:0:0: CD-ROM            COMBI    RW32/DVD         B1.0 PQ: 0 ANSI: 0
sr1: scsi3-mmc drive: 0x/0x caddy
sr 7:0:0:0: Attached scsi CD-ROM sr1
sr 7:0:0:0: Attached scsi generic sg2 type 5
fw_sbp2: management write failed, rcode 0x10
fw_sbp2: no command orb status, rcode=16
fw_sbp2: no command orb status, rcode=16
fw_sbp2: no command orb status, rcode=16
fw_sbp2: no command orb status, rcode=16
fw_sbp2: no command orb status, rcode=16
fw_sbp2: no command orb status, rcode=16
fw_sbp2: no command orb status, rcode=16
fw_sbp2: no command orb status, rcode=16
fw_sbp2: removed sbp2 unit fw0.0

=================================
[ INFO: inconsistent lock state ]
2.6.20-1.2922.fc7 #1
---------------------------------
inconsistent {softirq-on-W} -> {in-softirq-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
 (&k->k_lock){-+..}, at: [<ffffffff80457ef4>] klist_next+0x24/0x8a
{softirq-on-W} state was registered at:
  [<ffffffff802a51b8>] __lock_acquire+0x454/0xa50
  [<ffffffff8020b39b>] kfree+0x26c/0x27e
  [<ffffffff802a5a91>] lock_acquire+0x4c/0x65
  [<ffffffff80457ef4>] klist_next+0x24/0x8a
  [<ffffffff80263a05>] _spin_lock+0x25/0x31
  [<ffffffff80457ef4>] klist_next+0x24/0x8a
  [<ffffffff803b0678>] __device_attach+0x0/0xb
  [<ffffffff803af887>] bus_for_each_drv+0x52/0x7d
  [<ffffffff803b070a>] device_attach+0x6a/0x7e
  [<ffffffff803af80d>] bus_attach_device+0x24/0x4c
  [<ffffffff803ae6c3>] device_add+0x3b6/0x55b
  [<ffffffff8029d936>] init_waitqueue_head+0x1f/0x30
  [<ffffffff803ae881>] device_register+0x19/0x1d
  [<ffffffff8067d348>] acpi_scan_init+0xda/0x1a4
  [<ffffffff80269c43>] init+0x1e7/0x3ae
  [<ffffffff802a48e1>] trace_hardirqs_on+0x11a/0x13e
  [<ffffffff8025cf38>] child_rip+0xa/0x12
  [<ffffffff80263cb0>] _spin_unlock_irq+0x2b/0x31
  [<ffffffff8025c6bc>] restore_args+0x0/0x30
  [<ffffffff80269a5c>] init+0x0/0x3ae
  [<ffffffff8025cf2e>] child_rip+0x0/0x12
  [<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 671234127
hardirqs last  enabled at (671234126): [<ffffffff80263d26>]
_spin_unlock_irqrestore+0x3f/0x47
hardirqs last disabled at (671234127): [<ffffffff80263d88>]
_spin_lock_irqsave+0x13/0x3c
softirqs last  enabled at (671234116): [<ffffffff80211c21>] __do_softirq+0xda/0xe3
softirqs last disabled at (671234119): [<ffffffff8025d2ac>] call_softirq+0x1c/0x28

other info that might help us debug this:
1 lock held by swapper/0:
 #0:  (&card->lock){.+..}, at: [<ffffffff8816f95f>]
fw_core_handle_bus_reset+0x3b/0x548 [fw_core]

stack backtrace:

Call Trace:
 <IRQ>  [<ffffffff802a3bc1>] print_usage_bug+0x162/0x173
 [<ffffffff802a43db>] mark_lock+0x139/0x411
 [<ffffffff802a5139>] __lock_acquire+0x3d5/0xa50
 [<ffffffff80256a89>] klist_children_put+0x0/0xf
 [<ffffffff802a5a91>] lock_acquire+0x4c/0x65
 [<ffffffff80457ef4>] klist_next+0x24/0x8a
 [<ffffffff80263a05>] _spin_lock+0x25/0x31
 [<ffffffff80457ef4>] klist_next+0x24/0x8a
 [<ffffffff8817149e>] :fw_core:update_unit+0x0/0x2f
 [<ffffffff803adb02>] next_device+0x9/0x19
 [<ffffffff803adb80>] device_for_each_child+0x36/0x59
 [<ffffffff8817126e>] :fw_core:fw_node_event+0x110/0x17b
 [<ffffffff8816f8cb>] :fw_core:fw_node_create+0x2a/0x83
 [<ffffffff8816fcd9>] :fw_core:fw_core_handle_bus_reset+0x3b5/0x548
 [<ffffffff881a4baf>] :fw_ohci:bus_reset_tasklet+0x1ae/0x1bb
 [<ffffffff802915f7>] tasklet_action+0x5e/0xb2
 [<ffffffff80211ba6>] __do_softirq+0x5f/0xe3
 [<ffffffff8025d2ac>] call_softirq+0x1c/0x28
 [<ffffffff8026bdfc>] do_softirq+0x39/0x9f
 [<ffffffff802914cd>] irq_exit+0x4e/0x50
 [<ffffffff8026bfb0>] do_IRQ+0x14e/0x170
 [<ffffffff80255c77>] mwait_idle+0x0/0x50
 [<ffffffff8025c666>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80260f3e>] __sched_text_start+0xafe/0xb36
 [<ffffffff80255cbd>] mwait_idle+0x46/0x50
 [<ffffffff802481e1>] enter_idle+0x22/0x24
 [<ffffffff802483dc>] cpu_idle+0xa1/0xc4
 [<ffffffff80275c70>] start_secondary+0x46e/0x47d

fw_sbp2: logged in to sbp2 unit fw1.0
fw_sbp2:  - management_agent_address: 0xfffff0030000
fw_sbp2:  - command_block_agent_address: 0xffc1fffff0100000
fw_sbp2:  - status write address: 0x000100000000
fw_sbp2: agent reset write rcode=0
scsi8 : SBP-2 IEEE-1394
scsi 8:0:0:0: CD-ROM            COMBI    RW32/DVD         B1.0 PQ: 0 ANSI: 0
sr1: scsi3-mmc drive: 0x/0x caddy
sr 8:0:0:0: Attached scsi CD-ROM sr1
sr 8:0:0:0: Attached scsi generic sg2 type 5
fw_core: created new fw device fw1 (4 config rom retries)


It appears to write a CD OK, however, it gets stuck at the end and reports:

cdrecord: Input/output error. prevent/allow medium removal: scsi sendcmd:
retryable error
CDB:  1E 00 00 00 00 00
status: 0x0 (GOOD STATUS)
cmd finished after 0.000s timeout 200s

Comment 1 Bill Nottingham 2007-02-09 16:11:19 UTC

... and the disc a) won't eject b) appears to be a coaster.

Comment 2 Bill Nottingham 2007-02-13 21:22:08 UTC

Still happens with:

* Mon Feb 12 2007 Kristian Høgsberg <krh>
- Update firewire patch with latest usptream changes.

Comment 3 Bill Nottingham 2007-02-13 21:29:22 UTC

Also, it now is unable to burn CDs:

fw_sbp2: no command orb status, rcode=17
fw_sbp2: no command orb status, rcode=17
fw_sbp2: status write for unknown orb
fw_sbp2: no command orb status, rcode=17

Comment 4 Bill Nottingham 2007-02-13 21:31:47 UTC

Also also, powering off the drive yields:

fw_core: phy config: card 0, new root=ffc1, gap_count=5
fw_sbp2: management write failed, rcode 0x11
fw_sbp2: reconnect attempt 1 for fw1.0 failed, rescheduling
fw_sbp2: management write failed, rcode 0x13
fw_sbp2: removed sbp2 unit fw1.0
list_add corruption. prev->next should be next (ffff81007e6841c0), but was
6b6b6b6b6b6b6b6b. (prev=ffff810076089480).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [1] SMP 
last sysfs file: /class/drm/card0/dev
CPU 1 
Modules linked in: nfs lockd nfs_acl i915 drm netconsole autofs4 hidp rfcomm
l2cap bluetooth sunrpc nf_conntrack_netbios_ns nf_conntrack_ipv4 xt_state
nf_conntrack nfnetlink ipt_REJECT iptable_filter ip_tables xt_tcpudp ip6t_REJECT
ip6table_filter ip6_tables x_tables cpufreq_ondemand dm_multipath video sbs
i2c_ec dock button battery asus_acpi backlight ac ipv6 lp fw_sbp2 snd_hda_intel
snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss parport_pc fw_ohci fw_core parport
snd_pcm i2c_i801 serio_raw pcspkr ata_generic i2c_core snd_timer iTCO_wdt snd
sr_mod iTCO_vendor_support cdrom shpchp e1000 soundcore snd_page_alloc sg
dm_snapshot dm_zero dm_mirror dm_mod pata_marvell ahci libata sd_mod scsi_mod
ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
Pid: 9, comm: events/1 Not tainted 2.6.20-1.2925.fc7 #1
RIP: 0010:[<ffffffff80342f27>]  [<ffffffff80342f27>] __list_add+0x47/0x5b
RSP: 0018:ffff81007e629c40  EFLAGS: 00010086
RAX: 0000000000000079 RBX: ffff810076886868 RCX: ffffffff806e07d9
RDX: ffff81007e3c6080 RSI: 0000000000000000 RDI: ffffffff80568ac0
RBP: ffff81007e629c40 R08: ffffffff80568af0 R09: 0000000000000086
R10: ffffffff80289d0b R11: 0000000000000000 R12: ffff81007e684000
R13: ffff81007e684000 R14: 0000000100009c18 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff81007e6f20a0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaab1b9030 CR3: 0000000058e50000 CR4: 00000000000006e0
Process events/1 (pid: 9, threadinfo ffff81007e628000, task ffff81007e3c6080)
Stack:  ffff81007e629c50 ffffffff80293f77 ffff81007e629ca0 ffffffff8021c8a1
 0000000000000282 0000000000000046 0000000000000286 ffff810076886848
 ffff810003f714d0 ffff810076886000 ffff810076886478 0000000000000282
Call Trace:
 [<ffffffff80293f77>] internal_add_timer+0xbb/0xbd
 [<ffffffff8021c8a1>] __mod_timer+0x9d/0xbb
 [<ffffffff8029a97c>] queue_delayed_work+0x7f/0x88
 [<ffffffff8029a99e>] schedule_delayed_work+0x19/0x1b
 [<ffffffff881bc639>] :fw_core:fw_card_bm_work+0x1c1/0x2c6
 [<ffffffff80263cb0>] _spin_unlock_irq+0x2b/0x31
 [<ffffffff881bc478>] :fw_core:fw_card_bm_work+0x0/0x2c6
 [<ffffffff8024c6fd>] run_workqueue+0xa2/0x187
 [<ffffffff8024909f>] worker_thread+0x0/0x150
 [<ffffffff802491b8>] worker_thread+0x119/0x150
 [<ffffffff8028ae43>] default_wake_function+0x0/0xf
 [<ffffffff8024909f>] worker_thread+0x0/0x150
 [<ffffffff802330cc>] kthread+0xd7/0x10a
 [<ffffffff8025cf38>] child_rip+0xa/0x12
 [<ffffffff80263cb0>] _spin_unlock_irq+0x2b/0x31
 [<ffffffff8025c6bc>] restore_args+0x0/0x30
 [<ffffffff80289d0b>] task_rq_lock+0x42/0x74
 [<ffffffff80232ff5>] kthread+0x0/0x10a
 [<ffffffff8025cf2e>] child_rip+0x0/0x12


Code: 0f 0b eb fe 48 89 7e 08 48 89 37 48 89 57 08 48 89 3a c9 c3 
RIP  [<ffffffff80342f27>] __list_add+0x47/0x5b
 RSP <ffff81007e629c40>

Comment 5 Kristian Høgsberg 2007-02-14 00:22:44 UTC

It the crash always reproducible on unplug?  Do you have other devices connected
or is it only the box and the drive?  I have an idea what this could be, I'll
take a look.

The inconsistent lock state is a known problem, I just need to come up with
better way of iterating over device sub-units.

Comment 6 Bill Nottingham 2007-02-14 18:19:21 UTC

The crash isn't always reproducible. I assume it's related to the drive going
off into la-la land when attempting to write a CD.

Comment 7 Jarod Wilson 2007-02-16 20:12:38 UTC

I'm seeing some of the same initial spew (but not quite as bad) when I hook up a
burner (before even powering it on).

Interestingly, the first time I hooked up the burner and powered it on, CDs
wouldn't mount anywhere and I couldn't find the device to burn on. The following
is the only feedback I get on the console when I power the drive off and on:

fw_core: phy config: card 0, new root=ffc2, gap_count=7

After completely removing the device from the firewire bus, hooking it back up
and powering it on, I now get this:

fw_core: phy config: card 0, new root=ffc2, gap_count=7
fw_core: phy config: card 0, new root=ffc2, gap_count=7
fw_core: phy config: card 0, new root=ffc2, gap_count=7
fw_core: created new fw device fw1 (5 config rom retries)
fw_core: giving up on config rom for node id ffc0
fw_sbp2: logged in to sbp2 unit fw1.0
fw_sbp2:  - management_agent_address: 0xfffff0030000
fw_sbp2:  - command_block_agent_address: 0xffc1fffff0100000
fw_sbp2:  - status write address: 0x000100000000
fw_sbp2: agent reset write rcode=0
scsi3 : SBP-2 IEEE-1394
scsi 3:0:0:0: CD-ROM            TEAC     CD-W540E         1.0B PQ: 0 ANSI: 0
sr2: scsi3-mmc drive: 48x/48x writer cd/rw xa/form2 cdda tray
sr 3:0:0:0: Attached scsi CD-ROM sr2
sr 3:0:0:0: Attached scsi generic sg4 type 5


That looks much better. However, I still don't seem to be able to access the
drive at all.


Nb: just to eliminate any concerns over the device being bad, it does work when
hooked up to an FC6 machine w/the old firewire stack, as well as under Mac OS X.

Comment 8 Jarod Wilson 2007-02-16 20:56:36 UTC

Hooking up a firewire hard drive to the same system yields the following on the
console:

fw_sbp2: management write failed, rcode 0x12
fw_sbp2: reconnect attempt 1 for fw1.0 failed, rescheduling
fw_sbp2: reconnected to unit fw1.0
fw_sbp2: agent reset write rcode=0
fw_sbp2: reconnected to unit fw1.0
fw_sbp2: agent reset write rcode=0

Drive never shows its face though.

At least the box doesn't freak out when I power things off and disconnect
them... Bouncing box w/o the firewire burner in the picture to see if behavior
is any different. Both of these devices are in ADS Pyro cases w/Oxford 911
chipsets, fwiw.

Okay, w/o the burner involved, same spew at first connection. Upon powering the
drive on, I see two 'fw_core: phy config: card 0, new root=ffc1, gap_count=5'
messages and then nada. Powering off and on does nothing. Powering off,
unplugging, replugging and powering back up results in some extra output:

fw_core: phy config: card 0, new root=ffc1, gap_count=5
fw_core: giving up on config rom for node id ffc0
fw_core: phy config: card 0, new root=ffc1, gap_count=5

But still can't get at the drive.

Should this be in a different bugzilla, perhaps?

Comment 9 Jarod Wilson 2007-02-16 21:25:21 UTC

At notting's suggestion, I rolled back to 2922 (all prior stuff was w/2925).
Much less spew just connecting devices:

fw_core: giving up on config rom for node id 1
fw_core: giving up on config rom for node id 0

Powering on both devices (hard drive, then burner) results in no additional
output though. Disconnecting and reconnecting, then powering up again results in
the same two lines of output as above.

Comment 10 Kristian Høgsberg 2007-02-16 23:25:47 UTC

There's three different things going on in this bug:

  - the 'lock inconsistent' warning which is dues to calling
device_for_each_child() in IRQ context.

  - the oops, which looks like the reconnect tasklet got scheduled and then the
device got freed, screwing up the keventd job queue

  - failure to connect to oxford fw911 based drives.

And we should probably have seperate bugzillas.  Btw. this log output

  fw_core: phy config: card 0, new root=ffc2, gap_count=7

it harmless and just means that the stack is reconfiguring the bus.

Jarod are you here in Westford?  I'd like to take a look at those drives.

Comment 11 Jarod Wilson 2007-02-19 15:31:31 UTC

Yep, I'm in Westford.

Comment 12 Kristian Høgsberg 2007-03-16 00:52:00 UTC

The lock order warning is fixed, the crash on power down should be fixed, I'll
increase the timeouts and number of retries so the devices are detected from
powerup.  I'm using Jarods harddisk and cdrom without problems now, just booted
off of a fedora rescue cd burned on the burner.

Comment 13 Jarod Wilson 2007-03-16 19:20:37 UTC

Ooh, cool! In that case, I'd like to swipe 'em back from ya and start playing
with 'em again, see if anything blows up on different arches or firewire
controllers... :)

Comment 14 Jarod Wilson 2007-03-21 20:33:50 UTC

Unfortunately, neither the CD-RW or HD seem to want to play nice on the box I'd
had them hooked to where I previously captured output (now running kernel
2.6.20-1.2999.fc6.i686). All I see while powering up, powering down, unplugging,
plugging, powering back up, etc. is:

fw_core: phy config: card 0, new root=ffc1, gap_count=5
fw_core: phy config: card 0, new root=ffc2, gap_count=7
fw_core: phy config: card 0, new root=ffc2, gap_count=7
fw_core: giving up on config rom for node id ffc1
fw_core: giving up on config rom for node id ffc0
fw_core: phy config: card 0, new root=ffc2, gap_count=7
fw_core: phy config: card 0, new root=ffc2, gap_count=7
fw_core: phy config: card 0, new root=ffc2, gap_count=7

Firewire stack hates my controller, perhaps? Its the firewire port on a
SoundBlaster Audigy.

I'll move the drives to another system and see what I can see there...

Comment 15 Jarod Wilson 2007-03-21 21:10:15 UTC

Okay, same kernel and arch, different box and firewire controller, things look
much better:

07:01.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000
Controller (PHY/Link) (prog-if 10 [OHCI])
        Subsystem: Intel Corporation Unknown device e302
        Flags: bus master, medium devsel, latency 32, IRQ 21
        Memory at 50204000 (32-bit, non-prefetchable) [size=2K]
        Memory at 50200000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [44] Power Management version 2

With that controller, I've got access to both the cd-rw drive and hard drive.

Comment 16 Kristian Høgsberg 2007-03-22 14:02:01 UTC

Very interesting - can you lend me the audigy controller for a few days?

Comment 17 Jarod Wilson 2007-04-05 21:55:47 UTC

So... The audigy works fine in Kristian's box, and the card that worked in one
box does NOT work in the same box where the audigy originally resided.

In semi-related news, I tried out the same devices with my PowerBook. Ouch.
First time I've ever seen xmon on a PowerBook... :) (will open another bz for
that one).

Comment 18 Jarod Wilson 2007-04-05 22:09:49 UTC

Upon unplugging and replugging the devices, I got the CD-RW drive to show up
when it was powered up. Once I powered up the hard drive though, I got a kernel
panic. Smells very similar to what happened with the PowerBook. That's all filed
as bug 235463 now.

Comment 19 Dave Jones 2007-08-27 14:18:58 UTC

Bill, all better with current builds?

Comment 20 Bill Nottingham 2007-08-29 16:51:15 UTC

fw bridge isn't available ATM. Feel free to close.

Comment 21 Bill Nottingham 2007-08-29 16:52:16 UTC

Whoops, found it. No strange noise anymore.

Comment 22 Stefan Richter 2008-03-04 16:24:06 UTC

Could someone put this bug entry to rest please?  :-)