Bug 182005

Summary:	problems and kernel oops when accessing IEEE1394 drive thru soundblaster audigy 4 pro
Product:	[Fedora] Fedora	Reporter:	Ronny Bremer <rbremer>
Component:	kernel	Assignee:	Dave Jones <davej>
Status:	CLOSED ERRATA	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	medium
Version:	5	CC:	pfrields, stefan-r-rhbz, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-10-18 06:29:41 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	175429

Description Ronny Bremer 2006-02-18 21:43:52 UTC

Description of problem:
when plugging in the IEEE drive into the SB firewire port only the message
"kernel: ieee1394: Error parsing configrom for node 0-00:1023" is displayed in
the system log. After manually loading SBP2 module, the attached dump happens.
Even though I can access the data on the drive, it should not generate the
kernel dump and also should be detected automatically.

Hint: the drive was detected without problems during the installation phase of
Fedora core 5 test 2.

Version-Release number of selected component (if applicable):
Kernel 2.6.15-1.1826.2.10_FC5smp

HW: Audigy 4 Pro Soundblaster device
lspci -v output:
02:0b.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
        Subsystem: Creative Labs Unknown device 2007
        Flags: bus master, medium devsel, latency 64, IRQ 193
        I/O ports at e400 [size=64]
        Capabilities: [dc] Power Management version 2

02:0b.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
(prog-if 10 [OHCI])
        Subsystem: Creative Labs SB Audigy FireWire Port
        Flags: bus master, medium devsel, latency 64, IRQ 217
        Memory at fbffb800 (32-bit, non-prefetchable) [size=2K]
        Memory at fbff4000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [44] Power Management version 2

attached drive: FREECOM 350GB Firewire hdd with two partitions

How reproducible:
Always

Steps to Reproduce:
1. boot the system
2. plug in the drive
3. load sbp2
  
Actual results:
after step 2 the error message " kernel: ieee1394: Error parsing configrom for
node 0-00:1023" is shown but nothing else happens
after step 3 the following dump will be logged:

Feb 18 22:30:17 fgrbhw01 kernel: sbp2: $Rev: 1306 $ Ben Collins
<bcollins>
Feb 18 22:30:17 fgrbhw01 kernel: ieee1394: sbp2: Driver forced to serialize I/O
(serialize_io=1)
Feb 18 22:30:17 fgrbhw01 kernel: ieee1394: sbp2: Try serialize_io=0 for better
performance
Feb 18 22:30:17 fgrbhw01 kernel: scsi2 : SCSI emulation for IEEE-1394 SBP-2 Devices
Feb 18 22:30:17 fgrbhw01 kernel: ieee1394: sbp2: Node 0-00:1023: Using 36byte
inquiry workaround
Feb 18 22:30:18 fgrbhw01 kernel: ieee1394: sbp2: Logged into SBP-2 device
Feb 18 22:30:18 fgrbhw01 kernel:   Vendor: Initio    Model: 0KLAT80          
Rev: 2.05
Feb 18 22:30:18 fgrbhw01 kernel:   Type:   Direct-Access                     
ANSI SCSI revision: 00
Feb 18 22:30:18 fgrbhw01 kernel: SCSI device sdb: 781422768 512-byte hdwr
sectors (400088 MB)
Feb 18 22:30:18 fgrbhw01 kernel: slab error in cache_free_debugcheck(): cache
`sgpool-8': double free, or memory outside object was overwritten
Feb 18 22:30:18 fgrbhw01 kernel:  [<c014d8bf>] cache_free_debugcheck+0xce/0x1b9
    [<c01486cb>] mempool_free+0x5f/0x63
Feb 18 22:30:18 fgrbhw01 kernel:  [<c014e230>] kmem_cache_free+0x2a/0x5c    
[<c01486cb>] mempool_free+0x5f/0x63
Feb 18 22:30:18 fgrbhw01 kernel:  [<f8864f65>] scsi_io_completion+0x65/0x3ce
[scsi_mod]     [<f8860bb3>] scsi_finish_command+0xb8/0xbd [scsi_mod]
Feb 18 22:30:18 fgrbhw01 kernel:  [<f8860ab6>] scsi_softirq+0x109/0x128
[scsi_mod]     [<c0127098>] __do_softirq+0x58/0xc2
Feb 18 22:30:18 fgrbhw01 kernel:  [<c0105f75>] do_softirq+0x46/0x4e
Feb 18 22:30:18 fgrbhw01 kernel:  =======================
Feb 18 22:30:18 fgrbhw01 kernel:  [<c0105e9a>] do_IRQ+0x72/0x7b     [<c01048fe>]
common_interrupt+0x1a/0x20
Feb 18 22:30:18 fgrbhw01 kernel:  [<f88c940b>] ext3_get_block_handle+0x0/0x2a5
[ext3]     [<f88c9714>] ext3_get_block+0x64/0x6c [ext3]
Feb 18 22:30:18 fgrbhw01 kernel:  [<f88c9f0f>] ext3_bmap+0x0/0x6d [ext3]    
[<c0165dec>] generic_block_bmap+0x28/0x35
Feb 18 22:30:18 fgrbhw01 kernel:  [<c02f599a>] io_schedule+0x26/0x30    
[<c02f5cd3>] out_of_line_wait_on_bit_lock+0x75/0x7d
Feb 18 22:30:18 fgrbhw01 kernel:  [<c01631d3>] sync_buffer+0x0/0x33    
[<f88c9f75>] ext3_bmap+0x66/0x6d [ext3]
Feb 18 22:30:18 fgrbhw01 kernel:  [<f88c96b0>] ext3_get_block+0x0/0x6c [ext3]  
  [<f88c9f0f>] ext3_bmap+0x0/0x6d [ext3]
Feb 18 22:30:18 fgrbhw01 kernel:  [<c0178e14>] bmap+0x23/0x27     [<f88961e9>]
journal_bmap+0x1d/0x64 [jbd]
Feb 18 22:30:18 fgrbhw01 kernel:  [<c01347cd>] wake_bit_function+0x0/0x3c    
[<c014d9a2>] cache_free_debugcheck+0x1b1/0x1b9
Feb 18 22:30:18 fgrbhw01 kernel:  [<f88961bd>] journal_next_log_block+0x74/0x83
[jbd]     [<f889623f>] journal_get_descriptor_buffer+0xf/0x8d [jbd]
Feb 18 22:30:19 fgrbhw01 kernel:  [<f8893709>]
journal_commit_transaction+0x61c/0xdbf [jbd]     [<c02f6269>]
_spin_lock_irqsave+0x9/0xd
Feb 18 22:30:19 fgrbhw01 kernel:  [<c012a32b>] try_to_del_timer_sync+0x44/0x4a 
   [<f88959aa>] kjournald+0xbd/0x20e [jbd]
Feb 18 22:30:19 fgrbhw01 kernel:  [<c011d4c9>] schedule_tail+0x36/0x8b    
[<f88958e8>] commit_timeout+0x0/0x5 [jbd]
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01347a0>] autoremove_wake_function+0x0/0x2d
    [<f88958ed>] kjournald+0x0/0x20e [jbd]
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01023a9>] kernel_thread_helper+0x5/0xb
Feb 18 22:30:19 fgrbhw01 kernel: f3fa3888: redzone 1: 0x170fc2a5, redzone 2:
0xc01485d0.
Feb 18 22:30:19 fgrbhw01 kernel: slab error in cache_alloc_debugcheck_after():
cache `sgpool-8': double free, or memory outside object was overwritten
Feb 18 22:30:19 fgrbhw01 kernel:  [<c014dd0e>]
cache_alloc_debugcheck_after+0x87/0x128     [<c014e03c>] kmem_cache_alloc+0x7d/0x86
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01485d0>] mempool_alloc+0x37/0xd3    
[<c01485d0>] mempool_alloc+0x37/0xd3
Feb 18 22:30:19 fgrbhw01 kernel:  [<f88600c3>] __scsi_get_command+0x13/0x57
[scsi_mod]     [<f8865321>] scsi_init_io+0x53/0xf6 [scsi_mod]
Feb 18 22:30:19 fgrbhw01 kernel:  [<f8865652>] scsi_prep_fn+0x150/0x1cc
[scsi_mod]     [<c01c830a>] elv_next_request+0x79/0x13d
Feb 18 22:30:19 fgrbhw01 kernel:  [<f8865778>] scsi_request_fn+0x5b/0x30a
[scsi_mod]     [<c012a2e1>] del_timer+0x41/0x47
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01ca11f>] __generic_unplug_device+0x1d/0x1f
    [<c01ca136>] generic_unplug_device+0x15/0x21
Feb 18 22:30:19 fgrbhw01 kernel:  [<f88a34f9>] dm_table_unplug_all+0x22/0x2b
[dm_mod]     [<f88a18c3>] dm_unplug_all+0x17/0x21 [dm_mod]
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01ca14e>] blk_backing_dev_unplug+0xc/0xd  
  [<c01631fe>] sync_buffer+0x2b/0x33
Feb 18 22:30:19 fgrbhw01 kernel:  [<c02f5b6a>] __wait_on_bit+0x33/0x58    
[<c01631d3>] sync_buffer+0x0/0x33
Feb 18 22:30:19 fgrbhw01 kernel:  [<c02f5c04>] out_of_line_wait_on_bit+0x75/0x7d
    [<c01631d3>] sync_buffer+0x0/0x33
Feb 18 22:30:19 fgrbhw01 kernel: slab error in cache_free_debugcheck(): cache
`sgpool-8': double free, or memory outside object was overwritten
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01347cd>] wake_bit_function+0x0/0x3c
[<c014d8bf>]     cache_free_debugcheck+0xce/0x1b9     [<c01486cb>]
mempool_free+0x5f/0x63
Feb 18 22:30:19 fgrbhw01 kernel:  [<c0163255>] __wait_on_buffer+0x1c/0x1f
Feb 18 22:30:19 fgrbhw01 kernel:  [<c014e230>] [<c0166086>]
kmem_cache_free+0x2a/0x5c     sync_dirty_buffer+0xa4/0xd6     [<c01486cb>]
[<f88930e1>] mempool_free+0x5f/0x63
Feb 18 22:30:19 fgrbhw01 kernel:  [<f8864f65>]
journal_write_commit_record+0xd5/0xe1 [jbd]
Feb 18 22:30:19 fgrbhw01 kernel:  scsi_io_completion+0x65/0x3ce [scsi_mod]    
[<f8893ab9>] journal_commit_transaction+0x9cc/0xdbf [jbd]     [<f8860bb3>]
scsi_finish_command+0xb8/0xbd [scsi_mod]
Feb 18 22:30:19 fgrbhw01 kernel:  [<c02f6269>] _spin_lock_irqsave+0x9/0xd
Feb 18 22:30:19 fgrbhw01 kernel:  [<f8860ab6>] scsi_softirq+0x109/0x128
[scsi_mod]     [<f88959aa>] [<c0127098>] kjournald+0xbd/0x20e [jbd]    
__do_softirq+0x58/0xc2
Feb 18 22:30:19 fgrbhw01 kernel:  [<c0105f75>] do_softirq+0x46/0x4e
Feb 18 22:30:19 fgrbhw01 kernel:  [<c011d4c9>] =======================
Feb 18 22:30:19 fgrbhw01 kernel:  schedule_tail+0x36/0x8b
Feb 18 22:30:19 fgrbhw01 kernel:  [<c0105e9a>] do_IRQ+0x72/0x7b     [<f88958e8>]
[<c01048fe>] common_interrupt+0x1a/0x20
Feb 18 22:30:19 fgrbhw01 kernel:  commit_timeout+0x0/0x5 [jbd]     [<c01347a0>]
autoremove_wake_function+0x0/0x2d
Feb 18 22:30:19 fgrbhw01 kernel:  [<c0155053>] do_no_page+0x98/0x2af    
[<f88958ed>] kjournald+0x0/0x20e [jbd]     [<c01023a9>]
kernel_thread_helper+0x5/0xbFeb 18 22:30:19 fgrbhw01 kernel: f3fa3934: redzone
1: 0x6b6b6b6b, redzone 2: 0x5a2cf071.
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01554cf>] __handle_mm_fault+0x16f/0x28b
Feb 18 22:30:19 fgrbhw01 kernel:  [<c02f7109>] do_page_fault+0x17f/0x5e2    
[<c0107c16>] do_syscall_trace+0x57/0x14a
Feb 18 22:30:19 fgrbhw01 kernel:  [<c02f6f8a>] do_page_fault+0x0/0x5e2    
[<c0104a57>] error_code+0x4f/0x54
Feb 18 22:30:19 fgrbhw01 kernel: f3fa3c90: redzone 1: 0x170fc2a5, redzone 2:
0xc01485d0.
Feb 18 22:30:19 fgrbhw01 kernel: sdb: got wrong page
Feb 18 22:30:19 fgrbhw01 kernel: sdb: assuming drive cache: write through
Feb 18 22:30:19 fgrbhw01 kernel: SCSI device sdb: 781422768 512-byte hdwr
sectors (400088 MB)
Feb 18 22:30:19 fgrbhw01 kernel: slab error in cache_free_debugcheck(): cache
`sgpool-8': double free, or memory outside object was overwritten
Feb 18 22:30:19 fgrbhw01 kernel:  [<c014d8bf>] cache_free_debugcheck+0xce/0x1b9
    [<c01486cb>] mempool_free+0x5f/0x63
Feb 18 22:30:19 fgrbhw01 kernel:  [<c014e230>] kmem_cache_free+0x2a/0x5c    
[<c01486cb>] mempool_free+0x5f/0x63
Feb 18 22:30:19 fgrbhw01 kernel:  [<f8864f65>] scsi_io_completion+0x65/0x3ce
[scsi_mod]     [<f8860bb3>] scsi_finish_command+0xb8/0xbd [scsi_mod]
Feb 18 22:30:19 fgrbhw01 kernel:  [<f8860ab6>] scsi_softirq+0x109/0x128
[scsi_mod]     [<c0127098>] __do_softirq+0x58/0xc2
Feb 18 22:30:19 fgrbhw01 kernel:  [<c0105f75>] do_softirq+0x46/0x4e
Feb 18 22:30:19 fgrbhw01 kernel:  =======================
Feb 18 22:30:19 fgrbhw01 kernel:  [<c0105e9a>] do_IRQ+0x72/0x7b     [<c01048fe>]
common_interrupt+0x1a/0x20
Feb 18 22:30:19 fgrbhw01 kernel:  [<c011c388>] kunmap_atomic+0x57/0x5d    
[<c0149649>] buffered_rmqueue+0x193/0x1c3
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01497a4>] get_page_from_freelist+0x90/0xa8
    [<c0149823>] __alloc_pages+0x67/0x275
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01593cb>] anon_vma_prepare+0x9b/0xbf    
[<c0154df8>] do_anonymous_page+0x63/0x226
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01554b3>] __handle_mm_fault+0x153/0x28b   
 [<c02f7109>] do_page_fault+0x17f/0x5e2
Feb 18 22:30:19 fgrbhw01 kernel:  [<c02f6f8a>] do_page_fault+0x0/0x5e2    
[<c0104a57>] error_code+0x4f/0x54
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01462fe>] file_read_actor+0x27/0xe0    
[<c0146011>] do_generic_mapping_read+0x1ac/0x472
Feb 18 22:30:19 fgrbhw01 kernel:  [<c0146524>]
__generic_file_aio_read+0x16d/0x1b4     [<c01462d7>] file_read_actor+0x0/0xe0
Feb 18 22:30:19 fgrbhw01 kernel:  [<c01465a6>] generic_file_aio_read+0x3b/0x42 
   [<c0162092>] do_sync_read+0xb0/0xeb
Feb 18 22:30:20 fgrbhw01 kernel:  [<c01b52bb>]
selinux_file_permission+0x10f/0x11c     [<c01347a0>]
autoremove_wake_function+0x0/0x2d
Feb 18 22:30:20 fgrbhw01 kernel:  [<c0161fe2>] do_sync_read+0x0/0xeb    
[<c0162174>] vfs_read+0xa7/0x149
Feb 18 22:30:20 fgrbhw01 kernel:  [<c0162488>] sys_read+0x3c/0x63    
[<c0103eb9>] syscall_call+0x7/0xb
Feb 18 22:30:20 fgrbhw01 kernel: f3fa3c90: redzone 1: 0x170fc2a5, redzone 2:
0xc01485d0.
Feb 18 22:30:20 fgrbhw01 kernel: slab error in cache_free_debugcheck(): cache
`sgpool-8': double free, or memory outside object was overwritten
Feb 18 22:30:20 fgrbhw01 kernel:  [<c014d8bf>] cache_free_debugcheck+0xce/0x1b9
    [<c01486cb>] mempool_free+0x5f/0x63
Feb 18 22:30:20 fgrbhw01 kernel:  [<c014e230>] kmem_cache_free+0x2a/0x5c    
[<c01486cb>] mempool_free+0x5f/0x63
Feb 18 22:30:20 fgrbhw01 kernel:  [<f8864f65>] scsi_io_completion+0x65/0x3ce
[scsi_mod]     [<f8860bb3>] scsi_finish_command+0xb8/0xbd [scsi_mod]
Feb 18 22:30:20 fgrbhw01 kernel:  [<f8860ab6>] scsi_softirq+0x109/0x128
[scsi_mod]     [<c0127098>] __do_softirq+0x58/0xc2
....


Expected results:
the drive should be recognized when plugging in
no kernel dump should occur

Additional info:
both the drive and the IEE1394 controller worked fine with previous linux
kernels ( but I am unfortunately not sure up to which release)

Comment 1 Dave Jones 2006-02-19 00:32:57 UTC

2.6.15-1.1826.2.10_FC5smp is quite old now, and a lot of change has happened in
the scsi layer recently.  Can you retry with the latest kernel? 

(people.redhat.com/davej/kernels/Fedora/devel has the latest one -- rawhide is
currently frozen until after test3 is released)

Comment 2 Ronny Bremer 2006-02-19 14:50:50 UTC

(In reply to comment #1)
Yes, did that. I upgraded to the kernel
2.6.15-1.1955_FC5smp (it might not be the one you referred me to, but I will
check that one out later)

The kernel oops still happens, but the sbp2 module now gets loaded automatically
during boot, which is an improvement :-)

I will check the other kernel and post the results.

Thanks!

Comment 3 Ronny Bremer 2006-02-19 17:16:04 UTC

downloaded latest kernel (1969)

still the same issue. anything I can get you to narrow it down?

Comment 4 Stefan Richter 2006-02-19 23:09:07 UTC

See my comments in http://marc.theaimsgroup.com/?l=linux-scsi&m=114038995826749

Comment 5 Ronny Bremer 2006-02-21 07:27:14 UTC

Stefan,

your assumption was absolutely right, passing the parameters to scsi_mod will
fix the kernel dump. Thank you!

Please close this and I will open up a new bug report, because the hotplug
system (hal) will not detect the IEEE drive being plugged in, I have to load
SBP2 manually.

Ron

Comment 6 Stefan Richter 2006-02-21 20:20:22 UTC

Ronny, thanks for testing and confirming that the workaround helps.

We wrapped it into a patch which is now in the -mm patchset. Thereby the
necessary blacklist flag is set automatically for these Initio bridges without
extra scsi_mod parameters. The patch should go to Linus soon, hopefully before
2.6.16 is released. Linus' kernel has its own bug entry now:
http://bugzilla.kernel.org/show_bug.cgi?id=6114

However the Fedora Core bug should of course _not_ be closed until the patch
made it also downstream into Fedora Core's kernel.

BTW, the Hal issue was already posted in bug 174723 . Similar bug reports were
made for udev which suffers the same regression since Linux 2.6.14.

Comment 7 Ronny Bremer 2006-02-21 23:10:25 UTC

Stefan, I agree. Please leave this bug open. I will constantly update to the
latest kernel from Dave see when it disappears.

Ron

Comment 8 Dave Jones 2006-02-24 02:26:39 UTC

Stefan, rawhide kernels are usually running Linus' -git of the day, so as soon
as it gets upstream, we should pick it up within a day.

Thanks for your input on this.

Comment 9 Stefan Richter 2006-02-24 05:23:43 UTC

Linus now has a patch which lets sbp2 suppress the problematic SCSI command if
an Initio bridge was detected. This should cover most Initio based devices. Al
Viro suggested an actual fix for sd_mod to get rid of the corruption
independently of device detection; this is not yet merged.

Comment 10 Ronny Bremer 2006-02-24 07:11:48 UTC

Ok, gonna give the KOTD a try.

Guys, great work, this is why I love open source :-) Thank you all.

Ron

Comment 11 Stefan Richter 2006-02-26 23:23:02 UTC

Complete fix for all Initio bridges and similarly broken bridges arrived in
Linus' tree now:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=489708007785389941a89fa06aedc5ec53303c96

Comment 12 Dave Jones 2006-10-16 21:36:43 UTC

A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 13 Ronny Bremer 2006-10-17 09:26:01 UTC

I tested it with the latest kernel from FC5 and it works fine now. Thank you!!!