Description of problem: when plugging in the IEEE drive into the SB firewire port only the message "kernel: ieee1394: Error parsing configrom for node 0-00:1023" is displayed in the system log. After manually loading SBP2 module, the attached dump happens. Even though I can access the data on the drive, it should not generate the kernel dump and also should be detected automatically. Hint: the drive was detected without problems during the installation phase of Fedora core 5 test 2. Version-Release number of selected component (if applicable): Kernel 2.6.15-1.1826.2.10_FC5smp HW: Audigy 4 Pro Soundblaster device lspci -v output: 02:0b.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04) Subsystem: Creative Labs Unknown device 2007 Flags: bus master, medium devsel, latency 64, IRQ 193 I/O ports at e400 [size=64] Capabilities: [dc] Power Management version 2 02:0b.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04) (prog-if 10 [OHCI]) Subsystem: Creative Labs SB Audigy FireWire Port Flags: bus master, medium devsel, latency 64, IRQ 217 Memory at fbffb800 (32-bit, non-prefetchable) [size=2K] Memory at fbff4000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 attached drive: FREECOM 350GB Firewire hdd with two partitions How reproducible: Always Steps to Reproduce: 1. boot the system 2. plug in the drive 3. load sbp2 Actual results: after step 2 the error message " kernel: ieee1394: Error parsing configrom for node 0-00:1023" is shown but nothing else happens after step 3 the following dump will be logged: Feb 18 22:30:17 fgrbhw01 kernel: sbp2: $Rev: 1306 $ Ben Collins <bcollins> Feb 18 22:30:17 fgrbhw01 kernel: ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) Feb 18 22:30:17 fgrbhw01 kernel: ieee1394: sbp2: Try serialize_io=0 for better performance Feb 18 22:30:17 fgrbhw01 kernel: scsi2 : SCSI emulation for IEEE-1394 SBP-2 Devices Feb 18 22:30:17 fgrbhw01 kernel: ieee1394: sbp2: Node 0-00:1023: Using 36byte inquiry workaround Feb 18 22:30:18 fgrbhw01 kernel: ieee1394: sbp2: Logged into SBP-2 device Feb 18 22:30:18 fgrbhw01 kernel: Vendor: Initio Model: 0KLAT80 Rev: 2.05 Feb 18 22:30:18 fgrbhw01 kernel: Type: Direct-Access ANSI SCSI revision: 00 Feb 18 22:30:18 fgrbhw01 kernel: SCSI device sdb: 781422768 512-byte hdwr sectors (400088 MB) Feb 18 22:30:18 fgrbhw01 kernel: slab error in cache_free_debugcheck(): cache `sgpool-8': double free, or memory outside object was overwritten Feb 18 22:30:18 fgrbhw01 kernel: [<c014d8bf>] cache_free_debugcheck+0xce/0x1b9 [<c01486cb>] mempool_free+0x5f/0x63 Feb 18 22:30:18 fgrbhw01 kernel: [<c014e230>] kmem_cache_free+0x2a/0x5c [<c01486cb>] mempool_free+0x5f/0x63 Feb 18 22:30:18 fgrbhw01 kernel: [<f8864f65>] scsi_io_completion+0x65/0x3ce [scsi_mod] [<f8860bb3>] scsi_finish_command+0xb8/0xbd [scsi_mod] Feb 18 22:30:18 fgrbhw01 kernel: [<f8860ab6>] scsi_softirq+0x109/0x128 [scsi_mod] [<c0127098>] __do_softirq+0x58/0xc2 Feb 18 22:30:18 fgrbhw01 kernel: [<c0105f75>] do_softirq+0x46/0x4e Feb 18 22:30:18 fgrbhw01 kernel: ======================= Feb 18 22:30:18 fgrbhw01 kernel: [<c0105e9a>] do_IRQ+0x72/0x7b [<c01048fe>] common_interrupt+0x1a/0x20 Feb 18 22:30:18 fgrbhw01 kernel: [<f88c940b>] ext3_get_block_handle+0x0/0x2a5 [ext3] [<f88c9714>] ext3_get_block+0x64/0x6c [ext3] Feb 18 22:30:18 fgrbhw01 kernel: [<f88c9f0f>] ext3_bmap+0x0/0x6d [ext3] [<c0165dec>] generic_block_bmap+0x28/0x35 Feb 18 22:30:18 fgrbhw01 kernel: [<c02f599a>] io_schedule+0x26/0x30 [<c02f5cd3>] out_of_line_wait_on_bit_lock+0x75/0x7d Feb 18 22:30:18 fgrbhw01 kernel: [<c01631d3>] sync_buffer+0x0/0x33 [<f88c9f75>] ext3_bmap+0x66/0x6d [ext3] Feb 18 22:30:18 fgrbhw01 kernel: [<f88c96b0>] ext3_get_block+0x0/0x6c [ext3] [<f88c9f0f>] ext3_bmap+0x0/0x6d [ext3] Feb 18 22:30:18 fgrbhw01 kernel: [<c0178e14>] bmap+0x23/0x27 [<f88961e9>] journal_bmap+0x1d/0x64 [jbd] Feb 18 22:30:18 fgrbhw01 kernel: [<c01347cd>] wake_bit_function+0x0/0x3c [<c014d9a2>] cache_free_debugcheck+0x1b1/0x1b9 Feb 18 22:30:18 fgrbhw01 kernel: [<f88961bd>] journal_next_log_block+0x74/0x83 [jbd] [<f889623f>] journal_get_descriptor_buffer+0xf/0x8d [jbd] Feb 18 22:30:19 fgrbhw01 kernel: [<f8893709>] journal_commit_transaction+0x61c/0xdbf [jbd] [<c02f6269>] _spin_lock_irqsave+0x9/0xd Feb 18 22:30:19 fgrbhw01 kernel: [<c012a32b>] try_to_del_timer_sync+0x44/0x4a [<f88959aa>] kjournald+0xbd/0x20e [jbd] Feb 18 22:30:19 fgrbhw01 kernel: [<c011d4c9>] schedule_tail+0x36/0x8b [<f88958e8>] commit_timeout+0x0/0x5 [jbd] Feb 18 22:30:19 fgrbhw01 kernel: [<c01347a0>] autoremove_wake_function+0x0/0x2d [<f88958ed>] kjournald+0x0/0x20e [jbd] Feb 18 22:30:19 fgrbhw01 kernel: [<c01023a9>] kernel_thread_helper+0x5/0xb Feb 18 22:30:19 fgrbhw01 kernel: f3fa3888: redzone 1: 0x170fc2a5, redzone 2: 0xc01485d0. Feb 18 22:30:19 fgrbhw01 kernel: slab error in cache_alloc_debugcheck_after(): cache `sgpool-8': double free, or memory outside object was overwritten Feb 18 22:30:19 fgrbhw01 kernel: [<c014dd0e>] cache_alloc_debugcheck_after+0x87/0x128 [<c014e03c>] kmem_cache_alloc+0x7d/0x86 Feb 18 22:30:19 fgrbhw01 kernel: [<c01485d0>] mempool_alloc+0x37/0xd3 [<c01485d0>] mempool_alloc+0x37/0xd3 Feb 18 22:30:19 fgrbhw01 kernel: [<f88600c3>] __scsi_get_command+0x13/0x57 [scsi_mod] [<f8865321>] scsi_init_io+0x53/0xf6 [scsi_mod] Feb 18 22:30:19 fgrbhw01 kernel: [<f8865652>] scsi_prep_fn+0x150/0x1cc [scsi_mod] [<c01c830a>] elv_next_request+0x79/0x13d Feb 18 22:30:19 fgrbhw01 kernel: [<f8865778>] scsi_request_fn+0x5b/0x30a [scsi_mod] [<c012a2e1>] del_timer+0x41/0x47 Feb 18 22:30:19 fgrbhw01 kernel: [<c01ca11f>] __generic_unplug_device+0x1d/0x1f [<c01ca136>] generic_unplug_device+0x15/0x21 Feb 18 22:30:19 fgrbhw01 kernel: [<f88a34f9>] dm_table_unplug_all+0x22/0x2b [dm_mod] [<f88a18c3>] dm_unplug_all+0x17/0x21 [dm_mod] Feb 18 22:30:19 fgrbhw01 kernel: [<c01ca14e>] blk_backing_dev_unplug+0xc/0xd [<c01631fe>] sync_buffer+0x2b/0x33 Feb 18 22:30:19 fgrbhw01 kernel: [<c02f5b6a>] __wait_on_bit+0x33/0x58 [<c01631d3>] sync_buffer+0x0/0x33 Feb 18 22:30:19 fgrbhw01 kernel: [<c02f5c04>] out_of_line_wait_on_bit+0x75/0x7d [<c01631d3>] sync_buffer+0x0/0x33 Feb 18 22:30:19 fgrbhw01 kernel: slab error in cache_free_debugcheck(): cache `sgpool-8': double free, or memory outside object was overwritten Feb 18 22:30:19 fgrbhw01 kernel: [<c01347cd>] wake_bit_function+0x0/0x3c [<c014d8bf>] cache_free_debugcheck+0xce/0x1b9 [<c01486cb>] mempool_free+0x5f/0x63 Feb 18 22:30:19 fgrbhw01 kernel: [<c0163255>] __wait_on_buffer+0x1c/0x1f Feb 18 22:30:19 fgrbhw01 kernel: [<c014e230>] [<c0166086>] kmem_cache_free+0x2a/0x5c sync_dirty_buffer+0xa4/0xd6 [<c01486cb>] [<f88930e1>] mempool_free+0x5f/0x63 Feb 18 22:30:19 fgrbhw01 kernel: [<f8864f65>] journal_write_commit_record+0xd5/0xe1 [jbd] Feb 18 22:30:19 fgrbhw01 kernel: scsi_io_completion+0x65/0x3ce [scsi_mod] [<f8893ab9>] journal_commit_transaction+0x9cc/0xdbf [jbd] [<f8860bb3>] scsi_finish_command+0xb8/0xbd [scsi_mod] Feb 18 22:30:19 fgrbhw01 kernel: [<c02f6269>] _spin_lock_irqsave+0x9/0xd Feb 18 22:30:19 fgrbhw01 kernel: [<f8860ab6>] scsi_softirq+0x109/0x128 [scsi_mod] [<f88959aa>] [<c0127098>] kjournald+0xbd/0x20e [jbd] __do_softirq+0x58/0xc2 Feb 18 22:30:19 fgrbhw01 kernel: [<c0105f75>] do_softirq+0x46/0x4e Feb 18 22:30:19 fgrbhw01 kernel: [<c011d4c9>] ======================= Feb 18 22:30:19 fgrbhw01 kernel: schedule_tail+0x36/0x8b Feb 18 22:30:19 fgrbhw01 kernel: [<c0105e9a>] do_IRQ+0x72/0x7b [<f88958e8>] [<c01048fe>] common_interrupt+0x1a/0x20 Feb 18 22:30:19 fgrbhw01 kernel: commit_timeout+0x0/0x5 [jbd] [<c01347a0>] autoremove_wake_function+0x0/0x2d Feb 18 22:30:19 fgrbhw01 kernel: [<c0155053>] do_no_page+0x98/0x2af [<f88958ed>] kjournald+0x0/0x20e [jbd] [<c01023a9>] kernel_thread_helper+0x5/0xbFeb 18 22:30:19 fgrbhw01 kernel: f3fa3934: redzone 1: 0x6b6b6b6b, redzone 2: 0x5a2cf071. Feb 18 22:30:19 fgrbhw01 kernel: [<c01554cf>] __handle_mm_fault+0x16f/0x28b Feb 18 22:30:19 fgrbhw01 kernel: [<c02f7109>] do_page_fault+0x17f/0x5e2 [<c0107c16>] do_syscall_trace+0x57/0x14a Feb 18 22:30:19 fgrbhw01 kernel: [<c02f6f8a>] do_page_fault+0x0/0x5e2 [<c0104a57>] error_code+0x4f/0x54 Feb 18 22:30:19 fgrbhw01 kernel: f3fa3c90: redzone 1: 0x170fc2a5, redzone 2: 0xc01485d0. Feb 18 22:30:19 fgrbhw01 kernel: sdb: got wrong page Feb 18 22:30:19 fgrbhw01 kernel: sdb: assuming drive cache: write through Feb 18 22:30:19 fgrbhw01 kernel: SCSI device sdb: 781422768 512-byte hdwr sectors (400088 MB) Feb 18 22:30:19 fgrbhw01 kernel: slab error in cache_free_debugcheck(): cache `sgpool-8': double free, or memory outside object was overwritten Feb 18 22:30:19 fgrbhw01 kernel: [<c014d8bf>] cache_free_debugcheck+0xce/0x1b9 [<c01486cb>] mempool_free+0x5f/0x63 Feb 18 22:30:19 fgrbhw01 kernel: [<c014e230>] kmem_cache_free+0x2a/0x5c [<c01486cb>] mempool_free+0x5f/0x63 Feb 18 22:30:19 fgrbhw01 kernel: [<f8864f65>] scsi_io_completion+0x65/0x3ce [scsi_mod] [<f8860bb3>] scsi_finish_command+0xb8/0xbd [scsi_mod] Feb 18 22:30:19 fgrbhw01 kernel: [<f8860ab6>] scsi_softirq+0x109/0x128 [scsi_mod] [<c0127098>] __do_softirq+0x58/0xc2 Feb 18 22:30:19 fgrbhw01 kernel: [<c0105f75>] do_softirq+0x46/0x4e Feb 18 22:30:19 fgrbhw01 kernel: ======================= Feb 18 22:30:19 fgrbhw01 kernel: [<c0105e9a>] do_IRQ+0x72/0x7b [<c01048fe>] common_interrupt+0x1a/0x20 Feb 18 22:30:19 fgrbhw01 kernel: [<c011c388>] kunmap_atomic+0x57/0x5d [<c0149649>] buffered_rmqueue+0x193/0x1c3 Feb 18 22:30:19 fgrbhw01 kernel: [<c01497a4>] get_page_from_freelist+0x90/0xa8 [<c0149823>] __alloc_pages+0x67/0x275 Feb 18 22:30:19 fgrbhw01 kernel: [<c01593cb>] anon_vma_prepare+0x9b/0xbf [<c0154df8>] do_anonymous_page+0x63/0x226 Feb 18 22:30:19 fgrbhw01 kernel: [<c01554b3>] __handle_mm_fault+0x153/0x28b [<c02f7109>] do_page_fault+0x17f/0x5e2 Feb 18 22:30:19 fgrbhw01 kernel: [<c02f6f8a>] do_page_fault+0x0/0x5e2 [<c0104a57>] error_code+0x4f/0x54 Feb 18 22:30:19 fgrbhw01 kernel: [<c01462fe>] file_read_actor+0x27/0xe0 [<c0146011>] do_generic_mapping_read+0x1ac/0x472 Feb 18 22:30:19 fgrbhw01 kernel: [<c0146524>] __generic_file_aio_read+0x16d/0x1b4 [<c01462d7>] file_read_actor+0x0/0xe0 Feb 18 22:30:19 fgrbhw01 kernel: [<c01465a6>] generic_file_aio_read+0x3b/0x42 [<c0162092>] do_sync_read+0xb0/0xeb Feb 18 22:30:20 fgrbhw01 kernel: [<c01b52bb>] selinux_file_permission+0x10f/0x11c [<c01347a0>] autoremove_wake_function+0x0/0x2d Feb 18 22:30:20 fgrbhw01 kernel: [<c0161fe2>] do_sync_read+0x0/0xeb [<c0162174>] vfs_read+0xa7/0x149 Feb 18 22:30:20 fgrbhw01 kernel: [<c0162488>] sys_read+0x3c/0x63 [<c0103eb9>] syscall_call+0x7/0xb Feb 18 22:30:20 fgrbhw01 kernel: f3fa3c90: redzone 1: 0x170fc2a5, redzone 2: 0xc01485d0. Feb 18 22:30:20 fgrbhw01 kernel: slab error in cache_free_debugcheck(): cache `sgpool-8': double free, or memory outside object was overwritten Feb 18 22:30:20 fgrbhw01 kernel: [<c014d8bf>] cache_free_debugcheck+0xce/0x1b9 [<c01486cb>] mempool_free+0x5f/0x63 Feb 18 22:30:20 fgrbhw01 kernel: [<c014e230>] kmem_cache_free+0x2a/0x5c [<c01486cb>] mempool_free+0x5f/0x63 Feb 18 22:30:20 fgrbhw01 kernel: [<f8864f65>] scsi_io_completion+0x65/0x3ce [scsi_mod] [<f8860bb3>] scsi_finish_command+0xb8/0xbd [scsi_mod] Feb 18 22:30:20 fgrbhw01 kernel: [<f8860ab6>] scsi_softirq+0x109/0x128 [scsi_mod] [<c0127098>] __do_softirq+0x58/0xc2 .... Expected results: the drive should be recognized when plugging in no kernel dump should occur Additional info: both the drive and the IEE1394 controller worked fine with previous linux kernels ( but I am unfortunately not sure up to which release)
2.6.15-1.1826.2.10_FC5smp is quite old now, and a lot of change has happened in the scsi layer recently. Can you retry with the latest kernel? (people.redhat.com/davej/kernels/Fedora/devel has the latest one -- rawhide is currently frozen until after test3 is released)
(In reply to comment #1) Yes, did that. I upgraded to the kernel 2.6.15-1.1955_FC5smp (it might not be the one you referred me to, but I will check that one out later) The kernel oops still happens, but the sbp2 module now gets loaded automatically during boot, which is an improvement :-) I will check the other kernel and post the results. Thanks!
downloaded latest kernel (1969) still the same issue. anything I can get you to narrow it down?
See my comments in http://marc.theaimsgroup.com/?l=linux-scsi&m=114038995826749
Stefan, your assumption was absolutely right, passing the parameters to scsi_mod will fix the kernel dump. Thank you! Please close this and I will open up a new bug report, because the hotplug system (hal) will not detect the IEEE drive being plugged in, I have to load SBP2 manually. Ron
Ronny, thanks for testing and confirming that the workaround helps. We wrapped it into a patch which is now in the -mm patchset. Thereby the necessary blacklist flag is set automatically for these Initio bridges without extra scsi_mod parameters. The patch should go to Linus soon, hopefully before 2.6.16 is released. Linus' kernel has its own bug entry now: http://bugzilla.kernel.org/show_bug.cgi?id=6114 However the Fedora Core bug should of course _not_ be closed until the patch made it also downstream into Fedora Core's kernel. BTW, the Hal issue was already posted in bug 174723 . Similar bug reports were made for udev which suffers the same regression since Linux 2.6.14.
Stefan, I agree. Please leave this bug open. I will constantly update to the latest kernel from Dave see when it disappears. Ron
Stefan, rawhide kernels are usually running Linus' -git of the day, so as soon as it gets upstream, we should pick it up within a day. Thanks for your input on this.
Linus now has a patch which lets sbp2 suppress the problematic SCSI command if an Initio bridge was detected. This should cover most Initio based devices. Al Viro suggested an actual fix for sd_mod to get rid of the corruption independently of device detection; this is not yet merged.
Ok, gonna give the KOTD a try. Guys, great work, this is why I love open source :-) Thank you all. Ron
Complete fix for all Initio bridges and similarly broken bridges arrived in Linus' tree now: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=489708007785389941a89fa06aedc5ec53303c96
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
I tested it with the latest kernel from FC5 and it works fine now. Thank you!!!