Bug 442621 - [firewire] device offlined: OOPS i/o error, mark_buffer_dirty in fs/buffer.c
[firewire] device offlined: OOPS i/o error, mark_buffer_dirty in fs/buffer.c
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
9
All Linux
low Severity low
: ---
: ---
Assigned To: Jarod Wilson
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-15 16:27 EDT by Andrew Farris
Modified: 2014-10-01 01:40 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-14 11:38:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg-oops-mark-buffer-dirty.txt (49.10 KB, text/plain)
2008-04-15 16:27 EDT, Andrew Farris
no flags Details
var-log-messages-oops-mark-buffer-dirty.txt (36.21 KB, text/plain)
2008-04-15 16:28 EDT, Andrew Farris
no flags Details

  None (edit)
Description Andrew Farris 2008-04-15 16:27:06 EDT
Description of problem:
OOPS i/o error, mark_buffer_dirty in fs/buffer.c
This oops occurred after the system was up and running for 2 days, running
folding@home and seeding torrents (only 30kbps up).  The load is low for I/O,
and nothing should have been writing to or reading from this disk at the time. 
The FAH client and torrents are not stored on this device.

The effected filesystem is ext3 created originally by F6 or F7.  The attached
dmesg shows many i/o failures to offline device before this oops.

Version-Release number of selected component (if applicable):
2.6.25-0.218.rc8.git7.fc9.i686

How reproducible:
Unknown

Steps to Reproduce:
No idea, the device offlined itself.


Additional info:

 12:57:17 up 2 days,  9:45,  3 users,  load average: 1.39, 1.40, 1.22

#-> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/luks-sdb4
                       15G  5.3G  8.8G  38% /
/dev/mapper/luks-sdb2
                       20G   14G  5.4G  72% /home
/dev/sdb1             342M   54M  272M  17% /boot
tmpfs                 506M   24K  506M   1% /tmp
tmpfs                 506M   48K  506M   1% /dev/shm
/dev/sdd1              20G   12G  7.4G  61% /media/extarc
/dev/sdd3              76G   67G  8.7G  89% /media/archive
/dev/sdd2              20G   14G  5.8G  71% /media/blackhole
/dev/sda1              38G   27G   11G  71% /media/disk

#-> cat /etc/mtab
/dev/mapper/luks-sdb4 / ext4dev rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/mapper/luks-sdb2 /home ext4dev rw 0 0
/dev/sdb1 /boot ext3 rw 0 0
tmpfs /tmp tmpfs rw 0 0
tmpfs /dev/shm tmpfs rw 0 0
/dev/sdd1 /media/extarc ext3 rw,noexec,nosuid,nodev 0 0
/dev/sdd3 /media/archive vfat
rw,noexec,nosuid,nodev,shortname=lower,fmask=0013,dmask=0002,gid=555 0 0
/dev/sdd2 /media/blackhole fuseblk
rw,noexec,nosuid,nodev,noatime,allow_other,default_permissions,blksize=1024 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
/dev/sda1 /media/disk fuseblk rw,nosuid,nodev,noatime,allow_other,blksize=4096 0 0


/var/log/messages:
Apr 15 12:44:10 localhost kernel: FAT: Directory bread(block 77245) failed
Apr 15 12:44:10 localhost kernel: sd 3:0:0:0: rejecting I/O to offline device
Apr 15 12:44:10 localhost kernel: EXT3-fs error (device sdd1): ext3_find_entry:
reading directory #2 offset 0
Apr 15 12:44:10 localhost kernel: sd 3:0:0:0: rejecting I/O to offline device
Apr 15 12:44:10 localhost kernel: Buffer I/O error on device sdd1, logical block 0
Apr 15 12:44:10 localhost kernel: lost page write due to I/O error on sdd1
Apr 15 12:44:10 localhost kernel: sd 3:0:0:0: rejecting I/O to offline device
Apr 15 12:44:10 localhost kernel: EXT3-fs error (device sdd1): ext3_find_entry:
reading directory #2 offset 0
Apr 15 12:44:10 localhost kernel: ------------[ cut here ]------------
Apr 15 12:44:10 localhost kernel: WARNING: at fs/buffer.c:1183
mark_buffer_dirty+0x23/0x6a() (Not tainted)
Apr 15 12:44:10 localhost kernel: Modules linked in: appletalk autofs4
smsc47m192 hwmon_vid hwmon sunrpc ipt_REJECT nf_conntrack_ipv4 iptable_filter
ip_tables ip6t_REJECT xt_tcpudp xt_limit nf_conntrack_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables x_tables ipv6 fuse vfat fat ext3 jbd dm_multipath
firewire_sbp2 snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul
snd_emu10k1 snd_rawmidi snd_seq_dummy arc4 dcdbas ecb snd_intel8x0
snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
snd_mixer_oss floppy pcspkr snd_pcm serio_raw rt2500pci rt2x00pci rt2x00lib
snd_seq_device rfkill snd_timer usb_storage snd_page_alloc input_polldev
snd_util_mem mac80211 snd_hwdep snd 3c59x firewire_ohci tulip cfg80211
firewire_core mii soundcore crc_itu_t emu10k1_gp eeprom_93cx6 gameport button
iTCO_wdt iTCO_vendor_support i2c_i801 joydev i2c_core intel_rng sg sr_mod cdrom
ata_generic pata_acpi ata_piix libata sd_mod scsi_mod sha256_generic cbc
aes_i586 aes_generic dm_crypt crypto_blkcipher dm
Apr 15 12:44:10 localhost kernel: _snapshot dm_zero dm_mirror dm_mod ext4dev
jbd2 mbcache crc16 uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Apr 15 12:44:10 localhost kernel: Pid: 3986, comm: gvfsd-trash Not tainted
2.6.25-0.218.rc8.git7.fc9.i686 #1
Apr 15 12:44:10 localhost kernel:  [warn_on_slowpath+71/115]
warn_on_slowpath+0x47/0x73
Apr 15 12:44:10 localhost kernel:  [vt_console_print+51/646] ?
vt_console_print+0x33/0x286
Apr 15 12:44:10 localhost kernel:  [default_wake_function+11/13] ?
default_wake_function+0xb/0xd
Apr 15 12:44:10 localhost kernel:  [__wake_up_common+53/91] ?
__wake_up_common+0x35/0x5b
Apr 15 12:44:10 localhost kernel:  [vt_console_print+0/646] ?
vt_console_print+0x0/0x286
Apr 15 12:44:10 localhost kernel:  [__call_console_drivers+86/99] ?
__call_console_drivers+0x56/0x63
Apr 15 12:44:10 localhost kernel:  [_call_console_drivers+87/91] ?
_call_console_drivers+0x57/0x5b
Apr 15 12:44:10 localhost kernel:  [release_console_sem+416/424] ?
release_console_sem+0x1a0/0x1a8
Apr 15 12:44:10 localhost kernel:  [mark_buffer_dirty+35/106]
mark_buffer_dirty+0x23/0x6a
Apr 15 12:44:10 localhost kernel:  [<f91e1c36>] ext3_commit_super+0x40/0x53 [ext3]
Apr 15 12:44:10 localhost kernel:  [<f91e3076>] ext3_handle_error+0x71/0x95 [ext3]
Apr 15 12:44:10 localhost kernel:  [<f91e3129>] ext3_error+0x39/0x43 [ext3]
Apr 15 12:44:10 localhost kernel:  [<f91dffdc>] ext3_find_entry+0x3a1/0x52b [ext3]
Apr 15 12:44:10 localhost kernel:  [inode_has_perm+91/101] ?
inode_has_perm+0x5b/0x65
Apr 15 12:44:10 localhost kernel:  [<f91e088d>] ext3_lookup+0x25/0xa2 [ext3]
Apr 15 12:44:10 localhost kernel:  [do_lookup+161/320] do_lookup+0xa1/0x140
Apr 15 12:44:10 localhost kernel:  [__link_path_walk+2204/3324]
__link_path_walk+0x89c/0xcfc
Apr 15 12:44:10 localhost kernel:  [rb_insert_color+86/192] ?
rb_insert_color+0x56/0xc0
Apr 15 12:44:10 localhost kernel:  [mntput_no_expire+22/105] ?
mntput_no_expire+0x16/0x69
Apr 15 12:44:10 localhost kernel:  [path_walk+76/155] path_walk+0x4c/0x9b
Apr 15 12:44:10 localhost kernel:  [do_path_lookup+391/464]
do_path_lookup+0x187/0x1d0
Apr 15 12:44:10 localhost kernel:  [__user_walk_fd+47/67] __user_walk_fd+0x2f/0x43
Apr 15 12:44:10 localhost kernel:  [vfs_lstat_fd+22/61] vfs_lstat_fd+0x16/0x3d
Apr 15 12:44:10 localhost kernel:  [autoremove_wake_function+0/51] ?
autoremove_wake_function+0x0/0x33
Apr 15 12:44:10 localhost kernel:  [selinux_file_permission+256/262] ?
selinux_file_permission+0x100/0x106
Apr 15 12:44:10 localhost kernel:  [vfs_lstat+17/19] vfs_lstat+0x11/0x13
Apr 15 12:44:10 localhost kernel:  [sys_lstat64+20/40] sys_lstat64+0x14/0x28
Apr 15 12:44:10 localhost kernel:  [audit_syscall_entry+249/291] ?
audit_syscall_entry+0xf9/0x123
Apr 15 12:44:10 localhost kernel:  [do_syscall_trace+294/365] ?
do_syscall_trace+0x126/0x16d
Apr 15 12:44:10 localhost kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Apr 15 12:44:10 localhost kernel:  =======================
Apr 15 12:44:10 localhost kernel: ---[ end trace c85abf95dc3347d0 ]---
Apr 15 12:44:10 localhost kernel: sd 3:0:0:0: rejecting I/O to offline device
Apr 15 12:44:10 localhost kernel: Buffer I/O error on device sdd1, logical block 0
Apr 15 12:44:10 localhost kernel: lost page write due to I/O error on sdd1
Apr 15 12:44:22 localhost kerneloops: Submitted 1 kernel oopses to
www.kerneloops.org
Comment 1 Andrew Farris 2008-04-15 16:27:06 EDT
Created attachment 302519 [details]
dmesg-oops-mark-buffer-dirty.txt
Comment 2 Andrew Farris 2008-04-15 16:28:18 EDT
Created attachment 302520 [details]
var-log-messages-oops-mark-buffer-dirty.txt

Shows the timestamping for the I/O failure events.
Comment 3 Andrew Farris 2008-04-15 16:37:04 EDT
The disk is still working after journal recovery, and then I ran which fsck
modified the filesystem but I didn't get the verbose output for that.
Comment 4 Chuck Ebbert 2008-04-18 01:18:33 EDT
Your firewire hard drive went offline earlier -- that is the real error:

firewire_sbp2: fw1.0: sbp2_scsi_abort
firewire_sbp2: fw1.0: sbp2_scsi_abort
sd 3:0:0:0: Device offlined - not ready after error recovery
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK

Those are the only messages I could find.
Comment 5 Jarod Wilson 2008-04-18 10:56:07 EDT
Huh. Yeah, not a whole lot to go on there, the bus appears to be completely
silent until the sbp2_scsi_aborts hit...

Judging from the GUID of fw0, you're hooking to the FireWire port on your Audigy
sound card, which has no known issues. I'm mildly concerned about the GUID of
the disk though. Its reporting a vendor of Xerox, which I'm assuming is
incorrect. That being the case, there may well be a firmware update available
for your device that is worth applying for that and other reasons.

Was there anything interesting environmentally that happened around the time of
the failure? Power fluctuation? Disk decided to suspend itself? Cosmic rays?
Comment 6 Andrew Farris 2008-04-19 23:06:37 EDT
You are correct it is connected to the Audigy FW port, which has functioned
pretty well for several years now both in linux and windows.  This is the same
device that has prior issues though, see bug 428554 (was a powerup login
failure, firewire_sbp2: orb reply timed out).

The device is actually an I-Rocks external USB2/FW enclosure with a Maxtor 120Gb
IDE internal drive inside it.  I'm not aware if there is any firmware that can
be uploaded to the enclosure bits, there may be.

I do not think there were any environmental issues involved, the machine is
running off a APC UPS (500VA) which didn't sound off any low power conditions. 
I felt no cosmic rays, but I may have been dozing off at the time. ;)
Comment 7 Stefan Richter 2008-04-26 05:52:45 EDT
These are of course two different bugs:
  - Disk going offline for no apparent reason.
  - ext3 throwing a bug after the partition became inaccessible.

Could you have a look at the chips on the IDE bridge board in the enclosure?  It
would be interesting to know what chips are used.

Can you repeat the 1st bug?  For example, try the following while the disk is
not mounted (to avoid data corruption and the ext3 bug):
# dd if=/dev/disk/by-id/ieee1394*0000 of=/dev/null

If this doesn't work, mount the disk and read files from it e.g. with
# find /path/to/the/filesystem -type f -exec cat {} \; > /dev/null

If you found a way to reliably reproduce the offlining, try
# echo 1 > /sys/module/firewire_sbp2/parameters/workarounds
Then plug the disk back in and check if you get the IO errors again.
Comment 8 Stefan Richter 2008-04-26 10:44:52 EDT
> These are of course two different bugs:
>  - Disk going offline for no apparent reason.
>  - ext3 throwing a bug after the partition became inaccessible.

On the latter:
http://www.kerneloops.org/searchweek.php?search=mark_buffer_dirty
Comment 9 Jarod Wilson 2008-05-01 11:36:53 EDT
Also, if cracking open the case to look at the bridge chip is impractical,
attaching /sys/bus/firewire/devices/fwX/config_rom here might give enough info
to tell what's in there. (Where X is the fw device # matching the disk).
Comment 10 Andrew Farris 2008-05-01 16:20:31 EDT
Stefan/Jarod, sorry for not getting back to this quickly, I'd be happy to open it and get that info, I'm an EE 
major so it shouldn't be a problem, but I've been under a heavy load this last week.  I will get to this.  The 
error hasn't shown up again yet, and the system has been up for 12 days under load, so while I do want to 
look into why its not the most pressing thing just now.
Comment 11 Bug Zapper 2008-05-14 05:30:16 EDT
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 12 Andrew Farris 2008-09-17 15:10:36 EDT
Stefan, I finally remembered this bug and decided to open up the drive.  I haven't had issues with this particular problem in awhile but I have not had the drive connected to the Audigy card, instead its been used with my macbook's firewire, and seems to operate fine there with my rawhide install (so far, I rarely turn it on).

The USB2/Firewire to IDE bridge is an AGERE FW802B, the full code is:
AGERE FW802B 65518433 05021.

That desktop system with the Audigy has been packed away awhile since moving out of my college town in June, but I do plan to revisit this in the future and try those debugging suggestions.
Comment 13 Stefan Richter 2008-09-17 15:42:24 EDT
Thanks for the info.  Agere FW802B is only a so-called PHY though (FireWire physical layer).  There is another chip on the bridge board which implements the IDE bridge, which from the FireWire point of view would be a so-called link layer controller, possibly combined with an USB-IDE bridge.

The behaviour of a FireWire disk is of course be influenced by the PHY, by the link (with its SBP-2 firmware), and by the IDE or SATA drive.  But if there are device quirks, then they are most likely caused by the link and its firmware because the link is more complex than the PHY and is developed and tested less rigorously than the drive which is produced in much larger quantities.
Comment 14 Stefan Richter 2008-11-07 17:59:51 EST
> These are of course two different bugs:
>  - Disk going offline for no apparent reason.

If this is not reproducible, let's ignore it.

>  - ext3 throwing a bug after the partition became inaccessible.

It seems this warning still exists in upstream 2.6.27.5 and 2.6.28-rc3.
Comment 15 Bug Zapper 2009-06-09 20:12:50 EDT
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 16 Bug Zapper 2009-07-14 11:38:49 EDT
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.