From Bugzilla Helper: User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.13-0.5 i686) Description of problem: the summary pretty much says it all, I am seeing the problem on the two machines described below, with any filesystem I've tried - ext3, reiser, XFS (using the sgi-modified kernel kernel-smp-2.4.9-13SGI_XFS_PR1). with the 2.4.7-10 kernel, I can't reproduce. excerpts of /var/log/messages in 'additional information'. two intel stl2 dual PIII 1GB ram machines, one with a 3ware 7410, one a 7810, each flashed with the latest firmware (7.3.2), in a 3-drive raid0 configuration (one all maxtor drives, the other IBM, all 7200 rpm 40 GB). bonnie++ version 1.02a Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. bonnie++ -s 2g:64k -x 10 -d <path to 3ware partition> 2. 3. Actual Results: during the first or second bonnie++ iteration, will start going south... Additional info: these are excerpted from the run with the XFS kernel/filesystem, however the 3w-xxxx and SCSI related messages are virtually identical with the normal 2.4.9-13 kernel and reiser or ext3. details: Nov 15 12:01:15 d3 kernel: 3ware Storage Controller device driver for Linux v1.02.00.008. Nov 15 12:01:16 d3 kernel: scsi0 : Found a 3ware Storage Controller at 0x5490, IRQ: 21, P-chip: 1.3 Nov 15 12:01:16 d3 kernel: scsi0 : 3ware Storage Controller Nov 15 12:01:16 d3 kernel: Vendor: 3ware Model: 3w-xxxx Rev: 1.0 Nov 15 12:01:16 d3 kernel: Type: Direct-Access ANSI SCSI revision: 00 Nov 15 12:01:37 d3 kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Nov 15 12:01:37 d3 kernel: SCSI device sda: 241248385 512-byte hdwr sectors (123519 MB) Nov 15 12:01:37 d3 kernel: sda: sda1 Nov 15 12:01:43 d3 kernel: XFS mounting filesystem sd(8,1) <snip> Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x131070e2). Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected bits. Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x131060e2). Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected bits. Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x131060e2). Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected bits. Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x131060e2). Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected bits. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_scsi_eh_abort(): Abort failed for unknown Scsi_Cmnd 0xf6bfb400, re setting card 0. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x13173002). Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_aen_drain_queue(): Unexpected bits. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_sequence(): No attention interrupt for card 0. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x13173002). Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_aen_drain_queue(): Unexpected bits. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_sequence(): No attention interrupt for card 0. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x13173002). Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_aen_drain_queue(): Unexpected bits. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_sequence(): No attention interrupt for card 0. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_sequence(): Controller error or no attention interrupt: givi ng up for card 0. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_device_extension(): Reset sequence failed for card 0. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_scsi_eh_abort(): Reset failed for card 0. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x13133002). Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_aen_read_queue(): Unexpected bits. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_interrupt(): Error reading aen queue. Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x13107002). Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_interrupt(): Unexpected bits. Nov 15 12:07:25 d3 kernel: 3w-xxxx: tw_scsi_eh_abort(): Abort failed for unknown Scsi_Cmnd 0xf6bfb600, re setting card 0. <snip, much repetition > Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047088 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047096 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047216 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047344 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047472 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047600 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047728 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047856 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4047984 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4048112 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4048240 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4048368 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4048496 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4048624 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4048752 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4048880 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4049008 Nov 15 12:10:30 d3 kernel: : dev 08:01, sector 4098032 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4098160 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4098288 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4098416 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4098544 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4098672 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4098800 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4098928 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4099056 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4099184 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4099312 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4099440 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4099568 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4099696 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4099824 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4099952 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4100080 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4100208 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4100336 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4100464 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4100592 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4100720 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4100848 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4100976 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4101104 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4101232 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4101360 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4101488 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4101616 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4101744 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4101872 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4102000 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4102128 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4102256 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4102384 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4102512 Nov 15 12:10:30 d3 kernel: I/O error: dev 08:01, sector 4102640 <snip, much repetition > Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046960 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046968 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046832 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046840 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046064 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046072 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4045936 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4045944 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284264 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284272 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284008 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284016 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284232 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284240 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284008 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046704 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046712 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046576 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046584 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046448 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046456 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046320 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 4046328 Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284152 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284160 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 284024 Nov 15 12:10:31 d3 kernel: I/O error: dev 08:01, sector 117440928 Nov 15 12:10:31 d3 kernel: I/O error in filesystem ("sd(8,1)") meta-data dev 0x801 block 0x70001a0 Nov 15 12:10:31 d3 kernel: ("xlog_iodone") error 5 buf count 1536 Nov 15 12:10:31 d3 kernel: xfs_force_shutdown(sd(8,1),0x2) called from line 939 of file xfs_log.c. Return address = 0xc01bde3e Nov 15 12:10:31 d3 kernel: Log I/O Error Detected. Shutting down filesystem: sd(8,1) Nov 15 12:10:31 d3 kernel: Please umount the filesystem, and rectify the problem(s)
To get things clear: This error shows up in the Red Hat kernel as well, not just in the SGI kernel ? How much memory do you have ?
yes, I am testing with the *redhat* 2.4.9-13smp (rh7.2 update) and 2.4.7-10smp (rh7.2) kernels. I use these kernels when using just reiser or ext3. the XFS kernels are those same exact RH kernels, patched and tested with XFS. sorry to confuse with XFS, it was just the last set of error logs I had easy access to (really, aside from fs-specific messages, the logs look the same every time). each machine has 1 GB ram (as is mentioned above).
Ok can you try adding "mem=800M" to boot/grub and see if it still fails then ?
ok, with 800M, it has now passed three bonnie++ iterations successfully, no errors logged. I don't think it ever got this far before. (using 2.4.9-13 and reiser right now).
ok, nearly 20 iterations run on the two machines now with 800 mb, and no problems.
It turns out that the adaptec 2400A IDE raid card I am testing suffers from the same problem - just harder to trigger. that is, with the same exact hardware and software configuration (except for the card), it fails occasionally, but not when using just 800 mb ram. so, this is a highmem problem? any prospects of a fix? :)
This could be a hardware error possibly. I saw similar errors to the first part of your log on 2.4.13smp with a supermicro P3TDE6 with dual PIIIs, serverworks chipset. Adam Radford at 3ware has said that the error is a "PCI reset" error. I would email the log to linux. Switching to an AMD tigerMP fixed it for me, but was not an ideal solution. The error also only happened for me when multiple 3ware cards were installed. I have never seen errors similar to the second part of your log, could be because I didn't use XFS. I don't know if this helps or confuses the issue, since reducing the RAM to 800 seemed to fix it.