Description of problem: We have a scsi raid (ERQ16+) which are connected with LSI20320R SCSI Adapter to a UNIWIDE Server (UniServer_3326). Filesystem is ext3 Partition size is 1TB. With rsync we copy 300GB to this filesystem. After the sucessful copy we want to delete all data with rm -rf. But this fails and the fs is remounted ro. Error messages from /var/log/messages: Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1): ext3_free_blocks_sb: bit already cleared for block 16 Aug 30 14:31:33 oss3 kernel: Aborting journal on device sdb1. Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_truncate: Journal has aborted Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_orphan_del: Journal has aborted Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_delete_inode: Journal has aborted Aug 30 14:31:33 oss3 kernel: __journal_remove_journal_head: freeing b_committed_data Aug 30 14:31:33 oss3 last message repeated 90 times Aug 30 14:31:33 oss3 kernel: ext3_abort called. Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal Aug 30 14:31:33 oss3 kernel: Remounting filesystem read-only Aug 30 14:31:33 oss3 kernel: __journal_remove_journal_head: freeing b_committed_data Aug 30 14:31:33 oss3 last message repeated 86 times After "crash" e2fsck seems not help, I have to recreate the fs. Version-Release number of selected component (if applicable): Server: UNIWIDE 3326 CPUs: 2 x Dual Core AMD Opteron(tm) Processor 870 MEM: 2GB HDD: 1 x ATLAS10K4_36SCA SCSI: 2 x SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) RAID: EasyRaid Q16+ with 16 x 250GB Hitachi SATA Configured with 2 Raidsets each Raidset with 3 slices (900GB) OS: RHEL4U2 kernel 2.6.9-22.0.2 RHEL4U2 kernel 2.6.9.34 How reproducible: Every time Steps to Reproduce: 1. connect raid to server 2. boot server 3. mkfs.ext3 4. rsync -av /data /raid (300GB) (maybe earlier) 5. rm -rf /raid Actual results: fs remount ro Expected results: delete data Additional info: We change the raid,SCSI HBA, SCSI Terminator, SCSI Kabel and it also happens We saw the same problem also on an other server (TYAN GT24).
Are the above messages the first errors you see? Are there any other error messages before this? What is the output of e2fsck? You say it doesn't help, what do you mean by that, does e2fsck fail, or? If you have the hardware, is it possible to recreate this on a different type of storage subsystem? (SATA drive, or different type of raid, simpler geometry, or...) When it fails on the other server, is it the same IO hardware? (hba, raid etc?) Thanks, -Eric
Dear Eric, Thanks for your answer. We have found the error at the hardware (defect PCI Slot). best regards Benedikt Schaefer
Closing; hardware problem.