204731 – external scsi filesystem remount read-only during remove data

Bug 204731 - external scsi filesystem remount read-only during remove data

Summary: external scsi filesystem remount read-only during remove data

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.3
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Eric Sandeen
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-08-31 10:38 UTC by Benedikt Schaefer
Modified:	2007-11-17 01:14 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-12-05 13:56:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Benedikt Schaefer 2006-08-31 10:38:59 UTC

Description of problem:
We have a scsi raid (ERQ16+) which are connected with LSI20320R SCSI Adapter to
a UNIWIDE Server (UniServer_3326). Filesystem is ext3 Partition size is 1TB.
With rsync we copy 300GB to this filesystem. After the sucessful copy we want to
delete all data with rm -rf. But this fails and the fs is remounted ro.

Error messages from /var/log/messages:
Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1): ext3_free_blocks_sb:
bit already cleared for block 16
Aug 30 14:31:33 oss3 kernel: Aborting journal on device sdb1.
Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in
ext3_reserve_inode_write: Journal has aborted
Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_truncate:
Journal has aborted
Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in
ext3_reserve_inode_write: Journal has aborted
Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_orphan_del:
Journal has aborted
Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in
ext3_reserve_inode_write: Journal has aborted
Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1) in ext3_delete_inode:
Journal has aborted
Aug 30 14:31:33 oss3 kernel: __journal_remove_journal_head: freeing b_committed_data
Aug 30 14:31:33 oss3 last message repeated 90 times
Aug 30 14:31:33 oss3 kernel: ext3_abort called.
Aug 30 14:31:33 oss3 kernel: EXT3-fs error (device sdb1): ext3_journal_start_sb:
Detected aborted journal
Aug 30 14:31:33 oss3 kernel: Remounting filesystem read-only
Aug 30 14:31:33 oss3 kernel: __journal_remove_journal_head: freeing b_committed_data
Aug 30 14:31:33 oss3 last message repeated 86 times

After "crash" e2fsck seems not help, I have to recreate the fs.

Version-Release number of selected component (if applicable):
Server:
 UNIWIDE 3326
  CPUs: 2 x Dual Core AMD Opteron(tm) Processor 870
  MEM: 2GB
  HDD: 1 x ATLAS10K4_36SCA
  SCSI: 2 x SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 08)
  
RAID: EasyRaid Q16+ with 16 x 250GB Hitachi SATA 
      Configured with 2 Raidsets each Raidset with 3 slices (900GB)

OS: RHEL4U2 kernel 2.6.9-22.0.2
    RHEL4U2 kernel 2.6.9.34

How reproducible:
Every time

Steps to Reproduce:
1. connect raid to server
2. boot server
3. mkfs.ext3
4. rsync -av /data /raid (300GB) (maybe earlier)
5. rm -rf /raid
  
Actual results:
fs remount ro

Expected results:
delete data

Additional info:
We change the raid,SCSI HBA, SCSI Terminator, SCSI Kabel and it also happens
We saw the same problem also on an other server (TYAN GT24).

Comment 1 Eric Sandeen 2006-12-04 23:11:16 UTC

Are the above messages the first errors you see?  Are there any other error
messages before this?

What is the output of e2fsck?  You say it doesn't help, what do you mean by
that, does e2fsck fail, or?

If you have the hardware, is it possible to recreate this on a different type of
storage subsystem?  (SATA drive, or different type of raid, simpler geometry,
or...)  When it fails on the other server, is it the same IO hardware? (hba,
raid etc?)

Thanks,

-Eric

Comment 2 Benedikt Schaefer 2006-12-05 07:31:21 UTC

Dear Eric, 
 
Thanks for your answer. 
We have found the error at the hardware (defect PCI Slot). 
 
best regards 
Benedikt Schaefer

Comment 3 Benedikt Schaefer 2006-12-05 07:32:02 UTC

Dear Eric, 
 
Thanks for your answer. 
We have found the error at the hardware (defect PCI Slot). 
 
best regards 
Benedikt Schaefer

Comment 4 Eric Sandeen 2006-12-05 13:56:37 UTC

Closing; hardware problem.

Note You need to log in before you can comment on or make changes to this bug.