63740 – Problems with ieee1394/sbp2 (systematic partition table wipe-out)

Bug 63740 - Problems with ieee1394/sbp2 (systematic partition table wipe-out)

Summary: Problems with ieee1394/sbp2 (systematic partition table wipe-out)

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	9
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-04-18 07:53 UTC by Alfredo Ferrari
Modified:	2008-08-01 16:22 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:39:31 UTC
Embargoed:

Attachments	(Terms of Use)

Description Alfredo Ferrari 2002-04-18 07:53:02 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.9-31enterprise i686)

Description of problem:
This is a difficult bug. The machine is a DELL Latitude C800 with the following
hardware connected to the ieee1394 port:
- a LaCie 30 GB hard drive
- a Fujitsu DynaMO 1300FE MagnetoOptical drive (chained to the LaCie disk)

I successfully ran the LaCie disk for months under RH7.2. I am able to run the
MO disk alone apparently with no problem (to be verified, I have not stressed
too much this configuration). As soon as I put both in operation I got hard
machine crashes under RH7.2 (kernel-2.4.9-31) roughly 1 boot out of 2. The
crashes occurred when inserting iee1394->ohci1394->sbp2 and make the machine
reboot.

I moved to skipjack2 (actually the kernel, glibc, initscrpits, hotplug, mount
and friends, I cannot afford to jeopardize too much the machine). The situation
improved, I am able to use the MO drive and the hard disk together even for
hours and hundreds of MBytes of transfers... however almost every day at some
point the kernel stops to recognize correctly the MO disks (they are blocked at
2048 bytes) and starts to complain that all operations on block 0 (the partition
table) results in short read/write. There is no way to recover the disk
apparently.... however mounting it on superblock 16384 it is fine and perfectly
ok, e2fsck'ing it using block 16384 is ok, but for a systematic error message
that block 0 cannot be restored.
I was thinking about disk hardware failures maybe triggered by the MO driver,
however:

- the same disk inserted into an identical driver (but SCSI not Firewire) is ok,
but the partition table which is corrupted, e2fscking from superblock 16384
restores it perfectly provided I dd if=/dev/null the 1st block of the disk
before (this operation does not work on the Firewire driver resulting in an I/O
error, short read/write). At this point the disk works again flawlessly into the
original firewire drive, until some voodoo triggers again the partition table
corruption. This occurs for both 640 MBytes and 1300 MBytes disks, both with
ext2 or ext3 partitions, both with real partitions, or with "superfloppy"
format.
Needless to say the drive works flawlessly under Windows ME exactly in the same
configuration. I have the feeling that the coexistence of the two drives
sometimes triggers a situation where the kernel "forgets" that the drive has a
2048 blockage. The problems occurs typically
a) during scsi bus rescans (but not always)
b) during iee1394 resets (ie removing and inserting sbp2)
   sometimes I get kernel oops also in this condition
c) when the hard disk is mounted
d) after several hours of inactivity
e) when the MO disk is unmounted

never during normal operations (read/write of files)

None of the above situations is 100% reproducible, however 
in 10 days, using that machine only in the evening, I got at least 10 hard hangs
and more than 20 partition tables spoiled.
I am happy to help carrying out some further tests if you have any good
suggestions.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. get both a hard disk and a MO firewire drives
2. make some gymnastics with them
3. the MO drive partition table gets corrupted
	

Actual Results:  The MO partition table gets corrupted randomly

Expected Results:  no problem

Additional info:

Comment 1 Alfredo Ferrari 2002-05-28 15:37:50 UTC

... I got no feedback. I want just to confirm that the problem is still there
in RH7.3 with kernel-2.4.18-4... and even worse (it systematically wipes out
the .journal inode on my ext3 partitions).

Comment 2 Bugzilla owner 2004-09-30 15:39:31 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.