587664 – Partition locks after heavy writing

Bug 587664 - Partition locks after heavy writing

Summary: Partition locks after heavy writing

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	13
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-04-30 14:03 UTC by John J. McDonough
Modified:	2010-04-30 21:57 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-04-30 21:48:36 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg immediately after error (36.28 KB, text/plain) 2010-04-30 18:57 UTC, John J. McDonough	no flags	Details
View All

Description John J. McDonough 2010-04-30 14:03:34 UTC

Description of problem:
When doing a task involving heavy writing, for example, backing up.  Task aborts with "Read-only file system" message after a gigbyte or so has been written.  Error occurs with tar, cp and rsync, ext3 and ext4 filesystems.

After the error occurs the partition is no longer useable.  Cannot be mounted, gparted doesn't see device. Reboot allow partition to be mountable again, fsck shows no errors.  Have seen this on two partitions on two different physical drives.

Version-Release number of selected component (if applicable):
2.6.33.2-57

How reproducible:
reproduceable

Steps to Reproduce:
1. Perform a task involving a lot of writing such as rsync
2.
3.
  
Actual results:
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "/media/oslash/Cimbaoth/Cimbaoth/jjmcd-091022.tar.gz": Read-only file system (30)
rsync error: error in file IO (code 11) at receiver.c(302) [receiver=3.0.7]
rsync: connection unexpectedly closed (34 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(601) [sender=3.0.7]

similar errors occur with cp, tar.


Expected results:
complete

Additional info:

Comment 1 Eric Sandeen 2010-04-30 16:13:55 UTC

dmesg will contain the errors that caused the filesystem to go readonly; can you attach that please?

Given that ext4 & ext3 both fail, I'm wondering if it could be an IO error due to a hardware problem, although if you've seen it on 2 different devices...

Anyway, dmesg after the error should offer more clues.  Please just attach the whole thing, to avoid editing out anything relevant.

Thanks,
-Eric

Comment 2 John J. McDonough 2010-04-30 18:57:11 UTC

Created attachment 410571 [details]
dmesg immediately after error

Previously the failure would occur after less than 2G of transfer.  This time I got more like 5G before the failure.

Comment 3 Eric Sandeen 2010-04-30 19:14:04 UTC

I don't see any messages related to the filesystem going readonly in dmesg, or any other storage errors for that matter.

Which device is causing the problem?

Comment 4 John J. McDonough 2010-04-30 19:57:44 UTC

I initially saw the error on /dev/sdd1

Since I was concerned about that drive as a result of the messages, I started
backing things up to /dev/sdc2 with the same result.

Once I saw it on another partition, I began to get suspicious of F13.  The
drives had previously performed without issue on F10, and this error began
shortly after doing a clean install of F13.  /dev/sdd was formatted on F10. 
/dev/sdd may have been F10 or possibly earlier.  The OS itself is running from
/dev/sda FWIW.

Comment 5 John J. McDonough 2010-04-30 19:58:51 UTC

Sorry /dev/sdC may have been F10 or earlier. sdd was definitely F10.

Comment 6 Eric Sandeen 2010-04-30 20:00:07 UTC

The thing is, if ext3 or ext4 -really- went readonly as rsync is saying, it would have been due to some error that would show up in the logs.  I'm stumped about why we don't see that.

You say the partitions aren't even visible after this happens?

> Cannot be mounted, gparted doesn't see device.

what happens when you try to mount it?  What does the kernel say when you try?
Something else is going on here ...

Comment 7 John J. McDonough 2010-04-30 21:48:36 UTC

When you try a mount, it says:
mount: you must specify the filesystem type

This occurs with any partition on the device once the error has occurred.

I am about 99% convinced that it is hardware, and it's occurrence when I installed F13 is just coincidence.

Since F10 had worked well before, I burned a Live F10 CD and tried the same copy from it, and it failed (the message was slightly different but seemed to be telling me the same thing).

I then did essentially the same copy on F13 to a partition on a drive on a different controller, and it went OK.  That is only one success, but since the lone success is a different controller, that makes the controller pretty suspicious.

I'm going to go ahead and close this as NOTABUG.  It will take me a bit to round up a new controller, but since most of the system is on another controller I can limp along for a while.

Thanks for the help and sorry about the false alarm.

Comment 8 Eric Sandeen 2010-04-30 21:57:30 UTC

Ok, thanks for the followup, and good luck with the hardware :)

-Eric

Note You need to log in before you can comment on or make changes to this bug.