Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 587664 - Partition locks after heavy writing
Partition locks after heavy writing
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
13
i686 Linux
low Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-30 10:03 EDT by John J. McDonough
Modified: 2010-04-30 17:57 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-04-30 17:48:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg immediately after error (36.28 KB, text/plain)
2010-04-30 14:57 EDT, John J. McDonough
no flags Details

  None (edit)
Description John J. McDonough 2010-04-30 10:03:34 EDT
Description of problem:
When doing a task involving heavy writing, for example, backing up.  Task aborts with "Read-only file system" message after a gigbyte or so has been written.  Error occurs with tar, cp and rsync, ext3 and ext4 filesystems.

After the error occurs the partition is no longer useable.  Cannot be mounted, gparted doesn't see device. Reboot allow partition to be mountable again, fsck shows no errors.  Have seen this on two partitions on two different physical drives.

Version-Release number of selected component (if applicable):
2.6.33.2-57

How reproducible:
reproduceable

Steps to Reproduce:
1. Perform a task involving a lot of writing such as rsync
2.
3.
  
Actual results:
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "/media/oslash/Cimbaoth/Cimbaoth/jjmcd-091022.tar.gz": Read-only file system (30)
rsync error: error in file IO (code 11) at receiver.c(302) [receiver=3.0.7]
rsync: connection unexpectedly closed (34 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(601) [sender=3.0.7]

similar errors occur with cp, tar.


Expected results:
complete

Additional info:
Comment 1 Eric Sandeen 2010-04-30 12:13:55 EDT
dmesg will contain the errors that caused the filesystem to go readonly; can you attach that please?

Given that ext4 & ext3 both fail, I'm wondering if it could be an IO error due to a hardware problem, although if you've seen it on 2 different devices...

Anyway, dmesg after the error should offer more clues.  Please just attach the whole thing, to avoid editing out anything relevant.

Thanks,
-Eric
Comment 2 John J. McDonough 2010-04-30 14:57:11 EDT
Created attachment 410571 [details]
dmesg immediately after error

Previously the failure would occur after less than 2G of transfer.  This time I got more like 5G before the failure.
Comment 3 Eric Sandeen 2010-04-30 15:14:04 EDT
I don't see any messages related to the filesystem going readonly in dmesg, or any other storage errors for that matter.

Which device is causing the problem?
Comment 4 John J. McDonough 2010-04-30 15:57:44 EDT
I initially saw the error on /dev/sdd1

Since I was concerned about that drive as a result of the messages, I started
backing things up to /dev/sdc2 with the same result.

Once I saw it on another partition, I began to get suspicious of F13.  The
drives had previously performed without issue on F10, and this error began
shortly after doing a clean install of F13.  /dev/sdd was formatted on F10. 
/dev/sdd may have been F10 or possibly earlier.  The OS itself is running from
/dev/sda FWIW.
Comment 5 John J. McDonough 2010-04-30 15:58:51 EDT
Sorry /dev/sdC may have been F10 or earlier. sdd was definitely F10.
Comment 6 Eric Sandeen 2010-04-30 16:00:07 EDT
The thing is, if ext3 or ext4 -really- went readonly as rsync is saying, it would have been due to some error that would show up in the logs.  I'm stumped about why we don't see that.

You say the partitions aren't even visible after this happens?

> Cannot be mounted, gparted doesn't see device.

what happens when you try to mount it?  What does the kernel say when you try?
Something else is going on here ...
Comment 7 John J. McDonough 2010-04-30 17:48:36 EDT
When you try a mount, it says:
mount: you must specify the filesystem type

This occurs with any partition on the device once the error has occurred.

I am about 99% convinced that it is hardware, and it's occurrence when I installed F13 is just coincidence.

Since F10 had worked well before, I burned a Live F10 CD and tried the same copy from it, and it failed (the message was slightly different but seemed to be telling me the same thing).

I then did essentially the same copy on F13 to a partition on a drive on a different controller, and it went OK.  That is only one success, but since the lone success is a different controller, that makes the controller pretty suspicious.

I'm going to go ahead and close this as NOTABUG.  It will take me a bit to round up a new controller, but since most of the system is on another controller I can limp along for a while.

Thanks for the help and sorry about the false alarm.
Comment 8 Eric Sandeen 2010-04-30 17:57:30 EDT
Ok, thanks for the followup, and good luck with the hardware :)

-Eric

Note You need to log in before you can comment on or make changes to this bug.