904703 – cpio died on signal 11 when doing business on faulty SDHC card.

Bug 904703 - cpio died on signal 11 when doing business on faulty SDHC card.

Summary: cpio died on signal 11 when doing business on faulty SDHC card.

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	18
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	911102
TreeView+	depends on / blocked

Reported:	2013-01-27 06:26 UTC by Endre "Hrebicek" Balint-Nagy
Modified:	2014-09-24 01:29 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Clones:	911102 (view as bug list)
Environment:
Last Closed:	2013-02-22 02:30:13 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
The abrt directory created from coredump. (59.96 KB, application/x-compressed-tar) 2013-01-27 06:26 UTC, Endre "Hrebicek" Balint-Nagy	no flags	Details
View All

Description Endre "Hrebicek" Balint-Nagy 2013-01-27 06:26:16 UTC

Created attachment 688302 [details]
The abrt directory created from coredump.

Description of problem:
When the system "unplugged" my faulty Kingston 4GB class 4 Mobility kit, cpio died as in BZ618526. (BTW Stupid abrt was unwilling to submit the bug because of this closed F13 bug.)
A find command was running in parallel, causing the disconnect.
The "unplugging" happened this way:
[701422.408693] EXT4-fs (sdd2): previous I/O error to superblock detected
[701422.408709] EXT4-fs error (device sdd2): ext4_readdir:172: inode #11: comm find: path /media/piroot/lost+found: directory contains a hole at offset 0
[701422.546536] sdd: detected capacity change from 3904897024 to 0
[701545.623478] usb 6-2.4: USB disconnect, device number 34
[701547.333370] usb 6-2.4: new full-speed USB device number 35 using uhci_hcd
The lost of the device happened after it reported zero size.
Un-plugging and re-plugging returned the device.
I expect cpio is dying when the read fails because of the disk size reduced to 0. 

Version-Release number of selected component (if applicable):
cpio-2.11-12.fc18.x86_64

How reproducible:
Almost always.
Steps to Reproduce:
1.Repeat the find|cpio pipeline with a problematic disk-like device or the flakey dm device. (I will sketch a reproducer later.)
2.
3.
  
Actual results:
cpio is killed by signal 11

Expected results:
IO-error reported.

Additional info:

Comment 1 Endre "Hrebicek" Balint-Nagy 2013-01-27 08:57:30 UTC

After some pondering and playing with the dm-flakey module, I decided it is not the cpio-s fault if a MMAP-ed page abruptly disappears (The most likely scenario of the problem.) . This is a kernel issue, and the possible error handling scenarios should be discussed with the kernel developers.

Endre/Ondras

Comment 2 Ondrej Vasik 2013-01-28 10:45:34 UTC

Moving to kernel based on reporter's comment #1

Comment 3 Endre "Hrebicek" Balint-Nagy 2013-01-30 00:28:21 UTC

Some more background. On my acer aspire I have a 3.6.7-acer kernel, I mean an official 3.6.7 kernel with cut-down config for this machine produced a different
symptom:
sd 5:0:0:0: [sdb]
Add. Sense: Medium not present
sd 5:0:0:0: [sdb] CDB: 
Read(10): 28 00 00 01 e4 60 00 00 08 00
EXT3-fs error (device sdb2): ext3_get_inode_loc: unable to read inode block - inode=404, block=140
sd 5:0:0:0: [sdb] Device not ready
sd 5:0:0:0: [sdb]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdb]  
Sense Key : Not Ready [current] 
sd 5:0:0:0: [sdb]  
Add. Sense: Medium not present
In the end all 3 partitions remained mounted while the media disappeared.
A bit more sensible behavior.
After clearing the 3rd partition and cleaning the /etc/shadow sitting in the second one I can pass the faulty hardware to any developer located in Brno (Czech Republic) to study the phenomenon.
I think reading media not present as size change to 0 is to not  distinguishing between semantically very different error conditions, some corrections are necessary I bet.

Endre/Ondraš.

P.S.
The offending kernel is 3.7.4-204.fc18.x86_64 and its cut-down version 3.7.4-pinky.

Comment 4 Endre "Hrebicek" Balint-Nagy 2013-01-31 12:50:42 UTC

I did the cleaning, so ready to handover the faulty SDHC card if you promise to not open the package. (As you know, almost impossible to clean completely an SDHC card, so some private data is surely still present. I mean disassembling the SDHC card.)
Endre/Ondras.

Comment 5 Endre "Hrebicek" Balint-Nagy 2013-02-02 01:15:10 UTC

I am going home to Hungary, I come back to Brno 6th February late evening.
Cheers till then!

Endre/Ondraš

P.S. Gnome3 drives me mad. Till 7th I hope mate gets into shape.

Comment 6 Endre "Hrebicek" Balint-Nagy 2013-02-22 02:30:13 UTC

As using a better SDHC card reader is always a solution, better not to waste time on a such simplified hardware handling the error poorly.
WORKAROUND: use a better SDHC card reader.
(I know normally to my role to close a bug, but wasting more human capacities on this is unnecessary.)

Note You need to log in before you can comment on or make changes to this bug.