Bug 904703

Summary: cpio died on signal 11 when doing business on faulty SDHC card.
Product: [Fedora] Fedora Reporter: Endre "Hrebicek" Balint-Nagy <endre>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: benl, gansalmon, itamar, jonathan, kdudka, kernel-maint, madhu.chinakonda, ovasik, praiskup
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 911102 (view as bug list) Environment:
Last Closed: 2013-02-22 02:30:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 911102    
Attachments:
Description Flags
The abrt directory created from coredump. none

Description Endre "Hrebicek" Balint-Nagy 2013-01-27 06:26:16 UTC
Created attachment 688302 [details]
The abrt directory created from coredump.

Description of problem:
When the system "unplugged" my faulty Kingston 4GB class 4 Mobility kit, cpio died as in BZ618526. (BTW Stupid abrt was unwilling to submit the bug because of this closed F13 bug.)
A find command was running in parallel, causing the disconnect.
The "unplugging" happened this way:
[701422.408693] EXT4-fs (sdd2): previous I/O error to superblock detected
[701422.408709] EXT4-fs error (device sdd2): ext4_readdir:172: inode #11: comm find: path /media/piroot/lost+found: directory contains a hole at offset 0
[701422.546536] sdd: detected capacity change from 3904897024 to 0
[701545.623478] usb 6-2.4: USB disconnect, device number 34
[701547.333370] usb 6-2.4: new full-speed USB device number 35 using uhci_hcd
The lost of the device happened after it reported zero size.
Un-plugging and re-plugging returned the device.
I expect cpio is dying when the read fails because of the disk size reduced to 0. 

Version-Release number of selected component (if applicable):
cpio-2.11-12.fc18.x86_64

How reproducible:
Almost always.
Steps to Reproduce:
1.Repeat the find|cpio pipeline with a problematic disk-like device or the flakey dm device. (I will sketch a reproducer later.)
2.
3.
  
Actual results:
cpio is killed by signal 11

Expected results:
IO-error reported.

Additional info:

Comment 1 Endre "Hrebicek" Balint-Nagy 2013-01-27 08:57:30 UTC
After some pondering and playing with the dm-flakey module, I decided it is not the cpio-s fault if a MMAP-ed page abruptly disappears (The most likely scenario of the problem.) . This is a kernel issue, and the possible error handling scenarios should be discussed with the kernel developers.

Endre/Ondras

Comment 2 Ondrej Vasik 2013-01-28 10:45:34 UTC
Moving to kernel based on reporter's comment #1

Comment 3 Endre "Hrebicek" Balint-Nagy 2013-01-30 00:28:21 UTC
Some more background. On my acer aspire I have a 3.6.7-acer kernel, I mean an official 3.6.7 kernel with cut-down config for this machine produced a different
symptom:
sd 5:0:0:0: [sdb]
Add. Sense: Medium not present
sd 5:0:0:0: [sdb] CDB: 
Read(10): 28 00 00 01 e4 60 00 00 08 00
EXT3-fs error (device sdb2): ext3_get_inode_loc: unable to read inode block - inode=404, block=140
sd 5:0:0:0: [sdb] Device not ready
sd 5:0:0:0: [sdb]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdb]  
Sense Key : Not Ready [current] 
sd 5:0:0:0: [sdb]  
Add. Sense: Medium not present
In the end all 3 partitions remained mounted while the media disappeared.
A bit more sensible behavior.
After clearing the 3rd partition and cleaning the /etc/shadow sitting in the second one I can pass the faulty hardware to any developer located in Brno (Czech Republic) to study the phenomenon.
I think reading media not present as size change to 0 is to not  distinguishing between semantically very different error conditions, some corrections are necessary I bet.

Endre/Ondraš.

P.S.
The offending kernel is 3.7.4-204.fc18.x86_64 and its cut-down version 3.7.4-pinky.

Comment 4 Endre "Hrebicek" Balint-Nagy 2013-01-31 12:50:42 UTC
I did the cleaning, so ready to handover the faulty SDHC card if you promise to not open the package. (As you know, almost impossible to clean completely an SDHC card, so some private data is surely still present. I mean disassembling the SDHC card.)
Endre/Ondras.

Comment 5 Endre "Hrebicek" Balint-Nagy 2013-02-02 01:15:10 UTC
I am going home to Hungary, I come back to Brno 6th February late evening.
Cheers till then!

Endre/Ondraš

P.S. Gnome3 drives me mad. Till 7th I hope mate gets into shape.

Comment 6 Endre "Hrebicek" Balint-Nagy 2013-02-22 02:30:13 UTC
As using a better SDHC card reader is always a solution, better not to waste time on a such simplified hardware handling the error poorly.
WORKAROUND: use a better SDHC card reader.
(I know normally to my role to close a bug, but wasting more human capacities on this is unnecessary.)