Bug 1024002

Summary: SSD has been wiped after kernel update
Product: [Fedora] Fedora Reporter: Mykola Dvornik <mykola.dvornik>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 20CC: collura, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.12.5-302.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-18 11:02:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mykola Dvornik 2013-10-28 14:44:52 UTC
Description of problem:

SSD GPT got wiped after kernel update.

Version-Release number of selected component (if applicable):

Fedora 20 + rawhide kernel 3.12rc5 (NODEBUG)

How reproducible:

Not sure.

Steps to Reproduce:

1. Install Fedora 20 with stock kernel (3.11)
2. Install rawhide kernel from NODEBUG repo.
3. Reboot

Actual results:

SSD GPT got wiped.

Expected results:

No corruption.

Additional info:

UEFI, no secure boot, Crucial M500 SSD, XFS filesystem for everything except of EFI partition. The same kernel works fine on conventional HDD with MBR and EXT4 partitions.

Comment 1 Josh Boyer 2013-10-28 15:02:44 UTC
Can you define at what point it got wiped?  Did it happen before you rebooted into the new kernel, or after?

Comment 2 Mykola Dvornik 2013-10-28 15:28:26 UTC
I have updated and continued working on the machine. 

Then, I have rebooted the machine once and everything seemed to be OK. 

After the next reboot I've got a blank screen right after POST (SSD was wiped). So it is probably happened during shutdown syncing. 

The XFS partitions were mounted with 'default,noatime,discard' options.

Comment 3 Mykola Dvornik 2013-11-09 12:31:51 UTC
John, 

I have reproduced the bug again! The same routine, while disk scheme was ext4 + MBR, but not XFS + GPT. So I would not expect it to be on the file system level, but rather in SSD-specific things, e.g. TRIM support, etc. It seems like SandForce SSDs are not afected (have intel 520 at work on the same kernel). So the bottom line is that marvel-based SSDs got wiped by 3.12. At the moment I am limited to iPod only communication, so I would be greatfull if you can advertise this bug to the kernel developers and Linux community.

Thanks in advance.

Mykola

Comment 4 Mykola Dvornik 2013-11-13 15:16:41 UTC
Just wondering, was the 'Queued TRIM' support ever tested?

Comment 5 Mykola Dvornik 2013-12-17 18:42:33 UTC
Josh,

Marc Carino committed a workaround for affected SSDs. It is already merged into libata/for-3.13-fixes

https://git.kernel.org/cgit/linux/kernel/git/tj/libata.git/commit/?h=for-3.13-fixes&id=f78dea064c5f7de07de4912a6e5136dbc443d614 

I have an impression that you guys want to push 3.12 soon. Should it be avoided until the workaround is backported to stable?

Comment 6 Josh Boyer 2013-12-17 18:50:05 UTC
We can add that patch in Fedora before it hits upstream.  We'll try to make sure the patch is included before it hits stable updates.

Comment 7 Josh Boyer 2013-12-17 18:53:06 UTC
Patch applied.

Comment 8 Fedora Update System 2013-12-18 04:49:15 UTC
kernel-3.12.5-302.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/FEDORA-2013-23445/kernel-3.12.5-302.fc20

Comment 9 Mykola Dvornik 2013-12-18 11:01:40 UTC
With this version I cannot reproduce the problem any more. I guess the issue should be reopened once MU04 firmware will appear for Crucial/Micron M500. So if MU04 firmware will fix the queued TRIM support, then Marc's patches should be reverted.

Comment 10 Fedora Update System 2013-12-21 02:23:11 UTC
kernel-3.12.5-302.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.