Bug 1271863

Summary: Direct-Access ATA SSD is intolerably slow on MacBookPro11,3
Product: [Fedora] Fedora Reporter: Alex Markley <alex>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 22CC: fs, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, stephen
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-19 19:57:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Thunderbird Test iostat (30 second intervals)
none
Thunderbird Test iotop (30 second intervals)
none
bonnie++ benchmark test output (5 iterations)
none
bonnie++ benchmark test iostat (30 second interval)
none
Testing Kernel 4.2.3-200.fc22.x86_64 none

Description Alex Markley 2015-10-14 22:48:37 UTC
Created attachment 1083041 [details]
Thunderbird Test iostat (30 second intervals)

Description of problem:

The hardware is a macbookpro11,3. (Also known as: MacBook Pro (Retina, 15-inch, Mid 2014)) This unit has a 1TB SSD connected to the PCIe bus with a maximum theoretical throughput of 8gbps.

On Fedora 22, fully updated to the latest stable kernel, normal desktop operations are very very slow. For example, it takes over 15 seconds for Rhythmbox to start up. (In my previous experience, applications like Rhythmbox would start almost instantly.)

Some of my basic troubleshooting has pointed to very slow I/O on the SSD as the culprit.


Version-Release number of selected component (if applicable):

----------------
[alex@vertex ~]$ uname -a
Linux vertex.malexmedia.net 4.1.10-200.fc22.x86_64 #1 SMP Mon Oct 5 14:22:49 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[alex@vertex ~]$
[alex@vertex ~]$ cat /etc/redhat-release
Fedora release 22 (Twenty Two)
[alex@vertex ~]$
----------------


How reproducible:

100% (every boot)


Steps to Reproduce:
1. Boot / Login
2. Perform action requiring I/O (including read-heavy from /)
3. Observe high IO Wait in iostat and top.


Actual results:

Magical and revolutionary Apple hardware fails to perform adequately.


Expected results:

Magical and revolutionary Apple hardware should sail like the wind.


Additional info:


Here are some relevant lines from my kernel's dmesg.
----------------
[root@vertex ~]# dmesg | egrep 'ata1|scsi|sda1'
[    0.500754] scsi host0: ahci
[    0.500801] ata1: SATA max UDMA/133 abar m8192@0xc1b00000 port 0xc1b00100 irq 16
[    0.807997] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    0.808303] ata1.00: unexpected _GTF length (8)
[    0.808453] ata1.00: ATA-8: APPLE SSD SM1024F, UXM6JA1Q, max UDMA/133
[    0.808456] ata1.00: 1954210120 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    0.808771] ata1.00: unexpected _GTF length (8)
[    0.808923] ata1.00: configured for UDMA/133
[    0.809078] scsi 0:0:0:0: Direct-Access     ATA      APPLE SSD SM1024 JA1Q PQ: 0 ANSI: 5
[    0.809347] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    0.811715]  sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7
[    1.293776] scsi host1: usb-storage 2-4:1.0
[    2.298048] scsi 1:0:0:0: Direct-Access     APPLE    SD Card Reader   3.00 PQ: 0 ANSI: 6
[    2.298319] sd 1:0:0:0: Attached scsi generic sg1 type 0
[root@vertex ~]#
----------------


Here's the lspci output inspecting the SSD attached to my system.
----------------
[root@vertex ~]# lspci -vv -s 05:00.0
05:00.0 SATA controller: Samsung Electronics Co Ltd Apple PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 256 bytes
	Interrupt: pin A routed to IRQ 16
	Region 5: Memory at c1b00000 (32-bit, non-prefetchable) [size=8K]
	Expansion ROM at c1b10000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/2 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Connection timed out
		Not readable
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [150 v1] Power Budgeting <?>
	Capabilities: [160 v1] Latency Tolerance Reporting
		Max snoop latency: 71680ns
		Max no snoop latency: 71680ns
	Kernel driver in use: ahci

[root@vertex ~]#
----------------


This is a snippet of an iostat report showing 30 seconds of data accumulated during a desktop I/O test.

During the test, the system was booted and logged in to Gnome 3. The only running application was Thunderbird, and it was downloading my message archive in the background. (My network link was not saturated.)

You can see the latency is very high here. And what you can't tell from this bug report is that this I/O activity slows my entire desktop to a crawl. In order to have a usable desktop, I have to put Thunderbird in offline mode or kill it altogether! (Again, "usable" is a matter of some debate, as that simply brings us down to a baseline of merely tens of seconds to open basic system applications, rather than minutes.)

The rest of the data from the Thunderbird test is attached to this bugzilla.
----------------
10/14/2015 01:54:35 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.67    0.00    0.25   22.84    0.00   76.24

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fedora--lvm-root     0.00     0.00    0.00    0.17     0.00     0.67     8.00     0.16  963.40    0.00  963.40 482.20   8.04
sda7              0.00     0.00    0.00    8.83     0.00   289.60    65.57     1.61  186.53    0.00  186.53  94.23  83.24
 ALL              0.00     0.00    0.00    9.00     0.00   290.27    64.50     1.77  200.91    0.00  200.91 101.42  30.43
----------------


Since my /home partition (sda7, ZFS) could have been skewing the results, and since Gnome3 isn't exactly lightweight, I also ran a couple of benchmarking tests with bonnie++ against the root partition (LVM, ext4) at the command line with NO graphical desktop running.

The results of those tests are also attached.


Please let me know what other information I can gather to troubleshoot this problem? Are there any kernel versions/patches I can test?

Thanks!
--Alex Markley

Comment 1 Alex Markley 2015-10-14 22:49:40 UTC
Created attachment 1083042 [details]
Thunderbird Test iotop (30 second intervals)

Comment 2 Alex Markley 2015-10-14 22:50:27 UTC
Created attachment 1083043 [details]
bonnie++ benchmark test output (5 iterations)

Comment 3 Alex Markley 2015-10-14 22:51:25 UTC
Created attachment 1083044 [details]
bonnie++ benchmark test iostat (30 second interval)

Comment 4 Alex Markley 2015-10-16 16:13:07 UTC
I can confirm this problem is still occurring with 4.2.3-200.fc22.x86_64 tested today. (See new attachment...)

Comment 5 Alex Markley 2015-10-16 16:15:46 UTC
Created attachment 1083759 [details]
Testing Kernel 4.2.3-200.fc22.x86_64

Showing high wait times and low throughput while launching Rhythmbox, Pidgin, and Thunderbird.

Comment 6 Frank Sweetser 2015-11-05 16:58:49 UTC
I'm having similar performance problem.  Dell Optiplex 9010 workstation with a pair of Samsung 840 Pro SSDs in a software raid 1 configuration.  If I'm running under any Fedora 4.2 kernel, my write performance starts off normal, but within five minutes or so of booting, drops down to no more than 5 - 10 MB/s.

If I revert to a 4.1 kernel, everything goes back to normal (> 250MB/s).

Comment 7 Fedora End Of Life 2016-07-19 19:57:57 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.