1463000 – Input/output error using NVMe SSD storage on i3 AWS instances

Bug 1463000 - Input/output error using NVMe SSD storage on i3 AWS instances

Summary: Input/output error using NVMe SSD storage on i3 AWS instances

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	25
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-19 20:54 UTC by Joe Doss
Modified:	2017-09-22 05:22 UTC (History)
CC List:	11 users (show)
Fixed In Version:	kernel-4.12.12-300.fc26 kernel-4.12.13-200.fc25
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1490770 (view as bug list)
Environment:
Last Closed:	2017-09-22 05:22:35 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Joe Doss 2017-06-19 20:54:36 UTC

Description of problem:

When using NVMe SSD storage on i3 AWS instances produces errors and file corruption. Seems very much related to  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 which states that CONFIG_XEN_BALLOON=y is the cause of this issue.

Version-Release number of selected component (if applicable):

4.11.5-200.fc25.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Spin up any i3 AWS instance. 
2. Setup /dev/nvme0n1 with LUKS, LVM and XFS/EXT4 and mount it.
3. Put the disk under a large amount of I/O

Actual results:

[ 2110.011229] EXT4-fs warning (device dm-1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 2298478592 size 8388608 starting block 637696)
[ 2110.011259] Buffer I/O error on device dm-1, logical block 637696
[ 2110.017068] Buffer I/O error on device dm-1, logical block 637697
[ 2110.021814] Buffer I/O error on device dm-1, logical block 637698
[ 2110.026815] Buffer I/O error on device dm-1, logical block 637699
[ 2110.031674] Buffer I/O error on device dm-1, logical block 637700
[ 2110.036626] Buffer I/O error on device dm-1, logical block 637701
[ 2110.041988] Buffer I/O error on device dm-1, logical block 637702
[ 2110.046759] Buffer I/O error on device dm-1, logical block 637703
[ 2110.051321] Buffer I/O error on device dm-1, logical block 637704
[ 2110.056081] Buffer I/O error on device dm-1, logical block 637705
[ 2110.678357] blk_update_request: I/O error, dev nvme0n1, sector 5197504

Expected results:

Fast I/O without errors. 

Additional info:

Setting memhp_default_state=offline in the grub.conf doesn't seem to correct the issue as detailed in the Ubuntu bug report.

Comment 1 Joe Doss 2017-06-20 21:24:24 UTC

I built a kernel (4.11.6-300.nvme.fc26.x86_64) with CONFIG_XEN_BALLOON disabled and I ran the I/O tests on an i3.xlarge instance with Fedora 26 installed. It has the same drive setup as before (LUKS, LVM and XFS) and I am no longer seeing the blk_update_request: I/O error, dev nvme0n1 errors. 

The kernel is up on copr (https://copr.fedorainfracloud.org/coprs/jdoss/kernel/) if anyone else encounters this issue on AWS i3 instances while the kernel team figures out how to work around this issue without totally disabling CONFIG_XEN_BALLOON by default.

Comment 2 Dusty Mabe 2017-06-23 18:48:21 UTC

any follow up on this?

Comment 3 Joe Doss 2017-09-06 14:50:43 UTC

Just wondering if there is any update on this? We are going to start rolling out i3 instances and we would rather not have to roll a custom kernel to get the NVMe SSD storage working correctly.

Comment 4 Justin M. Forbes 2017-09-06 16:13:47 UTC

From msw in the launchpad bug:

"Yes, ballooning has been a constant source of problems which is why it is disabled in Amazon Linux AMI.

We do not currently support DMA to/from guest physical addresses outside of the E820 map for ENA networking or NVMe storage interfaces. This effectively means that ballooning needs to be disabled, or perhaps some changes would need to be made in the Xen swiotlb code to bounce data that resides in guest physical addresses that are outside of the E820 map."

Matt tends to be worth trusting on these issues.

Given that AWS is the largest target for our xen guests, I would be more than happy to turn this off, but I would like to hear myoung's thoughts first as he is the maintainer of xen.

Comment 5 Michael Young 2017-09-06 21:33:35 UTC

The problem is that disabling CONFIG_XEN_BALLOON stops ballooning on Dom0 and well as DomU, and that is something I personally find useful, so I imagine other people do as well. Ubuntu doesn't have this problem as they have a separate AWS kernel so they can disable CONFIG_XEN_BALLOON there while leaving it working in their standard kernel.

Comment 6 Vitaly Kuznetsov 2017-09-07 09:39:45 UTC

I think it would make sense to try to address the issue upstream and I'd like to volunteer. Was it ever reported on xen-devel mailing list? Are there any additional details on why GPAs outside of the E820 addresses get in the way of NVMe?

Comment 7 Justin M. Forbes 2017-09-07 11:11:49 UTC

I am not sure about an upstream discussion, I was just reading the launchpad bug. Perhaps this can be worked around in userspace in the meantime?

Comment 8 Vitaly Kuznetsov 2017-09-07 17:16:41 UTC

It seems there was a discussion upstream:
https://lists.xen.org/archives/html/xen-devel/2017-03/msg03020.html

The following kernel commit 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96edd61dcf44362d3ef0bed1a5361e0ac7886a63
may be related. It is present upstream since v4.13. Do we know if the issue is still reproducible?

Comment 9 Joe Doss 2017-09-07 22:26:33 UTC

This seems to be fixed with 4.13. I grabbed https://bodhi.fedoraproject.org/updates/FEDORA-2017-061a577fe5 and installed it on an AWS i3.xlarge with Fedora 26. I then resetup /dev/nvme0n1 with LUKS, LVM and XFS and mounted it.

I ran three tests: 
1) wrote a bunch of zeros a file 
2) used bonnie++ as detailed in the launchpad bug.
3) used sysbench

It was pretty easy to generate the Buffer I/O error on device dm-1 when I originally opened this bug. I can't reproduce them now with 4.13.0-1.fc27.x86_64. Just need 4.13 on FC26 :)

Maybe there are better ways to test IO these days, but I didn't spend much time looking. The tests used are detailed below for the record.

[root@nvme0n1-testing elasticsearch]# dd if=/dev/zero of=teesting.img bs=4k iflag=fullblock,count_bytes count=25G

[root@nvme0n1-testing elasticsearch]# bonnie++ -d /var/lib/elasticsearch -r 1000 -u fedora
Using uid:1000, gid:1000.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
nvme0n1-testing  2G   563  99 193130  18 185598  13  2455  99 2926166  99 +++++ +++
Latency             21616us    3068ms     197ms    4303us      45us    4336us
Version  1.97       ------Sequential Create------ --------Random Create--------
nvme0n1-testing     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 19415  94 +++++ +++ +++++ +++ 20135  96 +++++ +++ 31172  92
Latency               288us     106us     265us     530us      16us     979us
1.97,1.97,nvme0n1-testing,1,1504825975,2G,,563,99,193130,18,185598,13,2455,99,2926166,99,+++++,+++,16,,,,,19415,94,+++++,+++,+++++,+++,20135,96,+++++,+++,31172,92,21616us,3068ms,197ms,4303us,45us,4336us,288us,106us,265us,530us,16us,979us

[root@nvme0n1-testing elasticsearch]# sysbench --test=fileio --file-total-size=150G --file-test-mode=rndrw  --max-time=300 --max-requests=0 run

sysbench 1.0.6 (using system LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: 0
128 files, 1.1719GiB each
150GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      4174.28
    writes/s:                     2782.85
    fsyncs/s:                     8904.77

Throughput:
    read, MiB/s:                  65.22
    written, MiB/s:               43.48

General statistics:
    total time:                          300.0013s
    total number of events:              4758673

Latency (ms):
         min:                                  0.00
         avg:                                  0.06
         max:                                 25.70
         95th percentile:                      0.16
         sum:                             296169.92

Threads fairness:
    events (avg/stddev):           4758673.0000/0.00
    execution time (avg/stddev):   296.1699/0.00

Comment 10 Michael Young 2017-09-07 23:04:29 UTC

(In reply to Joe Doss from comment #9)
> This seems to be fixed with 4.13. I grabbed
> https://bodhi.fedoraproject.org/updates/FEDORA-2017-061a577fe5 and installed
> it on an AWS i3.xlarge with Fedora 26. I then resetup /dev/nvme0n1 with
> LUKS, LVM and XFS and mounted it.
> 
> I ran three tests: 
> 1) wrote a bunch of zeros a file 
> 2) used bonnie++ as detailed in the launchpad bug.
> 3) used sysbench
> 
> It was pretty easy to generate the Buffer I/O error on device dm-1 when I
> originally opened this bug. I can't reproduce them now with
> 4.13.0-1.fc27.x86_64. Just need 4.13 on FC26 :)

The patch looks like it applies cleanly to 4.12.11 so it should be possible to add it to the current F26 kernels.

Comment 11 Justin M. Forbes 2017-09-07 23:28:23 UTC

Thanks for looking into this, I already did the builds for 4.12.11, but I will make sure it hits the next update.

Comment 12 Fedora Update System 2017-09-12 03:14:28 UTC

kernel-4.12.12-300.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-e00b28acd5

Comment 13 Fedora Update System 2017-09-12 03:15:45 UTC

kernel-4.12.12-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-025ef758b0

Comment 14 Fedora Update System 2017-09-13 06:20:19 UTC

kernel-4.12.12-300.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-e00b28acd5

Comment 15 Fedora Update System 2017-09-13 07:23:32 UTC

kernel-4.12.12-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-025ef758b0

Comment 16 Fedora Update System 2017-09-14 21:55:05 UTC

kernel-4.12.12-300.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2017-09-14 22:40:01 UTC

kernel-4.12.13-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-e07d7fb18e

Comment 18 Fedora Update System 2017-09-16 05:26:53 UTC

kernel-4.12.13-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-e07d7fb18e

Comment 19 Fedora Update System 2017-09-22 05:22:35 UTC

kernel-4.12.13-200.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.