Description of problem: When using NVMe SSD storage on i3 AWS instances produces errors and file corruption. Seems very much related to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 which states that CONFIG_XEN_BALLOON=y is the cause of this issue. Version-Release number of selected component (if applicable): 4.11.5-200.fc25.x86_64 How reproducible: Always Steps to Reproduce: 1. Spin up any i3 AWS instance. 2. Setup /dev/nvme0n1 with LUKS, LVM and XFS/EXT4 and mount it. 3. Put the disk under a large amount of I/O Actual results: [ 2110.011229] EXT4-fs warning (device dm-1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 2298478592 size 8388608 starting block 637696) [ 2110.011259] Buffer I/O error on device dm-1, logical block 637696 [ 2110.017068] Buffer I/O error on device dm-1, logical block 637697 [ 2110.021814] Buffer I/O error on device dm-1, logical block 637698 [ 2110.026815] Buffer I/O error on device dm-1, logical block 637699 [ 2110.031674] Buffer I/O error on device dm-1, logical block 637700 [ 2110.036626] Buffer I/O error on device dm-1, logical block 637701 [ 2110.041988] Buffer I/O error on device dm-1, logical block 637702 [ 2110.046759] Buffer I/O error on device dm-1, logical block 637703 [ 2110.051321] Buffer I/O error on device dm-1, logical block 637704 [ 2110.056081] Buffer I/O error on device dm-1, logical block 637705 [ 2110.678357] blk_update_request: I/O error, dev nvme0n1, sector 5197504 Expected results: Fast I/O without errors. Additional info: Setting memhp_default_state=offline in the grub.conf doesn't seem to correct the issue as detailed in the Ubuntu bug report.
I built a kernel (4.11.6-300.nvme.fc26.x86_64) with CONFIG_XEN_BALLOON disabled and I ran the I/O tests on an i3.xlarge instance with Fedora 26 installed. It has the same drive setup as before (LUKS, LVM and XFS) and I am no longer seeing the blk_update_request: I/O error, dev nvme0n1 errors. The kernel is up on copr (https://copr.fedorainfracloud.org/coprs/jdoss/kernel/) if anyone else encounters this issue on AWS i3 instances while the kernel team figures out how to work around this issue without totally disabling CONFIG_XEN_BALLOON by default.
any follow up on this?
Just wondering if there is any update on this? We are going to start rolling out i3 instances and we would rather not have to roll a custom kernel to get the NVMe SSD storage working correctly.
From msw in the launchpad bug: "Yes, ballooning has been a constant source of problems which is why it is disabled in Amazon Linux AMI. We do not currently support DMA to/from guest physical addresses outside of the E820 map for ENA networking or NVMe storage interfaces. This effectively means that ballooning needs to be disabled, or perhaps some changes would need to be made in the Xen swiotlb code to bounce data that resides in guest physical addresses that are outside of the E820 map." Matt tends to be worth trusting on these issues. Given that AWS is the largest target for our xen guests, I would be more than happy to turn this off, but I would like to hear myoung's thoughts first as he is the maintainer of xen.
The problem is that disabling CONFIG_XEN_BALLOON stops ballooning on Dom0 and well as DomU, and that is something I personally find useful, so I imagine other people do as well. Ubuntu doesn't have this problem as they have a separate AWS kernel so they can disable CONFIG_XEN_BALLOON there while leaving it working in their standard kernel.
I think it would make sense to try to address the issue upstream and I'd like to volunteer. Was it ever reported on xen-devel mailing list? Are there any additional details on why GPAs outside of the E820 addresses get in the way of NVMe?
I am not sure about an upstream discussion, I was just reading the launchpad bug. Perhaps this can be worked around in userspace in the meantime?
It seems there was a discussion upstream: https://lists.xen.org/archives/html/xen-devel/2017-03/msg03020.html The following kernel commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96edd61dcf44362d3ef0bed1a5361e0ac7886a63 may be related. It is present upstream since v4.13. Do we know if the issue is still reproducible?
This seems to be fixed with 4.13. I grabbed https://bodhi.fedoraproject.org/updates/FEDORA-2017-061a577fe5 and installed it on an AWS i3.xlarge with Fedora 26. I then resetup /dev/nvme0n1 with LUKS, LVM and XFS and mounted it. I ran three tests: 1) wrote a bunch of zeros a file 2) used bonnie++ as detailed in the launchpad bug. 3) used sysbench It was pretty easy to generate the Buffer I/O error on device dm-1 when I originally opened this bug. I can't reproduce them now with 4.13.0-1.fc27.x86_64. Just need 4.13 on FC26 :) Maybe there are better ways to test IO these days, but I didn't spend much time looking. The tests used are detailed below for the record. [root@nvme0n1-testing elasticsearch]# dd if=/dev/zero of=teesting.img bs=4k iflag=fullblock,count_bytes count=25G [root@nvme0n1-testing elasticsearch]# bonnie++ -d /var/lib/elasticsearch -r 1000 -u fedora Using uid:1000, gid:1000. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP nvme0n1-testing 2G 563 99 193130 18 185598 13 2455 99 2926166 99 +++++ +++ Latency 21616us 3068ms 197ms 4303us 45us 4336us Version 1.97 ------Sequential Create------ --------Random Create-------- nvme0n1-testing -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 19415 94 +++++ +++ +++++ +++ 20135 96 +++++ +++ 31172 92 Latency 288us 106us 265us 530us 16us 979us 1.97,1.97,nvme0n1-testing,1,1504825975,2G,,563,99,193130,18,185598,13,2455,99,2926166,99,+++++,+++,16,,,,,19415,94,+++++,+++,+++++,+++,20135,96,+++++,+++,31172,92,21616us,3068ms,197ms,4303us,45us,4336us,288us,106us,265us,530us,16us,979us [root@nvme0n1-testing elasticsearch]# sysbench --test=fileio --file-total-size=150G --file-test-mode=rndrw --max-time=300 --max-requests=0 run sysbench 1.0.6 (using system LuaJIT 2.1.0-beta2) Running the test with following options: Number of threads: 1 Initializing random number generator from current time Extra file open flags: 0 128 files, 1.1719GiB each 150GiB total file size Block size 16KiB Number of IO requests: 0 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 100 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random r/w test Initializing worker threads... Threads started! File operations: reads/s: 4174.28 writes/s: 2782.85 fsyncs/s: 8904.77 Throughput: read, MiB/s: 65.22 written, MiB/s: 43.48 General statistics: total time: 300.0013s total number of events: 4758673 Latency (ms): min: 0.00 avg: 0.06 max: 25.70 95th percentile: 0.16 sum: 296169.92 Threads fairness: events (avg/stddev): 4758673.0000/0.00 execution time (avg/stddev): 296.1699/0.00
(In reply to Joe Doss from comment #9) > This seems to be fixed with 4.13. I grabbed > https://bodhi.fedoraproject.org/updates/FEDORA-2017-061a577fe5 and installed > it on an AWS i3.xlarge with Fedora 26. I then resetup /dev/nvme0n1 with > LUKS, LVM and XFS and mounted it. > > I ran three tests: > 1) wrote a bunch of zeros a file > 2) used bonnie++ as detailed in the launchpad bug. > 3) used sysbench > > It was pretty easy to generate the Buffer I/O error on device dm-1 when I > originally opened this bug. I can't reproduce them now with > 4.13.0-1.fc27.x86_64. Just need 4.13 on FC26 :) The patch looks like it applies cleanly to 4.12.11 so it should be possible to add it to the current F26 kernels.
Thanks for looking into this, I already did the builds for 4.12.11, but I will make sure it hits the next update.
kernel-4.12.12-300.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-e00b28acd5
kernel-4.12.12-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-025ef758b0
kernel-4.12.12-300.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-e00b28acd5
kernel-4.12.12-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-025ef758b0
kernel-4.12.12-300.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
kernel-4.12.13-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-e07d7fb18e
kernel-4.12.13-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-e07d7fb18e
kernel-4.12.13-200.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.