Bug 1731474

Summary: blk-mq regression, write cache delaying every read for rotating media until cache is empty?
Product: [Fedora] Fedora Reporter: Enrico Tagliavini <enrico.tagliavini>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 30CC: airlied, bskeggs, germano.massullo, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, labbott, linville, mchehab, mjg59, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-21 12:17:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output none

Description Enrico Tagliavini 2019-07-19 13:38:31 UTC
1. Please describe the problem:
Since the introduction of blk-mq as the default queuing mechanism for SCSI devices the performance of the SATA HDD on my Dell Inspiring 17 7577 laptop dropped massively. dd reports a max write speed of 10 - 20 MB/s, fdatasync() on files frequently takes ~1 second (looking at strace of normal desktop apps, such as dolphin). OS is installed on an nvme disk and that works fine, but /home is on the HDD and makes the computer almost unusable. Desktop application might become unresponsive at times (in reality they are just writing very slow to the disk, they are working fine). Doing normal work is severely slowed down. Running video games is out of question.

Until recently I was able to switch back to non-MQ and using the CFQ IO scheduler by appending scsi_mod.use_blk_mq=0 and scsi_mod.use_blk_mq=0 to the kernel command line, but now that's not possible any longer, CONFIG_IOSCHED_CFQ is not set in the current fedora kernels and no fallback is possible.

I would imagine this doesn't affect only me and it would be nice if a fallback is offered as it was in the past. Currently the usage of this laptop is prevented by this problem, I cannot run Fedora with a pleasant experience.


2. What is the Version-Release number of the kernel:
kernel-5.1.18-300.fc30 kernel-headers-5.1.18-300.fc30 


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes it worked perfectly fine since Fedora 29, when the laptop was purchased. I'm not sure what kernel release introduced the change to default to blk-mq, but that's ok. The problem is there is no fallback any longer. I'm also not sure when CONFIG_IOSCHED_CFQ and co. where removed. The laptop was having thermal issues in the recent months and I stopped using it on a daily basis. I already noticed the problem, but I hoped it was going to be fixed by the time I was going to resolve the thermal issue.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Yes. I'll attach a block configuration of the computer later. In short it's two VG (one for the nvme and one for the hdd, the hdd one has some space on the nvme as well for optional LVM-cache. Issue happens with and without lvm-cache), with dmcrypt-LUKS encryption for the whole volume groups. File system is XFS on all partitions.

With this setup it should be sufficient to use dd to write to the disk to notice very low write performance, about one order of magnitude lower than expected. strace-ing programs doing small IO can show fdatasync() calls taking very long, up to a full second. I tried a few from the Plasma workspace that looked unresponsive


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Didn't try yet


6. Are you running any modules that not shipped with directly Fedora's kernel?:
Nvidia driver, but I booted without it, no change (main GC is Intel IGP anyway). Nothing else.


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

I'll attach kernel logs later

Comment 1 Enrico Tagliavini 2019-07-19 13:46:03 UTC
Mhm I noticed just now CFQ was dropped upstream and I think there is no alternative to blk-mq for the unfortunate people having problems with it. Great :(

Comment 2 Enrico Tagliavini 2019-07-19 19:26:01 UTC
So, some more information. It seems the problem is concurrent I/O. If there is an heavy I/O operation running, such as a dd writing to the HDD, performance of other applications I/O to the same disk are catastrophically affected, to the point of hanging until the dd is stopped. I've tried to play around with a lot of options in sda/queue/ but nothing worked. I tried to change scheduler, using also bfq (which is supposed to share the bandwith) and tuning its setting to no change. I tried to change wbt_lat_usec (which is -1 by default) and that also changed nothing. Looking at htop I can see 8 crypt kworkers (kcryptd) in the D state using a fair amount of CPU (5% each roughly).

Then I changed the /proc/sys/vm/dirty_bytes to 15 MB (the default ratio is 20 and this laptop has 32 GB of RAM). This made a change. System now responds, albeit very slowly (latency of 1-2 seconds) during heavy background I/O. Setting it to 1M makes the system even more responsive, at the expenses of the throughput of the dd process, 120 MB/s with default settings, 96 with 15 dirty bytes and 78 with 1 MB dirty bytes. The kworker kthreads are now in the I state in htop and are using possibly a bit more CPU than before.

Now I don't think this is really the solution to the problem, it's very odd the system behaves like that. But does this gives anybody a clue about what's wrong here and if there is any other parameter I can play around with to check if a better one to tune can be found?

Thank you for your help.

dmesg coming shortly.

Comment 3 Enrico Tagliavini 2019-07-19 19:26:35 UTC
Created attachment 1592137 [details]
dmesg output

Comment 4 Laura Abbott 2019-07-20 13:15:40 UTC
This all needs to be reported to the upstream maintainers. There isn't much the kernel maintainers can do about this right now.

Comment 5 Enrico Tagliavini 2019-07-21 12:17:33 UTC
Yeah I was very surprised when I discovered this was actually removed upstream. It was difficult to find info about this and I thought it was Fedora that disabled it. Unfortunately that's not the case. This is very surprising, I don't understand why the fallback option was removed.

I opened bug https://bugzilla.kernel.org/show_bug.cgi?id=204253

Unfortunately I'm not a multi billion corporation, I'm just a dude with a useless (brand new) laptop I can basically trash now. I'm being honest: I have no hope this will be fixed, or even looked at by upstream. If they cared they would have kept the fallback options in the first place, as it's kind of obvious you will have regressions.

Anyway, thank you for looking at this Laura :)

Kind regards.