Bug 1955935

Summary: Fedora 34 - system slow when copying large file from HDD to SSD (root partition) [BTRFS with compression]
Product: [Fedora] Fedora Reporter: Shyam Jos <shyamjosepp>
Component: kernelAssignee: fedora-kernel-btrfs
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 34CC: acaringi, adscvr, airlied, alciregi, bberg, bskeggs, bugzilla, davide, esandeen, hdegoede, igor.raits, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ngompa13, ptalbert, shyamjosepp, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-05 17:16:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journalctl -b -o short-monotonic --no-hostname
none
bug1955935-journal1.log
none
bug1955935shya-journal2.log
none
nautilus-pid-stack.log
none
dmesg log
none
fpaste --sysinfo --printonly > sysinfo.txt
none
fpaste --btrfsinfo --printonly > btrfsinfo.txt none

Description Shyam Jos 2021-05-01 14:38:51 UTC
Description of problem:
I just installed (fresh install/btrfs with FDE and compression) fedora 34 on my laptop's SSD drive (not NVMe) and when I was restoring my files (mostly video files)from HDD (SATA) to SSD (/home) system become totally unusable (freezing).


Version-Release number of selected component (if applicable):
Linux fedora 5.11.16-300.fc34.x86_64 
btrfs-progs v5.11.1 

Model name: Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
RAM: 12GB


How reproducible:
Happens every time
 
Steps to Reproduce:
1. Copy large (videos) files from HDD to SDD (/home partition)


Actual results:
System freezing 

Expected results:
No freeze

Additional info:
fstab
UUID=000 /                       btrfs   subvol=root,compress=zstd:1,x-systemd.device-timeout=0 0 0
UUID=000 /boot                   ext4    defaults        1 2
UUID=000         /boot/efi               vfat    umask=0077,shortname=winnt 0 2
UUID=000 /home                   btrfs   subvol=home,compress=zstd:1,x-systemd.device-timeout=0 0 0
#HDD 
LABEL=data /mnt/data auto nosuid,nodev,nofail,x-gvfs-show 0 0

Comment 1 Neal Gompa 2021-05-02 01:56:23 UTC
This is actually an issue in the kernel, re-assigning.

Comment 2 Chris Murphy 2021-05-02 18:27:43 UTC
Can you capture sysrq+t while the hang is happening? It might help to have sysrq+t typed out before reproducing the problem, so that the typing isn't subject to the hang.

Then 'journalctl -b -o short-monotonic --no-hostname > bug1955935-journal.log'  and attach that log file to the bug. Thanks.

Comment 3 Shyam Jos 2021-05-03 04:39:21 UTC
Created attachment 1778823 [details]
journalctl -b -o short-monotonic --no-hostname

Comment 4 Josef Bacik 2021-05-03 15:16:54 UTC
Hmm the sysrq+t doesn't show anything relevant.  Can you sysrq+w when the box is misbehaving?  You may have to ssh into the box from another machine and sysrq+w if it's too locked up.  Also try running top and see if there's anything suspicious eating a lot of cpu.  If you can get the PID of the copy process you can also cat /proc/<pid>/stack a few times to get a snapshot of what it's doing, as well as any pid that seems to be eating a lot of CPU.

Comment 5 Shyam Jos 2021-05-03 17:45:19 UTC
Created attachment 1779066 [details]
bug1955935-journal1.log

Comment 6 Shyam Jos 2021-05-03 17:46:03 UTC
Created attachment 1779067 [details]
bug1955935shya-journal2.log

Comment 7 Shyam Jos 2021-05-03 17:47:40 UTC
Created attachment 1779068 [details]
nautilus-pid-stack.log

Output of nautilus /proc/<pid>/stack

Comment 8 Shyam Jos 2021-05-03 18:00:01 UTC
Cpu utilization was normal nothing suspicious(nautilus utilization was < 10 %), Any application that requires disk/IO becomes painfully slow at that time.
Also, after installing kernel 5.11.17-300.fc34.x86_64 experienced the same issue.

Comment 9 Chris Murphy 2021-05-04 01:28:08 UTC
The sysrq+t journal has a bunch of missing lines:
[  937.960415] systemd-journald[723]: Missed 22 kernel messages

I though journald only rate limited services, not kernel messages. That's kinda annoying. Ok so new idea, boot with parameter log_buf_len=8M and now the sysrq+t can be retrieved with just dmesg > dmesg.log or whatever. It might be a red herring, because what I see with sysrq+w is actually i915 related, so I wonder if this is a GPU hang and not IO related.

When the hang happens, in addition to sysrq+t, could you also do
grep -R . /proc/pressure

And at any time:
fpaste --sysinfo --printonly > sysinfo.txt
fpaste --btrfsinfo --printonly > btrfsinfo.txt

Attach those files. Thanks!

Comment 10 Shyam Jos 2021-05-04 05:22:59 UTC
Created attachment 1779211 [details]
dmesg log

Comment 11 Shyam Jos 2021-05-04 05:23:22 UTC
grep -R . /proc/pressure && dmesg > dmesg.log1255
/proc/pressure/io:some avg10=61.36 avg60=45.06 avg300=37.58 total=505042612
/proc/pressure/io:full avg10=53.99 avg60=40.04 avg300=32.98 total=390619115
/proc/pressure/cpu:some avg10=2.45 avg60=2.84 avg300=3.42 total=176677631
/proc/pressure/memory:some avg10=4.68 avg60=2.33 avg300=1.74 total=34107161
/proc/pressure/memory:full avg10=3.48 avg60=1.52 avg300=0.99 total=20667551

Comment 12 Shyam Jos 2021-05-04 05:25:49 UTC
Created attachment 1779212 [details]
fpaste --sysinfo --printonly > sysinfo.txt

Comment 13 Shyam Jos 2021-05-04 05:26:34 UTC
Created attachment 1779213 [details]
fpaste --btrfsinfo --printonly > btrfsinfo.txt

Comment 14 Josef Bacik 2021-05-04 19:07:01 UTC
Man there's just no indication that anything is wrong, nothing is doing IO or stuck anywhere.  Can you try two things for me?  First do

echo "mq-deadline" > /sys/block/sda/queue/scheduler

and see if that helps.  If it doesn't install bcc-tools and run

/usr/share/bcc/tools/biolatency

while the copy is running.  Then ctrl+c once you are done, it'll print a nice graph of the BIO latencies.

Comment 15 Benjamin Berg 2021-05-05 14:17:37 UTC
So, the bug description says "freezing", but comment #8 says "Any application that requires disk/IO becomes painfully slow at that time."

This makes a big difference to us. If the mouse cursor is still updating, then we would not consider that as a freeze of the system. We "just" happen to not be scheduling IO in an effective manner and other applications are impacted (which might be caused by buffer bloat in the disk caches even).

Shyam, could you clarify what behaviour you are observing?

Comment 16 Shyam Jos 2021-05-05 17:16:03 UTC
@josef System is much responsive when mq-deadline scheduler is used 

@Benjamin Sorry for the misleading title, after changing the scheduler to mq-deadline, system seems much responsive when doing file copy, So I am closing this ticket for now

Thank you all for the support

Comment 17 Chris Murphy 2021-05-06 18:12:47 UTC
I drop in a file:
/etc/udev/rules.d/60-block-scheduler.rules
containing:
ACTION=="add", SUBSYSTEM=="block", \
  KERNEL=="mmcblk*[0-9]|sd*[!0-9]", \
  ENV{DEVTYPE}=="disk", \
  ATTR{queue/scheduler}="mq-deadline"