Bug 2242391
Summary: | Kernel worker thread on 100% CPU core utilisation and one btrfs file system completely unusable | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Joshua Noeske <fedora> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 38 | CC: | acaringi, adscvr, airlied, alciregi, bskeggs, glandvador, hdegoede, hpa, jarod, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | --- | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2024-05-31 08:38:14 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Joshua Noeske
2023-10-05 20:25:24 UTC
Created attachment 1992294 [details]
journalctl -k output of the boot with the problem.
I forgot to mention that unmounting the affected filesystem fails since the device is busy. Even after lazily unmounting the filesystem, the machine did not shut down correctly and I had to perform a hard reset. The CPU usage graph of Cockpit supports my theory regarding btrbk and the snapshot transfer. Both times, the error apparently occurred during the transfer of the snapshots to another drive. This also happens on a F37 system, EXT4 partition over a mdadm raid1 (HDD 4T). Whole kernel line 6.5 (tested 6.5.4, 6.5.5 and 6.5.6 that are available on koji) display this behaviour, while previous line 6.4 (tested 6.4.4, 6.4.15) doesn't. What triggers this behaviour in my case is creating small files on the raid array's partition, ie: #for i in {0001..0200}; echo "some text" > "file_${i}.txt" After a few seconds the kworker/flush kicks in for a variable amount of time dependent of the number of created files. During the time the kworker/flush is 100% CPU, trying to delete these files is more or less impossible. Removing these files (once the kworker/flush goes away) is fast and doesn't trig this behaviour. Writing one huge file (dd if=/dev/zero of=/raid/file) doesn't seem to trig this behaviour. I also experienced the behaviour in Comment #2, which lead to a reconstruction of the raid array, youpii. On the same system, a small SSD (16G) is installed for the system with a EXT4 partition, no raid. Writing smalls file on this SSD partition doesn't trig the kworker/flush to eat 100% CPU. I am willing to test kernels as long as they work on F37 (for now) and I don't have to build them. Building Fedora kernels are not an option for me. Last time I tried it took several hours just to fail after filling remaining 16G disk space on a I7 laptop (ok not last generation, but still). I can confirm that 6.4.15 does not show this behaviour. The error has not occurred with this kernel version yet. Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21. Fedora Linux 38 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed. |