Bug 2189320
| Summary: | cifsiod kernel has high cpu usage and server is freezing up after which only fix is to force reboot. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | akarnafel | ||||||
| Component: | cifs-utils | Assignee: | Nobody <nobody> | ||||||
| Status: | CLOSED MIGRATED | QA Contact: | xiaoli feng <xifeng> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | CentOS Stream | CC: | brentw, bstinson, dmoraes1, dpulkowski, john.horne, jwboyer, lawrence.gorman, ryan.brothers, xzhou | ||||||
| Target Milestone: | rc | Keywords: | MigratedToJIRA | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2023-09-23 11:39:33 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
I had the same behavior │ │ PID %CPU Size Res Res Res Res Shared Faults Faults Command │ │ Used KB Set Text Data Lib KB Min Maj │ │ 745 99.7 0 0 0 0 0 0 0 0 kworker/5:2+cifsiod │ │ 1115 0.5 456588 9452 32 27828 0 7860 0 0 vmtoolsd │ │ 5071 0.5 8332 5032 156 4680 0 2476 0 0 nmon16m_x86_64_ │ │ 1 0.0 169644 13408 44 19740 0 9748 0 0 systemd │ │ 2 0.0 0 0 0 0 0 0 0 0 kthreadd Created attachment 1959633 [details]
nmon - top
Additional Information: Kernel Version : 5.14.0-289.el9.x86_64 RPM : cifs-utils-7.0-1.el9.x86_64 OS Version : CentOS Stream release 9 VMware Tools: Running, version:12325 (Guest Managed) I think this issue is the same as bz2177562. It's a regression issue included from kernel-5.14.0-276.el9. I can't read about bz2177562. Can you do the right permissions ? Or tell us what we need to do ? (In reply to Daniel de Morais Carneiro from comment #5) > I can't read about Red Hatbz2177562. Can you do the right permissions ? Or > tell us what we need to do ? bz2177562 is ongoing fix. Now there isn't a solution. Let me see if I can set the permission. Thanks. I also have this exact problem since a kernel upgrade (to 5.14.0-284.11.1.el9_2.x86_64). At the same time cifs_utils was upgraded to 7.0-1.el9.x86_64. High CPU (~99%) usage for kworker/cifsiod followed by CPU soft lock. Unable to kill the process so only a reboot sorts out the problem. Then, within about 30-60 minutes, it's back again. Rolling cifs_utils to 6.14-1.el9.x86_64 did not fix the problem. Booting to previous kernel version (5.14.0-162.23.1.el9_1.x86_64) does seem to fix it. Additionally I can't view https://bugzilla.redhat.com/show_bug.cgi?id=2177562 so am unable to see if there's been any progress with the problem. I can confirm the same issue. I tried reverting back to cifs-utils (6.14-1.el9.x86_64) & that did not fix the problem as well. I will as a temporary workaround try reverting back to the previous kernel as well (5.14.0-162.23.1.el9_1.x86_64). There is a cve for that kernel https://nvd.nist.gov/vuln/detail/CVE-2022-3028 https://bugzilla.redhat.com/show_bug.cgi?id=2122228 Could you try kernel-5.14.0-301.el9? I find this bug bz2177562 is gone on kernel-5.14.0-301.el9. Thanks. I am running into this issue too after upgrading to 5.14.0-284.11.1.el9_2.x86_64. Will a fix for it be released soon? > Could you try kernel-5.14.0-301.el9? I find this bug Red Hatbz2177562 is
> gone on kernel-5.14.0-301.el9.
>
> Thanks.
This kernel-5.14.0-301.el9 is not available in the rhel-9-for-x86_64-baseos-rpms yet.
The latest kernel available is: 5.14.0-284.11.1.el9_2
Is there any timeline of when the fix will be released for RHEL 9? I'm still monitoring, but kernel version 5.14.0-284.18.1.el9_2.x86_64 (just released) appears to have fixed the issue for me. I also tested the new version: 5.14.0-284.18.1.el9_2.x86_64 I have been monitoring it for about 4+ hours with the new kernel. I also think this fixed the issue. Since on the troublesome version, I would noticed within 4 hours a reboot was necessary. Will report back if any issue arises throughout the day. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |
Created attachment 1959632 [details] image1 Description of problem: This is a new VM, tried on kernel version 5.14.0.234 and it is working just fine. Kernel 5.14.0.289-299 do not work. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. install keyutils and cifs-utils 2. mount a cifs mount to the server. 3. Wait for 30-180 minutes before you will start seeing in dmesg for something like "watchdog: BUG: soft looking - CPU## stuck for xxxs! [kworker/0:3.....] Actual results: dmesg shows info like "watchdog: BUG: soft looking - CPU## stuck for xxxs! [kworker/0:3.....] and server is very slow to do anything. Expected results: server should not be slow. Additional info: I can access the file without any issues until the server is very slow.