This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2189320 - cifsiod kernel has high cpu usage and server is freezing up after which only fix is to force reboot.
Summary: cifsiod kernel has high cpu usage and server is freezing up after which only ...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: cifs-utils
Version: CentOS Stream
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Nobody
QA Contact: xiaoli feng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-24 19:53 UTC by akarnafel
Modified: 2023-09-23 11:39 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-23 11:39:33 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
image1 (175.01 KB, image/png)
2023-04-24 19:53 UTC, akarnafel
no flags Details
nmon - top (39.60 KB, image/png)
2023-04-24 20:12 UTC, Daniel de Morais Carneiro
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-7946 0 None Migrated None 2023-09-23 11:39:28 UTC
Red Hat Issue Tracker RHELPLAN-155581 0 None None None 2023-04-24 19:54:04 UTC

Description akarnafel 2023-04-24 19:53:31 UTC
Created attachment 1959632 [details]
image1

Description of problem:
This is a new VM, tried on kernel version 5.14.0.234 and it is working just fine. Kernel 5.14.0.289-299 do not work.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. install keyutils and cifs-utils
2. mount a cifs mount to the server. 
3. Wait for 30-180 minutes before you will start seeing in dmesg for something like "watchdog: BUG: soft looking - CPU## stuck for xxxs! [kworker/0:3.....] 

Actual results:
dmesg shows info like "watchdog: BUG: soft looking - CPU## stuck for xxxs! [kworker/0:3.....] and server is very slow to do anything.

Expected results:
server should not be slow.

Additional info:
I can access the file without any issues until the server is very slow.

Comment 1 Daniel de Morais Carneiro 2023-04-24 20:09:42 UTC
I had the same behavior

│
│  PID        %CPU      Size       Res      Res       Res       Res      Shared   Faults   Faults Command                                                                                                                                          │
│              Used        KB       Set      Text      Data       Lib        KB      Min      Maj                                                                                                                                                  │
│      745     99.7         0         0         0         0         0         0        0        0 kworker/5:2+cifsiod                                                                                                                              │
│     1115      0.5    456588      9452        32     27828         0      7860        0        0 vmtoolsd                                                                                                                                         │
│     5071      0.5      8332      5032       156      4680         0      2476        0        0 nmon16m_x86_64_                                                                                                                                  │
│        1      0.0    169644     13408        44     19740         0      9748        0        0 systemd                                                                                                                                          │
│        2      0.0         0         0         0         0         0         0        0        0 kthreadd

Comment 2 Daniel de Morais Carneiro 2023-04-24 20:12:59 UTC
Created attachment 1959633 [details]
nmon - top

Comment 3 Daniel de Morais Carneiro 2023-04-24 20:25:27 UTC
Additional Information:
Kernel Version : 5.14.0-289.el9.x86_64
RPM : cifs-utils-7.0-1.el9.x86_64
OS Version : CentOS Stream release 9
VMware Tools:	Running, version:12325 (Guest Managed)

Comment 4 xiaoli feng 2023-04-25 02:37:25 UTC
I think this issue is the same as bz2177562. It's a regression issue included from kernel-5.14.0-276.el9.

Comment 5 Daniel de Morais Carneiro 2023-04-25 13:14:09 UTC
I can't read about bz2177562. Can you do the right permissions ? Or tell us what we need to do ?

Comment 6 xiaoli feng 2023-04-28 01:03:23 UTC
(In reply to Daniel de Morais Carneiro from comment #5)
> I can't read about Red Hatbz2177562. Can you do the right permissions ? Or
> tell us what we need to do ?

bz2177562 is ongoing fix. Now there isn't a solution. Let me see if I can set the permission.

Thanks.

Comment 7 lawrence.gorman 2023-05-11 18:10:37 UTC
I also have this exact problem since a kernel upgrade (to 5.14.0-284.11.1.el9_2.x86_64). At the same time cifs_utils was upgraded to 7.0-1.el9.x86_64.

High CPU (~99%) usage for kworker/cifsiod followed by CPU soft lock. Unable to kill the process so only a reboot sorts out the problem. Then, within about 30-60 minutes, it's back again.

Rolling cifs_utils to 6.14-1.el9.x86_64 did not fix the problem.

Booting to previous kernel version (5.14.0-162.23.1.el9_1.x86_64) does seem to fix it.

Comment 8 lawrence.gorman 2023-05-11 18:36:17 UTC
Additionally I can't view https://bugzilla.redhat.com/show_bug.cgi?id=2177562 so am unable to see if there's been any progress with the problem.

Comment 9 David Pulkowski 2023-05-11 18:39:58 UTC
I can confirm the same issue.
I tried reverting back to cifs-utils (6.14-1.el9.x86_64) & that did not fix the problem as well.
I will as a temporary workaround try reverting back to the previous kernel as well (5.14.0-162.23.1.el9_1.x86_64).
There is a cve for that kernel https://nvd.nist.gov/vuln/detail/CVE-2022-3028
https://bugzilla.redhat.com/show_bug.cgi?id=2122228

Comment 10 xiaoli feng 2023-05-15 01:23:54 UTC
Could you try kernel-5.14.0-301.el9? I find this bug bz2177562 is gone on kernel-5.14.0-301.el9.

Thanks.

Comment 11 ryan.brothers 2023-05-15 14:01:15 UTC
I am running into this issue too after upgrading to 5.14.0-284.11.1.el9_2.x86_64.  Will a fix for it be released soon?

Comment 12 David Pulkowski 2023-05-15 17:11:46 UTC
> Could you try kernel-5.14.0-301.el9? I find this bug Red Hatbz2177562 is
> gone on kernel-5.14.0-301.el9.
> 
> Thanks.

This kernel-5.14.0-301.el9 is not available in the rhel-9-for-x86_64-baseos-rpms yet.
The latest kernel available is: 5.14.0-284.11.1.el9_2

Comment 13 ryan.brothers 2023-06-01 17:07:38 UTC
Is there any timeline of when the fix will be released for RHEL 9?

Comment 15 lawrence.gorman 2023-06-22 09:55:49 UTC
I'm still monitoring, but kernel version 5.14.0-284.18.1.el9_2.x86_64 (just released) appears to have fixed the issue for me.

Comment 16 David Pulkowski 2023-06-27 15:13:34 UTC
I also tested the new version:

5.14.0-284.18.1.el9_2.x86_64

I have been monitoring it for about 4+ hours with the new kernel. I also think this fixed the issue. Since on the troublesome version, I would noticed within 4 hours a reboot was necessary. Will report back if any issue arises throughout the day.

Comment 18 RHEL Program Management 2023-09-23 11:38:42 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 19 RHEL Program Management 2023-09-23 11:39:33 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.


Note You need to log in before you can comment on or make changes to this bug.