Bug 2189320

Summary: cifsiod kernel has high cpu usage and server is freezing up after which only fix is to force reboot.
Product: Red Hat Enterprise Linux 9 Reporter: akarnafel
Component: cifs-utilsAssignee: Nobody <nobody>
Status: ASSIGNED --- QA Contact: xiaoli feng <xifeng>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: CentOS StreamCC: brentw, bstinson, dmoraes1, dpulkowski, john.horne, jwboyer, lawrence.gorman, ryan.brothers, xzhou
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
image1
none
nmon - top none

Description akarnafel 2023-04-24 19:53:31 UTC
Created attachment 1959632 [details]
image1

Description of problem:
This is a new VM, tried on kernel version 5.14.0.234 and it is working just fine. Kernel 5.14.0.289-299 do not work.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. install keyutils and cifs-utils
2. mount a cifs mount to the server. 
3. Wait for 30-180 minutes before you will start seeing in dmesg for something like "watchdog: BUG: soft looking - CPU## stuck for xxxs! [kworker/0:3.....] 

Actual results:
dmesg shows info like "watchdog: BUG: soft looking - CPU## stuck for xxxs! [kworker/0:3.....] and server is very slow to do anything.

Expected results:
server should not be slow.

Additional info:
I can access the file without any issues until the server is very slow.

Comment 1 Daniel de Morais Carneiro 2023-04-24 20:09:42 UTC
I had the same behavior

│
│  PID        %CPU      Size       Res      Res       Res       Res      Shared   Faults   Faults Command                                                                                                                                          │
│              Used        KB       Set      Text      Data       Lib        KB      Min      Maj                                                                                                                                                  │
│      745     99.7         0         0         0         0         0         0        0        0 kworker/5:2+cifsiod                                                                                                                              │
│     1115      0.5    456588      9452        32     27828         0      7860        0        0 vmtoolsd                                                                                                                                         │
│     5071      0.5      8332      5032       156      4680         0      2476        0        0 nmon16m_x86_64_                                                                                                                                  │
│        1      0.0    169644     13408        44     19740         0      9748        0        0 systemd                                                                                                                                          │
│        2      0.0         0         0         0         0         0         0        0        0 kthreadd

Comment 2 Daniel de Morais Carneiro 2023-04-24 20:12:59 UTC
Created attachment 1959633 [details]
nmon - top

Comment 3 Daniel de Morais Carneiro 2023-04-24 20:25:27 UTC
Additional Information:
Kernel Version : 5.14.0-289.el9.x86_64
RPM : cifs-utils-7.0-1.el9.x86_64
OS Version : CentOS Stream release 9
VMware Tools:	Running, version:12325 (Guest Managed)

Comment 4 xiaoli feng 2023-04-25 02:37:25 UTC
I think this issue is the same as bz2177562. It's a regression issue included from kernel-5.14.0-276.el9.

Comment 5 Daniel de Morais Carneiro 2023-04-25 13:14:09 UTC
I can't read about bz2177562. Can you do the right permissions ? Or tell us what we need to do ?

Comment 6 xiaoli feng 2023-04-28 01:03:23 UTC
(In reply to Daniel de Morais Carneiro from comment #5)
> I can't read about Red Hatbz2177562. Can you do the right permissions ? Or
> tell us what we need to do ?

bz2177562 is ongoing fix. Now there isn't a solution. Let me see if I can set the permission.

Thanks.

Comment 7 lawrence.gorman 2023-05-11 18:10:37 UTC
I also have this exact problem since a kernel upgrade (to 5.14.0-284.11.1.el9_2.x86_64). At the same time cifs_utils was upgraded to 7.0-1.el9.x86_64.

High CPU (~99%) usage for kworker/cifsiod followed by CPU soft lock. Unable to kill the process so only a reboot sorts out the problem. Then, within about 30-60 minutes, it's back again.

Rolling cifs_utils to 6.14-1.el9.x86_64 did not fix the problem.

Booting to previous kernel version (5.14.0-162.23.1.el9_1.x86_64) does seem to fix it.

Comment 8 lawrence.gorman 2023-05-11 18:36:17 UTC
Additionally I can't view https://bugzilla.redhat.com/show_bug.cgi?id=2177562 so am unable to see if there's been any progress with the problem.

Comment 9 David Pulkowski 2023-05-11 18:39:58 UTC
I can confirm the same issue.
I tried reverting back to cifs-utils (6.14-1.el9.x86_64) & that did not fix the problem as well.
I will as a temporary workaround try reverting back to the previous kernel as well (5.14.0-162.23.1.el9_1.x86_64).
There is a cve for that kernel https://nvd.nist.gov/vuln/detail/CVE-2022-3028
https://bugzilla.redhat.com/show_bug.cgi?id=2122228

Comment 10 xiaoli feng 2023-05-15 01:23:54 UTC
Could you try kernel-5.14.0-301.el9? I find this bug bz2177562 is gone on kernel-5.14.0-301.el9.

Thanks.

Comment 11 ryan.brothers 2023-05-15 14:01:15 UTC
I am running into this issue too after upgrading to 5.14.0-284.11.1.el9_2.x86_64.  Will a fix for it be released soon?

Comment 12 David Pulkowski 2023-05-15 17:11:46 UTC
> Could you try kernel-5.14.0-301.el9? I find this bug Red Hatbz2177562 is
> gone on kernel-5.14.0-301.el9.
> 
> Thanks.

This kernel-5.14.0-301.el9 is not available in the rhel-9-for-x86_64-baseos-rpms yet.
The latest kernel available is: 5.14.0-284.11.1.el9_2

Comment 13 ryan.brothers 2023-06-01 17:07:38 UTC
Is there any timeline of when the fix will be released for RHEL 9?

Comment 15 lawrence.gorman 2023-06-22 09:55:49 UTC
I'm still monitoring, but kernel version 5.14.0-284.18.1.el9_2.x86_64 (just released) appears to have fixed the issue for me.

Comment 16 David Pulkowski 2023-06-27 15:13:34 UTC
I also tested the new version:

5.14.0-284.18.1.el9_2.x86_64

I have been monitoring it for about 4+ hours with the new kernel. I also think this fixed the issue. Since on the troublesome version, I would noticed within 4 hours a reboot was necessary. Will report back if any issue arises throughout the day.