This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2035496 - High CPU utilization by the sssd_kcm process Despite RHBA-2021:1666 Being deployed
Summary: High CPU utilization by the sssd_kcm process Despite RHBA-2021:1666 Being dep...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Alejandro López
QA Contact: Jakub Vavra
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-24 09:25 UTC by Brian McCafferty
Modified: 2024-01-20 04:25 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-18 23:53:38 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1876514 1 high CLOSED High CPU utilization by the sssd_kcm process 2024-12-20 19:21:02 UTC
Red Hat Issue Tracker   RHEL-4980 0 None Migrated None 2023-09-21 07:30:01 UTC
Red Hat Issue Tracker RHELPLAN-106571 0 None None None 2021-12-24 09:29:11 UTC
Red Hat Issue Tracker SSSD-4298 0 None None None 2022-02-03 08:27:39 UTC
Red Hat Issue Tracker SSSD-6501 0 None None None 2023-08-14 08:30:29 UTC

Description Brian McCafferty 2021-12-24 09:25:40 UTC
Description of problem:
 High CPU utilization by the sssd_kcm process Despite RHBA-2021:1666 Being deployed

Version-Release number of selected component (if applicable):
RHEL 8.5 with  RHBA-2021:1666 Deployed

How reproducible:
Customer installed errata for RHBZ#1876514 and last month RHBZ#1693379. After two weeks after installing the latest errata, it appears the issue is still present.

Steps to Reproduce:
Further observations by the customer.
- The CPU spike of sssd_kcm immediately stopped after closing Firefox (which had been running for a couple of weeks). There might be a relationship there.
- Customer did not have to stop/restart sssd_kcm to have it return to normal CPU usage.
- Customer was able to get sssd_kcm running at 100% CPU again.  Just "klist" seems to do the trick, listing entries very slowly, while sssd_kcm is 100% busy.
- Once the list has been completed sssd_kcm goes back to normal.
- After running "kdestroy -A", followed by "kinit" the behaviour returned to normal, displaying a quick list with "klist" without any significant CPU load.
- There still seems to be some kind of ticket cache corruption that presents itself over time.

Entries in /var/log/sssd/sssd_kcm.log
~~~
(2021-12-20  5:28:47): [kcm] [kcm_cmd_request_done] (0x0040): KCM operation failed [105]: No buffer space available
(2021-12-20  5:28:47): [kcm] [kcm_reply_error] (0x0040): KCM operation returns failure [105]: No buffer space available

Actual results:
sssd_kcm consumes 100% of CPU

Expected results:
sssd_kcm does not consume 100% of CPU

Additional info:
Full history in case 02891467
Was supposed to be resolved by BZ 1876514 (RHBA-2021:1666) however can still see issues.

Comment 1 Alexey Tikhonov 2021-12-24 13:03:27 UTC
(In reply to Brian McCafferty from comment #0)
> 
> - Customer was able to get sssd_kcm running at 100% CPU again.  Just "klist"
> seems to do the trick, listing entries very slowly, while sssd_kcm is 100%

Please provide `sssd_kcm.log` with debug_level=9 covering this moment.

Comment 4 Alexey Tikhonov 2021-12-27 12:31:36 UTC
```
   *  (2021-12-16  8:03:28): [kcm] [kcm_cmd_send] (0x0400): KCM operation GET_CRED_LIST
...
   *  (2021-12-16  8:03:28): [kcm] [sss_sec_get] (0x2000): Searching for [(|(type=simple)(type=binary))] at [cn=2ce94712-4f77-4c58-939f-6fb4c6ca9b83-...,cn=ccache,cn=...,cn=persistent,cn=kcm] with scope=base
   *  (2021-12-16  8:03:28): [kcm] [secdb_get_cc] (0x2000): Fetched the ccache
   *  (2021-12-16  8:03:28): [kcm] [ccdb_secdb_getbyname_send] (0x2000): Got ccache by name
   *  (2021-12-16  8:03:28): [kcm] [kcm_cmd_done] (0x0040): op receive function failed [105]: No buffer space available
```


It seems cache size exceeds:
```
#define KCM_REPLY_MAX 10*1024*1024
```

What is the size of /var/lib/sss/secrets/secrets.ldb (when issue is observed)?


Could you please provide (as a private attachment) output of 
```
ldbsearch -H /var/lib/sss/secrets/secrets.ldb "(|(type=simple)(type=binary))" -b "cn=2ce94...,cn=ccache,cn=...,cn=persistent,cn=kcm"
```
where -b "..." corresponds to "Searching for ..." in the log before "No buffer space available"?
Plus corresponding sssd_kcm_log.


Merely `ldbsearch -H /var/lib/sss/secrets/secrets.ldb` output also works for me, but it seems it might be huge.

Comment 11 Alexey Tikhonov 2022-02-01 13:46:34 UTC
Thank you for the logs.

I confirm this is an issue: when entire ccache grows over default KCM protocol message limit (10mb) operation 'GET_CRED_LIST' just fails with 'ENOBUFS'

A potential solution could be to check size of ccache before reading and return 'unsupported' in case limit overflow: this should avoid CPU consumption for unused data read and allow libkrb5 to fallback to 'GET_CRED_UUID_LIST' gracefully.

Comment 12 Alexey Tikhonov 2022-02-01 14:12:25 UTC
Another things is the *reason* why ccache grows so huge.

Looking at the comments in the case (a lot of "config: refresh_time = ..." lines in `klist -A -C` out) and based on Pavel's comment, most probably the reason is https://github.com/SSSD/sssd/issues/5775
This particular issue is fixed in sssd-2.6, i.e. in RHEL 8.6
Most probably this patch will help to avoid hitting issue described in comment 11

Comment 16 Pavel Březina 2022-02-03 09:50:03 UTC
Running kdestroy from time to time will help. Or they can switch to different ccache type (eg keyring).

Comment 19 Persona non grata 2022-02-07 09:07:43 UTC
I can confirm this happens on sssd 2.5.2, sssd_kcm goes to the 100% of one CPU and the secrets.ldb file is ~122 MB.
The only workaround I've found so far is stopping sssd-kcm.socket, deleting the secrets.ldb and restarting.

Here's sssd_kcm.log relevant part

(2022-02-07 10:01:27): [kcm] [kcm_cmd_request_done] (0x0040): KCM operation failed [105]: No buffer space available
(2022-02-07 10:01:27): [kcm] [kcm_reply_error] (0x0040): KCM operation returns failure [105]: No buffer space available
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
   *  (2022-02-07 10:01:27): [kcm] [kcm_cmd_request_done] (0x0040): KCM operation failed [105]: No buffer space available
   *  (2022-02-07 10:01:27): [kcm] [kcm_reply_error] (0x0040): KCM operation returns failure [105]: No buffer space available

Comment 20 Gilbert Griffin 2022-03-23 14:57:39 UTC
I can confirm the issue on sssd 2.5.2-2. Debug output:

   *  (2022-03-23 14:54:04): [kcm] [kcm_op_queue_get] (0x1000): No existing queue for this ID
   *  (2022-03-23 14:54:04): [kcm] [kcm_op_queue_send] (0x1000): Queue was empty, running the request immediately
   *  (2022-03-23 14:54:04): [kcm] [kcm_op_get_default_ccache_send] (0x1000): Getting client's default ccache
   *  (2022-03-23 14:54:04): [kcm] [ccdb_secdb_get_default_send] (0x2000): Getting the default ccache
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_map_path] (0x1000): Mapping prefix /kcm/
   *  (2022-03-23 14:54:04): [kcm] [kcm_map_url_to_path] (0x1000): User-specific KCM path is [/kcm/persistent/1000/default]
   *  (2022-03-23 14:54:04): [kcm] [local_db_dn] (0x2000): Local path for [persistent/1000/default] is [cn=default,cn=1000,cn=persistent,cn=kcm]
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_new_req] (0x1000): Local DB path is persistent/1000/default
   *  (2022-03-23 14:54:04): [kcm] [secdb_dfl_url_req] (0x2000): Created request for URL /kcm/persistent/1000/default
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_get] (0x0400): Retrieving a secret from [persistent/1000/default]
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_get] (0x2000): Searching for [(|(type=simple)(type=binary))] at [cn=default,cn=1000,cn=persistent,cn=kcm] with scope=base
   *  (2022-03-23 14:54:04): [kcm] [local_decrypt] (0x2000): Decrypting with masterkey
   *  (2022-03-23 14:54:04): [kcm] [ccdb_secdb_get_default_send] (0x2000): Got the default ccache
   *  (2022-03-23 14:54:04): [kcm] [ccdb_secdb_name_by_uuid_send] (0x2000): Translating UUID to name
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_map_path] (0x1000): Mapping prefix /kcm/
   *  (2022-03-23 14:54:04): [kcm] [kcm_map_url_to_path] (0x1000): User-specific KCM path is [/kcm/persistent/1000/ccache/]
   *  (2022-03-23 14:54:04): [kcm] [local_db_dn] (0x2000): Local path for [persistent/1000/ccache/] is [cn=ccache,cn=1000,cn=persistent,cn=kcm]
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_new_req] (0x1000): Local DB path is persistent/1000/ccache/
   *  (2022-03-23 14:54:04): [kcm] [secdb_container_url_req] (0x2000): Created request for URL /kcm/persistent/1000/ccache/
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_list] (0x0400): Listing keys at [persistent/1000/ccache/]
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_list] (0x2000): Searching for [(|(type=simple)(type=binary))] at [cn=ccache,cn=1000,cn=persistent,cn=kcm] with scope=subtree
   *  (2022-03-23 14:54:04): [kcm] [local_dn_to_path] (0x2000): Secrets path for [cn=592860fe-e861-4f91-8537-e026138f66eb-1000,cn=ccache,cn=1000,cn=persistent,cn=kcm] is [592860fe-e861-4f91-8537-e026138f66eb-1000]
   *  (2022-03-23 14:54:04): [kcm] [sss_sec_list] (0x1000): Returning 1 secrets
   *  (2022-03-23 14:54:04): [kcm] [key_by_uuid] (0x2000): Found key 592860fe-e861-4f91-8537-e026138f66eb-1000
   *  (2022-03-23 14:54:04): [kcm] [ccdb_secdb_name_by_uuid_send] (0x2000): Got ccache by UUID
   *  (2022-03-23 14:54:04): [kcm] [kcm_op_get_default_ccache_reply_step] (0x2000): The default ccache is 1000
   *  (2022-03-23 14:54:04): [kcm] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
   *  (2022-03-23 14:54:04): [kcm] [kcm_send_reply] (0x2000): Sending a reply
   *  (2022-03-23 14:54:04): [kcm] [kcm_output_construct] (0x1000): Sending a reply with 9 bytes of payload
   *  (2022-03-23 14:54:04): [kcm] [queue_removal_cb] (0x0200): Removed queue for 1000
   *  (2022-03-23 14:54:04): [kcm] [kcm_send] (0x2000): All data sent!
   *  (2022-03-23 14:54:04): [kcm] [kcm_input_parse] (0x1000): Received message with length 9
   *  (2022-03-23 14:54:04): [kcm] [kcm_get_opt] (0x2000): The client requested operation 8
   *  (2022-03-23 14:54:04): [kcm] [kcm_cmd_send] (0x0400): KCM operation GET_PRINCIPAL
   *  (2022-03-23 14:54:04): [kcm] [kcm_cmd_send] (0x1000): 5 bytes on KCM input
   *  (2022-03-23 14:54:04): [kcm] [kcm_op_queue_send] (0x0200): Adding request by 1000 to the wait queue
   *  (2022-03-23 14:54:04): [kcm] [kcm_op_queue_get] (0x1000): No existing queue for this ID

Comment 21 sreekanth 2022-04-07 18:43:38 UTC
Hello everyone,

IHAC who faces the same issue (case: 03157166), sees sssd_kcm taking up 1 CPU core almost all the time.
This is causing issues with his host where he has only 2 vCPUs.

I have collected the debug logs, but the logs don't seem to contain anything other than the following.

(2022-04-07  8:40:03): [kcm] [sec_get] (0x0040): Cannot retrieve the secret [2]: No such file or directory
(2022-04-07  8:40:25): [kcm] [sec_get] (0x0040): Cannot retrieve the secret [2]: No such file or directory
(2022-04-07  8:43:10): [kcm] [sec_get] (0x0040): Cannot retrieve the secret [2]: No such file or directory
(2022-04-07  8:43:33): [kcm] [sec_get] (0x0040): Cannot retrieve the secret [2]: No such file or directory
(2022-04-07  8:45:03): [kcm] [sec_get] (0x0040): Cannot retrieve the secret [2]: No such file or directory
(2022-04-07  8:45:25): [kcm] [sec_get] (0x0040): Cannot retrieve the secret [2]: No such file or directory


Any suggestions or steps to work on about the issue?

Regards
Sreekanth

Comment 22 Alexey Tikhonov 2022-04-08 09:24:46 UTC
(In reply to sreekanth from comment #21)
> Hello everyone,
> 
> IHAC who faces the same issue (case: 03157166), sees sssd_kcm taking up 1
> CPU core almost all the time.
> This is causing issues with his host where he has only 2 vCPUs.
> 
> I have collected the debug logs, but the logs don't seem to contain anything
> other than the following.
> 
> (2022-04-07  8:40:03): [kcm] [sec_get] (0x0040): Cannot retrieve the secret
> [2]: No such file or directory
> (2022-04-07  8:40:25): [kcm] [sec_get] (0x0040): Cannot retrieve the secret
> [2]: No such file or directory
> (2022-04-07  8:43:10): [kcm] [sec_get] (0x0040): Cannot retrieve the secret
> [2]: No such file or directory
> (2022-04-07  8:43:33): [kcm] [sec_get] (0x0040): Cannot retrieve the secret
> [2]: No such file or directory
> (2022-04-07  8:45:03): [kcm] [sec_get] (0x0040): Cannot retrieve the secret
> [2]: No such file or directory
> (2022-04-07  8:45:25): [kcm] [sec_get] (0x0040): Cannot retrieve the secret
> [2]: No such file or directory
> 
> 
> Any suggestions or steps to work on about the issue?


What is the SSSD / RHEL version?

Please enable 'debug_level = 9' for [kcm] and provide sssd_kcm.log

Comment 46 rhel_beast 2023-05-16 02:59:14 UTC
rhel 9.1 using sssd-kcm 2.7.3 ... I was able to reproduce it consistently by starting a tiger-vnc using a Desktop Manager (.vnc/xstartup).  I have tried gnome, mate, and xfce4... as soon as you kill the vncserver... the cpu drops back down... but as soon as you start the vncserver using one of these desktop managers... it goes to 100% and stays there.

Comment 47 Alexey Tikhonov 2023-05-30 09:24:19 UTC
Hi,

(In reply to rhel_beast from comment #46)
> rhel 9.1 using sssd-kcm 2.7.3 ... I was able to reproduce it consistently by
> starting a tiger-vnc using a Desktop Manager (.vnc/xstartup).  I have tried
> gnome, mate, and xfce4... as soon as you kill the vncserver... the cpu drops
> back down... but as soon as you start the vncserver using one of these
> desktop managers... it goes to 100% and stays there.

I wonder if you hit sssd_kcm issue described in comment 11 or https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/issues/79

Is there "No buffer space available" in /var/log/sssd/sssd_kcm.log?

What is the size of /var/lib/sss/secrets/secrets.ldb when issue occurs?

Comment 49 RHEL Program Management 2023-09-18 23:48:36 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 50 RHEL Program Management 2023-09-18 23:53:38 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Comment 51 Red Hat Bugzilla 2024-01-20 04:25:15 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.