Bug 979046 - sssd_be goes to 99% CPU and causes significant login delays when client is under load
sssd_be goes to 99% CPU and causes significant login delays when client is un...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sssd (Show other bugs)
6.5
All Linux
medium Severity medium
: rc
: ---
Assigned To: Jakub Hrozek
Kaushik Banerjee
:
Depends On: 979045
Blocks: 979047
  Show dependency treegraph
 
Reported: 2013-06-27 09:35 EDT by Dmitri Pal
Modified: 2013-11-21 17:20 EST (History)
7 users (show)

See Also:
Fixed In Version: sssd-1.9.2-98.el6
Doc Type: Bug Fix
Doc Text:
Cause: The IPA provider attempted to store the original value of member attribute to the cache during HBAC evaluation and the values were processed by memberof plugin which required a lot of processing time in environment with very large hostgroups. Consequence: sssd_be process gone to 99% CPU for a while and users experienced significant login delays. Fix: Member attribute is no longer stored. Result: HBAC evaluation is much faster.
Story Points: ---
Clone Of: 979045
: 979047 (view as bug list)
Environment:
Last Closed: 2013-11-21 17:20:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 433743 None None None Never

  None (edit)
Description Dmitri Pal 2013-06-27 09:35:25 EDT
+++ This bug was initially created as a clone of Bug #979045 +++

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/sssd/ticket/1806

I have a system with a reproducible problem with sssd when under load.

The sssd.log shows a reoccurring number of messages stating:  A service PING timed out on [domain.com]. Attempt [0]

Followed by: Killing service [expertcity.com], not responding to pings!

Following a restart of sssd, the sssd_be process spikes at 99% cpu, and a delay of 30-60secs can be experienced sshing to the device.  Subsequent logins seem fine until whichever cache is effected needs to be renewed again, which in turn reproduces the long delay.

The system is a VM with 2 cores assigned.  Load can be anywhere from 4-12 to reproduce the issue.
Comment 1 Jakub Hrozek 2013-06-27 10:51:41 EDT
Steps to reproduce:
https://bugzilla.redhat.com/show_bug.cgi?id=979045#c2
Comment 2 Jakub Hrozek 2013-06-27 10:52:16 EDT
Fixed upstream.
Comment 7 Namita Soman 2013-09-30 15:03:42 EDT
Tested using ipa-server-3.0.0-36.el6.x86_64, sssd-1.9.2-127.el6.x86_64, ipa-client-3.0.0-36.el6.x86_64

Added a host group - hostgroup1
Added 2000 hosts
Added these hosts to the hostgroup
Installed ipaclient, and added that host to same hostgroup
Added hbac rule, allowing user (user one) to access hosts in the hostgroup (hostgroup1), and allowing access to a service (sshd).
Disabled hbac rule allow_all 
Ran kdestroy
ssh'd as user (one) from master server to the host where the rhel 6.5 client is installed.

There was no cpu spikes or messages in sssd.log
Comment 8 errata-xmlrpc 2013-11-21 17:20:28 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1680.html

Note You need to log in before you can comment on or make changes to this bug.