Bug 658444

Summary: sssd generates corrupt database when started by puppetd from kickstart %post
Product: [Fedora] Fedora Reporter: Daniel Piddock <dgp-bz>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: jhrozek, sbose, sgallagh, ssorce
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.5.0-1.fc14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-10 21:30:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sssd.log debug_level=9
none
sssd.conf slightly sanitized
none
sssd.log command line restart
none
kickstart none

Description Daniel Piddock 2010-11-30 11:54:24 UTC
Description of problem:
We use puppet to configure workstations. The %post section of our kickstart calls puppetd --test, which pulls in sssd and starts the service. However the service then falls over.

Version-Release number of selected component (if applicable):
1.4.1-1.fc14 i686 and x86_64

How reproducible:
Every kickstart install. I have not been able to successfully reproduce the problem outside kickstart.

Steps to Reproduce:
1. Puppet config:
package { ['sssd', 'sssd-client']:
   ensure => latest,
}
file { '/etc/sssd/sssd.conf':
   mode => 600,
   user => root,
   group => root,
   source => 'puppet:///files/sssd.conf',
   require => Package['sssd'],
}
service { 'sssd':
   hasstatus => true,
   subscribe => File['/etc/sssd/sssd.conf'],
   enable => true,
   ensure => running,
   hasrestart => true,
}
2. Fedora 14 kickstart %post section that calls puppetd --test --waitforcert 10
3. sssd has failed to start properly
  
Actual results:
sssd crashed. Running "sssd -d3 -i" gives:
($date) [sssd] [check_file] (1): lstat for [/var/run/nscd/socket] failed: [2][No such file or directory].
($date) [sssd] [confdb_get_domain_internal] (1): No enumeration for [default]!
($date) [sssd] [server_setup] (3): CONFDB: /var/lib/sss/db/config.ldb
($date) [sssd] [sysdb_domain_init_internal] (0): Failed to initialize DB (68, [Entry @ATTRIBUTES already exists]) for domain default!

Expected results:
sssd to start all happy

Additional info:
Deleting the contents of /var/lib/sss/db/ and starting sssd manually works fine.

Comment 1 Stephen Gallagher 2010-11-30 12:04:00 UTC
Please include your (sanitized) sssd.conf in this bug.

Also, for best ability to track this down, please have your puppetized sssd.conf include

debug_level=9

in both the [sssd] and [domain/default] sections. Then run your kickstart. SSSD should output some verbose information into /var/log/sssd/sssd.log and /var/log/sssd/sssd_default.log

Please attach that output as well.

Comment 2 Daniel Piddock 2010-11-30 13:44:20 UTC
Created attachment 463730 [details]
sssd.log debug_level=9

Comment 3 Daniel Piddock 2010-11-30 13:46:30 UTC
Created attachment 463733 [details]
sssd.conf slightly sanitized

Comment 4 Daniel Piddock 2010-11-30 13:47:10 UTC
No /var/log/sssd/sssd_default.log was created. Only sssd.log was in that directory.

Comment 5 Stephen Gallagher 2010-11-30 15:41:31 UTC
I'm inclined to say that this must be something unique to puppet. I just created a kickstart that drops an sssd.conf in place and then starts the service.

A VM generated from this kickstart started up the SSSD with no problems.

Comment 6 Daniel Piddock 2010-11-30 16:11:49 UTC
Created attachment 463766 [details]
sssd.log command line restart

I'm beginning to wonder if it's a race condition. Puppet starts the service then immediately restarts it - there's a statement saying the service should be running, so it needs to be started. The subscribed file has just been changed so queue a refresh request too.

I'm not sure what puppet is doing between the start and restart calls as attempting to replicate from the command line doesn't exactly work in the same way:
rm -f /var/lib/sss/db/* && /etc/init.d/sssd start; /etc/init.d/sssd restart

The service is started, then killed but fails to start up again even though three OKs were printed. Another start request works successfully. Comparing the logs the command line restart call happens slightly before the one generated by puppet.

Comment 7 Stephen Gallagher 2010-11-30 16:20:57 UTC
Would you mind including your sanitized kickstart file? I'm mostly interested in whether you're performing any actions using the sss_* or ldb* tools, but I'd like to see everything you're doing in %post.

Also, if puppet really is TOO fast in doing start then restart, it could very well be the cause of the problem. We have a certain minimal amount of initialization we do on the first install, and if that's interrupted it could explain this behavior.

Comment 8 Stephen Gallagher 2010-11-30 16:55:06 UTC
Okay, I think we found the problem. It was a race-condition as you suspected. There was a limited window during the first-time startup where it was possible for a restart request to interrupt the cache initialization and leave it in an unusable state.

I have submitted a patch to the upstream development list here:
https://fedorahosted.org/pipermail/sssd-devel/2010-November/005179.html

Comment 9 Daniel Piddock 2010-11-30 17:00:18 UTC
Created attachment 463778 [details]
kickstart

Attaching a sanitized kickstart. %post isn't very exciting.

Comment 10 Fedora Update System 2010-12-23 18:45:30 UTC
sssd-1.5.0-1.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/sssd-1.5.0-1.fc14

Comment 11 Fedora Update System 2010-12-25 00:22:37 UTC
sssd-1.5.0-1.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update sssd'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/sssd-1.5.0-1.fc14

Comment 12 Fedora Update System 2011-01-10 21:29:59 UTC
sssd-1.5.0-1.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.