Bug 975001

Summary: Cannot authenticate using AD credentials
Product: [Fedora] Fedora Reporter: David Woodhouse <dwmw2>
Component: sssdAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: jhrozek, pbrezina, sbose, sgallagh, ssorce
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.10.1-1.fc19 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-09 15:31:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Woodhouse 2013-06-17 10:58:14 UTC
I made a fresh install of Fedora 19 in a VM, to test Active Directory integration.

I was asked by gnome-initial-setup for my domain, username and password, but received the error 'Not authorized to perform this action.' This appears to be a typically unhelpful GNOME-style "haha, I'm not going to tell you what went wrong" error message, bearing little relation to the more useful messages I get if I do things manually...

If I run realmd in a VT and try using g-i-s, I see it complain:
(realmd:2170): DEBUG: rejecting access to method 'Join' on interface 'org.freedesktop.realmd.KerberosMembership' at /org/freedesktop/realmd/Sssd/get_corp_intel_com_1

If I run 'realm -v join ger.corp.intel.com' manually, I get a complaint that the computer's host name is not set correctly (it's localhost.localdomain).

If I set the hostname to something arbitrary, I get an expected failure since I am not authorized to add new machines in AD:
realm: Couldn't join realm: Insufficient permissions to join the domain ger.corp.intel.com

If I enter a *correct* hostname (which g-i-s needed to ask me for), it appears to work. I see it running 'net ads join', 'net ads keytab create', enabling and starting sssd, and running authconfig --update --enablesssd --enablesssdauth --enablemkhomedir.

Now I can run 'realm list' and I see my domain, with login-formats: %U.intel.com and login-policy: allow-realm-logins

I referred to the test case at https://fedoraproject.org/wiki/QA:Testcase_realmd_join_sssd

However, getent passwd 'GER\dwoodhou' or 'dwoodhou' or 'GER.CORP.INTEL.COM\dwoodhou' or 'dwoodhou.intel.com' all fail to return any results. Logging in with any of the above also fails, of course.

The test wiki also suggests running 'klist -k' but the krb5-workstation package doesn't seem to be installed. A bunch of *other* packages got automatically installed but not that. Installing it doesn't seem to help, except that I can now confirm that I do see sane output from it.

Comment 1 David Woodhouse 2013-06-17 11:03:46 UTC
/var/log/sssd/sssd_ger.corp.intel.com.log was helpfully completely empty until I restarted sssd manually with -d9, and now it's slightly more enlightening:

(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sbus_dispatch] (0x4000): dbus conn: 7F3FF10CE150
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sbus_dispatch] (0x4000): Dispatching.
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sbus_message_handler] (0x4000): Received SBUS method [getAccountInfo]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [be_get_account_info] (0x0100): Got request for [4097][1][name=dwoodhou]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_id_op_connect_step] (0x4000): reusing cached connection
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_users_next_base] (0x0400): Searching for users with base [DC=ger,DC=corp,DC=intel,DC=com]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(sAMAccountName=dwoodhou)(objectclass=user))][DC=ger,DC=corp,DC=intel,DC=com].
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectClass]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [sAMAccountName]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [unixUserPassword]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [uidNumber]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [gidNumber]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [gecos]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [unixHomeDirectory]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [loginShell]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userPrincipalName]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [name]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberOf]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectGUID]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectSID]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [primaryGroupID]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [whenChanged]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [uSNChanged]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [accountExpires]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userAccountControl]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 11
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_process_result] (0x2000): Trace: sh[0x7f3ff10dc190], connected[1], ops[0x7f3ff11013d0], ldap[0x7f3ff10cdf30]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_ENTRY]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_entry] (0x4000): OriginalDN: [CN=Woodhouse\, David,OU=Workers,DC=ger,DC=corp,DC=intel,DC=com].
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [objectClass]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [whenChanged]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [memberOf]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [uSNChanged]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [name]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [objectGUID]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [userAccountControl]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [primaryGroupID]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [objectSid]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [accountExpires]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [sAMAccountName]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [userPrincipalName]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [uidNumber]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [gidNumber]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [gecos]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [loginShell]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_parse_range] (0x2000): No sub-attributes for [unixHomeDirectory]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_process_result] (0x2000): Trace: sh[0x7f3ff10dc190], connected[1], ops[0x7f3ff11013d0], ldap[0x7f3ff10cdf30]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_REFERENCE]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_process_result] (0x2000): Trace: sh[0x7f3ff10dc190], connected[1], ops[0x7f3ff11013d0], ldap[0x7f3ff10cdf30]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_generic_ext_done] (0x0400): Search result: Success(0), no errmsg set
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_users_process] (0x0400): Search for users, returned 1 results.
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [ldb] (0x4000): start ldb transaction (nesting: 0)
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_save_user] (0x4000): Save user
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_save_user] (0x1000): Mapping user [dwoodhou] objectSID [S-1-5-21-2052111302-1275210071-1644491937-279532] to unix ID
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_idmap_sid_to_unix] (0x0080): Could not convert objectSID [S-1-5-21-2052111302-1275210071-1644491937-279532] to a UNIX ID
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_save_user] (0x0040): Failed to save user [dwoodhou]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_save_users] (0x0040): Failed to store user 0. Ignoring.
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [ldb] (0x4000): commit ldb transaction (nesting: 0)
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_get_users_process] (0x4000): Saving 1 Users - Done
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_id_op_done] (0x4000): releasing operation connection
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [acctinfo_callback] (0x0100): Request processed. Returned 0,0,Success
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_process_result] (0x2000): Trace: sh[0x7f3ff10dc190], connected[1], ops[(nil)], ldap[0x7f3ff10cdf30]
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_process_result] (0x2000): Trace: ldap_result found nothing!
(Mon Jun 17 12:02:20 2013) [sssd[be[ger.corp.intel.com]]] [sbus_dispatch] (0x4000): dbus conn: 7F3FF10BAD00
(Mon Jun 17 12:02:20 2013) [sssd[be[ger.corp.intel.com]]] [sbus_dispatch] (0x4000): Dispatching.
(Mon Jun 17 12:02:20 2013) [sssd[be[ger.corp.intel.com]]] [sbus_message_handler] (0x4000): Received SBUS method [ping]
(Mon Jun 17 12:02:30 2013) [sssd[be[ger.corp.intel.com]]] [sbus_dispatch] (0x4000): dbus conn: 7F3FF10BAD00
(Mon Jun 17 12:02:30 2013) [sssd[be[ger.corp.intel.com]]] [sbus_dispatch] (0x4000): Dispatching.
(Mon Jun 17 12:02:30 2013) [sssd[be[ger.corp.intel.com]]] [sbus_message_handler] (0x4000): Received SBUS method [ping]

Comment 2 Jakub Hrozek 2013-06-17 11:21:27 UTC
Thanks for the bug report. It seems that SSSD authenticated against AD correctly, looked up the user and then failed to convert his SID to ID.

Sumit, does the SID look like something SSSD should have trouble converting?

------------
(Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_save_user] (0x1000): Mapping user [dwoodhou] objectSID [S-1-5-21-2052111302-1275210071-1644491937-279532] to unix ID
------------

Comment 3 Jakub Hrozek 2013-06-17 11:22:38 UTC
(In reply to David Woodhouse from comment #0)
> I made a fresh install of Fedora 19 in a VM, to test Active Directory
> integration.
> 
> I was asked by gnome-initial-setup for my domain, username and password, but
> received the error 'Not authorized to perform this action.' This appears to
> be a typically unhelpful GNOME-style "haha, I'm not going to tell you what
> went wrong" error message, bearing little relation to the more useful
> messages I get if I do things manually...
> 
> If I run realmd in a VT and try using g-i-s, I see it complain:
> (realmd:2170): DEBUG: rejecting access to method 'Join' on interface
> 'org.freedesktop.realmd.KerberosMembership' at
> /org/freedesktop/realmd/Sssd/get_corp_intel_com_1
> 
> If I run 'realm -v join ger.corp.intel.com' manually, I get a complaint that
> the computer's host name is not set correctly (it's localhost.localdomain).
> 
> If I set the hostname to something arbitrary, I get an expected failure
> since I am not authorized to add new machines in AD:
> realm: Couldn't join realm: Insufficient permissions to join the domain
> ger.corp.intel.com
> 

The above are g-i-s problem. I'd suggest opening a separate bugzilla report for these, I know quite little about g-i-s, sorry.

Comment 4 David Woodhouse 2013-06-17 11:27:23 UTC
(In reply to Jakub Hrozek from comment #2)
> Thanks for the bug report. It seems that SSSD authenticated against AD
> correctly, looked up the user and then failed to convert his SID to ID.
> 
> Sumit, does the SID look like something SSSD should have trouble converting?
> 
> ------------
> (Mon Jun 17 12:02:14 2013) [sssd[be[ger.corp.intel.com]]] [sdap_save_user]
> (0x1000): Mapping user [dwoodhou] objectSID
> [S-1-5-21-2052111302-1275210071-1644491937-279532] to unix ID
> ------------

I tried with some other users, most of which failed similarly. But one worked:
S-1-5-21-2052111302-1275210071-1644491937-198384

Comment 5 Sumit Bose 2013-06-17 12:13:58 UTC
By default SSSD use a range of 200k POSIX ID to map the AD RID into. In your environments RIDs larger than 200k are used, e.g. 279532.

Please try to set ldap_idmap_range_size to a value suitable for your environment, e.g. 400000, delete the cache in /var/lib/sssd/db and restart sssd.

We have plans to automatically allocate a new range if RIDs become too large, but it is not implemented yet.

Comment 6 David Woodhouse 2013-06-17 12:32:05 UTC
Thanks. That gets me a little further. It now admits that I exist, but I cannot authenticate. The log suggests that it's trying to do DNS requests which fail. But tcpdump shows me that it isn't sending any DNS requests.

(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [sbus_message_handler] (0x4000): Received SBUS method [pamHandler]
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_pam_handler] (0x0100): Got request with the following data
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): command: PAM_AUTHENTICATE
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): domain: ger.corp.intel.com
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): user: dwoodhou
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): service: login
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): tty: tty4
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): ruser: 
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): rhost: 
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): authtok type: 1
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): newauthtok type: 0
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): priv: 1
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [pam_print_data] (0x0100): cli_pid: 6792
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [krb5_pam_handler] (0x1000): Wait queue of user [dwoodhou] is empty, running request immediately.
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [ldb] (0x4000): Added timed event "ltdb_callback": 0x7ffbccc05d90

(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [ldb] (0x4000): Added timed event "ltdb_timeout": 0x7ffbccc41ce0

(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [ldb] (0x4000): Running timer event 0x7ffbccc05d90 "ltdb_callback"

(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [ldb] (0x4000): Destroying timer event 0x7ffbccc41ce0 "ltdb_timeout"

(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [ldb] (0x4000): Ending timer event 0x7ffbccc05d90 "ltdb_callback"

(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [krb5_auth_send] (0x0100): No ccache file for user [dwoodhou] found.
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [krb5_auth_send] (0x4000): Ccache_file is [not set] and is not active and TGT is not valid.
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 5 seconds
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5)
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_process] (0x0040): Failed to find a server after 10 attempts
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_resolve_server_done] (0x1000): Server resolution failed: 5
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_mark_offline] (0x2000): Going offline!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [krb5_auth_prepare_ccache_file] (0x4000): Recreating  ccache file.
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [krb5_auth_resolve_done] (0x4000): Preparing for offline operation.
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [child_handler_setup] (0x2000): (Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [become_user] (0x0200): Trying to become user [217979532][217700513].
Setting up signal handler up for pid [7201]
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [child_handler_setup] (0x2000): Signal handler set up for pid [7201]
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [write_pipe_handler] (0x0400): All data has been sent!
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [child_sig_handler] (0x1000): Waiting for child [7201].
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [child_sig_handler] (0x0100): child [7201] finished successfully.
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [sss_child_handler] (0x2000): waitpid failed [10]: No child processes
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [read_pipe_handler] (0x0400): EOF received, client finished
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [parse_krb5_child_response] (0x1000): child response [0][3][42].
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [cc_dir_cache_for_princ] (0x0400): No principal for david.woodhouse in DIR:/run/user/217979532/krb5cc
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [krb5_auth_done] (0x0020): No ccache for david.woodhouse in DIR:/run/user/217979532/krb5cc?
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [check_wait_queue] (0x1000): Wait queue for user [dwoodhou] is empty.
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_pam_handler_callback] (0x0100): Backend returned: (0, 4, <NULL>) [Success]
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_pam_handler_callback] (0x0100): Sending result [4][ger.corp.intel.com]
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [be_pam_handler_callback] (0x0100): Sent result [4][ger.corp.intel.com]
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [remove_krb5_info_files] (0x0200): Could not remove [/var/lib/sss/pubconf/kdcinfo.GER.CORP.INTEL.COM], [2][No such file or directory]
(Mon Jun 17 13:28:14 2013) [sssd[be[ger.corp.intel.com]]] [remove_krb5_info_files] (0x0200): Could not remove [/var/lib/sss/pubconf/kpasswdinfo.GER.CORP.INTEL.COM], [2][No such file or directory]

Comment 7 Jakub Hrozek 2013-06-17 12:42:20 UTC
The logs says the SSSD is attempting to do SRV discovery, but failing. Is the AD server set as the DNS server in resolv.conf? How does the domain section of the sssd.conf look like?

Comment 8 David Woodhouse 2013-06-17 12:50:58 UTC
(In reply to Jakub Hrozek from comment #7)
> The logs says the SSSD is attempting to do SRV discovery, but failing.

Indeed, As I said, it's failing even to send the *request* to the DNS server.

> Is the AD server set as the DNS server in resolv.conf? 

No, this is a VM, and the DNS server is actually dnsmasq on the *host*.

But DNS does work correctly as expected:

[root@dwoodhou-mobl3 sssd]# host -t srv _kerberos._tcp.ger.corp.intel.com
;; Truncated, retrying in TCP mode.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 drsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 jrsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 ulsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 irsger203.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 dgsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 bgsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 musger202.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 100 100 88 idpjsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 hasger202.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 irsger202.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 105 100 88 f28sger202.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 10 100 88 fmsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 hasger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 100 100 88 f14sger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 lzsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 pgsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 100 100 88 idpjsger202.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 10 100 88 rrsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 100 100 88 f28sger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 gksger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 lcsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 musger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 105 100 88 f24sger202.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 hasger203.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 100 100 88 f24sger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 cesger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 20 100 88 irsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 10 100 88 orsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 ncsger201.ger.corp.intel.com.
_kerberos._tcp.ger.corp.intel.com has SRV record 15 100 88 dksger201.ger.corp.intel.com.



> How does the domain section of the sssd.conf look like?

[domain/ger.corp.intel.com]
ldap_idmap_range_size=500000
ad_domain = ger.corp.intel.com
krb5_realm = GER.CORP.INTEL.COM
realmd_tags = manages-system
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
ldap_id_mapping = True
use_fully_qualified_names = False
fallback_homedir = /home/%d/%u
access_provider = ad

Comment 9 Jakub Hrozek 2013-06-17 12:58:38 UTC
Can you check if ad_enable_dns_sites=False makes any difference in your case?

Also you can remove the krb5_realm stanza from the config file, the AD provider automatically uses the uppercase version of "ad_domain" as the "krb5_realm".

Comment 10 David Woodhouse 2013-06-17 13:17:26 UTC
After I restart sssd, although I hadn't made any changes since adding the ldap_idmap_range_size setting, this seems to work for local 'su dwoodhou', although ssh fails:

[dwmw2@i7 ~]$ SSH_AUTH_SOCK= ssh dwoodhou.122.27 -p 2222 whoami
dwoodhou.122.27's password: 
X11 forwarding request failed
debug3: PAM session not opened, exiting


and on the server side:

debug3: ssh_selinux_setup_variables: setting execution context
debug1: PAM: establishing credentials
debug3: PAM: opening session
PAM: pam_open_session(): Cannot make/remove an entry for the specified session
User child is on pid 9669
debug1: PAM: establishing credentials
debug3: ssh_selinux_setup_variables: setting execution context
debug1: permanently_set_uid: 217979532/217700513
ssh_selinux_copy_context: setcon failed with Permission denied

Comment 11 David Woodhouse 2013-06-17 13:26:36 UTC
There also seems to be an issue with local caching. Running 'id dwoodhou' takes up to 7 minutes the first time it's run after sssd is restarted.

The presumably translates to taking 7 minutes to log in to the system after a reboot, which is doing to be somewhat suboptimal.

Comment 12 Jakub Hrozek 2013-06-17 13:30:32 UTC
(In reply to David Woodhouse from comment #10)
> After I restart sssd, although I hadn't made any changes since adding the
> ldap_idmap_range_size setting, this seems to work for local 'su dwoodhou',
> although ssh fails:
> 
> [dwmw2@i7 ~]$ SSH_AUTH_SOCK= ssh dwoodhou.122.27 -p 2222 whoami
> dwoodhou.122.27's password: 
> X11 forwarding request failed
> debug3: PAM session not opened, exiting
> 
> 

That's quite strange. Sumit suggested on IRC that c-ares (the library we use to perform DNS resolution) might have problems switching to TCP. I must say we haven't tested replies as large as yours.

> and on the server side:
> 
> debug3: ssh_selinux_setup_variables: setting execution context
> debug1: PAM: establishing credentials
> debug3: PAM: opening session
> PAM: pam_open_session(): Cannot make/remove an entry for the specified
> session
> User child is on pid 9669
> debug1: PAM: establishing credentials
> debug3: ssh_selinux_setup_variables: setting execution context
> debug1: permanently_set_uid: 217979532/217700513
> ssh_selinux_copy_context: setcon failed with Permission denied


I suspect that is not actually related to SSSD and would at least temporarily go away with "setenforce 0" ? (A bug report to make that functionality work with SELinux would certainly still be needed).

Comment 13 Jakub Hrozek 2013-06-17 13:35:30 UTC
(In reply to David Woodhouse from comment #11)
> There also seems to be an issue with local caching. Running 'id dwoodhou'
> takes up to 7 minutes the first time it's run after sssd is restarted.
> 

Is dwoodhou a member of any very large group? Still, 7 minutes is extremely long, there must be some bug. Can you also raise debug_level in the [nss] section and attach both /var/log/sssd/sssd_nss.log and /var/log/sssd/sssd_$domain.log ?

(side-note: it appears we might have broken large group resolution with one of the latest patches. We are tracking the problem right now, the logs should have more info.)

> The presumably translates to taking 7 minutes to log in to the system after
> a reboot, which is doing to be somewhat suboptimal.

Not necessarily, "id" also runs "getgrgid $group" for every group the user is a member of. "id -G dwoodhou" would give you a better estimate on how log initgroups would take.

Comment 14 David Woodhouse 2013-06-17 15:13:18 UTC
Yeah, I'm guessing the 'domain users' group is fairly large.

Strangely, removing the leading '-' from the 'session optional pam_systemd.so' line in /etc/pam.d/password-auth and system-auth makes ssh and gdm authentication work. That doesn't make much sense to me.

Logs sent by email.

Comment 15 Jakub Hrozek 2013-06-17 17:54:21 UTC
(In reply to David Woodhouse from comment #14)
> Yeah, I'm guessing the 'domain users' group is fairly large.
> 
> Strangely, removing the leading '-' from the 'session optional
> pam_systemd.so' line in /etc/pam.d/password-auth and system-auth makes ssh
> and gdm authentication work. That doesn't make much sense to me.
>

I suspect that this might be caused by pam_systemd.so tinkering with /run/users/$UID where the ccache is stored. But it's just a hunch.
 
> Logs sent by email.

Thank you, I am travelling currently so I won't have the time to inspect them until tomorrow. I'll forward the logs to other team members in the meantime.

Comment 16 David Woodhouse 2013-06-18 09:15:46 UTC
Note that I didn't remove the pam_systemd.so line; I only removed the leading - sign from it.

pam.conf(5) says:
       If the type value from the list above is prepended with a - character the PAM library will not log to the system log if it is not
       possible to load the module because it is missing in the system. This can be useful especially for modules which are not always
       installed on the system and are not required for correct authentication and authorization of the login session.

I have absolutely no idea why this might change the behaviour when the module *is* present. Perhaps a PAM bug?

Comment 17 Jakub Hrozek 2013-06-18 09:46:50 UTC
David, can you try if setting:

dyndns_update = False

in the [domain] section improves performance for you? 

I see quite a few attempts to update the DNS server with the client's address in the logs, which fail and cause reconnection attempts. The typical setup for the AD provider would be to point the machine directly to the AD server even for DNS. Maybe this is something we should point out more explicitly in the man page.

Comment 18 Jakub Hrozek 2013-06-18 09:47:53 UTC
(In reply to David Woodhouse from comment #16)
> Note that I didn't remove the pam_systemd.so line; I only removed the
> leading - sign from it.
> 
> pam.conf(5) says:
>        If the type value from the list above is prepended with a - character
> the PAM library will not log to the system log if it is not
>        possible to load the module because it is missing in the system. This
> can be useful especially for modules which are not always
>        installed on the system and are not required for correct
> authentication and authorization of the login session.
> 
> I have absolutely no idea why this might change the behaviour when the
> module *is* present. Perhaps a PAM bug?

Yes, sounds like a bug to me, at least with my limited PAM expertise.

Comment 19 David Woodhouse 2013-06-18 11:23:14 UTC
Added dyndns_update=False and today it isn't allowing me to authenticate at all.

Logs sent in email again.

Comment 20 David Woodhouse 2013-06-18 11:28:54 UTC
I cannot reliably reproduce the working state but *sometimes* it does attempt to let me log in. I managed a GNOME login at least as far as its stupid "something has gone wrong!" screen that doesn't tell me *what* went wrong. In fact, I got that same screen yesterday but pressing the 'log out' button resulted in a working GNOME session so I'd forgotten. I don't know if that's related to SSSD authentication or not. Perhaps just because of the time it takes?

Watching with tcpdump while it was *trying* to start my GNOME session, I noticed it starting *logs* of separate LDAP sessions, authenticating each one with Kerberos (with a new exchange to a Kerberos server instead of an existing ticket). So I suspect that could be made a lot faster...

Comment 21 Stephen Gallagher 2013-06-18 11:39:12 UTC
(In reply to David Woodhouse from comment #20)
> Watching with tcpdump while it was *trying* to start my GNOME session, I
> noticed it starting *logs* of separate LDAP sessions, authenticating each
> one with Kerberos (with a new exchange to a Kerberos server instead of an
> existing ticket). So I suspect that could be made a lot faster...

SSSD only maintains a single LDAP session with the same ticket. If you're seeing multiple attempts, it most likely means one of two things:

1) The LDAP provider is crashing/restarting and therefore needs to reconnect. This is the most likely suspect to me.

2) Some part of the communication is failing and not properly setting the SSSD into "offline" mode for two minutes, so subsequent requests are attempting to reconnect to the LDAP server rapidly.


Can you confirm that you have 'ldap_referrals = False' in your sssd.conf [domain/ger.corp.intel.com] section?

Comment 22 David Woodhouse 2013-06-18 11:57:39 UTC
I don't have that, no. Should I try adding it and see what happens? Should 'realm join' have included it?

[sssd]
domains = ger.corp.intel.com
config_file_version = 2
services = nss, pam
debug_level=0x1310
debug_timestamps=True

[nss]
default_shell = /bin/bash
debug_level=0x1310
debug_timestamps=True

[domain/ger.corp.intel.com]
dyndns_update = False
debug_level=0x1310
debug_timestamps=True
ldap_idmap_range_size=500000
ad_domain = ger.corp.intel.com
krb5_realm = GER.CORP.INTEL.COM
realmd_tags = manages-system
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
ldap_id_mapping = True
use_fully_qualified_names = False
fallback_homedir = /home/%d/%u
access_provider = ad

Comment 23 Jakub Hrozek 2013-06-18 12:23:07 UTC
(In reply to David Woodhouse from comment #22)
> I don't have that, no. Should I try adding it and see what happens? Should
> 'realm join' have included it?

The AD provider defaults to ignoring referrals already, so you're good here.

Comment 24 Stephen Gallagher 2013-06-18 12:35:17 UTC
(In reply to Jakub Hrozek from comment #23)
> (In reply to David Woodhouse from comment #22)
> > I don't have that, no. Should I try adding it and see what happens? Should
> > 'realm join' have included it?
> 
> The AD provider defaults to ignoring referrals already, so you're good here.

Sorry, I missed that you were using the AD provider.

Can you check /var/log/sssd/sssd.log and see if there are notes in there about SSSD restarting the providers? That would tell us if a crash is happening.

Comment 25 David Woodhouse 2013-06-18 12:36:56 UTC
(In reply to Stephen Gallagher from comment #21)
> 1) The LDAP provider is crashing/restarting and therefore needs to
> reconnect. This is the most likely suspect to me.

Hm, now that I have a GNOME session and try 'ssh dwoodhou@localhost' from it, I do get an abrt notification. It says that it reported an error in sssd-ipa but it strangely doesn't tell me where to find the information.

There are some crashes in /var/tmp/abrt but those are from yesterday. One of them looks like this:

Core was generated by `/usr/libexec/sssd/sssd_pac --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fa818be7646 in pac_lookup_sids_done ()
#1  0x00007fa818be3707 in pac_lookup_sids_next_done ()
#2  0x00007fa817cf03f4 in tevent_common_loop_immediate (ev=ev@entry=0x7fa819fb16f0) at ../tevent_immediate.c:135
#3  0x00007fa817cf49c7 in epoll_event_loop_once (ev=0x7fa819fb16f0, location=<optimized out>) at ../tevent_epoll.c:912
#4  0x00007fa817cf30b7 in std_event_loop_once (ev=0x7fa819fb16f0, location=0x7fa817f59e77 "src/util/server.c:602")
    at ../tevent_standard.c:112
#5  0x00007fa817cefbcd in _tevent_loop_once (ev=ev@entry=0x7fa819fb16f0, location=location@entry=0x7fa817f59e77 "src/util/server.c:602")
    at ../tevent.c:530
#6  0x00007fa817cefd6b in tevent_common_loop_wait (ev=0x7fa819fb16f0, location=0x7fa817f59e77 "src/util/server.c:602") at ../tevent.c:634
#7  0x00007fa817cf3057 in std_event_loop_wait (ev=0x7fa819fb16f0, location=0x7fa817f59e77 "src/util/server.c:602")
    at ../tevent_standard.c:138
#8  0x00007fa817f42853 in server_loop () from /usr/lib64/sssd/libsss_util.so
#9  0x00007fa818be2242 in main ()

And another like this:

Program terminated with signal 11, Segmentation fault.
#0  _tevent_req_data (req=0x0) at ../tevent_req.c:274
274		return req->data;
Missing separate debuginfos, use: debuginfo-install sssd-common-1.10.0-11.fc19.beta2.x86_64
(gdb) bt
#0  _tevent_req_data (req=0x0) at ../tevent_req.c:274
#1  0x00007f3f17cfbe7b in nsupdate_child_handler ()
#2  0x00007f3f13d2d738 in child_invoke_callback () from /usr/lib64/sssd/libsss_child.so
#3  0x00007f3f172263f4 in tevent_common_loop_immediate (ev=ev@entry=0x7f3f18823550) at ../tevent_immediate.c:135
#4  0x00007f3f1722a9c7 in epoll_event_loop_once (ev=0x7f3f18823550, location=<optimized out>) at ../tevent_epoll.c:912
#5  0x00007f3f172290b7 in std_event_loop_once (ev=0x7f3f18823550, location=0x7f3f1748fe77 "src/util/server.c:602")
    at ../tevent_standard.c:112
#6  0x00007f3f17225bcd in _tevent_loop_once (ev=ev@entry=0x7f3f18823550, location=location@entry=0x7f3f1748fe77 "src/util/server.c:602")
    at ../tevent.c:530
#7  0x00007f3f17225d6b in tevent_common_loop_wait (ev=0x7f3f18823550, location=0x7f3f1748fe77 "src/util/server.c:602") at ../tevent.c:634
#8  0x00007f3f17229057 in std_event_loop_wait (ev=0x7f3f18823550, location=0x7f3f1748fe77 "src/util/server.c:602")
    at ../tevent_standard.c:138
#9  0x00007f3f17478853 in server_loop () from /usr/lib64/sssd/libsss_util.so
#10 0x00007f3f17ce31cb in main ()


> Can you confirm that you have 'ldap_referrals = False' in your sssd.conf
> [domain/ger.corp.intel.com] section?


Further perusing of the logs suggests that this defaults to false even when not explicitly set (as it wasn't).

I'm guessing this spurious offline status explains why I don't seem to have a valid Kerberos TGT. However, there is no /usr/bin/ntlm_auth so even automatic *NTLM* authentication isn't working (as it would have been if I'd been using winbind).

I was assuming that SSSD is the right way to do things in Fedora 19, and that we shouldn't be using samba-winbind? Am I trying this too soon, before it's really ready for active use?

Comment 26 Jakub Hrozek 2013-06-18 12:38:40 UTC
I suspect that the problem you are having logging in would be caused by the large list of servers returned with the SRV discovery and by SSSD not handling them well.

Can you (just for the test), add an "ad_server" directive pointing to a host name of one of your AD servers?

Comment 27 Jakub Hrozek 2013-06-18 12:41:22 UTC
> I was assuming that SSSD is the right way to do things in Fedora 19, and
> that we shouldn't be using samba-winbind? Am I trying this too soon, before
> it's really ready for active use?

No, I'm sorry about the trouble you're having. Both crashes you're seeing are in new functionality added in Fedora 19. Also typically we tested the AD provider with DNS pointing to the AD server, so we didn't see the dyndns crash.

Can you get us core files so we can fix these?

Comment 28 David Woodhouse 2013-06-18 12:49:33 UTC
Is abrt not uploading them somewhere you can see them? It *told* me it had; the lack of a URL at which I could see my own bug report was, I assumed, just the normal GNOME "sod you, we don't want to give you any information at all" mentality.

I can send the core dumps manually (they're lying around in /var/tmp/abrt) but only if you promise to promote an attitude of violence towards whoever's responsible for abrt being broken... :)

Comment 29 Jakub Hrozek 2013-06-18 12:52:12 UTC
(In reply to David Woodhouse from comment #28)
> Is abrt not uploading them somewhere you can see them? It *told* me it had;
> the lack of a URL at which I could see my own bug report was, I assumed,
> just the normal GNOME "sod you, we don't want to give you any information at
> all" mentality.
> 
> I can send the core dumps manually (they're lying around in /var/tmp/abrt)
> but only if you promise to promote an attitude of violence towards whoever's
> responsible for abrt being broken... :)

Yes, please :)

Comment 30 Stephen Gallagher 2013-06-18 12:56:48 UTC
(In reply to David Woodhouse from comment #28)
> Is abrt not uploading them somewhere you can see them? It *told* me it had;
> the lack of a URL at which I could see my own bug report was, I assumed,
> just the normal GNOME "sod you, we don't want to give you any information at
> all" mentality.
> 


ABRT isn't written by the GNOME folks, and if it told you it submitted a bug without providing you a URL, that's a bug. I've filed literally dozens of bugs through ABRT, and they all have a link viewable in the ABRT GUI (usually listed as just "Bugzilla" in the right pane on Fedora 19).

If you don't see "Reported yes" and the Bugzilla link when you click on a bug in the ABRT GUI, then please file a BZ directly against ABRT. They'll want to know about that.

Comment 31 Jakub Hrozek 2013-06-18 13:09:32 UTC
According to the logs send via e-mail (obfuscated version) this is the main problem:

------------------
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectSID=S-1-5-21-1111111111-1111111111-1111111111-123456)(objectclass=user))][DC=foo,DC=bar,DC=baz].
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectClass]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [sAMAccountName]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [unixUserPassword]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [uidNumber]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [gidNumber]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [gecos]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [unixHomeDirectory]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [loginShell]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userPrincipalName]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [name]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberOf]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectGUID]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectSID]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [primaryGroupID]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [whenChanged]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [uSNChanged]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [accountExpires]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userAccountControl]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 16
[sssd[be[DOMNAME]]] [sdap_process_result] (0x2000): Trace: sh[0x7ffa8fec14e0], connected[1], ops[0x7ffa8fea3ce0], ldap[0x7ffa8fe42480]
[sssd[be[DOMNAME]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
[sssd[be[DOMNAME]]] [sdap_get_generic_ext_done] (0x0400): Search result: Referral(10), 0000202B: RefErr: DSID-03100742, data 0, 1 access points
        ref 1: 'foo.bar.baz'

(Tue Jun 18 12:21:45 2013) [sssd[be[DOMNAME]]] [sdap_get_generic_ext_done] (0x0040): Unexpected result from ldap: Referral(10), 0000202B: RefErr: DSID-03100742, data 0, 1 access points
        ref 1: 'foo.bar.baz'
------------------

At first glance, this looks to me like we attempted an LDAP search where we should have searched using Global Catalog. I'll try to reproduce locally.

Comment 32 David Woodhouse 2013-06-18 13:23:43 UTC
Note that the repeated ldap searches aren't purely caused by sssd_pac dying and respawning. There are *lots* of these LDAP reconnections, *then* sssd_pac dies and I am finally logged in.

Program received signal SIGSEGV, Segmentation fault.
0x00007f938cc72646 in pac_lookup_sids_done (req=0x0) at src/responder/pac/pacsrv_cmd.c:309
309	            id = ldb_msg_find_attr_as_uint64(msg->msgs[0],
(gdb) p msg
$1 = (struct ldb_result *) 0x7f938e2d1e10
(gdb) p msg->msgs
$2 = (struct ldb_message **) 0x0
(gdb) bt
#0  0x00007f938cc72646 in pac_lookup_sids_done (req=0x0) at src/responder/pac/pacsrv_cmd.c:309
#1  0x00007f938cc6e707 in pac_lookup_sids_next_done (subreq=0x0) at src/responder/pac/pacsrv_cmd.c:1070
#2  0x00007f938cc827fc in sss_dp_internal_get_done (pending=<optimized out>, ptr=<optimized out>) at src/responder/common/responder_dp.c:779
#3  0x00007f938b4f8512 in complete_pending_call_and_unlock (connection=connection@entry=0x7f938e2b8000, pending=0x7f938e2b6890, 
    message=message@entry=0x7f938e2bada0) at dbus-connection.c:2314
#4  0x00007f938b4fb741 in dbus_connection_dispatch (connection=connection@entry=0x7f938e2b8000) at dbus-connection.c:4580
#5  0x00007f938bfc4811 in sbus_dispatch (ev=0x7f938e2ac7f0, te=<optimized out>, tv=..., data=<optimized out>)
    at src/sbus/sssd_dbus_connection.c:104
#6  0x00007f938bd7e9df in tevent_common_loop_timer_delay (ev=ev@entry=0x7f938e2ac7f0) at ../tevent_timed.c:341
#7  0x00007f938bd7f9d3 in epoll_event_loop_once (ev=0x7f938e2ac7f0, location=<optimized out>) at ../tevent_epoll.c:916
#8  0x00007f938bd7e0b7 in std_event_loop_once (ev=0x7f938e2ac7f0, location=0x7f938bfe4e77 "src/util/server.c:602")
    at ../tevent_standard.c:112
#9  0x00007f938bd7abcd in _tevent_loop_once (ev=ev@entry=0x7f938e2ac7f0, location=location@entry=0x7f938bfe4e77 "src/util/server.c:602")
    at ../tevent.c:530
#10 0x00007f938bd7ad6b in tevent_common_loop_wait (ev=0x7f938e2ac7f0, location=0x7f938bfe4e77 "src/util/server.c:602") at ../tevent.c:634
#11 0x00007f938bd7e057 in std_event_loop_wait (ev=0x7f938e2ac7f0, location=0x7f938bfe4e77 "src/util/server.c:602")
    at ../tevent_standard.c:138
#12 0x00007f938bfcd853 in server_loop (main_ctx=0x7f938e2adc40) at src/util/server.c:602
#13 0x00007f938cc6d242 in main (argc=4, argv=<optimized out>) at src/responder/pac/pacsrv.c:274

Comment 33 Sumit Bose 2013-06-18 13:35:21 UTC
(In reply to David Woodhouse from comment #32)
> Note that the repeated ldap searches aren't purely caused by sssd_pac dying
> and respawning. There are *lots* of these LDAP reconnections, *then*
> sssd_pac dies and I am finally logged in.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007f938cc72646 in pac_lookup_sids_done (req=0x0) at
> src/responder/pac/pacsrv_cmd.c:309
> 309	            id = ldb_msg_find_attr_as_uint64(msg->msgs[0],
> (gdb) p msg
> $1 = (struct ldb_result *) 0x7f938e2d1e10
> (gdb) p msg->msgs
> $2 = (struct ldb_message **) 0x0
> (gdb) bt

Thank you for this information. I'll prepare a fix.

Comment 34 David Woodhouse 2013-06-18 14:01:45 UTC
FWIW it said 'reported: yes' without a bugzilla link. I could still press the 'report' button though. The retrace server failed, probably for the same reason that 'debuginfo-install sssd-ipa' failed; I manually fetched the debuginfo package from koji for myself, and it managed to create the backtrace on my local system (as had I, above). And then it asked me for my bugzilla login details.
Now I suspect it *would* report the bug in bugzilla, but I'm not going to bother since it's already here.

Comment 35 Simo Sorce 2013-06-18 14:21:35 UTC
Daid, can you also post your pam configuration and nsswitch.conf files ?

Comment 36 David Woodhouse 2013-06-18 14:26:05 UTC
Those are exactly as a clean install followed by 'realm join' created them, except for the weirdness with having to remove the leading '-' from the systemd session configuration as discussed in comments 14-16.

# egrep -v '^#|^$'  /etc/nsswitch.conf 
passwd:     files sss
shadow:     files sss
group:      files sss
hosts:      files dns
bootparams: nisplus [NOTFOUND=return] files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files sss
netgroup:   files sss
publickey:  nisplus
automount:  files sss
aliases:    files nisplus
# cat /etc/pam.d/system-auth
#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        sufficient    pam_fprintd.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        sufficient    pam_sss.so use_first_pass
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 1000 quiet
account     [default=bad success=ok user_unknown=ignore] pam_sss.so
account     required      pam_permit.so

password    requisite     pam_pwquality.so try_first_pass retry=3 authtok_type=
password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok
password    sufficient    pam_sss.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     optional      pam_systemd.so
session     optional      pam_oddjob_mkhomedir.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so
session     optional      pam_sss.so
# cat /etc/pam.d/password-auth
#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        sufficient    pam_sss.so use_first_pass
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 1000 quiet
account     [default=bad success=ok user_unknown=ignore] pam_sss.so
account     required      pam_permit.so

password    requisite     pam_pwquality.so try_first_pass retry=3 authtok_type=
password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok
password    sufficient    pam_sss.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     optional      pam_systemd.so
session     optional      pam_oddjob_mkhomedir.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so
session     optional      pam_sss.so

Comment 37 Jakub Hrozek 2013-06-19 18:33:27 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/1989

Comment 38 Jakub Hrozek 2013-06-19 18:33:41 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/1992

Comment 39 David Woodhouse 2013-06-21 10:20:36 UTC
(In reply to Jakub Hrozek from comment #9)
> Can you check if ad_enable_dns_sites=False makes any difference in your case?

(Fri Jun 21 11:18:36 2013) [sssd[be[ger.corp.intel.com]]] [dp_copy_options] (0x0020): Failed to retrieve value for option (ad_enable_dns_sites)
(Fri Jun 21 11:18:36 2013) [sssd[be[ger.corp.intel.com]]] [sssm_ad_access_init] (0x0010): Could not initialize access provider options: [Invalid argument]

If I try 'ad_enable_dns_sites = 0' I get this instead:
(Fri Jun 21 11:19:56 2013) [sssd[be[ger.corp.intel.com]]] [confdb_get_bool] (0x0040): Value is not a boolean!
(Fri Jun 21 11:19:56 2013) [sssd[be[ger.corp.intel.com]]] [confdb_get_bool] (0x0020): Failed to read [ad_enable_dns_sites] from [config/domain/ger
.corp.intel.com], error [22] (Invalid argument)
(Fri Jun 21 11:19:56 2013) [sssd[be[ger.corp.intel.com]]] [dp_get_options] (0x0020): Failed to retrieve value for option (ad_enable_dns_sites)
(Fri Jun 21 11:19:56 2013) [sssd[be[ger.corp.intel.com]]] [common_ad_init] (0x0010): Could not parse common options: [Invalid argument]

Comment 40 Jakub Hrozek 2013-06-21 11:05:28 UTC
(In reply to David Woodhouse from comment #39)
> (In reply to Jakub Hrozek from comment #9)
> > Can you check if ad_enable_dns_sites=False makes any difference in your case?
> 
> (Fri Jun 21 11:18:36 2013) [sssd[be[ger.corp.intel.com]]] [dp_copy_options]
> (0x0020): Failed to retrieve value for option (ad_enable_dns_sites)
> (Fri Jun 21 11:18:36 2013) [sssd[be[ger.corp.intel.com]]]
> [sssm_ad_access_init] (0x0010): Could not initialize access provider
> options: [Invalid argument]
> 

Fixed in master. I found this problem independently when working on another fature. I will be releasing another tarball with a couple of fixes later today. I'm not sure if I can get the fix for the referral issue in, that would probably be a scratch build for you to test first.

Comment 41 Fedora Update System 2013-06-28 09:13:43 UTC
sssd-1.10.0-13.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/sssd-1.10.0-13.fc19

Comment 42 Fedora Update System 2013-06-29 15:23:59 UTC
Package sssd-1.10.0-13.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.10.0-13.fc19'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-11899/sssd-1.10.0-13.fc19
then log in and leave karma (feedback).

Comment 43 Jakub Hrozek 2013-07-09 15:31:22 UTC
Fixed in sssd-1.10.0-16 which is in stable updates already.

Comment 44 Fedora Update System 2013-07-18 19:46:59 UTC
sssd-1.10.1-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/sssd-1.10.1-1.fc19

Comment 45 Fedora Update System 2013-07-30 17:51:43 UTC
sssd-1.10.1-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.