Bug 1104901
| Summary: | RHEL7 ipactl start fails to start samba | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Scott Poore <spoore> |
| Component: | ipa | Assignee: | Martin Kosek <mkosek> |
| Status: | CLOSED WORKSFORME | QA Contact: | Namita Soman <nsoman> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.0 | CC: | abokovoy, dpal, lkrispen, pspacek, rcritten, spoore, tbordaz |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-09-10 15:05:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The problems seem to start from this place: Jun 4 16:00:58 ipaqavmb ns-slapd: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot contact any KDC for realm 'SPOORE06041229.TEST') Could this be a problem with DNS, for example named not fully running at the time samba starts? Also CCing Alexander. In the logs above named is unable to talk to LDAP and fails to start so it is no wonder everything else is not working too. Jun 4 16:00:40 ipaqavmb named[7074]: LDAP query timed out. Try to adjust "timeout" parameter Jun 4 16:00:40 ipaqavmb named[7074]: update_record (psearch) failed, dn 'idnsname=_ldap._tcp,idnsname=spoore06041229.test,cn=dns,dc=spoore06041229,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: timed out Jun 4 16:00:50 ipaqavmb named[7074]: LDAP query timed out. Try to adjust "timeout" parameter Yes, I'd look at the 389-ds logs to see what was going on immediately after start-up. Would it help to increase startup timeout to make sure this is not caused by a slow VM and slow service start? # echo 'startup_timeout=600' >> /etc/ipa/default.conf # ipactl start Testing now. It'll take a little time to know how it's going. Ok, that didn't seem to help.
Works here:
:: [ 19:11:44 ] :: Enabling IPA on REPLICA1
:: [ PASS ] :: Running 'ssh ipaqavmh.spoore07021651.test 'echo startup_timeout=600 >> /etc/ipa/default.conf'' (Expected 0, got 0)
Starting Directory Service
debugging enabled, suppressing output.
Starting krb5kdc Service
Starting kadmin Service
Starting named Service
Starting ipa_memcached Service
Starting httpd Service
Starting pki-tomcatd Service
Starting smb Service
Starting winbind Service
Starting ipa-otpd Service
ipa: INFO: The ipactl command was successful
:: [ PASS ] :: Running 'ssh ipaqavmh.spoore07021651.test 'ipactl start'' (Expected 0, got 0)
But, later not here:
:: [ 19:22:01 ] :: Enabling IPA on REPLICA1
Job for smb.service failed. See 'systemctl status smb.service' and 'journalctl -xn' for details.
Failed to start smb Service
Shutting down
Starting Directory Service
debugging enabled, suppressing output.
Starting krb5kdc Service
Starting kadmin Service
Starting named Service
Starting ipa_memcached Service
Starting httpd Service
Starting pki-tomcatd Service
Starting smb Service
Aborting ipactl
:: [ FAIL ] :: Running 'ssh ipaqavmh.spoore07021651.test 'ipactl start'' (Expected 0, got 1)
smb.service - Samba SMB Daemon
Loaded: loaded (/usr/lib/systemd/system/smb.service; disabled)
Active: failed (Result: exit-code) since Wed 2014-07-02 19:22:46 EDT; 6s ago
Process: 21584 ExecStart=/usr/sbin/smbd $SMBDOPTIONS (code=exited, status=1/FAILURE)
Main PID: 21584 (code=exited, status=1/FAILURE)
Status: "Starting process..."
CGroup: /system.slice/smb.service
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: failed to bind to server ldapi://%2fvar%2frun%2fslapd-SPOORE07021651-TEST.socket with dn="[Anonymous bind]" Error: Invalid credentials
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: [2014/07/02 19:22:46.222544, 0] ipa_sam.c:4440(pdb_init_ipasam)
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: Failed to get base DN.
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: [2014/07/02 19:22:46.224673, 0] ../source3/passdb/pdb_interface.c:178(make_pdb_method_name)
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: pdb backend ipasam:ldapi://%2fvar%2frun%2fslapd-SPOORE07021651-TEST.socket did not correctly init (error was NT_STATUS_UNSUCCESSFUL)
Jul 02 19:22:46 ipaqavmh.spoore07021651.test systemd[1]: smb.service: main process exited, code=exited, status=1/FAILURE
Jul 02 19:22:46 ipaqavmh.spoore07021651.test systemd[1]: Failed to start Samba SMB Daemon.
Jul 02 19:22:46 ipaqavmh.spoore07021651.test systemd[1]: Unit smb.service entered failed state.
Jul 02 19:22:49 ipaqavmh.spoore07021651.test systemd[1]: Stopped Samba SMB Daemon.
:: [ FAIL ] :: Running 'ssh ipaqavmh.spoore07021651.test 'systemctl status smb.service'' (Expected 0, got 3)
:: [ 19:22:52 ] :: Running Sleep 100000 to debug
I'll attach the logs here in case more info helps.
I checked the logs, but I could not find anything sticking out besides what was already mentioned. Alexander, maybe you'd have better luck? I am puzzled with following samba error message Jul 2 19:22:46 ipaqavmh smbd[21584]: failed to bind to server ldapi://%2fvar%2frun%2fslapd- SPOORE07021651-TEST.socket with dn="[Anonymous bind]" Error: Invalid credentials Given it is anonymous bind. I would expect DNS issue of some sort rather than Invalid credentials. It is not an anonymous bind, ignore what the message says because it simply has no credentials in the debug code to tell (as we do GSSAPI here). The next line explains why credentials are "invalid": Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage So GSSAPI server (389-ds in this case) is not yet ready for performing operations. This is sign on a pretty loaded ns-slapd which is only starting and logging own operations -- it cannot yet accept incoming connection until all plugins are initialized. Ah, ok - thanks for explanation. Should we then enhance ipa-sam to add try-wait-repeat loop to pdb_init_ipasam in this case? Because now it tries just once and then crashes. Alternatively, we would need to update 389-ds-base Service start waiting procedure to not wait just until the port is available but also until it is fully initialized (how to detect?). CCing Ludwig and Thierry for help here. I think we should enhance 389-ds-base part. In Samba case ipa-sam doesn't handle retries; it is done centrally in Samba and we get 10-15 retries before we decide to retire. what do you mean by 389 base part, ds has to do its task before listening on connections, so the clients need to wait until it is available. You could poll by tryng base searches until ds is available (In reply to Ludwig from comment #17) > what do you mean by 389 base part We need some way to be able to tell "now, 389-ds-base instance is up and all other service can start and do their own searches on the DS". Right now, we just check that port 389 is available and when it is, we fire up other services. For PKI for example, we do a more extended check and do a call to /ca/admin/ca/getStatus to check if the PKI is really up or just ports are opened. Should we do something similar for 389? We need something solid and stable here. Are you sure if your analysis is correct ? The error is at: Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage but if you look into the access logs ds is successfully processing operations at this time For Jul,4 the 19:00:58 the server is also processing ops, but in the error log I find; [04/Jun/2014:16:00:58 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecifie d GSS failure. Minor code may provide more information (Cannot contact any KDC for realm 'SPOORE06041229.TEST')) errno 115 (Operation now in progress) [04/Jun/2014:16:00:58 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] authentication mechanism [GSSAPI]: error -2 (Local error) [04/Jun/2014:16:00:58 -0400] NSMMReplicationPlugin - agmt="cn=meToipaqa64vmj.spoore06041229.test" (ipaqa64vmj:389): Replication bind with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failur e: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot contact any KDC for realm 'SPOORE06041229.TEST')) for the bind and for a repl conn it says canot contact any KDC We seem to be running in circles. May this be a named/bind-dyndb-ldap issue after all (CCing Petr Spacek to advise)? In /var/log/messages, we see:
Jul 2 19:22:07 ipaqavmh named[21175]: zone spoore07021651.test/IN: loaded serial 1404342711
Jul 2 19:22:07 ipaqavmh named[21175]: zone localhost/IN: loaded serial 0
Jul 2 19:22:07 ipaqavmh named[21175]: zone 1.0.0.127.in-addr.arpa/IN: loaded serial 0
Jul 2 19:22:07 ipaqavmh named[21175]: zone localhost.localdomain/IN: loaded serial 0
Jul 2 19:22:08 ipaqavmh named[21175]: all zones loaded
Jul 2 19:22:08 ipaqavmh named[21175]: running
...
Jul 2 19:22:18 ipaqavmh named[21175]: LDAP error: Timed out: while modifying(replace) entry 'idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test'
Jul 2 19:22:18 ipaqavmh named[21175]: SOA serial number incrementation failed in zone 'idnsname=96.16. 10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test': failure
Jul 2 19:22:18 ipaqavmh named[21175]: update_zone (psearch) failed for 'idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test'. Zones can be outdated, run `rndc reload`: failure
...
Jul 2 19:22:28 ipaqavmh named[21175]: update_record (psearch) failed, dn 'idnsname=_ldap._tcp, idnsname=spoore07021651.test,cn=dns,dc=spoore07021651,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: timed out
Jul 2 19:22:38 ipaqavmh named[21175]: LDAP query timed out. Try to adjust "timeout" parameter
Jul 2 19:22:38 ipaqavmh named[21175]: update_record (psearch) failed, dn 'idnsname=_kerberos._tcp, idnsname=spoore07021651.test,cn=dns,dc=spoore07021651,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: timed out
Jul 2 19:22:46 ipaqavmh ns-slapd: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot contact any KDC for realm 'SPOORE07021651.TEST')
In krb5kdc.log, I see just this being interesting:
Jul 02 19:22:46 ipaqavmh.spoore07021651.test krb5kdc[21154](info): closing down fd 14
Jul 02 19:22:46 ipaqavmh.spoore07021651.test krb5kdc[21154](info): DISPATCH: repeated (retransmitted?) request from 10.16.98.193, resending previous response
Jul 02 19:22:46 ipaqavmh.spoore07021651.test krb5kdc[21154](info): DISPATCH: repeated (retransmitted?) request from 10.16.98.193, resending previous response
Jul 02 19:22:46 ipaqavmh.spoore07021651.test krb5kdc[21153](info): TGS_REQ (6 etypes {18 17 16 23 25
I don't think that this is bind-dyndb-ldap problem, it simply reports failure during write to LDAP: Jul 2 19:22:18 ipaqavmh named[21175]: LDAP error: Timed out: while modifying(replace) entry 'idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test' Scott, do you have reliable reproducer? I would jump to the machine and investigate it in-situ to make sure that bind-dyndb-ldap behaves correctly. I is a bit strange, the processing of the MOD is extremely delayed: [02/Jul/2014:19:22:07 -0400] conn=9 op=13 RESULT err=0 tag=101 nentries=1 etime=0 [02/Jul/2014:19:22:07 -0400] conn=9 op=14 SRCH base="idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test" scope=0 filter="(objectClass=idnsRecord)" attrs=ALL [02/Jul/2014:19:22:07 -0400] conn=9 op=14 RESULT err=0 tag=101 nentries=1 etime=0 [02/Jul/2014:19:22:45 -0400] conn=9 op=15 MOD dn="idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test" [02/Jul/2014:19:22:45 -0400] conn=9 op=15 RESULT err=0 tag=103 nentries=0 etime=38 csn=53b49446000000030000 but it also looks like there is nothing going on in DS in the time: [02/Jul/2014:19:22:07 -0400] conn=9 op=14 RESULT err=0 tag=101 nentries=1 etime=0 [02/Jul/2014:19:22:22 -0400] conn=10 fd=70 slot=70 SSL connection from 10.16.98.193 to 10.16.98.193 ... [02/Jul/2014:19:22:23 -0400] conn=10 op=15 RESULT err=0 tag=101 nentries=0 etime=0 [02/Jul/2014:19:22:24 -0400] conn=19 fd=79 slot=79 connection from local to /var/run/slapd-SPOORE07021651-TEST.socket [02/Jul/2014:19:22:24 -0400] conn=19 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI [02/Jul/2014:19:22:45 -0400] conn=9 op=15 MOD dn="idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test" a few new conns at 19:22:22-24 and that's it, like the process was stale Any update? I have been unable to reproduce this one with the tests that originally produced it. So, I'm closing this one as worksforme. If it does occur again, we can revisit and try to get the logs then. Thanks for your help. |
Description of problem: I'm seeing an occasional fail of ipactl to start smbd on IPA replica in some trust testing. It's not consistent. Version-Release number of selected component (if applicable): ipa-server-3.3.3-28.el7.x86_64 samba-4.1.1-31.el7.x86_64 How reproducible: Unknown. Steps to Reproduce: 1. Setup IPA Master with trust to AD 2012 server 2. Setup IPA Replica 3. on Replica: ipactl stop 4. perform some ipa commands to setup AD uesrs in IPA groups 5. delete ipa entries 6. on Replica: ipactl start Actual results: :: [ 16:00:13 ] :: Enabling IPA on REPLICA1 Job for smb.service failed. See 'systemctl status smb.service' and 'journalctl -xn' for details. Failed to start smb Service Shutting down Starting Directory Service debugging enabled, suppressing output. Starting krb5kdc Service Starting kadmin Service Starting named Service Starting ipa_memcached Service Starting httpd Service Starting pki-tomcatd Service Starting smb Service Aborting ipactl :: [ FAIL ] :: Running 'ssh ipaqavmb.spoore06041229.test 'ipactl start'' (Expected 0, got 1) I'm able to successfully stop/start after a failure so there is a simple workaround it seems. Expected results: Clean IPA start on replica. Additional info: From /var/log/messages for ipactl start: Jun 4 16:00:32 ipaqavmb systemd: Starting Samba SMB Daemon... Jun 4 16:00:32 ipaqavmb smbd[7451]: [2014/06/04 16:00:32.641627, 0] ../source3/smbd/server.c:1278(main) Jun 4 16:00:32 ipaqavmb smbd[7451]: standard input is not a socket, assuming -D option Jun 4 16:00:32 ipaqavmb systemd: smb.service: Supervising process 7452 which is not our child. We'll most likely not notice when it exits. Jun 4 16:00:40 ipaqavmb named[7074]: LDAP query timed out. Try to adjust "timeout" parameter Jun 4 16:00:40 ipaqavmb named[7074]: update_record (psearch) failed, dn 'idnsname=_ldap._tcp,idnsname=spoore06041229.test,cn=dns,dc=spoore06041229,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: timed out Jun 4 16:00:50 ipaqavmb named[7074]: LDAP query timed out. Try to adjust "timeout" parameter Jun 4 16:00:53 ipaqavmb logger: 2014-06-04 16:00:53 /usr/bin/rhts-test-runner.sh 84757 7920 hearbeat... Jun 4 16:00:58 ipaqavmb ns-slapd: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot contact any KDC for realm 'SPOORE06041229.TEST') Jun 4 16:00:58 ipaqavmb named[7074]: update_record (psearch) failed, dn 'idnsname=ad12srv1,idnsname=adtest.qe,cn=dns,dc=spoore06041229,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: not found Jun 4 16:00:58 ipaqavmb named[7074]: zone spoore06041229.test/IN: sending notifies (serial 1401912020) Jun 4 16:00:58 ipaqavmb smbd[7452]: [2014/06/04 16:00:58.341752, 0] ipa_sam.c:4284(bind_callback) Jun 4 16:00:58 ipaqavmb smbd[7452]: bind_callback: cannot perform interactive SASL bind with GSSAPI. LDAP security error is 49 Jun 4 16:00:58 ipaqavmb smbd[7452]: [2014/06/04 16:00:58.342230, 0] ../source3/lib/smbldap.c:998(smbldap_connect_system) Jun 4 16:00:58 ipaqavmb smbd[7452]: failed to bind to server ldapi://%2fvar%2frun%2fslapd-SPOORE06041229-TEST.socket with dn="[Anonymous bind]" Error: Invalid credentials Jun 4 16:00:58 ipaqavmb smbd[7452]: SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage Jun 4 16:00:58 ipaqavmb smbd[7452]: [2014/06/04 16:00:58.343192, 0] ipa_sam.c:4440(pdb_init_ipasam) Jun 4 16:00:58 ipaqavmb smbd[7452]: Failed to get base DN. Jun 4 16:00:58 ipaqavmb smbd[7452]: [2014/06/04 16:00:58.343318, 0] ../source3/passdb/pdb_interface.c:178(make_pdb_method_name) Jun 4 16:00:58 ipaqavmb smbd[7452]: pdb backend ipasam:ldapi://%2fvar%2frun%2fslapd-SPOORE06041229-TEST.socket did not correctly init (error was NT_STATUS_UNSUCCESSFUL) Jun 4 16:00:58 ipaqavmb systemd: smb.service: main process exited, code=exited, status=1/FAILURE Jun 4 16:00:58 ipaqavmb systemd: Failed to start Samba SMB Daemon. Jun 4 16:00:58 ipaqavmb systemd: Unit smb.service entered failed state. From /var/log/samba/lob.smbd: [2014/06/04 16:00:32, 0] ../source3/smbd/server.c:1198(main) smbd version 4.1.1 started. Copyright Andrew Tridgell and the Samba Team 1992-2013 [2014/06/04 16:00:32.641627, 0] ../source3/smbd/server.c:1278(main) standard input is not a socket, assuming -D option [2014/06/04 16:00:58.341752, 0] ipa_sam.c:4284(bind_callback) bind_callback: cannot perform interactive SASL bind with GSSAPI. LDAP security error is 49 [2014/06/04 16:00:58.342230, 0] ../source3/lib/smbldap.c:998(smbldap_connect_system) failed to bind to server ldapi://%2fvar%2frun%2fslapd-SPOORE06041229-TEST.socket with dn="[Anonymous bind]" Error: Invalid credentials SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage [2014/06/04 16:00:58.342933, 1] ../source3/lib/smbldap.c:1206(get_cached_ldap_connect) Connection to LDAP server failed for the 1 try! [2014/06/04 16:00:58.343090, 1] ipa_sam.c:3646(ipasam_get_base_dn) Failed to get base DN from RootDSE: Timed out [2014/06/04 16:00:58.343192, 0] ipa_sam.c:4440(pdb_init_ipasam) Failed to get base DN. [2014/06/04 16:00:58.343318, 0] ../source3/passdb/pdb_interface.c:178(make_pdb_method_name) pdb backend ipasam:ldapi://%2fvar%2frun%2fslapd-SPOORE06041229-TEST.socket did not correctly init (error was NT_STATUS_UNSUCCESSFUL)