RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1104901 - RHEL7 ipactl start fails to start samba
Summary: RHEL7 ipactl start fails to start samba
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipa
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Martin Kosek
QA Contact: Namita Soman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-04 23:56 UTC by Scott Poore
Modified: 2014-09-10 15:05 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-10 15:05:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Scott Poore 2014-06-04 23:56:30 UTC
Description of problem:

I'm seeing an occasional fail of ipactl to start smbd on IPA replica in some trust testing.  It's not consistent.  

Version-Release number of selected component (if applicable):
ipa-server-3.3.3-28.el7.x86_64
samba-4.1.1-31.el7.x86_64


How reproducible:
Unknown.

Steps to Reproduce:
1.  Setup IPA Master with trust to AD 2012 server
2.  Setup IPA Replica
3.  on Replica: ipactl stop
4.  perform some ipa commands to setup AD uesrs in IPA groups
5.  delete ipa entries
6.  on Replica: ipactl start


Actual results:

:: [ 16:00:13 ] ::  Enabling IPA on REPLICA1
Job for smb.service failed. See 'systemctl status smb.service' and 'journalctl -xn' for details.
Failed to start smb Service
Shutting down
Starting Directory Service
    debugging enabled, suppressing output.
Starting krb5kdc Service
Starting kadmin Service
Starting named Service
Starting ipa_memcached Service
Starting httpd Service
Starting pki-tomcatd Service
Starting smb Service
Aborting ipactl
:: [   FAIL   ] :: Running 'ssh ipaqavmb.spoore06041229.test 'ipactl start'' (Expected 0, got 1)

I'm able to successfully stop/start after a failure so there is a simple workaround it seems.

Expected results:

Clean IPA start on replica.  

Additional info:

From /var/log/messages for ipactl start:

Jun  4 16:00:32 ipaqavmb systemd: Starting Samba SMB Daemon...
Jun  4 16:00:32 ipaqavmb smbd[7451]: [2014/06/04 16:00:32.641627,  0] ../source3/smbd/server.c:1278(main)
Jun  4 16:00:32 ipaqavmb smbd[7451]: standard input is not a socket, assuming -D option
Jun  4 16:00:32 ipaqavmb systemd: smb.service: Supervising process 7452 which is not our child. We'll most likely not notice when it exits.
Jun  4 16:00:40 ipaqavmb named[7074]: LDAP query timed out. Try to adjust "timeout" parameter
Jun  4 16:00:40 ipaqavmb named[7074]: update_record (psearch) failed, dn 'idnsname=_ldap._tcp,idnsname=spoore06041229.test,cn=dns,dc=spoore06041229,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: timed out
Jun  4 16:00:50 ipaqavmb named[7074]: LDAP query timed out. Try to adjust "timeout" parameter
Jun  4 16:00:53 ipaqavmb logger: 2014-06-04 16:00:53 /usr/bin/rhts-test-runner.sh 84757 7920 hearbeat...
Jun  4 16:00:58 ipaqavmb ns-slapd: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Cannot contact any KDC for realm 'SPOORE06041229.TEST')
Jun  4 16:00:58 ipaqavmb named[7074]: update_record (psearch) failed, dn 'idnsname=ad12srv1,idnsname=adtest.qe,cn=dns,dc=spoore06041229,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: not found
Jun  4 16:00:58 ipaqavmb named[7074]: zone spoore06041229.test/IN: sending notifies (serial 1401912020)
Jun  4 16:00:58 ipaqavmb smbd[7452]: [2014/06/04 16:00:58.341752,  0] ipa_sam.c:4284(bind_callback)
Jun  4 16:00:58 ipaqavmb smbd[7452]: bind_callback: cannot perform interactive SASL bind with GSSAPI. LDAP security error is 49
Jun  4 16:00:58 ipaqavmb smbd[7452]: [2014/06/04 16:00:58.342230,  0] ../source3/lib/smbldap.c:998(smbldap_connect_system)
Jun  4 16:00:58 ipaqavmb smbd[7452]: failed to bind to server ldapi://%2fvar%2frun%2fslapd-SPOORE06041229-TEST.socket with dn="[Anonymous bind]" Error: Invalid credentials
Jun  4 16:00:58 ipaqavmb smbd[7452]: SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage
Jun  4 16:00:58 ipaqavmb smbd[7452]: [2014/06/04 16:00:58.343192,  0] ipa_sam.c:4440(pdb_init_ipasam)
Jun  4 16:00:58 ipaqavmb smbd[7452]: Failed to get base DN.
Jun  4 16:00:58 ipaqavmb smbd[7452]: [2014/06/04 16:00:58.343318,  0] ../source3/passdb/pdb_interface.c:178(make_pdb_method_name)
Jun  4 16:00:58 ipaqavmb smbd[7452]: pdb backend ipasam:ldapi://%2fvar%2frun%2fslapd-SPOORE06041229-TEST.socket did not correctly init (error was NT_STATUS_UNSUCCESSFUL)
Jun  4 16:00:58 ipaqavmb systemd: smb.service: main process exited, code=exited, status=1/FAILURE
Jun  4 16:00:58 ipaqavmb systemd: Failed to start Samba SMB Daemon.
Jun  4 16:00:58 ipaqavmb systemd: Unit smb.service entered failed state.

From /var/log/samba/lob.smbd:

[2014/06/04 16:00:32,  0] ../source3/smbd/server.c:1198(main)
  smbd version 4.1.1 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2013
[2014/06/04 16:00:32.641627,  0] ../source3/smbd/server.c:1278(main)
  standard input is not a socket, assuming -D option
[2014/06/04 16:00:58.341752,  0] ipa_sam.c:4284(bind_callback)
  bind_callback: cannot perform interactive SASL bind with GSSAPI. LDAP security error is 49
[2014/06/04 16:00:58.342230,  0] ../source3/lib/smbldap.c:998(smbldap_connect_system)
  failed to bind to server ldapi://%2fvar%2frun%2fslapd-SPOORE06041229-TEST.socket with dn="[Anonymous bind]" Error: Invalid credentials
        SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage
[2014/06/04 16:00:58.342933,  1] ../source3/lib/smbldap.c:1206(get_cached_ldap_connect)
  Connection to LDAP server failed for the 1 try!
[2014/06/04 16:00:58.343090,  1] ipa_sam.c:3646(ipasam_get_base_dn)
  Failed to get base DN from RootDSE: Timed out
[2014/06/04 16:00:58.343192,  0] ipa_sam.c:4440(pdb_init_ipasam)
  Failed to get base DN.
[2014/06/04 16:00:58.343318,  0] ../source3/passdb/pdb_interface.c:178(make_pdb_method_name)
  pdb backend ipasam:ldapi://%2fvar%2frun%2fslapd-SPOORE06041229-TEST.socket did not correctly init (error was NT_STATUS_UNSUCCESSFUL)

Comment 2 Martin Kosek 2014-06-05 07:21:24 UTC
The problems seem to start from this place:

Jun  4 16:00:58 ipaqavmb ns-slapd: GSSAPI Error: Unspecified GSS failure.  Minor 
code may provide more information (Cannot contact any KDC for realm 'SPOORE06041229.TEST')

Could this be a problem with DNS, for example named not fully running at the time samba starts? Also CCing Alexander.

Comment 3 Alexander Bokovoy 2014-06-05 07:45:33 UTC
In the logs above named is unable to talk to LDAP and fails to start so it is no wonder everything else is not working too.
Jun  4 16:00:40 ipaqavmb named[7074]: LDAP query timed out. Try to adjust "timeout" parameter
Jun  4 16:00:40 ipaqavmb named[7074]: update_record (psearch) failed, dn 'idnsname=_ldap._tcp,idnsname=spoore06041229.test,cn=dns,dc=spoore06041229,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: timed out
Jun  4 16:00:50 ipaqavmb named[7074]: LDAP query timed out. Try to adjust "timeout" parameter

Comment 4 Rob Crittenden 2014-06-05 12:35:28 UTC
Yes, I'd look at the 389-ds logs to see what was going on immediately after start-up.

Comment 9 Martin Kosek 2014-07-02 10:32:00 UTC
Would it help to increase startup timeout to make sure this is not caused by a slow VM and slow service start?

# echo 'startup_timeout=600' >> /etc/ipa/default.conf
# ipactl start

Comment 10 Scott Poore 2014-07-02 15:01:05 UTC
Testing now.  It'll take a little time to know how it's going.

Comment 11 Scott Poore 2014-07-03 18:16:23 UTC
Ok, that didn't seem to help.  

Works here:
:: [ 19:11:44 ] ::  Enabling IPA on REPLICA1
:: [   PASS   ] :: Running 'ssh ipaqavmh.spoore07021651.test 'echo startup_timeout=600 >> /etc/ipa/default.conf'' (Expected 0, got 0)
Starting Directory Service
    debugging enabled, suppressing output.
Starting krb5kdc Service
Starting kadmin Service
Starting named Service
Starting ipa_memcached Service
Starting httpd Service
Starting pki-tomcatd Service
Starting smb Service
Starting winbind Service
Starting ipa-otpd Service
ipa: INFO: The ipactl command was successful
:: [   PASS   ] :: Running 'ssh ipaqavmh.spoore07021651.test 'ipactl start'' (Expected 0, got 0)

But, later not here:

:: [ 19:22:01 ] ::  Enabling IPA on REPLICA1
Job for smb.service failed. See 'systemctl status smb.service' and 'journalctl -xn' for details.
Failed to start smb Service
Shutting down
Starting Directory Service
    debugging enabled, suppressing output.
Starting krb5kdc Service
Starting kadmin Service
Starting named Service
Starting ipa_memcached Service
Starting httpd Service
Starting pki-tomcatd Service
Starting smb Service
Aborting ipactl
:: [   FAIL   ] :: Running 'ssh ipaqavmh.spoore07021651.test 'ipactl start'' (Expected 0, got 1)
smb.service - Samba SMB Daemon
   Loaded: loaded (/usr/lib/systemd/system/smb.service; disabled)
   Active: failed (Result: exit-code) since Wed 2014-07-02 19:22:46 EDT; 6s ago
  Process: 21584 ExecStart=/usr/sbin/smbd $SMBDOPTIONS (code=exited, status=1/FAILURE)
 Main PID: 21584 (code=exited, status=1/FAILURE)
   Status: "Starting process..."
   CGroup: /system.slice/smb.service

Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: failed to bind to server ldapi://%2fvar%2frun%2fslapd-SPOORE07021651-TEST.socket with dn="[Anonymous bind]" Error: Invalid credentials
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: [2014/07/02 19:22:46.222544,  0] ipa_sam.c:4440(pdb_init_ipasam)
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: Failed to get base DN.
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: [2014/07/02 19:22:46.224673,  0] ../source3/passdb/pdb_interface.c:178(make_pdb_method_name)
Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: pdb backend ipasam:ldapi://%2fvar%2frun%2fslapd-SPOORE07021651-TEST.socket did not correctly init (error was NT_STATUS_UNSUCCESSFUL)
Jul 02 19:22:46 ipaqavmh.spoore07021651.test systemd[1]: smb.service: main process exited, code=exited, status=1/FAILURE
Jul 02 19:22:46 ipaqavmh.spoore07021651.test systemd[1]: Failed to start Samba SMB Daemon.
Jul 02 19:22:46 ipaqavmh.spoore07021651.test systemd[1]: Unit smb.service entered failed state.
Jul 02 19:22:49 ipaqavmh.spoore07021651.test systemd[1]: Stopped Samba SMB Daemon.
:: [   FAIL   ] :: Running 'ssh ipaqavmh.spoore07021651.test 'systemctl status smb.service'' (Expected 0, got 3)
:: [ 19:22:52 ] ::  Running Sleep 100000 to debug

I'll attach the logs here in case more info helps.

Comment 13 Martin Kosek 2014-07-18 12:09:59 UTC
I checked the logs, but I could not find anything sticking out besides what was already mentioned. Alexander, maybe you'd have better luck?

I am puzzled with following samba error message

Jul  2 19:22:46 ipaqavmh smbd[21584]: failed to bind to server ldapi://%2fvar%2frun%2fslapd-             SPOORE07021651-TEST.socket with dn="[Anonymous bind]" Error: Invalid credentials

Given it is anonymous bind. I would expect DNS issue of some sort rather than Invalid credentials.

Comment 14 Alexander Bokovoy 2014-07-18 12:35:03 UTC
It is not an anonymous bind, ignore what the message says because it simply has no credentials in the debug code to tell (as we do GSSAPI here). 

The next line explains why credentials are "invalid":

Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage

So GSSAPI server (389-ds in this case) is not yet ready for performing operations. This is sign on a pretty loaded ns-slapd which is only starting and logging own operations -- it cannot yet accept incoming connection until all plugins are initialized.

Comment 15 Martin Kosek 2014-07-18 12:54:28 UTC
Ah, ok - thanks for explanation. Should we then enhance ipa-sam to add try-wait-repeat loop to pdb_init_ipasam in this case? Because now it tries just once and then crashes.

Alternatively, we would need to update 389-ds-base Service start waiting procedure to not wait just until the port is available but also until it is fully initialized (how to detect?). CCing Ludwig and Thierry for help here.

Comment 16 Alexander Bokovoy 2014-07-18 12:59:15 UTC
I think we should enhance 389-ds-base part. In Samba case ipa-sam doesn't handle retries; it is done centrally in Samba and we get 10-15 retries before we decide to retire.

Comment 17 Ludwig 2014-07-18 13:02:23 UTC
what do you mean by 389 base part, ds has to do its task before listening on connections, so the clients need to wait until it is available. You could poll by tryng base searches until ds is available

Comment 18 Martin Kosek 2014-07-18 13:42:13 UTC
(In reply to Ludwig from comment #17)
> what do you mean by 389 base part

We need some way to be able to tell "now, 389-ds-base instance is up and all other service can start and do their own searches on the DS".

Right now, we just check that port 389 is available and when it is, we fire up other services. For PKI for example, we do a more extended check and do a call to /ca/admin/ca/getStatus to check if the PKI is really up or just ports are opened.

Should we do something similar for 389? We need something solid and stable here.

Comment 19 Ludwig 2014-07-18 13:48:13 UTC
Are you sure if your analysis is correct ? The error is at: 

Jul 02 19:22:46 ipaqavmh.spoore07021651.test smbd[21584]: SASL(-13): authentication failure: GSSAPI server is not expecting data at this stage

but if you look into the access logs ds is successfully  processing operations at this time

Comment 20 Ludwig 2014-07-18 13:58:46 UTC
For Jul,4 the 19:00:58 the server is also processing ops, but in the error log I find;
[04/Jun/2014:16:00:58 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecifie
d GSS failure.  Minor code may provide more information (Cannot contact any KDC for realm 'SPOORE06041229.TEST')) errno 115 (Operation now in progress)
[04/Jun/2014:16:00:58 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] authentication mechanism [GSSAPI]: error -2 (Local error)
[04/Jun/2014:16:00:58 -0400] NSMMReplicationPlugin - agmt="cn=meToipaqa64vmj.spoore06041229.test" (ipaqa64vmj:389): Replication bind with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failur
e: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Cannot contact any KDC for realm 'SPOORE06041229.TEST'))

for the bind and for a repl conn it says canot contact any KDC

Comment 21 Martin Kosek 2014-07-24 11:51:43 UTC
We seem to be running in circles. May this be a named/bind-dyndb-ldap issue after all (CCing Petr Spacek to advise)? In /var/log/messages, we see:

Jul  2 19:22:07 ipaqavmh named[21175]: zone spoore07021651.test/IN: loaded serial 1404342711
Jul  2 19:22:07 ipaqavmh named[21175]: zone localhost/IN: loaded serial 0
Jul  2 19:22:07 ipaqavmh named[21175]: zone 1.0.0.127.in-addr.arpa/IN: loaded serial 0
Jul  2 19:22:07 ipaqavmh named[21175]: zone localhost.localdomain/IN: loaded serial 0
Jul  2 19:22:08 ipaqavmh named[21175]: all zones loaded
Jul  2 19:22:08 ipaqavmh named[21175]: running
...
Jul  2 19:22:18 ipaqavmh named[21175]: LDAP error: Timed out: while modifying(replace) entry             'idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test'
Jul  2 19:22:18 ipaqavmh named[21175]: SOA serial number incrementation failed in zone 'idnsname=96.16.  10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test': failure
Jul  2 19:22:18 ipaqavmh named[21175]: update_zone (psearch) failed for 'idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test'. Zones can be outdated, run `rndc reload`: failure
...
Jul  2 19:22:28 ipaqavmh named[21175]: update_record (psearch) failed, dn 'idnsname=_ldap._tcp,          idnsname=spoore07021651.test,cn=dns,dc=spoore07021651,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: timed out
Jul  2 19:22:38 ipaqavmh named[21175]: LDAP query timed out. Try to adjust "timeout" parameter
Jul  2 19:22:38 ipaqavmh named[21175]: update_record (psearch) failed, dn 'idnsname=_kerberos._tcp,      idnsname=spoore07021651.test,cn=dns,dc=spoore07021651,dc=test' change type 0x0. Records can be outdated, run `rndc reload`: timed out
Jul  2 19:22:46 ipaqavmh ns-slapd: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more   information (Cannot contact any KDC for realm 'SPOORE07021651.TEST')

In krb5kdc.log, I see just this being interesting:
Jul 02 19:22:46 ipaqavmh.spoore07021651.test krb5kdc[21154](info): closing down fd 14
Jul 02 19:22:46 ipaqavmh.spoore07021651.test krb5kdc[21154](info): DISPATCH: repeated (retransmitted?)   request from 10.16.98.193, resending previous response
Jul 02 19:22:46 ipaqavmh.spoore07021651.test krb5kdc[21154](info): DISPATCH: repeated (retransmitted?)   request from 10.16.98.193, resending previous response
Jul 02 19:22:46 ipaqavmh.spoore07021651.test krb5kdc[21153](info): TGS_REQ (6 etypes {18 17 16 23 25

Comment 22 Petr Spacek 2014-07-24 12:12:37 UTC
I don't think that this is bind-dyndb-ldap problem, it simply reports failure during write to LDAP:
Jul  2 19:22:18 ipaqavmh named[21175]: LDAP error: Timed out: while modifying(replace) entry             'idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test'

Scott, do you have reliable reproducer? I would jump to the machine and investigate it in-situ to make sure that bind-dyndb-ldap behaves correctly.

Comment 23 Ludwig 2014-07-24 12:57:17 UTC
I is a bit strange, the processing of the MOD is extremely delayed:

[02/Jul/2014:19:22:07 -0400] conn=9 op=13 RESULT err=0 tag=101 nentries=1 etime=0
[02/Jul/2014:19:22:07 -0400] conn=9 op=14 SRCH base="idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test" scope=0 filter="(objectClass=idnsRecord)" attrs=ALL
[02/Jul/2014:19:22:07 -0400] conn=9 op=14 RESULT err=0 tag=101 nentries=1 etime=0
[02/Jul/2014:19:22:45 -0400] conn=9 op=15 MOD dn="idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test"
[02/Jul/2014:19:22:45 -0400] conn=9 op=15 RESULT err=0 tag=103 nentries=0 etime=38 csn=53b49446000000030000

but it also looks like there is nothing going on in DS in the time:

[02/Jul/2014:19:22:07 -0400] conn=9 op=14 RESULT err=0 tag=101 nentries=1 etime=0
[02/Jul/2014:19:22:22 -0400] conn=10 fd=70 slot=70 SSL connection from 10.16.98.193 to 10.16.98.193
...
[02/Jul/2014:19:22:23 -0400] conn=10 op=15 RESULT err=0 tag=101 nentries=0 etime=0
[02/Jul/2014:19:22:24 -0400] conn=19 fd=79 slot=79 connection from local to /var/run/slapd-SPOORE07021651-TEST.socket
[02/Jul/2014:19:22:24 -0400] conn=19 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI
[02/Jul/2014:19:22:45 -0400] conn=9 op=15 MOD dn="idnsname=96.16.10.in-addr.arpa.,cn=dns,dc=spoore07021651,dc=test"

a few new conns at 19:22:22-24 and that's it, like the process was stale

Comment 24 Dmitri Pal 2014-09-01 14:08:13 UTC
Any update?

Comment 25 Scott Poore 2014-09-10 15:05:52 UTC
I have been unable to reproduce this one with the tests that originally produced it.

So, I'm closing this one as worksforme.  If it does occur again, we can revisit and try to get the logs then.

Thanks for your help.


Note You need to log in before you can comment on or make changes to this bug.