RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1859426 - When encountered KDC policy reject, samba-winbind should not flood the network by retrying intensively
Summary: When encountered KDC policy reject, samba-winbind should not flood the networ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: samba
Version: 7.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Isaac Boukris
QA Contact: sssd-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-22 01:46 UTC by Ding-Yi Chen
Modified: 2023-10-06 21:12 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-31 05:39:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Samba Project 14447 0 None None None 2020-07-22 13:05:32 UTC

Description Ding-Yi Chen 2020-07-22 01:46:04 UTC
Description of problem:

One of our customers encountered huge network traffic after upgrading samba-winbind from 4.9.1-6 to 4.9.1-10.

The root cause is:

The trusted domain has "selective authentication", 
e.g. if the user and computer account is allowed to authenticate, they will be rejected as KDC5KDC_ERR_POLICY, 
and samba-winbind recognize it as "KDC policy rejects request".

While old version (4.9.1-6) did not contact the trusted domains at all,
the newer version (>= 4.9.1-10), while could not authenticate, 
keeping retrying intensively (0.1 ~ 1 sec per request), thus flooding the network. 

The expected behavior is, when encounter the "KDC policy rejects request" at startup, it is very unlikely the situation will change in short period, thus no need to waste CPU and network bandwidth to retry.

The "winbind: ignore domains = <trusted domain>" is not desirable for the customer, as there are thousands of instances in the domain, the operators do not know which instances need trusted domains.

Also, the trusted domain is unwilling to allow all of the instances to authenticate. Thus we should reduce the frequency of scanning to prevent bombard the network traffic.


Version-Release number of selected component (if applicable):
From  samba-winbind-4.9.1-10.el7.x86_64
TO    samba-winbind-4.10.4-11.el7_8.x86_64

How reproducible:
Always


Steps to Reproduce:
1. Setup two way trust between HOME domain and TRUSTED domain
2. Assume samba1.home.example.com is joined to HOME
3. Setup selective authentication in TRUSTED domain to block the samba1.home.example.com. See [1] for instraction, [2,3] for more info.
4. Restart smb and winbind

Actual results:

Retried within 1 sec:
~~~~~~~~
[2020/07/20 14:17:24.482116,  1] ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
  Failed to prepare SMB connection to ad.trusted.example.com: NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:24.574381,  0] ../../source3/librpc/crypto/gse.c:543(gse_get_client_auth_token)
  gse_get_client_auth_token: gss_init_sec_context failed with [Unspecified GSS failure.  Minor code may provide more information: KDC policy rejects request](2529638924)
[2020/07/20 14:17:24.574518,  1] ../../auth/gensec/spnego.c:596(gensec_spnego_client_negTokenInit_step)
  gensec_spnego_client_negTokenInit_step: gse_krb5: creating NEG_TOKEN_INIT for cifs/ad.trusted.example.com failed (next[(null)]): NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:24.574854,  1] ../../source3/winbindd/winbindd_cm.c:1166(cm_prepare_connection)
  authenticated session setup to ad.trusted.example.com using SAMBA1$@HOME.EXAMPLE.COM failed with NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:24.575087,  1] ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
  Failed to prepare SMB connection to ad.trusted.example.com: NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.792648,  0] ../../source3/librpc/crypto/gse.c:543(gse_get_client_auth_token)
  gse_get_client_auth_token: gss_init_sec_context failed with [Unspecified GSS failure.  Minor code may provide more information: KDC policy rejects request](2529638924)
[2020/07/20 14:17:34.793231,  1] ../../auth/gensec/spnego.c:596(gensec_spnego_client_negTokenInit_step)
  gensec_spnego_client_negTokenInit_step: gse_krb5: creating NEG_TOKEN_INIT for cifs/ad.trusted.example.com failed (next[(null)]): NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.793775,  1] ../../source3/winbindd/winbindd_cm.c:1166(cm_prepare_connection)
  authenticated session setup to ad.trusted.example.com using SAMBA1$@HOME.EXAMPLE.COM failed with NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.794065,  1] ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
  Failed to prepare SMB connection to ad.trusted.example.com: NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.877074,  0] ../../source3/librpc/crypto/gse.c:543(gse_get_client_auth_token)
  gse_get_client_auth_token: gss_init_sec_context failed with [Unspecified GSS failure.  Minor code may provide more information: KDC policy rejects request](2529638924)
[2020/07/20 14:17:34.877247,  1] ../../auth/gensec/spnego.c:596(gensec_spnego_client_negTokenInit_step)
  gensec_spnego_client_negTokenInit_step: gse_krb5: creating NEG_TOKEN_INIT for cifs/ad.trusted.example.com failed (next[(null)]): NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.877572,  1] ../../source3/winbindd/winbindd_cm.c:1166(cm_prepare_connection)
  authenticated session setup to ad.trusted.example.com using SAMBA1$@HOME.EXAMPLE.COM failed with NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.877797,  1] ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
  Failed to prepare SMB connection to ad.trusted.example.com: NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.988599,  0] ../../source3/librpc/crypto/gse.c:543(gse_get_client_auth_token)
  gse_get_client_auth_token: gss_init_sec_context failed with [Unspecified GSS failure.  Minor code may provide more information: KDC policy rejects request](2529638924)
[2020/07/20 14:17:34.988752,  1] ../../auth/gensec/spnego.c:596(gensec_spnego_client_negTokenInit_step)
  gensec_spnego_client_negTokenInit_step: gse_krb5: creating NEG_TOKEN_INIT for cifs/ad.trusted.example.com failed (next[(null)]): NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.989100,  1] ../../source3/winbindd/winbindd_cm.c:1166(cm_prepare_connection)
  authenticated session setup to ad.trusted.example.com using SAMBA1$@HOME.EXAMPLE.COM failed with NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:34.989317,  1] ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
  Failed to prepare SMB connection to ad.trusted.example.com: NT_STATUS_LOGON_FAILURE
~~~~~~~~

Expected results:

Retry after 1 hour


Additional info:

A configuration item for setting scan interval in smb.conf, such as:

winbind rescan trusted domains interval


References:
1. https://www.microsoftpressstore.com/articles/article.aspx?p=2199426&seqNum=3
2. https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc755321(v=ws.10)?redirectedfrom=MSDN
3. https://support.microsoft.com/en-ae/help/2959395/a-tgs-request-for-the-krbtgt-account-fails-with-kdc-err-policy-and-an

Comment 2 Isaac Boukris 2020-07-22 11:57:10 UTC
> The trusted domain has "selective authentication",

That helps, looking into it.

Comment 3 Stefan Metzmacher 2020-07-22 14:13:40 UTC
We already have "winbind scan trusted domains = no", would that have fixed the problem too?
It would be good to know of any reason why we still need the scanning at all.

But for sure we should fix the regression.

Comment 4 Isaac Boukris 2020-07-22 15:17:02 UTC
(In reply to Stefan Metzmacher from comment #3)
> We already have "winbind scan trusted domains = no", would that have fixed
> the problem too?

I didn't know the scanning wasn't really needed.

@ding-yi: sounds like a good idea to suggest to customer, at least for the meantime until we fix it back to skip forests with selective-authentication (as we used to).

Comment 5 Stefan Metzmacher 2020-07-22 15:27:29 UTC
I'm not sure if it's needed in the customers configuration, but if it turns out it's not needed I'm keep it off forever.

The wbinfo -m --verbose, wbinfo -u, wbinfo -g outputs differs, but that should not impact any real user interaction, see also
https://www.samba.org/samba/history/samba-4.8.0.html

In 4.8 times some idmap backends and pam_winbind with krb5 required the scanning, but I'm not sure if it's still required today.

Comment 6 Ding-Yi Chen 2020-07-23 01:18:01 UTC
@Issac customer mentioned that "winbind scan trusted domains = no" does not work for them.

Will ask him about the symptom and log.

Comment 7 Isaac Boukris 2020-07-23 08:03:08 UTC
(In reply to Ding-Yi Chen from comment #6)
> @Issac customer mentioned that "winbind scan trusted domains = no" does not
> work for them.
> 
> Will ask him about the symptom and log.

I have a question, in your tests did you enable selective-authentication only on the trusted forest side or also on the local one?
And can you also test the old samba version with selective-authentication configured only on the trusted forest side, not the local one.

Thanks

Comment 8 Isaac Boukris 2020-07-23 11:29:47 UTC
(In reply to Ding-Yi Chen from comment #0)


> Expected results:
> 
> Retry after 1 hour
> 
> 
> Additional info:
> 
> A configuration item for setting scan interval in smb.conf, such as:
> 
> winbind rescan trusted domains interval

In a closer look we already have such configuration option "winbind reconnect delay" which you can set to 3600, from man page:

This parameter specifies the number of seconds the winbindd(8) daemon will wait between attempts to contact a Domain controller for a domain that is determined to be down or not contactable.
Default: winbind reconnect delay = 30


Let me know if this works out for the customer, as I'm not sure we'll change it back to ignore trusts with selective-authentication, as it isn't necessarily correct.

Comment 9 Ding-Yi Chen 2020-07-24 00:19:39 UTC
@Issac



> I have a question, in your tests did you enable selective-authentication only on the trusted forest side or also on the local one?
> And can you also test the old samba version with selective-authentication configured only on the trusted forest side, not the local one.

The selective-authentication is on trusted forest.
Local one does not have that.

The old version did not scan the trust domain. No traffic in tcpdump to the trust domain when the samba-winbind starting.


> In a closer look we already have such configuration option "winbind reconnect delay" which you can set to 3600
> ...
> Default: winbind reconnect delay = 30

From previous log, it seems to contact less than one second (from 14:17:24.482116 ~ 14:17:24.575087)
Do you think something else might also involved?

Log:
~~~
[2020/07/20 14:17:24.482116,  1] ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
  Failed to prepare SMB connection to ad.trusted.example.com: NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:24.574381,  0] ../../source3/librpc/crypto/gse.c:543(gse_get_client_auth_token)
  gse_get_client_auth_token: gss_init_sec_context failed with [Unspecified GSS failure.  Minor code may provide more information: KDC policy rejects request](2529638924)
[2020/07/20 14:17:24.574518,  1] ../../auth/gensec/spnego.c:596(gensec_spnego_client_negTokenInit_step)
  gensec_spnego_client_negTokenInit_step: gse_krb5: creating NEG_TOKEN_INIT for cifs/ad.trusted.example.com failed (next[(null)]): NT_STATUS_LOGON_FAILURE
[2020/07/20 14:17:24.574854,  1] ../../source3/winbindd/winbindd_cm.c:1166(cm_prepare_connection)
  authenticated session setup to ad.trusted.example.com using SAMBA1$@HOME.EXAMPLE.COM failed with NT_STATUS_LOGON_FAILURE

[2020/07/20 14:17:24.575087,  1] ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
  Failed to prepare SMB connection to ad.trusted.example.com: NT_STATUS_LOGON_FAILURE
~~~


That said, I will ask cu to apply the option.

Comment 10 Isaac Boukris 2020-07-24 05:34:34 UTC
(In reply to Ding-Yi Chen from comment #9)
> @Issac
> 
> 
> 
> > I have a question, in your tests did you enable selective-authentication only on the trusted forest side or also on the local one?
> > And can you also test the old samba version with selective-authentication configured only on the trusted forest side, not the local one.
> 
> The selective-authentication is on trusted forest.
> Local one does not have that.
> 
> The old version did not scan the trust domain. No traffic in tcpdump to the
> trust domain when the samba-winbind starting.

This doesn't match my testing in lab and the code, when the local one doesn't have selective-auth and only the trusted has, we'd retry the same way every 30 seconds even in the older version.

> > In a closer look we already have such configuration option "winbind reconnect delay" which you can set to 3600
> > ...
> > Default: winbind reconnect delay = 30
> 
> From previous log, it seems to contact less than one second (from
> 14:17:24.482116 ~ 14:17:24.575087)
> Do you think something else might also involved?
> 
> Log:
> ~~~
> [2020/07/20 14:17:24.482116,  1]
> ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
>   Failed to prepare SMB connection to ad.trusted.example.com:
> NT_STATUS_LOGON_FAILURE
> [2020/07/20 14:17:24.574381,  0]
> ../../source3/librpc/crypto/gse.c:543(gse_get_client_auth_token)
>   gse_get_client_auth_token: gss_init_sec_context failed with [Unspecified
> GSS failure.  Minor code may provide more information: KDC policy rejects
> request](2529638924)
> [2020/07/20 14:17:24.574518,  1]
> ../../auth/gensec/spnego.c:596(gensec_spnego_client_negTokenInit_step)
>   gensec_spnego_client_negTokenInit_step: gse_krb5: creating NEG_TOKEN_INIT
> for cifs/ad.trusted.example.com failed (next[(null)]):
> NT_STATUS_LOGON_FAILURE
> [2020/07/20 14:17:24.574854,  1]
> ../../source3/winbindd/winbindd_cm.c:1166(cm_prepare_connection)
>   authenticated session setup to ad.trusted.example.com using
> SAMBA1$@HOME.EXAMPLE.COM failed with NT_STATUS_LOGON_FAILURE
> 
> [2020/07/20 14:17:24.575087,  1]
> ../../source3/winbindd/winbindd_cm.c:1306(cm_prepare_connection)
>   Failed to prepare SMB connection to ad.trusted.example.com:
> NT_STATUS_LOGON_FAILURE
> ~~~
> 
> 
> That said, I will ask cu to apply the option.

That's a couple of requests of a single retry, we only retry every 30 second (based on this option).

Comment 11 Ding-Yi Chen 2020-07-27 07:17:25 UTC
While the customer have not provided the log with  "winbind scan trusted domains = no" yet,

He did mention that "winbind reconnect delay = 3600" worked for him, 
as the error messages only appear on hourly basis.

Comment 12 Ding-Yi Chen 2020-07-28 01:54:55 UTC
Customer would like to know whether there are the ways to specify delays for certain domain, such as:

winbind reconnect * : delay = 30
winbind reconnect SELECTIVE_TRUSTED : delay = 3600

Comment 13 Isaac Boukris 2020-07-28 05:58:13 UTC
(In reply to Ding-Yi Chen from comment #12)
> Customer would like to know whether there are the ways to specify delays for
> certain domain, such as:
> 
> winbind reconnect * : delay = 30
> winbind reconnect SELECTIVE_TRUSTED : delay = 3600

No.

I'm closing as not a bug as we aren't going to filter out selective-auth as noted.

Comment 14 Ding-Yi Chen 2020-07-31 04:46:04 UTC
Hi,

Just wondering, is that possible that we can have an additional option that address reject domains.

For example, "winbind reject retry = 7200" means: if the domains get the KDC policy rejects, it will retry after 2 hours.


Having this will simplify config for mass deployments, the system admin don't need to worry about the error message flooding the system;
individual users can apply for permission without changing the configuration files.

Comment 15 Isaac Boukris 2020-07-31 05:39:16 UTC
(In reply to Ding-Yi Chen from comment #14)
> Hi,
> 
> Just wondering, is that possible that we can have an additional option that
> address reject domains.
> 
> For example, "winbind reject retry = 7200" means: if the domains get the KDC
> policy rejects, it will retry after 2 hours.

No, there is no such an option.


Note You need to log in before you can comment on or make changes to this bug.