Bug 1328108 - Protocol error with FreeIPA on CentOS 6
Summary: Protocol error with FreeIPA on CentOS 6
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: sssd
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Sumit Bose
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1329648
TreeView+ depends on / blocked
 
Reported: 2016-04-18 13:21 UTC by Tim Niemueller
Modified: 2020-05-02 18:21 UTC (History)
9 users (show)

Fixed In Version: sssd-1.13.4-2.fc23 sssd-1.13.4-2.fc22 sssd-1.13.4-2.fc24
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1329648 (view as bug list)
Environment:
Last Closed: 2016-05-04 13:06:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
sssd domain log debug level 9 (14.56 KB, application/x-xz)
2016-04-18 14:49 UTC, Tim Niemueller
no flags Details
Log with log_level 10 (1.37 MB, text/plain)
2016-05-04 12:12 UTC, Tim Niemueller
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 4034 0 None closed Protocol error with FreeIPA on CentOS 6 2020-05-02 18:21:29 UTC

Description Tim Niemueller 2016-04-18 13:21:35 UTC
Description of problem:
We use FreeIPA on CentOS 6 for user authentication. On Fedora 23, the sssd_be consumes lots of memory and writes error messages to the log at a rapid rate (gigabytes per day in debug_level 4).

Version-Release number of selected component (if applicable):
sssd-1.13.3-6.fc23.x86_64
Also tested: sssd-1.13.4-1.fc23.x86_64

How reproducible:
Always

Additional info:
The server runs ipa-server-3.0.0-47.el6.centos.2.x86_64.

On the client, the memory consumption of sssd_be constantly increases. The log contains lots of messages of the following form (can't actually tell the order since they repeat over and over, debug_level set to 4):
(Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_get_generic_op_finished] (0x0040): Unexpected result from ldap: Protocol error(2), A dereference attribute must have DN syntax
(Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed [5]: Input/output error
(Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_deref_search_done] (0x0040): dereference processing failed [5]: Input/output error

On the server, this causes a very high load in ns-slapd for the domain in question.

We are currently in the process of upgrading from F20. There, everything was working fine. So it seems that sssd in the meantime introduced something that triggers this behavior. Either this is a general incompatibility that current exists between the latest sssd and the CentOS 6 FreeIPA, or maybe some FreeIPA upgrade had an issue? (I remember

The workstations with the new version seem to work fine, aside from becoming unresponsive once sssd_be fills up the memory. Users can authenticate.

If there is anything I can provide to help analyze and fix this issue please let me know. Currently this is a roadblock for further upgrading our machines. We would like to resolve this without upgrading the FreeIPA server (after all, we chose an enterprise Linux for long-term stability).

Comment 1 Lukas Slebodnik 2016-04-18 13:35:06 UTC
(In reply to Tim Niemueller from comment #0)
> Description of problem:
> We use FreeIPA on CentOS 6 for user authentication. On Fedora 23, the
> sssd_be consumes lots of memory and writes error messages to the log at a
> rapid rate (gigabytes per day in debug_level 4).
> 
> Version-Release number of selected component (if applicable):
> sssd-1.13.3-6.fc23.x86_64
> Also tested: sssd-1.13.4-1.fc23.x86_64
> 
> How reproducible:
> Always
> 
> Additional info:
> The server runs ipa-server-3.0.0-47.el6.centos.2.x86_64.
> 
Do you have installed the latest 389-ds-base on IPA server?
1.2.11.15-72.el6_7?

How many replicas do you have in topology?

> On the client, the memory consumption of sssd_be constantly increases. The
> log contains lots of messages of the following form (can't actually tell the
> order since they repeat over and over, debug_level set to 4):
> (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_get_generic_op_finished]
> (0x0040): Unexpected result from ldap: Protocol error(2), A dereference
> attribute must have DN syntax
> (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [generic_ext_search_handler]
> (0x0040): sdap_get_generic_ext_recv failed [5]: Input/output error
> (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_deref_search_done]
> (0x0040): dereference processing failed [5]: Input/output error
> 
> On the server, this causes a very high load in ns-slapd for the domain in
> question.
> 
Could you test with sssd-1.13.4 (it is in updates testing ATM)?

> We are currently in the process of upgrading from F20. There, everything was
> working fine. So it seems that sssd in the meantime introduced something
> that triggers this behavior. Either this is a general incompatibility that
> current exists between the latest sssd and the CentOS 6 FreeIPA, or maybe
> some FreeIPA upgrade had an issue? (I remember
> 
> The workstations with the new version seem to work fine, aside from becoming
> unresponsive once sssd_be fills up the memory. Users can authenticate.
> 
> If there is anything I can provide to help analyze and fix this issue please
> let me know. Currently this is a roadblock for further upgrading our
> machines. We would like to resolve this without upgrading the FreeIPA server
> (after all, we chose an enterprise Linux for long-term stability).
It is a valid use-case and sssd should even with IPA on el6.

Please attach domain log files with higher debug_level (7 or even better with 9)

Comment 2 Tim Niemueller 2016-04-18 14:49:15 UTC
Created attachment 1148229 [details]
sssd domain log debug level 9

Comment 3 Tim Niemueller 2016-04-18 14:49:28 UTC
We're running 389-ds-base-1.2.11.15-72.el6_7.x86_64.

There is a single replica in the topology (it's a rather small network with about a dozen clients).

I have tested the version from updates-testing (or rather from Koji, the update was still pending). The result is the same.

I have attached the domain log with debug_level=9. I have only cut the first 5000 lines. It generated more than 1000000 lines in a mere minute... It is everything until the error appears. If you need more let me know.

Comment 4 Lukas Slebodnik 2016-04-18 15:21:26 UTC
(In reply to Tim Niemueller from comment #3)
> We're running 389-ds-base-1.2.11.15-72.el6_7.x86_64.
> 
> There is a single replica in the topology (it's a rather small network with
> about a dozen clients).
> 
> I have tested the version from updates-testing (or rather from Koji, the
> update was still pending). The result is the same.
> 
yes, you are right.
It hasn't been pushed to testing yet.
https://bodhi.fedoraproject.org/updates/sssd-1.13.4-1.fc23

> I have attached the domain log with debug_level=9. I have only cut the first
> 5000 lines. It generated more than 1000000 lines in a mere minute... It is
> everything until the error appears. If you need more let me know.
I will try to find something in attached log.

Meanwhile you might try to disable dereference lookups completely by setting the
value of ldap_deref_threshold to 0. It's just a workaround.

Comment 5 Tim Niemueller 2016-04-18 15:25:17 UTC
Thanks for your effort!

Adding ldap_deref_threshold = 0 to the domain section of sssd.conf changed nothing, the problem still persists.

Comment 6 Lukas Slebodnik 2016-04-18 16:00:03 UTC
I had a discussion with fellow developer and this bug could be caused
by upstream ticket https://fedorahosted.org/sssd/ticket/2960.
It was backported to fedora in sssd-1.13.3-6.fc23.x86_64

Could you test with sssd-1.13.3-5.fc23.x86_64 or older (1.13.3-1)?
I assume it should not be a problem for you because you already tested 1.13.4 from koji.

Comment 7 Sumit Bose 2016-04-18 16:54:14 UTC
I think I was able to reproduce the issues and the test build in http://koji.fedoraproject.org/koji/taskinfo?taskID=13699811 resolved it for me.

Please let me know if this fixes the issue for you as well. For your convenience you might want to try

curl http://koji.fedoraproject.org/koji/taskinfo?taskID=13699811 | grep -o '"https://.*.rpm"' | xargs -n 1 curl -L -O

which should download all packages in a single run.

Comment 8 Tim Niemueller 2016-04-18 17:54:22 UTC
Indeed, this seems to fix the problem. sssd_be is now back to 14640 RES memory. In the log I now see:

(Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [sdap_get_generic_op_finished] (0x0040): Unexp
ected result from ldap: Protocol error(2), A dereference attribute must have DN syntax
(Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [generic_ext_search_handler] (0x0040): sdap_ge
t_generic_ext_recv failed [5]: Input/output error
(Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [sdap_deref_search_done] (0x0040): dereference
 processing failed [5]: Input/output error

and

(Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [sysdb_get_real_name] (0x0040): Cannot find us
er [gdm] in cache
(Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [ipa_id_get_account_info_orig_done] (0x0080): 
Object not found, ending request
(Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [acctinfo_callback] (0x0100): Request processe
d. Returned 3,0,Account info lookup failed
(Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [ipa_host_info_done] (0x0020): Server does not
 support deref
(Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [ipa_sudo_refresh_host_done] (0x0040): Unable 
to retrieve host information [5]: Input/output error
(Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [be_ptask_done] (0x0040): Task [SUDO Full Refr
esh]: failed with [5]: Input/output error

I guess this is the expected output as some feature is missing on our older FreeIPA install? Would you mind giving a hint what the actual problem was?
Can you roll out the fix soonish? We have postponed our upgrade until this hits updates.

Thank you very much, especially for this speedy reaction and quick fixing. This is highly appreciated!

Comment 9 Lukas Slebodnik 2016-04-18 19:21:16 UTC
(In reply to Tim Niemueller from comment #8)
> Thank you very much, especially for this speedy reaction and quick fixing.
> This is highly appreciated!
As soon as patch will be accepted in upstream.

Thank you very much for report and testing.

Comment 10 Lukas Slebodnik 2016-04-19 11:26:06 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/2993

Comment 11 Tim Niemueller 2016-04-19 13:49:01 UTC
We got a new problem. Now we cannot login to the machine at all it seems. I only did "id tim" to see if user info could be retrieved to claim success. In /var/log/secure I see:

Apr 19 15:45:31 HOST sshd[1686]: pam_sss(sshd:account): Access denied for user tim: 4 (System error)
Apr 19 15:45:31 HOST sshd[1686]: fatal: Access denied for user tim by PAM account configuration [preauth]

In the sssd log is:
(Tue Apr 19 15:48:20 2016) [sssd[be[DOMAIN]]] [acctinfo_callback] (0x0100): Request processed. Returned 3,0,Account info lookup failed

Could this have been caused by the fix?

Comment 12 Sumit Bose 2016-04-19 13:59:19 UTC
Thank you for testing.

I think the login issue you are seeing is not related. Please attach full SSSD logs file with debug_level=10.

Besides that I wonder if you can test the build from http://koji.fedoraproject.org/koji/taskinfo?taskID=13715139 as well. It contains a different fix for the issue. Yesterday's build just contain a band-aid which made sure that some data about the IPA domain are only read once.

    In the morning I took a closer look why the related function was called repeatedly. It turned out that it started with a reordering of the different type of data SSSD read from the IPA server to find out about trusted domains. If you are interested the commit id is 5a5f1e1053415efaa99bb4d5bc7ce7ac0a95b757. In the commit message the old and new order is show:

        Previsouly the order was:
            - ranges
            - subdomains
            - if on client, views
            - master
        
        Now the order is:
            - ranges
            - master
            - subdomains
            - if on client, views

    As you can see originally the view name was read before the master domain record. The view name lookup is the dereference search you see failing in the logs because IPA on RHEL6 does not support views. In the case of this error SSSD jumps to the master domain record lookup.

    With the new ordering the master domain is lookup up earlier and the view name is looked up last. But unfortunately we forgot to update the error handling in the view name lookup and on error SSSD was still jumping to the master record lookup. This is not an issue as long as the master domain record is available because in this case the data from this object are already stored in SSSD and the lookup is skipped. But this record is only needed when creating trust to AD and is created by running ipa-adtrust-install on the server. In your case where the record does not exist SSSD can of course not find the data from this record and try to look the data up again and after that tries to look up the view name again which finally causes the loop you are seeing.

    In this new test build I replaced the band-aid with a patch which just finishes the request if the view name lookup fails.

    This issue was not discovered earlier because another more recent patch (b25d33b0a775e2337014a334699156ac56b08f9b) was needed to trigger the issue on RHEL6.

Comment 13 Lukas Slebodnik 2016-04-19 14:52:18 UTC
(In reply to Sumit Bose from comment #12)
> Thank you for testing.
> 
> I think the login issue you are seeing is not related. Please attach full
> SSSD logs file with debug_level=10.
> 
But maybe it would be good to track it in new Bugzilla ticket.

Comment 14 Lukas Slebodnik 2016-04-21 09:24:14 UTC
Do you have any success with testing of scratch build from comment 12?
Could you also provide log files for problem with
"pam_sss(sshd:account): Access denied for user tim: 4 (System error)"
?

Comment 15 Jakub Hrozek 2016-04-22 17:02:47 UTC
Additional fix:
    master: 57d8b4b9254442a568838cb60ea16068965f2df0
    sssd-1-13: e5fbaf42f87725e42a40e542c06d5f4cdf4c52c2

Comment 16 Fedora Update System 2016-04-22 19:51:07 UTC
sssd-1.13.4-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-a70b9da866

Comment 17 Fedora Update System 2016-04-22 19:52:24 UTC
sssd-1.13.4-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-f1498ed136

Comment 18 Fedora Update System 2016-04-22 19:53:03 UTC
sssd-1.13.4-2.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-48891618d0

Comment 19 Tim Niemueller 2016-04-23 22:54:02 UTC
Hi. I'm currently on vacation and will reply the week following the next.

Comment 20 Fedora Update System 2016-04-24 01:24:32 UTC
sssd-1.13.4-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-a70b9da866

Comment 21 Fedora Update System 2016-04-24 06:22:55 UTC
sssd-1.13.4-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f1498ed136

Comment 22 Fedora Update System 2016-04-24 06:25:58 UTC
sssd-1.13.4-2.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-48891618d0

Comment 23 Fedora Update System 2016-04-25 23:52:19 UTC
sssd-1.13.4-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2016-05-03 09:50:02 UTC
sssd-1.13.4-2.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Tim Niemueller 2016-05-04 12:12:02 UTC
Created attachment 1153821 [details]
Log with log_level 10

This log file shows a failed login attempt with log_level=10. Searching for "tim " yields: "tim is a direct member of 8 LDAP groups". So this in principle seems to be working. But then:
"Server does not support deref"
"communication error on cached connection, moving to next server"
Since there is only one server I guess that's where it fails. This indeed seems related to the original bug, doesn't it?

Thanks for your continued help and effort to fix this (back from vacation btw).

Comment 26 Sumit Bose 2016-05-04 12:37:09 UTC
The login issue might be a side effect of setting 'ldap_deref_threshold = 0' because the HBAC rules cannot be processed with this setting. Please remove the option an try again.

Comment 27 Tim Niemueller 2016-05-04 12:46:10 UTC
Yep, that did it. Thanks a lot!

Comment 28 Fedora Update System 2016-05-07 11:58:35 UTC
sssd-1.13.4-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.