RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1329648 - Protocol error with IPA on RHEL-6
Summary: Protocol error with IPA on RHEL-6
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sssd
Version: 6.8
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: ---
Assignee: SSSD Maintainers
QA Contact: Steeve Goveas
URL:
Whiteboard:
Depends On: 1328108
Blocks: 1329647
TreeView+ depends on / blocked
 
Reported: 2016-04-22 13:07 UTC by Jakub Hrozek
Modified: 2020-05-04 10:55 UTC (History)
14 users (show)

Fixed In Version: sssd-1.13.3-39.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 1328108
Environment:
Last Closed: 2017-03-21 09:56:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 4034 0 None None None 2020-05-04 10:55:03 UTC
Red Hat Product Errata RHBA-2017:0632 0 normal SHIPPED_LIVE sssd bug fix and enhancement update 2017-03-21 12:30:13 UTC

Description Jakub Hrozek 2016-04-22 13:07:31 UTC
+++ This bug was initially created as a clone of Bug #1328108 +++

Description of problem:
We use FreeIPA on CentOS 6 for user authentication. On Fedora 23, the sssd_be consumes lots of memory and writes error messages to the log at a rapid rate (gigabytes per day in debug_level 4).

Version-Release number of selected component (if applicable):
sssd-1.13.3-6.fc23.x86_64
Also tested: sssd-1.13.4-1.fc23.x86_64

How reproducible:
Always

Additional info:
The server runs ipa-server-3.0.0-47.el6.centos.2.x86_64.

On the client, the memory consumption of sssd_be constantly increases. The log contains lots of messages of the following form (can't actually tell the order since they repeat over and over, debug_level set to 4):
(Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_get_generic_op_finished] (0x0040): Unexpected result from ldap: Protocol error(2), A dereference attribute must have DN syntax
(Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed [5]: Input/output error
(Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_deref_search_done] (0x0040): dereference processing failed [5]: Input/output error

On the server, this causes a very high load in ns-slapd for the domain in question.

We are currently in the process of upgrading from F20. There, everything was working fine. So it seems that sssd in the meantime introduced something that triggers this behavior. Either this is a general incompatibility that current exists between the latest sssd and the CentOS 6 FreeIPA, or maybe some FreeIPA upgrade had an issue? (I remember

The workstations with the new version seem to work fine, aside from becoming unresponsive once sssd_be fills up the memory. Users can authenticate.

If there is anything I can provide to help analyze and fix this issue please let me know. Currently this is a roadblock for further upgrading our machines. We would like to resolve this without upgrading the FreeIPA server (after all, we chose an enterprise Linux for long-term stability).

--- Additional comment from Lukas Slebodnik on 2016-04-18 09:35:06 EDT ---

(In reply to Tim Niemueller from comment #0)
> Description of problem:
> We use FreeIPA on CentOS 6 for user authentication. On Fedora 23, the
> sssd_be consumes lots of memory and writes error messages to the log at a
> rapid rate (gigabytes per day in debug_level 4).
> 
> Version-Release number of selected component (if applicable):
> sssd-1.13.3-6.fc23.x86_64
> Also tested: sssd-1.13.4-1.fc23.x86_64
> 
> How reproducible:
> Always
> 
> Additional info:
> The server runs ipa-server-3.0.0-47.el6.centos.2.x86_64.
> 
Do you have installed the latest 389-ds-base on IPA server?
1.2.11.15-72.el6_7?

How many replicas do you have in topology?

> On the client, the memory consumption of sssd_be constantly increases. The
> log contains lots of messages of the following form (can't actually tell the
> order since they repeat over and over, debug_level set to 4):
> (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_get_generic_op_finished]
> (0x0040): Unexpected result from ldap: Protocol error(2), A dereference
> attribute must have DN syntax
> (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [generic_ext_search_handler]
> (0x0040): sdap_get_generic_ext_recv failed [5]: Input/output error
> (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_deref_search_done]
> (0x0040): dereference processing failed [5]: Input/output error
> 
> On the server, this causes a very high load in ns-slapd for the domain in
> question.
> 
Could you test with sssd-1.13.4 (it is in updates testing ATM)?

> We are currently in the process of upgrading from F20. There, everything was
> working fine. So it seems that sssd in the meantime introduced something
> that triggers this behavior. Either this is a general incompatibility that
> current exists between the latest sssd and the CentOS 6 FreeIPA, or maybe
> some FreeIPA upgrade had an issue? (I remember
> 
> The workstations with the new version seem to work fine, aside from becoming
> unresponsive once sssd_be fills up the memory. Users can authenticate.
> 
> If there is anything I can provide to help analyze and fix this issue please
> let me know. Currently this is a roadblock for further upgrading our
> machines. We would like to resolve this without upgrading the FreeIPA server
> (after all, we chose an enterprise Linux for long-term stability).
It is a valid use-case and sssd should even with IPA on el6.

Please attach domain log files with higher debug_level (7 or even better with 9)

--- Additional comment from Tim Niemueller on 2016-04-18 10:49 EDT ---



--- Additional comment from Tim Niemueller on 2016-04-18 10:49:28 EDT ---

We're running 389-ds-base-1.2.11.15-72.el6_7.x86_64.

There is a single replica in the topology (it's a rather small network with about a dozen clients).

I have tested the version from updates-testing (or rather from Koji, the update was still pending). The result is the same.

I have attached the domain log with debug_level=9. I have only cut the first 5000 lines. It generated more than 1000000 lines in a mere minute... It is everything until the error appears. If you need more let me know.

--- Additional comment from Lukas Slebodnik on 2016-04-18 11:21:26 EDT ---

(In reply to Tim Niemueller from comment #3)
> We're running 389-ds-base-1.2.11.15-72.el6_7.x86_64.
> 
> There is a single replica in the topology (it's a rather small network with
> about a dozen clients).
> 
> I have tested the version from updates-testing (or rather from Koji, the
> update was still pending). The result is the same.
> 
yes, you are right.
It hasn't been pushed to testing yet.
https://bodhi.fedoraproject.org/updates/sssd-1.13.4-1.fc23

> I have attached the domain log with debug_level=9. I have only cut the first
> 5000 lines. It generated more than 1000000 lines in a mere minute... It is
> everything until the error appears. If you need more let me know.
I will try to find something in attached log.

Meanwhile you might try to disable dereference lookups completely by setting the
value of ldap_deref_threshold to 0. It's just a workaround.

--- Additional comment from Tim Niemueller on 2016-04-18 11:25:17 EDT ---

Thanks for your effort!

Adding ldap_deref_threshold = 0 to the domain section of sssd.conf changed nothing, the problem still persists.

--- Additional comment from Lukas Slebodnik on 2016-04-18 12:00:03 EDT ---

I had a discussion with fellow developer and this bug could be caused
by upstream ticket https://fedorahosted.org/sssd/ticket/2960.
It was backported to fedora in sssd-1.13.3-6.fc23.x86_64

Could you test with sssd-1.13.3-5.fc23.x86_64 or older (1.13.3-1)?
I assume it should not be a problem for you because you already tested 1.13.4 from koji.

--- Additional comment from Sumit Bose on 2016-04-18 12:54:14 EDT ---

I think I was able to reproduce the issues and the test build in http://koji.fedoraproject.org/koji/taskinfo?taskID=13699811 resolved it for me.

Please let me know if this fixes the issue for you as well. For your convenience you might want to try

curl http://koji.fedoraproject.org/koji/taskinfo?taskID=13699811 | grep -o '"https://.*.rpm"' | xargs -n 1 curl -L -O

which should download all packages in a single run.

--- Additional comment from Tim Niemueller on 2016-04-18 13:54:22 EDT ---

Indeed, this seems to fix the problem. sssd_be is now back to 14640 RES memory. In the log I now see:

(Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [sdap_get_generic_op_finished] (0x0040): Unexp
ected result from ldap: Protocol error(2), A dereference attribute must have DN syntax
(Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [generic_ext_search_handler] (0x0040): sdap_ge
t_generic_ext_recv failed [5]: Input/output error
(Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [sdap_deref_search_done] (0x0040): dereference
 processing failed [5]: Input/output error

and

(Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [sysdb_get_real_name] (0x0040): Cannot find us
er [gdm] in cache
(Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [ipa_id_get_account_info_orig_done] (0x0080): 
Object not found, ending request
(Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [acctinfo_callback] (0x0100): Request processe
d. Returned 3,0,Account info lookup failed
(Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [ipa_host_info_done] (0x0020): Server does not
 support deref
(Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [ipa_sudo_refresh_host_done] (0x0040): Unable 
to retrieve host information [5]: Input/output error
(Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [be_ptask_done] (0x0040): Task [SUDO Full Refr
esh]: failed with [5]: Input/output error

I guess this is the expected output as some feature is missing on our older FreeIPA install? Would you mind giving a hint what the actual problem was?
Can you roll out the fix soonish? We have postponed our upgrade until this hits updates.

Thank you very much, especially for this speedy reaction and quick fixing. This is highly appreciated!

--- Additional comment from Lukas Slebodnik on 2016-04-18 15:21:16 EDT ---

(In reply to Tim Niemueller from comment #8)
> Thank you very much, especially for this speedy reaction and quick fixing.
> This is highly appreciated!
As soon as patch will be accepted in upstream.

Thank you very much for report and testing.

--- Additional comment from Lukas Slebodnik on 2016-04-19 07:26:06 EDT ---

Upstream ticket:
https://fedorahosted.org/sssd/ticket/2993

--- Additional comment from Tim Niemueller on 2016-04-19 09:49:01 EDT ---

We got a new problem. Now we cannot login to the machine at all it seems. I only did "id tim" to see if user info could be retrieved to claim success. In /var/log/secure I see:

Apr 19 15:45:31 HOST sshd[1686]: pam_sss(sshd:account): Access denied for user tim: 4 (System error)
Apr 19 15:45:31 HOST sshd[1686]: fatal: Access denied for user tim by PAM account configuration [preauth]

In the sssd log is:
(Tue Apr 19 15:48:20 2016) [sssd[be[DOMAIN]]] [acctinfo_callback] (0x0100): Request processed. Returned 3,0,Account info lookup failed

Could this have been caused by the fix?

--- Additional comment from Sumit Bose on 2016-04-19 09:59:19 EDT ---

Thank you for testing.

I think the login issue you are seeing is not related. Please attach full SSSD logs file with debug_level=10.

Besides that I wonder if you can test the build from http://koji.fedoraproject.org/koji/taskinfo?taskID=13715139 as well. It contains a different fix for the issue. Yesterday's build just contain a band-aid which made sure that some data about the IPA domain are only read once.

    In the morning I took a closer look why the related function was called repeatedly. It turned out that it started with a reordering of the different type of data SSSD read from the IPA server to find out about trusted domains. If you are interested the commit id is 5a5f1e1053415efaa99bb4d5bc7ce7ac0a95b757. In the commit message the old and new order is show:

        Previsouly the order was:
            - ranges
            - subdomains
            - if on client, views
            - master
        
        Now the order is:
            - ranges
            - master
            - subdomains
            - if on client, views

    As you can see originally the view name was read before the master domain record. The view name lookup is the dereference search you see failing in the logs because IPA on RHEL6 does not support views. In the case of this error SSSD jumps to the master domain record lookup.

    With the new ordering the master domain is lookup up earlier and the view name is looked up last. But unfortunately we forgot to update the error handling in the view name lookup and on error SSSD was still jumping to the master record lookup. This is not an issue as long as the master domain record is available because in this case the data from this object are already stored in SSSD and the lookup is skipped. But this record is only needed when creating trust to AD and is created by running ipa-adtrust-install on the server. In your case where the record does not exist SSSD can of course not find the data from this record and try to look the data up again and after that tries to look up the view name again which finally causes the loop you are seeing.

    In this new test build I replaced the band-aid with a patch which just finishes the request if the view name lookup fails.

    This issue was not discovered earlier because another more recent patch (b25d33b0a775e2337014a334699156ac56b08f9b) was needed to trigger the issue on RHEL6.

--- Additional comment from Lukas Slebodnik on 2016-04-19 10:52:18 EDT ---

(In reply to Sumit Bose from comment #12)
> Thank you for testing.
> 
> I think the login issue you are seeing is not related. Please attach full
> SSSD logs file with debug_level=10.
> 
But maybe it would be good to track it in new Bugzilla ticket.

--- Additional comment from Lukas Slebodnik on 2016-04-21 05:24:14 EDT ---

Do you have any success with testing of scratch build from comment 12?
Could you also provide log files for problem with
"pam_sss(sshd:account): Access denied for user tim: 4 (System error)"
?

Comment 1 Jakub Hrozek 2016-04-22 13:10:10 UTC
Please note that if this bug is fixed, bug #1329647 must be fixed as well.

Comment 3 Jakub Hrozek 2016-04-22 17:03:49 UTC
Additional fix:
    master: 57d8b4b9254442a568838cb60ea16068965f2df0
    sssd-1-13: e5fbaf42f87725e42a40e542c06d5f4cdf4c52c2

Comment 5 Jakub Hrozek 2016-09-29 08:15:35 UTC
To reproduce, run basic tests (authentication, looking up identity data) with a sssd-1-13 branch against an IPA 3.x (so, RHEL-6) server.

Comment 13 Sudhir Menon 2016-11-29 10:01:09 UTC
Verified on RHEL6.9 Server and client using the below rpms.

ipa-server-3.0.0-51.el6.x86_64
sssd-1.13.3-46.el6.x86_64
ipa-client-3.0.0-51.el6.x86_64

Marking the bug verified as per logs seen in comment 11.

[root@master sssd]# /etc/init.d/sssd status
sssd (pid  30492) is running...

(Mon Nov 28 18:38:57 2016) [sssd[be[test-relm.test]]] [dp_get_options] (0x0400): Option ipa_views_search_base has no value 
(Mon Nov 28 18:38:57 2016) [sssd[be[test-relm.test]]] [ipa_get_id_options] (0x0100): Option ipa_views_search_base set to cn=views,cn=accounts,dc=test-relm,dc=test
(Mon Nov 28 18:38:57 2016) [sssd[be[test-relm.test]]] [common_parse_search_base] (0x0100): Search base added: [IPA_VIEWS][cn=views,cn=accounts,dc=test-relm,dc=test][SUBTREE][]
(Mon Nov 28 18:38:57 2016) [sssd[be[test-relm.test]]] [sdap_get_map] (0x0400): Option ipa_view_class has value nsContainer
(Mon Nov 28 18:38:57 2016) [sssd[be[test-relm.test]]] [sdap_get_map] (0x0400): Option ipa_view_name has value cn
(Mon Nov 28 18:38:57 2016) [sssd[be[test-relm.test]]] [sssm_ipa_id_init] (0x0020): Cannot find view name in the cache. Will do online lookup later.
(Mon Nov 28 18:38:57 2016) [sssd[be[test-relm.test]]] [dp_copy_options_ex] (0x0400): Option ipa_views_search_base has value cn=views,cn=accounts,dc=test-relm,dc=test
(Mon Nov 28 18:38:57 2016) [sssd[be[test-relm.test]]] [dp_copy_options_ex] (0x0400): Option ipa_views_search_base has value cn=views,cn=accounts,dc=test-relm,dc=test
(Mon Nov 28 18:38:58 2016) [sssd[be[test-relm.test]]] [ipa_get_view_name_done] (0x0400): get_view_name request failed, looks like server does not support views.
(Mon Nov 28 18:38:58 2016) [sssd[be[test-relm.test]]] [ipa_get_view_name_done] (0x0400): get_view_name request failed, looks like server does not support views.
(Mon Nov 28 22:38:58 2016) [sssd[be[test-relm.test]]] [ipa_get_view_name_done] (0x0400): get_view_name request failed, looks like server does not support views.
(Tue Nov 29 02:38:58 2016) [sssd[be[test-relm.test]]] [ipa_get_view_name_done] (0x0400): get_view_name request failed, looks like server does not support views.
(Tue Nov 29 06:38:58 2016) [sssd[be[test-relm.test]]] [ipa_get_view_name_done] (0x0400): get_view_name request failed, looks like server does not support views.
(Tue Nov 29 10:38:58 2016) [sssd[be[test-relm.test]]] [ipa_get_view_name_done] (0x0400): get_view_name request failed, looks like server does not support views.

Comment 15 errata-xmlrpc 2017-03-21 09:56:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0632.html


Note You need to log in before you can comment on or make changes to this bug.