Description of problem: We use FreeIPA on CentOS 6 for user authentication. On Fedora 23, the sssd_be consumes lots of memory and writes error messages to the log at a rapid rate (gigabytes per day in debug_level 4). Version-Release number of selected component (if applicable): sssd-1.13.3-6.fc23.x86_64 Also tested: sssd-1.13.4-1.fc23.x86_64 How reproducible: Always Additional info: The server runs ipa-server-3.0.0-47.el6.centos.2.x86_64. On the client, the memory consumption of sssd_be constantly increases. The log contains lots of messages of the following form (can't actually tell the order since they repeat over and over, debug_level set to 4): (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_get_generic_op_finished] (0x0040): Unexpected result from ldap: Protocol error(2), A dereference attribute must have DN syntax (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed [5]: Input/output error (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_deref_search_done] (0x0040): dereference processing failed [5]: Input/output error On the server, this causes a very high load in ns-slapd for the domain in question. We are currently in the process of upgrading from F20. There, everything was working fine. So it seems that sssd in the meantime introduced something that triggers this behavior. Either this is a general incompatibility that current exists between the latest sssd and the CentOS 6 FreeIPA, or maybe some FreeIPA upgrade had an issue? (I remember The workstations with the new version seem to work fine, aside from becoming unresponsive once sssd_be fills up the memory. Users can authenticate. If there is anything I can provide to help analyze and fix this issue please let me know. Currently this is a roadblock for further upgrading our machines. We would like to resolve this without upgrading the FreeIPA server (after all, we chose an enterprise Linux for long-term stability).
(In reply to Tim Niemueller from comment #0) > Description of problem: > We use FreeIPA on CentOS 6 for user authentication. On Fedora 23, the > sssd_be consumes lots of memory and writes error messages to the log at a > rapid rate (gigabytes per day in debug_level 4). > > Version-Release number of selected component (if applicable): > sssd-1.13.3-6.fc23.x86_64 > Also tested: sssd-1.13.4-1.fc23.x86_64 > > How reproducible: > Always > > Additional info: > The server runs ipa-server-3.0.0-47.el6.centos.2.x86_64. > Do you have installed the latest 389-ds-base on IPA server? 1.2.11.15-72.el6_7? How many replicas do you have in topology? > On the client, the memory consumption of sssd_be constantly increases. The > log contains lots of messages of the following form (can't actually tell the > order since they repeat over and over, debug_level set to 4): > (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_get_generic_op_finished] > (0x0040): Unexpected result from ldap: Protocol error(2), A dereference > attribute must have DN syntax > (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [generic_ext_search_handler] > (0x0040): sdap_get_generic_ext_recv failed [5]: Input/output error > (Mon Apr 18 14:49:30 2016) [sssd[be[---]]] [sdap_deref_search_done] > (0x0040): dereference processing failed [5]: Input/output error > > On the server, this causes a very high load in ns-slapd for the domain in > question. > Could you test with sssd-1.13.4 (it is in updates testing ATM)? > We are currently in the process of upgrading from F20. There, everything was > working fine. So it seems that sssd in the meantime introduced something > that triggers this behavior. Either this is a general incompatibility that > current exists between the latest sssd and the CentOS 6 FreeIPA, or maybe > some FreeIPA upgrade had an issue? (I remember > > The workstations with the new version seem to work fine, aside from becoming > unresponsive once sssd_be fills up the memory. Users can authenticate. > > If there is anything I can provide to help analyze and fix this issue please > let me know. Currently this is a roadblock for further upgrading our > machines. We would like to resolve this without upgrading the FreeIPA server > (after all, we chose an enterprise Linux for long-term stability). It is a valid use-case and sssd should even with IPA on el6. Please attach domain log files with higher debug_level (7 or even better with 9)
Created attachment 1148229 [details] sssd domain log debug level 9
We're running 389-ds-base-1.2.11.15-72.el6_7.x86_64. There is a single replica in the topology (it's a rather small network with about a dozen clients). I have tested the version from updates-testing (or rather from Koji, the update was still pending). The result is the same. I have attached the domain log with debug_level=9. I have only cut the first 5000 lines. It generated more than 1000000 lines in a mere minute... It is everything until the error appears. If you need more let me know.
(In reply to Tim Niemueller from comment #3) > We're running 389-ds-base-1.2.11.15-72.el6_7.x86_64. > > There is a single replica in the topology (it's a rather small network with > about a dozen clients). > > I have tested the version from updates-testing (or rather from Koji, the > update was still pending). The result is the same. > yes, you are right. It hasn't been pushed to testing yet. https://bodhi.fedoraproject.org/updates/sssd-1.13.4-1.fc23 > I have attached the domain log with debug_level=9. I have only cut the first > 5000 lines. It generated more than 1000000 lines in a mere minute... It is > everything until the error appears. If you need more let me know. I will try to find something in attached log. Meanwhile you might try to disable dereference lookups completely by setting the value of ldap_deref_threshold to 0. It's just a workaround.
Thanks for your effort! Adding ldap_deref_threshold = 0 to the domain section of sssd.conf changed nothing, the problem still persists.
I had a discussion with fellow developer and this bug could be caused by upstream ticket https://fedorahosted.org/sssd/ticket/2960. It was backported to fedora in sssd-1.13.3-6.fc23.x86_64 Could you test with sssd-1.13.3-5.fc23.x86_64 or older (1.13.3-1)? I assume it should not be a problem for you because you already tested 1.13.4 from koji.
I think I was able to reproduce the issues and the test build in http://koji.fedoraproject.org/koji/taskinfo?taskID=13699811 resolved it for me. Please let me know if this fixes the issue for you as well. For your convenience you might want to try curl http://koji.fedoraproject.org/koji/taskinfo?taskID=13699811 | grep -o '"https://.*.rpm"' | xargs -n 1 curl -L -O which should download all packages in a single run.
Indeed, this seems to fix the problem. sssd_be is now back to 14640 RES memory. In the log I now see: (Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [sdap_get_generic_op_finished] (0x0040): Unexp ected result from ldap: Protocol error(2), A dereference attribute must have DN syntax (Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [generic_ext_search_handler] (0x0040): sdap_ge t_generic_ext_recv failed [5]: Input/output error (Mon Apr 18 19:44:38 2016) [sssd[be[DOM]]] [sdap_deref_search_done] (0x0040): dereference processing failed [5]: Input/output error and (Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [sysdb_get_real_name] (0x0040): Cannot find us er [gdm] in cache (Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request (Mon Apr 18 19:44:39 2016) [sssd[be[DOM]]] [acctinfo_callback] (0x0100): Request processe d. Returned 3,0,Account info lookup failed (Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [ipa_host_info_done] (0x0020): Server does not support deref (Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [ipa_sudo_refresh_host_done] (0x0040): Unable to retrieve host information [5]: Input/output error (Mon Apr 18 19:44:48 2016) [sssd[be[DOM]]] [be_ptask_done] (0x0040): Task [SUDO Full Refr esh]: failed with [5]: Input/output error I guess this is the expected output as some feature is missing on our older FreeIPA install? Would you mind giving a hint what the actual problem was? Can you roll out the fix soonish? We have postponed our upgrade until this hits updates. Thank you very much, especially for this speedy reaction and quick fixing. This is highly appreciated!
(In reply to Tim Niemueller from comment #8) > Thank you very much, especially for this speedy reaction and quick fixing. > This is highly appreciated! As soon as patch will be accepted in upstream. Thank you very much for report and testing.
Upstream ticket: https://fedorahosted.org/sssd/ticket/2993
We got a new problem. Now we cannot login to the machine at all it seems. I only did "id tim" to see if user info could be retrieved to claim success. In /var/log/secure I see: Apr 19 15:45:31 HOST sshd[1686]: pam_sss(sshd:account): Access denied for user tim: 4 (System error) Apr 19 15:45:31 HOST sshd[1686]: fatal: Access denied for user tim by PAM account configuration [preauth] In the sssd log is: (Tue Apr 19 15:48:20 2016) [sssd[be[DOMAIN]]] [acctinfo_callback] (0x0100): Request processed. Returned 3,0,Account info lookup failed Could this have been caused by the fix?
Thank you for testing. I think the login issue you are seeing is not related. Please attach full SSSD logs file with debug_level=10. Besides that I wonder if you can test the build from http://koji.fedoraproject.org/koji/taskinfo?taskID=13715139 as well. It contains a different fix for the issue. Yesterday's build just contain a band-aid which made sure that some data about the IPA domain are only read once. In the morning I took a closer look why the related function was called repeatedly. It turned out that it started with a reordering of the different type of data SSSD read from the IPA server to find out about trusted domains. If you are interested the commit id is 5a5f1e1053415efaa99bb4d5bc7ce7ac0a95b757. In the commit message the old and new order is show: Previsouly the order was: - ranges - subdomains - if on client, views - master Now the order is: - ranges - master - subdomains - if on client, views As you can see originally the view name was read before the master domain record. The view name lookup is the dereference search you see failing in the logs because IPA on RHEL6 does not support views. In the case of this error SSSD jumps to the master domain record lookup. With the new ordering the master domain is lookup up earlier and the view name is looked up last. But unfortunately we forgot to update the error handling in the view name lookup and on error SSSD was still jumping to the master record lookup. This is not an issue as long as the master domain record is available because in this case the data from this object are already stored in SSSD and the lookup is skipped. But this record is only needed when creating trust to AD and is created by running ipa-adtrust-install on the server. In your case where the record does not exist SSSD can of course not find the data from this record and try to look the data up again and after that tries to look up the view name again which finally causes the loop you are seeing. In this new test build I replaced the band-aid with a patch which just finishes the request if the view name lookup fails. This issue was not discovered earlier because another more recent patch (b25d33b0a775e2337014a334699156ac56b08f9b) was needed to trigger the issue on RHEL6.
(In reply to Sumit Bose from comment #12) > Thank you for testing. > > I think the login issue you are seeing is not related. Please attach full > SSSD logs file with debug_level=10. > But maybe it would be good to track it in new Bugzilla ticket.
Do you have any success with testing of scratch build from comment 12? Could you also provide log files for problem with "pam_sss(sshd:account): Access denied for user tim: 4 (System error)" ?
Additional fix: master: 57d8b4b9254442a568838cb60ea16068965f2df0 sssd-1-13: e5fbaf42f87725e42a40e542c06d5f4cdf4c52c2
sssd-1.13.4-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-a70b9da866
sssd-1.13.4-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-f1498ed136
sssd-1.13.4-2.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-48891618d0
Hi. I'm currently on vacation and will reply the week following the next.
sssd-1.13.4-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-a70b9da866
sssd-1.13.4-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f1498ed136
sssd-1.13.4-2.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-48891618d0
sssd-1.13.4-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
sssd-1.13.4-2.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
Created attachment 1153821 [details] Log with log_level 10 This log file shows a failed login attempt with log_level=10. Searching for "tim " yields: "tim is a direct member of 8 LDAP groups". So this in principle seems to be working. But then: "Server does not support deref" "communication error on cached connection, moving to next server" Since there is only one server I guess that's where it fails. This indeed seems related to the original bug, doesn't it? Thanks for your continued help and effort to fix this (back from vacation btw).
The login issue might be a side effect of setting 'ldap_deref_threshold = 0' because the HBAC rules cannot be processed with this setting. Please remove the option an try again.
Yep, that did it. Thanks a lot!
sssd-1.13.4-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.