This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2125607 - winbind leaks memory for each NTLM auth request [rhel-7.9.z]
Summary: winbind leaks memory for each NTLM auth request [rhel-7.9.z]
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: samba
Version: 7.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Andreas Schneider
QA Contact: Denis Karpelevich
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-09 12:49 UTC by Anton Bobrov
Modified: 2023-09-05 13:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-05 13:03:51 UTC
Target Upstream Version:
Embargoed:
abobrov: needinfo+
abobrov: needinfo+


Attachments (Terms of Use)
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --num-callers=50 (4.67 MB, application/gzip)
2022-09-09 12:49 UTC, Anton Bobrov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gitlab samba-team samba merge_requests 2717 0 None opened s3:auth: Flush the GETPWSID in memory cache for NTLM auth 2022-09-13 12:13:43 UTC
Red Hat Issue Tracker   RHEL-2220 0 None Migrated None 2023-09-05 13:03:46 UTC
Red Hat Issue Tracker RHELPLAN-133655 0 None None None 2022-09-09 13:01:54 UTC
Red Hat Issue Tracker SSSD-4990 0 None None None 2022-09-13 12:48:04 UTC
Samba Project 15169 0 None None None 2022-09-16 13:46:49 UTC

Description Anton Bobrov 2022-09-09 12:49:48 UTC
Created attachment 1910673 [details]
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --num-callers=50

Description of problem:

The customer has reported very slow winbind memory growth which accumulates over long period of time and as result requires periodic service restarts which is of course inconvenient and unpredictable.

Collected valgrind leaks report is attached. It looks like its leaking indirectly via LDAP handles and various small allocations associated with them on libldap part and its underlying dependencies.

There are quite a few things in the valgrind leaks report but it appears that new LDAP handles are being continuously create, like so

[ ..... ]
==23510==    by 0x13571D40: ldap_int_open_connection (in /usr/lib64/libldap-2.4.so.2.10.7)
==23510==    by 0x135850CC: ldap_new_connection (in /usr/lib64/libldap-2.4.so.2.10.7)
==23510==    by 0x135711DE: ldap_open_defconn (in /usr/lib64/libldap-2.4.so.2.10.7)
==23510==    by 0x135863D7: ldap_send_initial_request (in /usr/lib64/libldap-2.4.so.2.10.7)
==23510==    by 0x1357B418: ldap_sasl_bind (in /usr/lib64/libldap-2.4.so.2.10.7)
==23510==    by 0x1357B848: ldap_sasl_bind_s (in /usr/lib64/libldap-2.4.so.2.10.7)
==23510==    by 0x1357C0E4: ldap_simple_bind_s (in /usr/lib64/libldap-2.4.so.2.10.7)
==23510==    by 0xA8D574F: ??? (in /usr/lib64/libsmbldap.so.2)
==23510==    by 0xA8D66A4: ??? (in /usr/lib64/libsmbldap.so.2)

==23510==    by 0xA8D6D4A: smbldap_search (in /usr/lib64/libsmbldap.so.2)
==23510==    by 0xA8D6D96: smbldap_search_suffix (in /usr/lib64/libsmbldap.so.2)
==23510==    by 0x2043FAC6: smbldap_search_domain_info (in /usr/lib64/samba/libsmbldaphelper-samba4.so)
==23510==    by 0x20223949: pdb_ldapsam_init_common (in /usr/lib64/samba/pdb/ldapsam.so)
==23510==    by 0x6A278A8: make_pdb_method_name (in /usr/lib64/libsamba-passdb.so.0.27.2)
==23510==    by 0x6A27BA3: ??? (in /usr/lib64/libsamba-passdb.so.0.27.2)
==23510==    by 0x6A29CB8: initialize_password_db (in /usr/lib64/libsamba-passdb.so.0.27.2)
==23510==    by 0x12EE8B: main (in /usr/sbin/winbindd)

In the winbind code it appears that the original intent was to cache the LDAP handle and its associated connection and only free it on LDAP_SERVER_DOWN or any sort of reconnect conditions however it looks like (I'm not familiar with related code at all) a new handle is created every time via

pdb_ldapsam_init_common()/pdb_init_ldapsam_common() path

and nothing ever calls libldap ldap_unbind() API (which is libldap way to discard an LDAP handle and free all resources associated with it) unless there is an error condition eg connection problem etc. 
    

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Andreas Schneider 2022-09-12 16:21:35 UTC
Samba is using talloc [1] a hierarchical, reference counted memory pool system with destructors.


pdb_ldapsam_init_common()
  -> pdb_init_ldapsam_common()
     -> smbldap_init()
        -> talloc_set_destructor(*smbldap_state, smbldap_state_destructor);

The memory context of smbldap is the `pdb_method` pointer passed to pdb_ldapsam_init_common(). So when TALLOC_FREE(pdb_method) is called, the smbldap_state_destructor will be called with will do the ldap_unbind().

The smbldap state is stored in the private_data field of the pdb method, so it will be reused as long as the pdb_method exisits. It only exits once as you register the module only once. I do not see a memory leak by Samba here.


[1] https://talloc.samba.org/talloc/doc/html/index.html



Looking at the valgrind log the pdb memcache could be a problem.

Comment 4 Anton Bobrov 2022-09-13 13:08:28 UTC
ok, like i said i have no idea about that code, it just looks like it has accumulated some baggage via LDAP handle there. if it is leaking elsewhere it would make sense to change this bug summary line once you confirm the real root cause then.

Comment 5 Andreas Schneider 2022-10-24 15:06:35 UTC
Could you ask the customer if he is willing to test a package with a possible fix?

Comment 27 Andreas Schneider 2023-04-13 12:44:00 UTC
That means we have additional memory leaks.

To find memory leaks we would need to run AddressSanitizer with the memory leak detector turned on. However there are several small leaks which prevent even starting up. I would need to address them first.

This will be a bigger task. What the customer can do is to run the test binaries with valgrind maybe it will catch something. However this is normally not fun as valgrind slows down things a lot.

Comment 28 Andreas Schneider 2023-04-14 13:45:21 UTC
Well, it would be nice if we could find out what is causing it.

Is it when:

* User authenticate with NTLM
* User authenticate with Kerberos
* We query user information

The hotfix fixes a memory leak, however there might be more.

Comment 30 Andreas Schneider 2023-04-18 11:54:03 UTC
The logs would be to big to digest. The question is which workload increases the memory using. If we would know that we would know in which area of the code to look. It is hard to find this leak as we do not have a clean shutdown and all memory freed. This is a goal for one of the next Samba releases. However we are short on manpower so removing all memory leaks will take some time.

I will have another look into the valgrind logs if I can spot anything suspicious again.

What you can ask the customer if he knows what workload makes the memory leak grow faster ...

Comment 35 RHEL Program Management 2023-09-05 12:11:48 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 36 RHEL Program Management 2023-09-05 13:03:51 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues.


Note You need to log in before you can comment on or make changes to this bug.