This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2196664 - sssd_be segfaults
Summary: sssd_be segfaults
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: cyrus-sasl
Version: CentOS Stream
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Simo Sorce
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-09 19:05 UTC by Stephen Roylance
Modified: 2023-09-17 19:18 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-28 19:27:58 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
Bundle with cyrus-sasl test rpms (1.03 MB, application/octet-stream)
2023-06-02 14:47 UTC, Simo Sorce
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CRYPTO-10587 0 None None None 2023-05-09 21:04:59 UTC
Red Hat Issue Tracker   RHEL-1739 0 None Migrated None 2023-08-28 19:27:48 UTC
Red Hat Issue Tracker RHELPLAN-156818 0 None None None 2023-05-09 19:07:46 UTC

Description Stephen Roylance 2023-05-09 19:05:36 UTC
Description of problem:
sssd_be sometimes segfaults under load

Version-Release number of selected component (if applicable):
cyrus-sasl-2.1.27-6.el8_5.x86_64

How reproducible:
on an nvidia DGX100 joined to an IPA realm running an mpi all_reduce performance test

Additional info:

#0  sasl_gss_encode (context=0x0, invec=<optimized out>, numiov=<optimized out>, output=0x562cff3bc538, outputlen=0x7ffd58ffb594, privacy=1) at gssapi.c:370                                                      
#1  0x00007f2f6de215ee in _sasl_encodev (conn=conn@entry=0x562cff412780, invec=invec@entry=0x7ffd58ffb560, numiov=numiov@entry=1, p_num_packets=p_num_packets@entry=0x7ffd58ffb4fc,                               
    output=output@entry=0x562cff3bc538, outputlen=outputlen@entry=0x7ffd58ffb594) at common.c:359
#2  0x00007f2f6de23623 in sasl_encodev (conn=conn@entry=0x562cff412780, invec=invec@entry=0x7ffd58ffb560, numiov=numiov@entry=1, output=output@entry=0x562cff3bc538, outputlen=outputlen@entry=0x7ffd58ffb594)
    at common.c:582
#3  0x00007f2f6de23750 in sasl_encode (conn=0x562cff412780, input=<optimized out>, inputlen=<optimized out>, output=output@entry=0x562cff3bc538, outputlen=outputlen@entry=0x7ffd58ffb594) at common.c:304
#4  0x00007f2f6e4730ca in sb_sasl_cyrus_encode (p=0x562cff3bc4b0, buf=<optimized out>, len=<optimized out>, dst=0x562cff3bc520) at cyrus.c:157
#5  0x00007f2f6e476350 in sb_sasl_generic_write (sbiod=0x562cff3b8880, buf=0x562cff419ff0, len=<optimized out>) at sasl.c:783
#6  0x00007f2f6e25585c in sb_debug_write (sbiod=0x562cff3a3050, buf=0x562cff419ff0, len=286) at sockbuf.c:854
#7  0x00007f2f6e25585c in sb_debug_write (sbiod=0x562cff3c2900, buf=0x562cff419ff0, len=286) at sockbuf.c:854
#8  0x00007f2f6e256f85 in ber_int_sb_write (sb=sb@entry=0x562cff2ef480, buf=0x562cff419ff0, len=len@entry=286) at sockbuf.c:445
#9  0x00007f2f6e253223 in ber_flush2 (sb=0x562cff2ef480, ber=0x562cff3720f0, freeit=freeit@entry=0) at io.c:246
#10 0x00007f2f6e481775 in ldap_int_flush_request (ld=ld@entry=0x562cff3d81a0, lr=lr@entry=0x562cff2ef2a0) at request.c:186
#11 0x00007f2f6e4819a7 in ldap_send_server_request (ld=ld@entry=0x562cff3d81a0, ber=ber@entry=0x562cff3720f0, msgid=msgid@entry=13, parentreq=parentreq@entry=0x0, srvlist=srvlist@entry=0x0, 
    lc=<optimized out>, lc@entry=0x0, bind=0x0, m_noconn=0, m_res=0) at request.c:408

Based on the conditions, I suspect this may be resolved with the upstream commit https://github.com/cyrusimap/cyrus-sasl/commit/df037bd4e20f7508fc36a9292d75e94c04dc8daa

Comment 1 Simo Sorce 2023-05-09 20:11:15 UTC
You opened a bug against the RHEL 9 product, but the RPM you mention is an RHEL 8 rpm, did you file against the wrong product or did you copy the wrong rpm version?

Comment 2 Stephen Roylance 2023-05-09 20:33:04 UTC
Sorry, the crash happened on 8.   Our next update cycle will be on 9, though, so a fix in 8 won't help us in particular.


if the full backtrace is helpful, this is it with the domain name redacted:
#0  sasl_gss_encode (context=0x0, invec=<optimized out>, numiov=<optimized out>, output=0x562cff3bc538, outputlen=0x7ffd58ffb594, privacy=1) at gssapi.c:370                                                      
#1  0x00007f2f6de215ee in _sasl_encodev (conn=conn@entry=0x562cff412780, invec=invec@entry=0x7ffd58ffb560, numiov=numiov@entry=1, p_num_packets=p_num_packets@entry=0x7ffd58ffb4fc,                               
    output=output@entry=0x562cff3bc538, outputlen=outputlen@entry=0x7ffd58ffb594) at common.c:359
#2  0x00007f2f6de23623 in sasl_encodev (conn=conn@entry=0x562cff412780, invec=invec@entry=0x7ffd58ffb560, numiov=numiov@entry=1, output=output@entry=0x562cff3bc538, outputlen=outputlen@entry=0x7ffd58ffb594)
    at common.c:582
#3  0x00007f2f6de23750 in sasl_encode (conn=0x562cff412780, input=<optimized out>, inputlen=<optimized out>, output=output@entry=0x562cff3bc538, outputlen=outputlen@entry=0x7ffd58ffb594) at common.c:304
#4  0x00007f2f6e4730ca in sb_sasl_cyrus_encode (p=0x562cff3bc4b0, buf=<optimized out>, len=<optimized out>, dst=0x562cff3bc520) at cyrus.c:157
#5  0x00007f2f6e476350 in sb_sasl_generic_write (sbiod=0x562cff3b8880, buf=0x562cff419ff0, len=<optimized out>) at sasl.c:783
#6  0x00007f2f6e25585c in sb_debug_write (sbiod=0x562cff3a3050, buf=0x562cff419ff0, len=286) at sockbuf.c:854
#7  0x00007f2f6e25585c in sb_debug_write (sbiod=0x562cff3c2900, buf=0x562cff419ff0, len=286) at sockbuf.c:854
#8  0x00007f2f6e256f85 in ber_int_sb_write (sb=sb@entry=0x562cff2ef480, buf=0x562cff419ff0, len=len@entry=286) at sockbuf.c:445
#9  0x00007f2f6e253223 in ber_flush2 (sb=0x562cff2ef480, ber=0x562cff3720f0, freeit=freeit@entry=0) at io.c:246
#10 0x00007f2f6e481775 in ldap_int_flush_request (ld=ld@entry=0x562cff3d81a0, lr=lr@entry=0x562cff2ef2a0) at request.c:186
#11 0x00007f2f6e4819a7 in ldap_send_server_request (ld=ld@entry=0x562cff3d81a0, ber=ber@entry=0x562cff3720f0, msgid=msgid@entry=13, parentreq=parentreq@entry=0x0, srvlist=srvlist@entry=0x0, 
    lc=<optimized out>, lc@entry=0x0, bind=0x0, m_noconn=0, m_res=0) at request.c:408
#12 0x00007f2f6e481e26 in ldap_send_initial_request (ld=ld@entry=0x562cff3d81a0, msgtype=msgtype@entry=99, dn=dn@entry=0x562cff3c2f60 "cn=certmap,dc=XXX,dc=facebook,dc=com", ber=0x562cff3720f0, msgid=13)
    at request.c:169
#13 0x00007f2f6e470d32 in ldap_pvt_search (ld=0x562cff3d81a0, base=0x562cff3c2f60 "cn=certmap,dc=XXX,dc=facebook,dc=com", scope=2, 
    filter=0x7f2f6a8afb10 "(|(&(objectClass=ipaCertMapRule)(ipaEnabledFlag=TRUE))(objectClass=ipaCertMapConfigObject))", attrs=0x7ffd58ffbd10, attrsonly=0, sctrls=0x562cff3d7990, cctrls=0x0, timeout=0x0, 
    sizelimit=0, deref=-1, msgidp=0x7ffd58ffba64) at search.c:128
#14 0x00007f2f6e470e14 in ldap_search_ext (ld=<optimized out>, base=<optimized out>, scope=<optimized out>, filter=<optimized out>, attrs=<optimized out>, attrsonly=<optimized out>, sctrls=0x562cff3d7990, 
    cctrls=0x0, timeout=0x0, sizelimit=0, msgidp=0x7ffd58ffba64) at search.c:69
#15 0x00007f2f6a1760d9 in sdap_get_generic_ext_step (req=req@entry=0x562cff3d76d0) at src/providers/ldap/sdap_async.c:1629
#16 0x00007f2f6a1765e9 in sdap_get_generic_ext_send (memctx=<optimized out>, ev=ev@entry=0x562cff2da460, opts=opts@entry=0x562cff2eab30, sh=sh@entry=0x562cff3b4dc0, 
    search_base=search_base@entry=0x562cff3c2f60 "cn=certmap,dc=XXX,dc=facebook,dc=com", scope=scope@entry=2, 
    filter=0x7f2f6a8afb10 "(|(&(objectClass=ipaCertMapRule)(ipaEnabledFlag=TRUE))(objectClass=ipaCertMapConfigObject))", attrs=0x7ffd58ffbd10, serverctrls=0x0, clientctrls=0x0, sizelimit=0, timeout=0, 
    parse_cb=0x7f2f6a173ae0 <sdap_get_and_parse_generic_parse_entry>, cb_data=0x562cff3dd390, flags=0) at src/providers/ldap/sdap_async.c:1567
#17 0x00007f2f6a177270 in sdap_get_and_parse_generic_send (memctx=memctx@entry=0x562cff3fa7a0, ev=ev@entry=0x562cff2da460, opts=opts@entry=0x562cff2eab30, sh=sh@entry=0x562cff3b4dc0, 
    search_base=search_base@entry=0x562cff3c2f60 "cn=certmap,dc=XXX,dc=facebook,dc=com", scope=scope@entry=2, 
    filter=0x7f2f6a8afb10 "(|(&(objectClass=ipaCertMapRule)(ipaEnabledFlag=TRUE))(objectClass=ipaCertMapConfigObject))", attrs=0x7ffd58ffbd10, map=0x0, map_num_attrs=0, attrsonly=0, serverctrls=0x0, 
    clientctrls=0x0, sizelimit=0, timeout=0, allow_paging=false) at src/providers/ldap/sdap_async.c:2020
#18 0x00007f2f6a177512 in sdap_get_generic_send (memctx=0x562cff3fa7a0, ev=0x562cff2da460, opts=0x562cff2eab30, sh=0x562cff3b4dc0, search_base=0x562cff3c2f60 "cn=certmap,dc=XXX,dc=facebook,dc=com", scope=2, 
    filter=0x7f2f6a8afb10 "(|(&(objectClass=ipaCertMapRule)(ipaEnabledFlag=TRUE))(objectClass=ipaCertMapConfigObject))", attrs=0x7ffd58ffbd10, map=0x0, map_num_attrs=0, timeout=0, allow_paging=false)
    at src/providers/ldap/sdap_async.c:2121
#19 0x00007f2f6a871e52 in ipa_subdomains_refresh_ranges_done () from /usr/lib64/sssd/libsss_ipa.so
#20 0x00007f2f717b1ec2 in _tevent_req_error (req=<optimized out>, error=<optimized out>, location=<optimized out>) at ../../tevent_req.c:211
#21 0x00007f2f6a870969 in ipa_subdomains_ranges_done () from /usr/lib64/sssd/libsss_ipa.so
#22 0x00007f2f717b1ec2 in _tevent_req_error (req=req@entry=0x562cff3dfff0, error=error@entry=5, location=location@entry=0x7f2f6a1d7b20 "src/providers/ldap/sdap_ops.c:192") at ../../tevent_req.c:211
#23 0x00007f2f6a1a2a52 in sdap_search_bases_ex_done (subreq=0x0) at src/providers/ldap/sdap_ops.c:192
#24 0x00007f2f717b1ec2 in _tevent_req_error (req=<optimized out>, error=<optimized out>, location=<optimized out>) at ../../tevent_req.c:211
#25 0x00007f2f717b1ec2 in _tevent_req_error (req=req@entry=0x562cff3dd1d0, error=error@entry=5, location=location@entry=0x7f2f6a1beef0 "src/providers/ldap/sdap_async.c:1948") at ../../tevent_req.c:211
#26 0x00007f2f6a1738fe in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1948
#27 0x00007f2f717b1ec2 in _tevent_req_error (req=req@entry=0x562cff3d76d0, error=error@entry=5, location=location@entry=0x7f2f6a1bfdf0 "src/providers/ldap/sdap_async.c:1739") at ../../tevent_req.c:211
#28 0x00007f2f6a176b62 in sdap_get_generic_op_finished (op=<optimized out>, reply=0x0, error=5, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1739
#29 0x00007f2f6a174bff in sdap_handle_release (sh=0x562cff3b4dc0) at src/providers/ldap/sdap_async.c:143
#30 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:245
#31 0x00007f2f717b0f97 in tevent_common_invoke_fd_handler (fde=fde@entry=0x562cff3b3f20, flags=<optimized out>, removed=removed@entry=0x0) at ../../tevent_fd.c:142
#32 0x00007f2f717b77af in epoll_event_loop (tvalp=0x7ffd58ffbfe0, epoll_ev=0x562cff2da740) at ../../tevent_epoll.c:736
#33 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../../tevent_epoll.c:937
#34 0x00007f2f717b579b in std_event_loop_once (ev=0x562cff2da460, location=0x7f2f7461fff4 "src/util/server.c:744") at ../../tevent_standard.c:110
#35 0x00007f2f717b0365 in _tevent_loop_once (ev=ev@entry=0x562cff2da460, location=location@entry=0x7f2f7461fff4 "src/util/server.c:744") at ../../tevent.c:790
#36 0x00007f2f717b060b in tevent_common_loop_wait (ev=0x562cff2da460, location=0x7f2f7461fff4 "src/util/server.c:744") at ../../tevent.c:913
#37 0x00007f2f717b572b in std_event_loop_wait (ev=0x562cff2da460, location=0x7f2f7461fff4 "src/util/server.c:744") at ../../tevent_standard.c:141
#38 0x00007f2f745fda37 in server_loop (main_ctx=0x562cff2da7d0) at src/util/server.c:744
#39 0x0000562cfe3b0955 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:802

Comment 4 Simo Sorce 2023-05-09 21:05:32 UTC
Do you know if there is a way to reproduce this crash on demand, or is this happening at random?

Comment 5 Stephen Roylance 2023-05-10 15:27:29 UTC
I can't trigger it on demand.  It happens consistently, a few times a day, on our DGX nodes in production, and I can reliably see it happen by running all_reduce_perf from https://github.com/NVIDIA/nccl-tests for long enough on similar nodes in our test environment.

Comment 6 Simo Sorce 2023-05-31 14:46:07 UTC
Would you be able to use a test build with the patch and provide feedback on whether you see a drop in occurences?

Comment 7 Stephen Roylance 2023-05-31 14:57:04 UTC
(In reply to Simo Sorce from comment #6)
> Would you be able to use a test build with the patch and provide feedback on
> whether you see a drop in occurences?

yea, happy to.  Will take at least a few weeks to get everything lined back up and get dedicated time on the test nodes.

Comment 8 Simo Sorce 2023-06-02 14:47:10 UTC
Created attachment 1968598 [details]
Bundle with cyrus-sasl test rpms

Comment 9 Simo Sorce 2023-06-02 14:48:33 UTC
I attache dto the bug a set of test packages to try.
If they do resolve the issue I can schedule work to include this in a future RHEL update.

Comment 10 Simo Sorce 2023-07-07 13:14:15 UTC
Stephen,
any news on this?

Comment 11 Stephen Roylance 2023-07-07 16:34:16 UTC
(In reply to Simo Sorce from comment #10)
> Stephen,
> any news on this?

sorry for the delay, I lost the test nodes to another project and am waiting for them to get rebuilt so I can use them.

Comment 12 RHEL Program Management 2023-08-28 19:27:32 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 13 RHEL Program Management 2023-08-28 19:27:58 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues.


Note You need to log in before you can comment on or make changes to this bug.