RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1791300 - sporadic sssd_be crash on s390x
Summary: sporadic sssd_be crash on s390x
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.2
Assignee: Pavel Březina
QA Contact: Steeve Goveas
URL:
Whiteboard: sync-to-jira
: 1895794 (view as bug list)
Depends On: 1881992
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-15 13:30 UTC by Jan Stancek
Modified: 2021-05-18 15:04 UTC (History)
11 users (show)

Fixed In Version: sssd-2.4.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 15:03:54 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 5298 0 None closed [abrt] sssd-common: dp_client_register(): sssd_be killed by SIGSEGV 2021-02-17 10:57:32 UTC

Description Jan Stancek 2020-01-15 13:30:06 UTC
Description of problem:
While testing kernel with LTP, we are seeing sssd_be is sporadically crashing on s390x:

ore was generated by `/usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=file'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000002aa08d969c2 in dp_client_register ()
(gdb) bt -a
No symbol "a" in current context.
(gdb) bt
#0  0x000002aa08d969c2 in dp_client_register ()
#1  0x000003ffb508cdaa in _sbus_sss_invoke_in_s_out__step () from /usr/lib64/sssd/libsss_iface.so
#2  0x000003ffb4f0cee8 in tevent_common_invoke_timer_handler (te=te@entry=0x2aa1586e020, current_time=..., removed=removed@entry=0x0) at ../../tevent_timed.c:370
#3  0x000003ffb4f0d0a6 in tevent_common_loop_timer_delay (ev=0x2aa15820860) at ../../tevent_timed.c:442
#4  0x000003ffb4f0e71c in epoll_event_loop (tvalp=0x3ffe3e7e9b0, epoll_ev=0x2aa1582a920) at ../../tevent_epoll.c:667
#5  epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../../tevent_epoll.c:937
#6  0x000003ffb4f0c314 in std_event_loop_once (ev=0x2aa15820860, location=0x3ffb5d6f238 "src/util/server.c:718") at ../../tevent_standard.c:110
#7  0x000003ffb4f06cb4 in _tevent_loop_once (ev=ev@entry=0x2aa15820860, location=location@entry=0x3ffb5d6f238 "src/util/server.c:718") at ../../tevent.c:772
#8  0x000003ffb4f06f66 in tevent_common_loop_wait (ev=0x2aa15820860, location=0x3ffb5d6f238 "src/util/server.c:718") at ../../tevent.c:893
#9  0x000003ffb4f0c294 in std_event_loop_wait (ev=0x2aa15820860, location=0x3ffb5d6f238 "src/util/server.c:718") at ../../tevent_standard.c:141
#10 0x000003ffb5d4d8ba in server_loop () from /usr/lib64/sssd/libsss_util.so
#11 0x000002aa08d89588 in main ()


Version-Release number of selected component (if applicable):
sssd-2.2.3-6.el8.s390x

How reproducible:
sporadically, about 3 times in last ~6 weeks

Steps to Reproduce:
unknown, the test (LTP) that ran while this occurred has no known interaction with sssd. It however collects all cores, which is what makes it visible.

Actual results:
sssd_be sporadic crashes 

Expected results:
no crashes

Additional info:

Comment 5 Alexey Tikhonov 2020-01-16 10:56:32 UTC
#0  0x000002aa08d969c2 in dp_client_register (mem_ctx=<optimized out>, sbus_req=<optimized out>, provider=0x2aa1583a700, 
    name=0x2aa1589dc60 "nss") at src/providers/data_provider/dp_client.c:107
#1  0x000003ffb508cdaa in _sbus_sss_invoke_in_s_out__step (ev=<optimized out>, te=<optimized out>, 
    tv=<error reading variable: value has been optimized out>, private_data=<optimized out>) at src/sss_iface/sbus_sss_invokers.c:682
#2  0x000003ffb4f0cee8 in tevent_common_invoke_timer_handler (te=te@entry=0x2aa1586e020, current_time=..., removed=removed@entry=0x0)
    at ../../tevent_timed.c:370


(gdb) frame 0
#0  0x000002aa08d969c2 in dp_client_register (mem_ctx=<optimized out>, sbus_req=<optimized out>, provider=0x2aa1583a700, 
    name=0x2aa1589dc60 "nss") at src/providers/data_provider/dp_client.c:107
107	    dp_cli->name = talloc_strdup(dp_cli, name);

(gdb) p name
$2 = 0x2aa1589dc60 "nss"
(gdb) p dp_cli
$3 = (struct dp_client *) 0x0

```
    cli_conn = sbus_server_find_connection(dp_sbus_server(provider),
                                           sbus_req->sender->name);
    if (cli_conn == NULL) {
        DEBUG(SSSDBG_CRIT_FAILURE, "Unknown client: %s\n",
              sbus_req->sender->name);
        return ENOENT;
    }

    dp_cli = sbus_connection_get_data(cli_conn, struct dp_client);

    dp_cli->name = talloc_strdup(dp_cli, name);
```

(gdb) p cli_conn
$5 = <optimized out>

So `cli_conn != NULL` but either `cli_conn->data == NULL` or typeof `cli_conn->data` != `struct dp_client`


(gdb) p *provider
$9 = {uid = 0, gid = 0, be_ctx = 0x2aa15839060, ev = 0x2aa15820860, sbus_server = 0x2aa15852520, sbus_conn = 0x2aa158551f0, clients = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, terminating = false, requests = {index = 0, num_active = 0, active = 0x0}, modules = 0x2aa1586aff0, targets = 0x2aa1586b0b0}

(gdb) p *provider->sbus_server 
$11 = {ev = 0x2aa15820860, server = 0x2aa1583dcf0, symlink = 0x2aa1583d7f0 "/var/lib/sss/pipes/private/sbus-dp_implicit_files", watch_ctx = 0x2aa15838070, router = 0x2aa15851e80, data_slot = 0, last_activity = 0x0, names = 0x2aa15838d30, match_rules = 0x2aa15852610, max_connections = 1000, uid = 0, gid = 0, on_connection = 0x2aa1583dae0, disconnecting = false, name = {major = 1, minor = 2}}


p sizeof(struct talloc_chunk)
$15 = 88

(gdb) p *((struct talloc_chunk *)((char *)provider->sbus_server - 96))
$17 = {flags = 3404304376, next = 0x0, prev = 0x2aa15855190, parent = 0x0, child = 0x2aa15861560, refs = 0x0, destructor = 0x3ffb5029d20 <sbus_server_destructor>, name = 0x3ffb5034f08 "struct sbus_server", size = 112, limit = 0x0, pool = 0x2aa15852280}

(gdb) p *((struct talloc_chunk *)((char *)provider->sbus_server->names - 96))
$18 = {flags = 3404304368, next = 0x2aa1583da80, prev = 0x2aa158525b0, parent = 0x0, child = 0x2aa158980e0, refs = 0x0, destructor = 0x0, name = 0x3ffb5d6e436 "src/util/util.c:374", size = 168, limit = 0x0, pool = 0x0}

 => pointers are valid (not freed)


(gdb) frame 1
(gdb) p state
$20 = <optimized out>
(gdb) p *((struct _sbus_sss_invoke_in_s_out__state*)req->data)->sbus_req->sender
$26 = {name = 0x2aa1587b3c0 "sssd.nss", uid = 0}


cli_conn = sbus_server_find_connection = sss_ptr_hash_lookup(provider->sbus_server->names, sbus_req->sender->name == "sssd.nss")


(gdb) p *provider->sbus_server->names
$14 = {p = 0, maxp = 4, entry_count = 4, bucket_count = 4, segment_count = 1, min_load_factor = 1, max_load_factor = 5, directory_size = 4, 
  directory_size_shift = 2, segment_size = 4, segment_size_shift = 2, delete_callback = 0x3ffb5d59ef8 <sss_ptr_hash_delete_cb>, 
  delete_pvt = 0x2aa1583e210, halloc = 0x3ffb5d490b0 <hash_talloc>, hfree = 0x3ffb5d490a0 <hash_talloc_free>, halloc_pvt = 0x2aa1583d200, 
  directory = 0x2aa1583f610, statistics = {hash_accesses = 14, hash_collisions = 2, table_expansions = 0, table_contractions = 0}}


(gdb) p *provider->sbus_server->names->directory[0][0]
$45 = {entry = {key = {type = HASH_KEY_STRING, {str = 0x2aa15869aa0 "sssd.domain_implicit_5ffiles", c_str = 0x2aa15869aa0 "sssd.domain_implicit_5ffiles", ul = 2929528838816}}, value = {type = HASH_VALUE_PTR, {ptr = 0x2aa15869880, ..}}}, next = 0x2aa15869720}

  --  key of this entry - "sssd.domain_implicit_5ffiles" - looks strange


(gdb) p *provider->sbus_server->names->directory[0][0]->next
$57 = {entry = {key = {type = HASH_KEY_STRING, {str = 0x2aa1586a5a0 ":1.2", c_str = 0x2aa1586a5a0 ":1.2", ul = 2929528841632}}, value = { type = HASH_VALUE_PTR, {ptr = 0x2aa15866f60, ...}}}, next = 0x0}

(gdb) p *provider->sbus_server->names->directory[0][1]
$46 = {entry = {key = {type = HASH_KEY_STRING, {str = 0x2aa15867760 ":1.1", c_str = 0x2aa15867760 ":1.1", ul = 2929528829792}}, value = { type = HASH_VALUE_PTR, {ptr = 0x2aa15867560, ...}}}, next = 0x0}

(gdb) p *provider->sbus_server->names->directory[0][3]
$47 = {entry = {key = {type = HASH_KEY_STRING, {str = 0x2aa158764a0 "sssd.nss", c_str = 0x2aa158764a0 "sssd.nss", ul = 2929528890528}}, value = { type = HASH_VALUE_PTR, {ptr = 0x2aa15898140, ...}}}, next = 0x0}


(gdb) p *((struct sss_ptr_hash_value *)provider->sbus_server->names->directory[0][3]->entry->value->ptr)
$55 = {spy = 0x2aa15872a60, ptr = 0x2aa158615c0}

(gdb) p *((struct talloc_chunk *)((char *)((struct sss_ptr_hash_value *)provider->sbus_server->names->directory[0][3]->entry->value->ptr)->ptr - 96))
$60 = {flags = 3404304368, next = 0x2aa15859e40, prev = 0x0, parent = 0x2aa158524c0, child = 0x2aa158e7eb0, refs = 0x0, destructor = 0x3ffb50147c8 <sbus_connection_destructor>, name = 0x3ffb502d0ca "struct sbus_connection", size = 128, limit = 0x0, pool = 0x0}

 => pointer is valid (not freed) and has proper type `struct sbus_connection`


(gdb) p *((struct sbus_connection *)((struct sss_ptr_hash_value *)provider->sbus_server->names->directory[0][3]->entry->value->ptr)->ptr)
$63 = {ev = 0x2aa15820860, connection = 0x2aa15862930, type = SBUS_CONNECTION_CLIENT, address = 0x0, wellknown_name = 0x2aa1586f2f0 sssd.nss", unique_name = 0x2aa1586a4c0 ":1.2", disconnecting = false, access = 0x2aa158654a0, destructor = 0x2aa15865520, requests = 0x2aa15863240, reconnect = 0x2aa15863710, router = 0x2aa158637a0, watch = 0x2aa158614f0, data = 0x0, senders = 0x2aa15861070, last_activity = 0x0}


 => dp_client *dp_cli == sbus_connection *cli_conn->data == NULL

Comment 6 Pavel Březina 2020-01-16 11:09:20 UTC
I think this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1775766#c14

Comment 7 Alexey Tikhonov 2020-01-16 13:23:05 UTC
Other relevant tickets:
 * bz 1768670
 * bz 1770467
 * bz 1684824

Comment 9 Oksana Voshchana 2020-07-14 10:37:57 UTC
Hi
In RHEL-8.3 on s390x during the VM's provision, we got a problem with sssd_be.
The VM crushed with error in dmesg log:
User process fault: interruption code 003b ilc:3 in sssd_be[2aa21b80000+37000]
Failing address: 0000000000000000 TEID: 0000000000000400
Fault in primary space mode while using user ASCE.
AS:00000003e4e001c7 R3:0000000000000024 

Can you provide status for this bug?
Thanks

Comment 11 Pavel Březina 2020-09-03 09:10:39 UTC
Upstream PR:
https://github.com/SSSD/sssd/pull/5299

Comment 12 Alexey Tikhonov 2020-09-17 14:10:54 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5299

* `master`
    * 4a84f8e18ea5604ac7e69849dee492718fd96296 - dp: fix potential race condition in provider's sbus server

Comment 14 Alexey Tikhonov 2020-10-01 14:18:23 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5344

* `master`
    * 7fbcaa8feeb968711ff52f51705c45062fd81394 - be: remove accidental sleep

Comment 16 Alexey Tikhonov 2020-11-09 12:58:08 UTC
*** Bug 1895794 has been marked as a duplicate of this bug. ***

Comment 22 errata-xmlrpc 2021-05-18 15:03:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1666


Note You need to log in before you can comment on or make changes to this bug.