Description of problem: Today I applied updates `dnf ugrade`, quite a few were ulled in (meanig it had been a while since I upgraded, at least 1 week) After the upgrade, I locked my desktop session and was unable to unlock it. A clean reboot fixed my inability to log in but I saw that sssd had dumped core Version-Release number of selected component (if applicable): sssd-2.0.0-5.fc29.x86_64 How reproducible: did not try Steps to Reproduce: 0. have sssd set up on machine and use it (Red Hat corporate auth) 1. yum upgrade 2. lock graphical desktop session 3. attempt to unlock Actual results: system refused to unlock Expected results: system unlocks just fine Additional info: apologies for the messy bug report, on-site at customer and filing this on the side so it's not lost. sssd logs, sosreport and the core will be attached as private entries.
gdb tells me the following: Core was generated by `/usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 --logger=files'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000560b1625e1a1 in dp_client_register (mem_ctx=<optimized out>, sbus_req=<optimized out>, provider=0x560b1762e230, name=0x560b176a0530 "autofs") at src/providers/data_provider/dp_client.c:107 107 dp_cli->name = talloc_strdup(dp_cli, name); (gdb) list 102 return ENOENT; 103 } 104 105 dp_cli = sbus_connection_get_data(cli_conn, struct dp_client); 106 107 dp_cli->name = talloc_strdup(dp_cli, name); 108 if (dp_cli->name == NULL) { 109 talloc_free(dp_cli); 110 return ENOMEM; 111 } (gdb) p dp_cli $1 = (struct dp_client *) 0x0 So cli_conn is not NULL but cli_conn->data is NULL. I'm not sure if this is an expected state and just a NULL check is missing or if this is unexpected and more investigation is needed why we got into this state. Pavel knows the SBus code best, so I set Needinfo for him. In the update to sssd-2.0.0-5.fc29.x86_64 only the SBus timeout is change from iirc 25s, the DBus default, to 120s. I wonder if there is maybe some dependent timeout which has to be increased as well?
No, it is not expected. I though this was a race condition in the initialization code when we set on connection function after the server is already created (dp_client_init creates the dp_cli): static void dp_init_done(struct tevent_req *subreq) { struct dp_init_state *state; struct tevent_req *req; errno_t ret; req = tevent_req_callback_data(subreq, struct tevent_req); state = tevent_req_data(req, struct dp_init_state); ret = sbus_server_create_and_connect_recv(state->provider, subreq, &state->provider->sbus_server, &state->provider->sbus_conn); talloc_zfree(subreq); if (ret != EOK) { tevent_req_error(req, ret); return; } sbus_server_set_on_connection(state->provider->sbus_server, dp_client_init, state->provider); However, responders are started way past this point in sss_monitor_service_init that is called after dp_init_done (in dp_initialized). I even tried this with setting some delay before dp_init_done is called but it only proved that responders will not start that soon. Unfortunately the sssd logs are empty so it does not tell us anything.
I suppose this was a one time event and it is not reproducible, right? I'm afraid we can't do much without logs (ideally level 0x3ff0).
Yes, have not seen the problem since the initial occurance. Since there was nothin guseful in the logs I provided and I do not have a reproducer for you, I'll close this now.