Bug 1330766

Summary: [abrt] realmd: g_cancellable_is_cancelled(): realmd killed by SIGSEGV
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: realmdAssignee: Sumit Bose <sbose>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: jhrozek, kparal, robatino, sbose, stefw, tflink, viorel.tabara
Target Milestone: ---Keywords: CommonBugs, Reopened
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/fe9a603cf3430f9e821c1dd8c8dc70558923bcb3
Whiteboard: https://fedoraproject.org/wiki/Common_F24_bugs#realmd-first-time abrt_hash:6a4960dd011c9c13125c1a057633f052cb78501d;VARIANT_ID=server; AcceptedBlocker
Fixed In Version: realmd-0.16.2-4.fc24 realmd-0.16.2-5.fc24 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-04 01:28:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1230435    
Attachments:
Description Flags
File: backtrace
none
File: cgroup
none
File: core_backtrace
none
File: dso_list
none
File: environ
none
File: exploitable
none
File: limits
none
File: maps
none
File: mountinfo
none
File: namespaces
none
File: open_fds
none
File: proc_pid_status
none
File: var_log_messages
none
Patch with a workaround which fixes the issue for me none

Description Adam Williamson 2016-04-26 22:22:06 UTC
Description of problem:
Happened when following https://fedoraproject.org/wiki/QA:Testcase_realmd_join_sssd on Fedora-24-20160424.n.0 (after install from Server DVD). I tried with and without updates-testing enabled, this is without. The enrol process proceeded for a while - through package install, I think - then crashed like this.

Version-Release number of selected component:
realmd-0.16.2-3.fc24

Additional info:
reporter:       libreport-2.7.0
backtrace_rating: 4
cmdline:        /usr/lib/realmd/realmd
crash_function: g_cancellable_is_cancelled
executable:     /usr/lib/realmd/realmd
global_pid:     1163
kernel:         4.5.2-301.fc24.x86_64
pkg_fingerprint: 73BD E983 81B4 6521
pkg_vendor:     Fedora Project
runlevel:       N 3
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (4 frames)
 #0 g_cancellable_is_cancelled at gcancellable.c:295
 #1 g_task_had_error at gtask.c:1799
 #2 on_transaction_signal at service/realm-packages.c:135
 #3 emit_signal_instance_in_idle_cb at gdbusconnection.c:3701

Comment 1 Adam Williamson 2016-04-26 22:22:09 UTC
Created attachment 1151123 [details]
File: backtrace

Comment 2 Adam Williamson 2016-04-26 22:22:11 UTC
Created attachment 1151124 [details]
File: cgroup

Comment 3 Adam Williamson 2016-04-26 22:22:12 UTC
Created attachment 1151125 [details]
File: core_backtrace

Comment 4 Adam Williamson 2016-04-26 22:22:13 UTC
Created attachment 1151126 [details]
File: dso_list

Comment 5 Adam Williamson 2016-04-26 22:22:14 UTC
Created attachment 1151127 [details]
File: environ

Comment 6 Adam Williamson 2016-04-26 22:22:15 UTC
Created attachment 1151128 [details]
File: exploitable

Comment 7 Adam Williamson 2016-04-26 22:22:16 UTC
Created attachment 1151129 [details]
File: limits

Comment 8 Adam Williamson 2016-04-26 22:22:18 UTC
Created attachment 1151130 [details]
File: maps

Comment 9 Adam Williamson 2016-04-26 22:22:19 UTC
Created attachment 1151131 [details]
File: mountinfo

Comment 10 Adam Williamson 2016-04-26 22:22:20 UTC
Created attachment 1151132 [details]
File: namespaces

Comment 11 Adam Williamson 2016-04-26 22:22:21 UTC
Created attachment 1151133 [details]
File: open_fds

Comment 12 Adam Williamson 2016-04-26 22:22:22 UTC
Created attachment 1151134 [details]
File: proc_pid_status

Comment 13 Adam Williamson 2016-04-26 22:22:23 UTC
Created attachment 1151135 [details]
File: var_log_messages

Comment 14 Adam Williamson 2016-04-26 22:25:32 UTC
If I subsequently try the enrolment again, it works. I'm guessing this is a problem when package installation is necessary: something times out while the package install process is running.

There is actually a message along those lines on the console when the enrolment attempt fails, but it's now scrolled out of the buffer so I can't read it any more :( Something dbus-y, I think.

Comment 15 Adam Williamson 2016-04-26 22:27:59 UTC
I'm not gonna propose this as a Beta blocker, I think, because the packages will often be available (e.g. after a Server install), and the bug is easy to 'work around' by just trying again. But it may be a Final blocker as a very conditional violation of "It must be possible to join the system to a FreeIPA or Active Directory domain at install time and post-install" - https://fedoraproject.org/wiki/Fedora_24_Alpha_Release_Criteria#remote-authentication

Comment 16 Adam Williamson 2016-04-27 00:11:47 UTC
Much the same occurs when enrolling through Cockpit. The error shows up in Cockpit's UI:

"Message recipient disconnected from message bus without replying"

again, a subsequent enrolment attempt works.

Comment 17 Sumit Bose 2016-04-29 08:53:47 UTC
I can reproduce this and it looks like cancellable or at least cancellable->priv is un-initialized.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f76058a7029 in g_cancellable_is_cancelled (cancellable=0x56248d0ef450) at gcancellable.c:295
295       return cancellable != NULL && cancellable->priv->cancelled;
[Current thread is 1 (Thread 0x7f7605f868c0 (LWP 39866))]
Missing separate debuginfos, use: dnf debuginfo-install realmd-0.16.2-3.fc24.x86_64
(gdb) p *cancellable
$1 = {parent_instance = {g_type_instance = {g_class = 0x0}, ref_count = 0, qdata = 0x0}, priv = 0x3}
(gdb) 

By inspecting the code I found this potential issue:

diff --git a/service/realm-packages.c b/service/realm-packages.c
index 321921a..905c754 100644
--- a/service/realm-packages.c
+++ b/service/realm-packages.c
@@ -312,7 +312,7 @@ package_transaction_create (const gchar *method,
                                g_variant_new ("()"),
                                G_VARIANT_TYPE ("(o)"),
                                G_DBUS_CALL_FLAGS_NONE,
-                               CALL_TIMEOUT, cancellable,
+                               CALL_TIMEOUT, g_task_get_cancellable(task),
                                on_create_transaction, g_object_ref (task));
 }
 
but it does not seem to fix the issue.

Stef, do you have another idea?

Comment 18 Stef Walter 2016-04-29 10:11:16 UTC
Does this upstream patch solve the issue?

commit ef0797e5ed116a98cc074a6d4e1d1d6b6e6384db
Author: Stef Walter <stefw>
Date:   Mon Sep 7 12:53:02 2015 +0200

    service: Fix issue where diagnostics about package install hidden
    
    Due to the recent refactoring the diagnostics about package
    installation were hidden (even when --verbose).
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1258745

diff --git a/service/realm-packages.c b/service/realm-packages.c
index 9da852c..321921a 100644
--- a/service/realm-packages.c
+++ b/service/realm-packages.c
@@ -615,6 +615,7 @@ realm_packages_install_async (const gchar **package_sets,
        task = g_task_new (NULL, NULL, callback, user_data);
        install = g_new0 (InstallClosure, 1);
        install->automatic = realm_options_automatic_install ();
+       install->invocation = invocation ? g_object_ref (invocation) : NULL;
        install->connection = g_object_ref (connection);
        g_task_set_task_data (task, install, install_closure_free);

Comment 19 Stef Walter 2016-04-29 10:12:35 UTC
Hmmm, no that's already in the Fedora build.

Comment 20 Tim Flink 2016-05-02 22:12:30 UTC
Discussed during the 2016-05-02 blocker review meeting [1].

Accepted as a blocker for f24 final due to violation of the following F24 release criterion [2]:

It must be possible to log in to the default Cockpit instance and use it to ... Enrol the system to a FreeIPA or Active Directory domain.


[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-05-02/f24-blocker-review.2016-05-02-16.02.html
[2] http://fedoraproject.org/wiki/Fedora_24_Beta_Release_Criteria#Cockpit_management_interface

Comment 21 Kamil Páral 2016-05-09 17:15:36 UTC
Hello realmd developers, is there any progress on this? We're trying to evaluate status of F24 blocker bugs. Thanks.

Comment 22 Sumit Bose 2016-05-13 14:17:09 UTC
Created attachment 1157147 [details]
Patch with a workaround which fixes the issue for me

Hi,

the attached patch fixes the issue for me. But it does not fix the real reason because it only removes some g_object_unref to make sure the cancellable it not freed too early. But since realmd by default is a short living process it might be acceptable as a fix.

You can find a scratch build at http://koji.fedoraproject.org/koji/taskinfo?taskID=14043682 .

Stef, what do you think about this workaround?

Comment 23 Sumit Bose 2016-05-13 15:44:10 UTC
Stef,

maybe the patch is even the right fix. It looks that realm_invocation_get_cancellable() does not call g_object_ref() on cancellable so if I understand it correctly calling g_object_unref() is not needed.

Comment 24 Stef Walter 2016-05-18 12:46:36 UTC
Sumit. Nice patch. Thanks for catching that. Pushed the commit upstream.

Will you include this patch in a revision build, or should I do a point release upstream?

Comment 25 Fedora Admin XMLRPC Client 2016-05-18 13:44:40 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 26 Fedora Update System 2016-05-18 14:35:07 UTC
realmd-0.16.2-4.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-80786ccbb3

Comment 27 Viorel Tabara 2016-05-21 00:42:48 UTC
Test passed.

Note:

With a regular user I was getting:

   Joining realm failed: No permission to join this host to the IPA domain.
 
Assuming that has to do with my IPA server setup I used a user with admin 
privs.

I'll update Bodhi with tests output.

Comment 28 Fedora Update System 2016-05-21 01:34:12 UTC
realmd-0.16.2-4.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-80786ccbb3

Comment 29 Adam Williamson 2016-05-21 14:46:42 UTC
viorel: the actual joining process on the guest requires root privileges. The test case suggests realmd should use PolicyKit to elevate privs when run as a regular user, but it doesn't seem to. I've filed a bug on that: https://bugzilla.redhat.com/show_bug.cgi?id=1330764

Comment 30 Fedora Update System 2016-05-21 20:27:05 UTC
realmd-0.16.2-4.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 31 Kamil Páral 2016-05-23 13:19:00 UTC
Adam, do you want to re-test the update, or are we going to believe the issue is gone?

Comment 32 Adam Williamson 2016-05-23 14:37:28 UTC
I'll check it at some point, but I'm fine with the bug being closed for now. From talking to Viorel I think his verification was valid.

Comment 33 Viorel Tabara 2016-05-23 14:43:25 UTC
I can confirm that when running the join command in a terminal as a regular 
user while logged in to a desktop session (Gnome for this test) the PK auth 
dialog pops up as noted in 
https://bugzilla.redhat.com/show_bug.cgi?id=1330764#c1.

Comment 34 Adam Williamson 2016-06-02 23:12:51 UTC
Sorry, but I still hit this. I've been working on automating this test (Cockpit FreeIPA enrolment) for the last couple of days and finally got it to the point where the enrolment should succeed, only it doesn't. I get this exact same traceback, but the test is definitely using realmd-0.16.2-4.fc24 .

The openQA test follows the wiki test case very closely. There are two openQA jobs. One sets up a FreeIPA server then sits and waits for the other. The other boots a stock Fedora Server install, installs and runs Firefox, opens up Cockpit, and tries to enrol into the domain.

https://openqa.stg.fedoraproject.org/tests/21821 is my latest attempt (that's the client job); you can find the abrt crash directory attached in the Logs & Assets tab - https://openqa.stg.fedoraproject.org/tests/21821/file/realmd_join_cockpit_postinstall-spoolabrt.tar.gz - and confirm for yourself that it's the same crash.

Comment 35 Viorel Tabara 2016-06-03 01:42:19 UTC
(In reply to Adam Williamson from comment #34)
> The other boots a stock Fedora Server install, installs and runs Firefox,
> opens up Cockpit, and tries to enrol into the domain.

Adam, I only tested with the client running Fedora Workstation and from command 
line.  Is that something you can easily do? Otherwise I'll setup the 
environment and retest with the same variables as yours.

Comment 36 Adam Williamson 2016-06-03 04:16:22 UTC
Not really, no. It should not make any difference, though. Note my test did work *one* time, so I think it may be that the bug doesn't happen 100% of the time (though it does seem to happen most of the time).

Comment 37 Adam Williamson 2016-06-03 05:36:45 UTC
Furthermore it seems like if I re-try the cockpit enrolment after it fails, I get another error, instead of it working:

No such interface 'org.freedesktop.realmd.KerberosMembership' on object at path /org/freedesktop/realmd/Sssd/...'

(after Sssd/ the message flows off the side of the screen and I can't read it; it does not wrap. All the boxes in the 'Join a Domain' dialog are widened off the edge of the screen too).

See https://openqa.stg.fedoraproject.org/tests/21851 , particularly the video https://openqa.stg.fedoraproject.org/tests/21851/file/video.ogv - the video is too highly time-compressed for you to quite see it, but that test was wired such that it would hit the 'Join' button, wait five minutes, then if it didn't recognize success but *did* see the "Message recipient disconnected from message bus without replying" error, it would click the 'Join' button again. When it does that, the new error message appears. You can find logs in the "Logs & Assets" tab, but there's nothing very useful for the second error I don't think.

Comment 38 Fedora Update System 2016-06-03 09:09:13 UTC
realmd-0.16.2-5.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-7516793e87

Comment 39 Sumit Bose 2016-06-03 09:11:45 UTC
I'm sorry, I forgot to apply the patch in the first build. This time I checked the build logs of realmd-0.16.2-5.fc24 to make sure the patch is applied.

Comment 40 Adam Williamson 2016-06-03 18:17:10 UTC
OK, looks like the fix works with -5.

Comment 41 Fedora Update System 2016-06-04 01:28:03 UTC
realmd-0.16.2-5.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.