Bug 2249218 - gpg waits indefinitely in start_new_service()/assuan_socket_connect()
Summary: gpg waits indefinitely in start_new_service()/assuan_socket_connect()
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: gnupg2
Version: 39
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-11-11 12:27 UTC by Peter Simonyi
Modified: 2024-01-04 21:25 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
Backtrace while hung (33.50 KB, text/plain)
2023-11-13 22:07 UTC, Peter Simonyi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FC-1036 0 None None None 2023-11-14 10:58:03 UTC

Description Peter Simonyi 2023-11-11 12:27:00 UTC
Description of problem:
When I start a new message (or do anything else to open the mail composer), Evolution stops responding and one CPU core sits near 100% usage.

Version-Release number of selected component (if applicable):
$ dnf list installed evolution* gnupg*
Installed Packages
evolution.x86_64                                3.50.1-1.fc39           @updates
evolution-data-server.x86_64                    3.50.1-1.fc39           @updates
evolution-data-server-langpacks.noarch          3.50.1-1.fc39           @updates
evolution-langpacks.noarch                      3.50.1-1.fc39           @updates
gnupg2.x86_64                                   2.4.3-3.fc39            @updates
gnupg2-smime.x86_64                             2.4.3-3.fc39            @updates


How reproducible:
Always

Steps to Reproduce:
1. Open Evolution.  It's responsive; you can open and click on stuff.
2. Start a new mail message by clicking the New Mail button in the headerbar.
(Or open the mail composer another way; I've tried: click reply with a message selected in the main window; click reply in a message window; right-click on a message list item and pick Edit as new message; double-click a message in the Drafts folder; run xdg-email)

Actual results:
The composer window does not open.  Evolution stops responding and eventually Gnome Shell pops up the offer to kill it or wait.  One CPU core sits near 100% for the 'evolution' process.

Expected results:
A composer window should open and let you use it.

Additional info:
strace shows the 'evolution' process is doing a whole lot of polling:

poll([{fd=97, events=POLLIN}, {fd=99, events=POLLIN}, {fd=101, events=POLLIN}, {fd=94, events=POLLOUT}, {fd=-1}, {fd=-1}], 6, 30000) = 1 ([{fd=94, revents=POLLOUT}])

When trying to open the composer, Evolution does create a couple of additional bwrap'ed Webkit processes for the composer, but it also runs `gpg2`.  Killing the gpg2 process fixes the hang; Evolution opens the composer window and continues -- as far as I can tell so far, nothing else goes wrong.

Comment 1 Peter Simonyi 2023-11-11 13:17:50 UTC
Oh, I removed my ~/.gnupg folder and everything works normally again.

My current .gnupg (which got recreated shortly after I removed the problematic one) contains 3 files (common.conf, pubring.kbx, trustdb.gpg).

The problematic one contains common.conf and trustdb.gpg, but also:
- a whole bunch of files with names like .#lk0x000055b42110eb50.[hostname].4693 (with my hostname there)
- gnupg_spawn_keyboxd_sentinel.lock
- a subfolder public-keys.d which in turn contains:
  - pubring.db
  - pubring.db.lock
  - many more files of the form .#lk0x0000555d50829e70.[hostname].5078 (again with my hostname there, but some say "fedora" there, which was the default hostname when I installed Fedora 39 but I changed before running Evolution)

I will try to make time later today to reproduce it in a VM fresh installation.

Comment 2 Milan Crha 2023-11-13 08:26:19 UTC
Thanks for a bug report. I'm wondering what would the composer call the gpg for. It does that on send, when PGP signing or encryption is enabled, but otherwise it should have much need to call it, maybe apart of trying to get your public key, to add it into the Autocrypt header. Still, it's a single call, thus it might be gnupg itself going odd.

Could you install debuginfo packages for glib2, evolution-data-server and evolution and when the composer will be in this stuck state, grab a backtrace of the process, to see what it is doing, please? You can install the packages with:

   sudo debuginfo-install glib2 evolution-data-server evolution

You can get the backtrace with a command like this:

   $ gdb --batch --ex "t a a bt" --pid=`pidof evolution` &>bt.txt

Please check the bt.txt for any private information, like passwords, email addresses, server addresses,... I usually search for "pass" at least (quotes for clarity only), before sharing it anywhere.

Could you check whether there's any gpg process running, please? There probably isn't, but just in case. Thanks in advance.

Comment 3 Peter Simonyi 2023-11-13 22:06:10 UTC
Good news!  I can reliably reproduce the bug in a fresh VM.

Steps to reproduce:
1. Install Fedora Workstation 39 from the live image.
2. sudo dnf install evolution seahorse
3. Search in the overview.
4. Open Gnome Settings' Sharing pane and change the hostname.
5. Reboot.
6. Search in the overview.
7. Open Evolution.
8. In the Evolution wizard, create a new account (I used a dummy address, local Maildir, and Sendmail).
9. Click the new message button in the headerbar.
10. Evolution is frozen.

In the "search in the overview" steps you can search for anything, including just using search to do the next step of opening Settings (step 3+4) or Evolution (step 6+7).  I'm pretty sure the search step is needed only to make Seahorse's search provider run, which uses GPG.

Yes, there is a GPG process still running starting after step 6.  Looking at ps -ef | grep gpg, after step 6 there is this:

gpg --batch --status-fd 16 --logger-fd 20 --no-tty --charset=utf8 --enable-progress-filter --exit-on-status-write-error --display=:0 --with-colons --list-secret-keys --

More of them seem to accumulate every time you use the overview search.  So there's probably a bug in Seahorse or GPG.
Then after step 9, there is additionally this running:

gpg2 --verbose --no-secmem-warning --no-greeting --no-tty --batch --yes --status-fd=88 --export --export-options export-minimal no-export-attributes <me>

(with the dummy email address me I made up in step 8).

I am about to attach the backtrace you requested.  (Since I'd made it with a dummy email account I left it verbatim in the backtrace :)

Comment 4 Peter Simonyi 2023-11-13 22:07:18 UTC
Created attachment 1999192 [details]
Backtrace while hung

gdb --batch --ex "t a a bt" --pid=`pidof evolution` &>bt.txt

Comment 5 Milan Crha 2023-11-14 07:00:27 UTC
Thanks for the update and all the testing. I see Evolution is waiting for a response from the gpg process, it's asking for a public key for your test account, to add it into the message for the Autocrypt feature.

Those `--list-secret-key` processes are suspicious, they should end quickly as well. Maybe there's a problem with gpg itself. I'll check it here and let you know when I know more. Thank you again for your help on this.

Comment 6 Milan Crha 2023-11-14 10:56:28 UTC
I've installed a Fedora 39 machine, which does not exhibit the problem, but when I install a new virtual machine from Fedora-Workstation-Live-x86_64-39-1.5.iso and update it to the latest bits (which includes gnupg2-2.4.3-4.fc39.x86_64), then even

    gpg --list-keys

is stuck and waiting for nothing. Running `gdb gpg --ex "r --list-keys"` shows the below backtrace when it's waiting.

(gdb) bt
#0  0x00007ffff7b897dd in recvmsg () from /lib64/libc.so.6
#1  0x00007ffff7c8ab57 in __assuan_recvmsg (ctx=ctx@entry=0x55555567c280, fd=fd@entry=4, msg=msg@entry=0x7fffffffcda0, flags=flags@entry=0)
    at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/system-posix.c:133
#2  0x000055555556f6e1 in _assuan_npth_recvmsg (ctx=0x55555567c280, fd=4, msg=0x7fffffffcda0, flags=0)
    at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/g10/gpg.c:1085
#3  0x00007ffff7c88c39 in _assuan_recvmsg (flags=0, msg=0x7fffffffcda0, fd=<optimized out>, ctx=0x55555567c280)
    at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/system.c:258
#4  uds_reader (ctx=0x55555567c280, buf=<optimized out>, buflen=<optimized out>) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-uds.c:113
#5  0x00007ffff7c88ea0 in readline (ctx=ctx@entry=0x55555567c280, buf=buf@entry=0x55555567c3d0 "", buflen=buflen@entry=1002, 
    r_nread=r_nread@entry=0x7fffffffce9c, r_eof=r_eof@entry=0x55555567c3cc) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-buffer.c:79
#6  0x00007ffff7c89c3b in _assuan_read_line (ctx=ctx@entry=0x55555567c280) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-buffer.c:151
#7  0x00007ffff7c8c568 in assuan_client_read_response (ctx=ctx@entry=0x55555567c280, line_r=line_r@entry=0x7fffffffcfb0, 
    linelen_r=linelen_r@entry=0x7fffffffcfac) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/client.c:87
#8  0x00007ffff7c8c921 in _assuan_read_from_server (ctx=ctx@entry=0x55555567c280, response=response@entry=0x7fffffffd004, off=off@entry=0x7fffffffd000, 
    convey_comments=convey_comments@entry=0) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/client.c:209
#9  0x00007ffff7c8caec in _assuan_connect_finalize (ctx=ctx@entry=0x55555567c280, fd=fd@entry=4, flags=flags@entry=1)
    at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-socket-connect.c:125
#10 0x00007ffff7c91824 in assuan_socket_connect (ctx=0x55555567c280, name=<optimized out>, name@entry=0x55555567d150 "/run/user/1000/gnupg/S.keyboxd", 
    server_pid=server_pid@entry=0, flags=flags@entry=1) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-socket-connect.c:343
#11 0x0000555555621859 in start_new_service.constprop.0 (r_ctx=r_ctx@entry=0x7fffffffd290, module_name_id=module_name_id@entry=13, program_name=0x0, 
    opt_lc_ctype=opt_lc_ctype@entry=0x0, opt_lc_messages=opt_lc_messages@entry=0x0, session_env=session_env@entry=0x0, autostart=1, verbose=0, debug=0, 
    status_cb_arg=0x55555567b3c0, status_cb=0x0, errsource=GPG_ERR_SOURCE_GPG) at ../common/asshelp.c:447
#12 0x000055555558330c in start_new_keyboxd (errsource=GPG_ERR_SOURCE_GPG, status_cb=0x0, status_cb_arg=0x55555567b3c0, debug=<optimized out>, 
    verbose=<optimized out>, autostart=<optimized out>, keyboxd_program=<optimized out>, r_ctx=0x7fffffffd290) at ../common/asshelp.c:629
#13 create_new_context (r_ctx=0x55555567d998, ctrl=0x55555567b3c0) at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/g10/call-keyboxd.c:143
#14 open_context (r_kbl=0x55555567d530, ctrl=0x55555567b3c0) at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/g10/call-keyboxd.c:219
#15 keydb_new (ctrl=0x55555567b3c0) at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/g10/call-keyboxd.c:277
#16 0x00005555555b5170 in list_all (ctrl=ctrl@entry=0x55555567b3c0, secret=secret@entry=0, mark_secret=0)
    at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/g10/keylist.c:753
#17 0x00005555555b5851 in public_key_list (ctrl=0x55555567b3c0, list=0x0, locate_mode=0, no_local=0)
    at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/g10/keylist.c:201
#18 0x000055555556b9f3 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/g10/gpg.c:4733



When I move away ~/.gnupg/ directory and run `gpg --list-keys`, I get this:

   $ gpg --list-keys
   gpg: directory '/home/user/.gnupg' created
   gpg: /home/user/.gnupg/trustdb.gpg: trustdb created
   gpg: DBG: kbx_client_data_cmd: finished command with error: Not initialized
   gpg: keydb_search_first failed: Not initialized

and it finishes immediately. Re-running the command shows:

   $ gpg --list-keys
   gpg: DBG: kbx_client_data_cmd: finished command with error: Not initialized
   gpg: keydb_search_first failed: Not initialized

from which I understand gnupg is not able to initialize its own directory from scratch for some reason. Interestingly, removing the directory on the machine gpg worked makes the directory recration work fine. Both are up-to-date Fedora 39 workstations using the same gnupg2 packages.

I also tried to move ~/.gnupg from the good machine to the bad machine and it worked fine (by an accident both machines had been renamed to `f39`, if that matters - I noticed the hostname is stored in some gpg files). Since then, even when I return back the "bad" directory, the `gpg --list-keys` is happy, even when I remove the directory and let gpg recreate it, then it begins to work. This is truly weird.



In any case, as you mentioned it at the beginning, there's a problem with gnupg, thus I move this there for further investigation. I do not think my above test with the removal of the .gnupg/ directory made during install/user-creation is related here, it's only a side problem.

Comment 7 Jakub Jelen 2023-11-15 10:18:49 UTC
I just installed fresh Fedora 39 VM (with updated packages including gnupg2-2.4.3-4.fc39.x86_64) and running just gpg --list-keys work ok

  [root@fedora39 ~]# gpg --list-keys
  gpg: directory '/root/.gnupg' created
  gpg: /root/.gnupg/trustdb.gpg: trustdb created

Downgrading or upgrading gnupg (only versions -2 and -4 are in repositories now) looks working ok.

Could you make sure that on fresh installation/after reboot with the -4 version it still hangs for you?

The version -3 was bad as I was too fast fixing things, but if you just dnf update, there might still be some daemon services running that gnupg wants to talk to and fails.

Comment 8 Milan Crha 2023-11-15 12:27:32 UTC
> if you just dnf update, there might still be some daemon services running that gnupg wants to talk to and fails.

Do you mean after `dnf udpate` the things should just work or you mean to restart the machine after the `dnf update`, to make things work properly?

Comment 9 Jakub Jelen 2023-11-15 15:06:44 UTC
Likely something starts the gnupg tools/daemons in the background so dnf update might not restart them so rebooting should be safest option.

Comment 10 Milan Crha 2023-11-16 15:12:54 UTC
Okay, I gave this another go. Steps I've done, which are close to Peter's steps from the comment #3:
a) install Fedora-Workstation-Live-x86_64-39-1.5.iso into a new virtual machine
b) boot the installed machine
    - it has gnupg2-2.4.3-2.fc39.x86_64
    - there is running: gpg-agent --homedir /var/lib/fwupd/gnupg --use-standard-socket --daemon
    - I do not know what started it, this is the first run after finishing initial setup wizard
c) search in the Activities for "terminal" and run it
d) execute in the terminal: sudo dnf install evolution seahorse
e) close the terminal
f) open Settings->Sharing and change the host name from "fedora" to, say, "f39"
g) restart the machine, disable install of pending updates
h) login and search for "evolution" in Activities and run it
i) create a test account, then press Ctrl+N (in the Mail window)
j) the evolution is frozen; once asked by GNOME Shell "Force Quit" it
k) search for terminal and open it
l) run: sudo dnf update gnupg2 ; you get gnupg2-2.4.3-4.fc39.x86_64
m) reboot the machine
n) login, then search for terminal and run it
o) execute: gpg --list-keys
   * it's still frozen, the restart did not help
p) break it (Ctrl+C), then run: sudo dnf update ; many packages are updated
q) once the update is finished run `reboot`
r) login, search for terminal, then run `gpg --list-keys`
   * it's frozen; verified the installed version is gnupg2-2.4.3-4.fc39.x86_64 also with gnupg2-smime of the same version
s) reboot the machine, boot it, in gdb press Ctrl+Alt+F4 and login to the text console, not GUI
t) run: gpg --list-keys
   * there's a message: "gpg: removing stale lock file (created by 2902)
   * the gpg is stuck
   * Ctrl+C it
u) run: sudo hostnamectl set-hostname fedora
   * it's the original name, as set by the Live image setup
   * run: reboot
v) login to the GUI, search for "terminal" and run there: gpg --list-keys
   * there are repeating messages: gpg: waiting for lock (held by 1786) ...
   * `ps ax | grep 1786` doesn't find a hit, there is no such process running

Thus, it looks like this is about some stall lock file, from previously named machine, which blocks the process when the machine is renamed. Why install of seahorse and evolution is needed here I've no idea, they just help to reproduce it.

Comment 11 Peter Simonyi 2023-11-18 11:48:24 UTC
Evolution is not required to trigger the GPG bug, but it makes it obviously visible.  Seahorse is only needed because it uses GPGME in its search provider, which is why triggering search in the Overview is required in the Comment #3 steps.  You can verify this by disabling its search provider (without triggering a search first!) and see that Evolution doesn't freeze.

Distilling Milan's work in comment 10, here are minimal steps to reproduce the GPG hang without other software:

1. Install Fedora Workstation 39 from the live image and open a terminal (I don't think it matters how).
2. (Optionally, sudo dnf upgrade gnupg2)
3. gpg --list-keys
   This reports that it created ~/.gnupg and returns immediately.
4. Change the hostname (either from Gnome Settings or with sudo hostnamectl set-hostname whatever)
5. reboot
6. gpg --list-keys

After the reboot, all attempts to run gpg --list-keys will hang.

This happens both with the original -2 version and with upgrading to -4 in step 2.

Comment 12 Ankur Sinha (FranciscoD) 2023-11-22 15:51:22 UTC
Not sure if this is the same issue, but two reports on Ask Fedora look similar where gpg hangs:

https://discussion.fedoraproject.org/t/gpg-blocking-forever-cant-git-commit-as-a-result/96605/7
https://discussion.fedoraproject.org/t/weird-libreoffice-freeze-due-to-gnupg/96360

There, the issue was the new "keyboxd" bit that gpg uses by default in v2.4.1+, and the "fix" was to disable it in ~/.gnupg/common.conf.

Comment 13 Jakub Jelen 2023-11-23 09:50:26 UTC
Thank you for the investigation and adding another data points. So this is the issue with the new 2.4.1+ series of GnuPG.

Interesting. So the trigger is the hostname change? Following the comment #11, without the reboot, I do not see the same issue yet, regardless of killing the keyboxd.

Comment 14 Milan Crha 2023-11-23 11:16:57 UTC
It turned out to be very tricky to reproduce. For example I tried to install only seahorse, but it was not enough. Installing both seahorse and evolution did trigger the problem (using the fresh machine installed from the live DVD).

Comment 15 soconfused 2023-11-23 17:57:23 UTC
(In reply to Jakub Jelen from comment #13)
> Interesting. So the trigger is the hostname change? Following the comment
> #11, without the reboot, I do not see the same issue yet, regardless of
> killing the keyboxd.

I was the one posting on Ask Fedora about the Libreoffice GnuPG related freeze. Just wanted to mention that there was actually a hostname change in my case too - I changed the machine name which changed the hostname.

Comment 16 Jakub Jelen 2023-11-24 16:06:41 UTC
Finally got to reproduce this on Fedora 39 and debug this a bit. 

Debug logs from gpg do not show much information, just that it starts the keyboxd (and after it is killed we continue):

tester@whatever:~$ gpg --list-keys --debug-all 2>&1 | tee /tmp/gpg-debug
gpg: reading options from '[cmdline]'
gpg: reading options from '/home/tester/.gnupg/common.conf'
gpg: enabled debug flags: packet mpi crypto filter iobuf memory cache memstat trust hashing ipc clock lookup extprog
gpg: enabled compatibility flags:
gpg: DBG: [no clock] start
gpg: using pgp trust model
gpg: DBG: [no clock] keydb_new
gpg: no running keyboxd - starting '/usr/libexec/keyboxd'
gpg: removing stale lockfile (created by 3458)
gpg: waiting for the keyboxd to come up ... (5s)
gpg: DBG: chan_4 <- [error: Connection reset by peer]

In the meantime, I see the keyboxd process running:

[tester@whatever ~]$ ps auxf | grep keybox
tester      2989  0.0  0.0 225884  1548 ?        Ss   16:37   0:00  \_ keyboxd --homedir /home/tester/.gnupg --daemon

After killing it and trying to start it manually, I can see that what the keyboxd is waiting for (the last message repeats indefinitely):

[tester@whatever ~]$ /usr/libexec/keyboxd --debug-all --homedir /home/tester/.gnupg --daemon
keyboxd[3548]: reading options from '[cmdline]'
keyboxd[3548]: reading options from '/home/tester/.gnupg/common.conf'
keyboxd[3548]: enabled debug flags: mpi crypto memory cache memstat hashing ipc clock lookup
keyboxd[3548]: listening on socket '/run/user/1000/gnupg/S.keyboxd'
keyboxd[3549]: waiting for lock (held by 2886) ...

Oh ... there is a lock file hold by a process that no longer exists! This one even with the old hostname in the filename, from yesterday:

tester@whatever:~1$ ls -al .gnupg/public-keys.d/ | grep 2886
-rw-r--r--. 2 tester tester    18 Nov 23 17:40 .#lk0x000056307f85fe60.fedora.2886

In the first output, we see that the gpg removes some stale locks, but probably ignores the ones from different hostnames. This is probably reasonable countermeasure when the home is shared among multiple machines though. But given that this is causing a dead lock and the locks are stale anyway, I think there needs to be some reasonable way out.

I am not sure if I have a knowledge to debug this further so I will open an usptream issue

Comment 17 Jakub Jelen 2023-11-24 16:11:33 UTC
https://dev.gnupg.org/T6838

Comment 18 Michael Vorburger.ch 2024-01-04 21:24:50 UTC
+1 this just affected me too; I've found this issue via search, seems crazy (to me) that a hostname change would break gnupg; blogged about it onhttps://github.com/vorburger/vorburger.ch-Notes/blob/develop/security/gpg-hangs-on-hostname-change.md.


Note You need to log in before you can comment on or make changes to this bug.