Bug 2051971 - Default debuginfo download makes valgrind appear stuck when no net, and is a privacy problem
Summary: Default debuginfo download makes valgrind appear stuck when no net, and is a ...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: valgrind
Version: 35
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Mark Wielaard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-08 13:01 UTC by Erkki Ruohtula
Modified: 2022-12-13 16:36 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-12-13 16:36:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Erkki Ruohtula 2022-02-08 13:01:29 UTC
Description of problem:
This problem may not be in Valgrind per se, but in how it is deployed in Fedora 35 (possibly appeared earlier, but I never used Fedora 34). A bad combination of well-meaning changes.

After updating to Fedora 35, I found that trying to "valgrind" a program I just compiled got stuck. But not always. Finally had the sense to try the -v option, which revealed Valgrind was trying to download debug symbols from outside, but that that time the machine had no net. Or more precisely it was on intranet, and the session did not have any environment variables set for http proxies.

fedora:.../$ valgrind -v ./test
==33596== Memcheck, a memory error detector
==33596== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==33596== Using Valgrind-3.18.1-42b08ed5bd-20211015 and LibVEX; rerun with -h for copyright info
==33596== Command: ./test
==33596== 
--33596-- Valgrind options:
--33596--    -v
--33596-- Contents of /proc/version:
--33596--   Linux version 5.14.10-300.fc35.x86_64 (mockbuild.fedoraproject.org) (gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1), GNU ld version 2.37-10.fc35) #1 SMP Thu Oct 7 20:48:44 UTC 2021
--33596-- 
--33596-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-ssse3-avx-f16c-rdrand
--33596-- Page sizes: currently 4096, max supported 4096
--33596-- Valgrind library directory: /usr/libexec/valgrind
--33596-- Reading syms from /home/ruohtula/tmp/test
==33596== Downloading debug info for /home/ruohtula/tmp/test...

Reading the fine manual reveals Valgrind is at this point trying to reach a server given by variable DEBUGINFOD_URLS, which Fedora had set to https://debuginfod.fedoraproject.org/
 but as noted this was unreachable.

The "test" program above had no symbols in the main program (not compiled with -g). Compiling with -g gets past this, but then gets stuck later when it tries to download symbols for libc.

This behaviour has two problems: The net is not always reachable, and if not the program looks like it is stuck. Figuring this out took some time.

The second problem is that even if there is connectivity, I may not want to reach out to the Internet every time I want to test a program with Valgrind. It may in fact reveal information that I would like to keep private (the program name may give away information).

The workaround is of course removing the variable, but I think having it in the environment by default, and Valgrind automatically using it is very problematic. Valgrind is normally used for programs being developed on the user's machine, so their symbols are almost never on any server, let alone a public server. The feature can be useful, but it should be required to be enabled explicitly in the case of Valgrind.

Version-Release number of selected component (if applicable):
Fedora 35

Valgrind-3.18.1-42b08ed5bd-20211015 

How reproducible:

Run Valgrind on any binary while not having network connectivity.

Actual results:
Valgrind appears to be stuck.

Expected results:
Program runs, with possible diagnostics from Valgrind.

Additional info:
-

Comment 1 Mark Wielaard 2022-02-08 13:17:58 UTC
(In reply to Erkki Ruohtula from comment #0)
> Reading the fine manual reveals Valgrind is at this point trying to reach a
> server given by variable DEBUGINFOD_URLS, which Fedora had set to
> https://debuginfod.fedoraproject.org/
>  but as noted this was unreachable.
> 
> The "test" program above had no symbols in the main program (not compiled
> with -g). Compiling with -g gets past this, but then gets stuck later when
> it tries to download symbols for libc.
> 
> This behaviour has two problems: The net is not always reachable, and if not
> the program looks like it is stuck. Figuring this out took some time.

Sorry about that. As you say this is not really specific to valgrind, but caused
by Fedora 35 enabling DEBUGINFOD_URLS by default.
valgrind simply spawns debuginfod-find to get the debuginfo files.
In theory that should simply detect no network is available and/or simply time out quickly
(and add a negative caching note so the same query isn't tried again).

Could you try debugging your program with gdb and see if gdb handles this better/differently?

You could also try running:
$ debuginfod-find --verbose debuginfo /home/ruohtula/tmp/test
to see what the debuginfod-find program that valgrind uses is actually doing.

> The second problem is that even if there is connectivity, I may not want to
> reach out to the Internet every time I want to test a program with Valgrind.
> It may in fact reveal information that I would like to keep private (the
> program name may give away information).

The only information "revealed" is the build-ids of the loadable ELF objects.
If unknown to the debuginfod.fedoraproject.org server they don't provide any information themselves
(they are globally unique hashes).
Some more background is in privacy section of https://fedoraproject.org/wiki/Debuginfod#Security

Comment 2 Erkki Ruohtula 2022-02-08 19:12:36 UTC
Hi, thanks for quick response!

> In theory that should simply detect no network is available

A network was available in my case (intranet), you just cannot reach an outside URL
without going through a proxy, which was not defined. GDB did not get stuck.
debuginfod-find seems to keep trying in vain:

fedora:.../$ debuginfod-find --verbose debuginfo test
debuginfod_find_debuginfo a371d42d7de3a2984df0cdbbfd421e7407c2e5df
server urls "https://debuginfod.fedoraproject.org/ "
checking build-id
checking cache dir /home/ruohtula/.cache/debuginfod_client
using timeout 90
init server 0 https://debuginfod.fedoraproject.org/buildid
url 0 https://debuginfod.fedoraproject.org/buildid/a371d42d7de3a2984df0cdbbfd421e7407c2e5df/debuginfo
query 1 urls in parallel
Progress 1 / 0
Progress 22 / 0
Progress 24 / 0
Progress 25 / 0
Progress 26 / 0
Progress 27 / 0
Progress 28 / 0
Progress 29 / 0
Progress 30 / 0
Progress 31 / 0
Progress 32 / 0
Progress 33 / 0
Progress 34 / 0
Progress 35 / 0
Progress 36 / 0
Progress 37 / 0
Progress 38 / 0
Progress 39 / 0
Progress 40 / 0
Progress 41 / 0
Progress 42 / 0
Progress 43 / 0
Progress 44 / 0
Progress 45 / 0
Progress 46 / 0
Progress 47 / 0
Progress 48 / 0
Progress 49 / 0
^C

Good to know there is only a hash being passed, so less privacy problems (aside from
signaling a certain IP searched for debug symbols).
But I really did not like Valgrind (which I have used about as long as it has existed)
suddenly doing internet accesses behind my back. Even if there is no
connectivity issue, the access makes the first run pause noticeably (After that the data is in cache).
Guess I will have to start undefining DEBUGINFOD_URLS for it.

Comment 3 Mark Wielaard 2022-02-09 11:35:40 UTC
(In reply to Erkki Ruohtula from comment #2)
> > In theory that should simply detect no network is available
> 
> A network was available in my case (intranet), you just cannot reach an
> outside URL
> without going through a proxy, which was not defined. GDB did not get stuck.
> debuginfod-find seems to keep trying in vain:

That is interesting. The underlying code should be the same. The only real difference is that valgrind tries to get the debuginfo upfront while gdb only tries to fetch it on first use. Does gdb provide any feedback? Could you show a gdb debugging session where it tries to connect to debuginfod.fedoraproject.org but gives up?

I wonder if gdb uses some different timeout heuristics.

> Even if there is no
> connectivity issue, the access makes the first run pause noticeably (After
> that the data is in cache).

That is interesting. So it does eventually work as intended without any pauses?
How long is the noticeable pause?

Maybe valgrind needs to be a little bit more verbose if we are waiting for a timeout to make clear what is going on.

Comment 4 Erkki Ruohtula 2022-02-09 16:47:27 UTC
GDB asks before loading. But I am mystified why it did not ask yesterday-
Caching? Or my shell environment had the magic var disabled?
Anyway this is from clean slate:

fedora:.../tmp 18:29:05$gdb a.out
GNU gdb (GDB) Fedora 11.1-6.fc35
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/ 
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.

// GDB also got stuck...

^CCancelling download of separate debug info for /home/ruohtula/tmp/a.out...
(No debugging symbols found in a.out)
(gdb) 

>How long is the noticeable pause?

This is Valgrind on a freshly compiled a.out with no debug info.
Prompt contains the time, and the program also prints the time:

fedora:.../tmp 18:38:31$valgrind a.out
==38921== Memcheck, a memory error detector
==38921== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==38921== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==38921== Command: a.out
==38921== 
Time now: Wed Feb  9 18:38:43 2022
==38921== 
==38921== HEAP SUMMARY:
==38921==     in use at exit: 0 bytes in 0 blocks
==38921==   total heap usage: 9 allocs, 9 frees, 6,844 bytes allocated
==38921== 
==38921== All heap blocks were freed -- no leaks are possible
==38921== 
==38921== For lists of detected and suppressed errors, rerun with: -s
==38921== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
fedora:.../tmp 18:38:43$

So, about 12 seconds delay. This test with a home ADSL connection (good enough for watching Netflix in HD).
The second time it then runs in about a second.

Comment 5 Aaron Merey 2022-02-09 17:51:48 UTC
I tried to reproduce this while connected to my home network and unable to connect to external URLs.

I did not see a difference in the time GDB and valgrind spent trying to query debuginfod.fedoraproject.org. FWIW firefox also stalled in a similar fashion.

If (In reply to Erkki Ruohtula from comment #4)
> GDB asks before loading. But I am mystified why it did not ask yesterday-
> Caching? Or my shell environment had the magic var disabled?

Yes if $DEBUGINFOD_URLS is unset then GDB won't attempt to use debuginfod.

Valgrind is typically used while debugging programs being developed locally but it's still useful to have shared library debuginfo fetched automatically to improve stack traces.

Maybe the solution here is for valgrind to display debuginfod-related messages by default instead of requiring -v. $DEBUGINFOD_TIMEOUT also provides a way to limit the time spent on each query.

Comment 6 Erkki Ruohtula 2022-02-10 09:13:44 UTC
Yes, I agree having Valgrind display information about debug info fetching would help. It would have told me immediately why it seems to be stuck, so I could work around it.
Maybe it could also have an option to disable the fetching (like GDB has)?

$ gdb a.out
....
This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/ 
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
^CCancelling download of separate debug info for /home/ruohtula/tmp/a.out...
(No debugging symbols found in a.out)
(gdb) Quit
$
$ echo set debuginfod enabled off > ~/.gdbinit
$ gdb a.out
....
Reading symbols from a.out...
(No debugging symbols found in a.out)
(gdb) 


One might want Valgrind to not try to use debuginfod even if other tools in the session use it.

Comment 7 Ben Cotton 2022-11-29 17:51:02 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 8 Ben Cotton 2022-12-13 16:36:29 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.