Bug 1566639

Summary: "debuginfo reader: ensure_valid failed" on libglvnd-glx-debuginfo
Product: [Fedora] Fedora Reporter: Adam Jackson <ajax>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: dodji, ignatenko, jakub, mjw, packaging-team-maint, pmatilai, pmoravco, vmukhame
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: valgrind-3.13.0-18.fc28 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-17 00:18:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Adam Jackson 2018-04-12 16:28:38 UTC
With F28, trying to valgrind the X server explodes when processing the debuginfo for one of its loaded libraries:

desoxy:~/git/xserver% valgrind /usr/bin/Xvfb :10
==7018== Memcheck, a memory error detector
==7018== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7018== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7018== Command: ./build/hw/vfb/Xvfb :10
==7018== Valgrind: debuginfo reader: ensure_valid failed:
==7018== Valgrind:   during call to ML_(img_get)
==7018== Valgrind:   request for range [460632, +12) exceeds
==7018== Valgrind:   valid image size of 333064 for image:
==7018== Valgrind:   "/usr/lib/debug/.build-id/3e/30f2307639da3a66b4c72c310049c659461253.debug"
==7018== Valgrind: debuginfo reader: Possibly corrupted debuginfo file.
==7018== Valgrind: I can't recover.  Giving up.  Sorry.
desoxy:~/git/xserver% rpm -qf /usr/lib/debug/.build-id/3e/30f2307639da3a66b4c72c310049c659461253.debug
desoxy:~/git/xserver% rpm -q valgrind

Filing this as a valgrind bug as I don't think we're doing anything
special in the libglvnd build that would emit broken dwarf.

Comment 1 Adam Jackson 2018-04-12 17:58:24 UTC
Moving this to rpm. Rebuilding libglvnd with %global debug_package %{nil} produces debuggable libraries, so this has to be a problem in find-debuginfo.sh or something it calls.

Comment 2 Mark Wielaard 2018-04-12 19:05:52 UTC
Replicated. But not yet investigated.

This is the valgrind backtrace (run under gdb) when ensure_valid_failed is hit:

#0  ensure_valid_failed (offset=460632, size=12, 
    caller=caller@entry=0x582221e0 "ML_(img_get)", img=<optimized out>, 
    img=<optimized out>) at m_debuginfo/image.c:1052
#1  0x00000000580d876e in ensure_valid (caller=0x582221e0 "ML_(img_get)", 
    size=12, offset=460632, img=0x1002976b30) at m_debuginfo/image.c:1076
#2  vgModuleLocal_img_get (dst=dst@entry=0x1002eace74, 
    img=img@entry=0x1002976b30, offset=offset@entry=460632, size=size@entry=12)
    at m_debuginfo/image.c:1085
#3  0x0000000058001522 in find_buildid (img=img@entry=0x1002976b30, 
    rel_ok=rel_ok@entry=0 '\000', search_shdrs=search_shdrs@entry=1 '\001')
    at m_debuginfo/readelf.c:1150
#4  0x00000000580017c6 in open_debug_file (
    name=name@entry=0x1002a4e1e0 "/usr/lib/debug/.build-id/3e/30f2307639da3a66b4c72c310049c659461253.debug", 
    buildid=buildid@entry=0x10028847b0 "3e30f2307639da3a66b4c72c310049c659461253", crc=crc@entry=0, rel_ok=rel_ok@entry=0 '\000', 
    serverAddr=serverAddr@entry=0x0) at m_debuginfo/readelf.c:1252
#5  0x000000005800192a in find_debug_file (di=di@entry=0x1002b8b960, 
    objpath=0x1002c8bec0 "/usr/lib64/libGL.so.1.7.0", 
    buildid=buildid@entry=0x10028847b0 "3e30f2307639da3a66b4c72c310049c659461253", 
    debugname=debugname@entry=0x1002a4e180 "libGL.so.1.7.0-1.0.1-0.1.20180226gitb029c24.fc28.x86_64.debug", crc=crc@entry=1951608855, 
    rel_ok=rel_ok@entry=0 '\000') at m_debuginfo/readelf.c:1308

Comment 3 Mark Wielaard 2018-04-12 19:13:39 UTC
This does look like a valgrind issue. If you look at the backtrace in comment 2 you'll notice this comes from "find_buildid". Looking at the source it looks like it is trying to get the buildid first though the phdrs and if that fails it should fall back on trying to get them through the shdrs. But the phdrs in a .debug file aren't reliable. So that is why getting the PT_NOTE fails. It really shouldn't trigger the "cannot recover" part. It should fall back to trying to find the build-id through the shdrs.

Comment 4 Mark Wielaard 2018-04-12 19:30:18 UTC
This seems to resolve the issue:

diff --git a/coregrind/m_debuginfo/readelf.c b/coregrind/m_debuginfo/readelf.c
index 70c28e629..8bd3e049c 100644
--- a/coregrind/m_debuginfo/readelf.c
+++ b/coregrind/m_debuginfo/readelf.c
@@ -1137,7 +1137,11 @@ HChar* find_buildid(DiImage* img, Bool rel_ok, Bool search_shdrs)
       ElfXX_Ehdr ehdr;
       ML_(img_get)(&ehdr, img, 0, sizeof(ehdr));
-      for (i = 0; i < ehdr.e_phnum; i++) {
+      /* Skip the phdrs when we have to search the shdrs. In separate
+         .debug files the phdrs might not be valid (they are a copy of
+         the main ELF file) and might trigger assertions when getting
+        image notes based on them. */
+      for (i = 0; !search_shdrs && i < ehdr.e_phnum; i++) {
          ElfXX_Phdr phdr;
          ML_(img_get)(&phdr, img,
                       ehdr.e_phoff + i * ehdr.e_phentsize, sizeof(phdr));

I'll report upstream and build new fedora valgrind packages.

Comment 5 Adam Jackson 2018-04-12 20:02:00 UTC
Oh very cool. Thanks for figuring it out so quickly!

Comment 6 Fedora Update System 2018-04-12 21:14:28 UTC
valgrind-3.13.0-18.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-6e2b5f0c1e

Comment 7 Fedora Update System 2018-04-15 02:22:37 UTC
valgrind-3.13.0-18.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-6e2b5f0c1e

Comment 8 Fedora Update System 2018-04-17 00:18:41 UTC
valgrind-3.13.0-18.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.