Bug 507682

Summary: (RFE) Refine error description
Product: [Fedora] Fedora Reporter: Malte Nuhn <nuhn>
Component: elfutilsAssignee: Roland McGrath <roland>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: drepper, fche, jlebon, mjw, roland
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-08 18:40:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to improve error if compression is not supported none

Description Malte Nuhn 2009-06-23 18:33:05 UTC
When using systemtap i built and installed some debuginfo from an SRPM. Unfortunately I only installed the selfmade debuginfo and not the newly made library itself. This must have lead to an incosistent pair of "libxxx.so" and "libxxx.so.debug".

When using systemtap, i got the message:

"WARNING: cannot find module /lib64/libc-2.5.so debuginfo: No DWARF information found'"

Frank Ch. Eigler told me that (at least one part of) the above warning is produced by elfutils... so here's what i guess might be improved in elfutils:

The misleading fact of the above warning is, that the DWARF information existed, but it didn't match the object file. Perhaps this might save a lot of time for future users of Systemtap when the above warning is refined as followed:

1) No debuginfo found - Keep the message as it is
2) Debuginfo found but some error occurs (doesn't match, ...)

"WARNING: cannot find module /lib64/libc-2.5.so debuginfo: No _matching_ DWARF information found'"

I am using Systemtap "SystemTap translator/driver (version 0.9.8/0.141 non-git sources)" meaning elfutils 0.141 is affected.

Comment 1 Malte Nuhn 2009-06-25 15:17:37 UTC
I built a patch for that and tested it. When using systemtap, the user can now differentiate between "non matching" debuginfo and "non existing" debuginfo.

I'm quite sure that the core developers won't like this patch as it is, but perhaps this can be reused and improved by one of you guys.



diff --git a/libdwfl/dwfl_module_getdwarf.c b/libdwfl/dwfl_module_getdwarf.c
index 652383b..6c88362 100644
--- a/libdwfl/dwfl_module_getdwarf.c
+++ b/libdwfl/dwfl_module_getdwarf.c
@@ -218,6 +218,17 @@ find_debuginfo (Dwfl_Module *mod)
 							   debuglink_file,
 							   debuglink_crc,
 							   &mod->debug.name);
+ 
+  if (mod->debug.fd<0 && errno == 1) {
+	/*
+	  we didn't find any _working_ debuginfo, but there
+	  was at least one file that couldn't been validated.
+	  this might sometimes be an interesting information
+	  for end-users.
+	*/
+	  return DWFL_E_VALIDATION_FAILED;
+  }
+
   return open_elf (mod, &mod->debug);
 }
 
diff --git a/libdwfl/find-debuginfo.c b/libdwfl/find-debuginfo.c
index a01293e..9eab772 100644
--- a/libdwfl/find-debuginfo.c
+++ b/libdwfl/find-debuginfo.c
@@ -122,6 +122,7 @@ find_debuginfo_in_path (Dwfl_Module *mod, const char *file_name,
 			const char *debuglink_file, GElf_Word debuglink_crc,
 			char **debuginfo_file_name)
 {
+  bool validation_failed = 0;
   bool cancheck = debuglink_crc != (GElf_Word) 0;
 
   const char *file_basename = file_name == NULL ? NULL : basename (file_name);
@@ -208,12 +209,30 @@ find_debuginfo_in_path (Dwfl_Module *mod, const char *file_name,
 	  *debuginfo_file_name = fname;
 	  return fd;
 	}
+      validation_failed = 1;
       free (fname);
       close (fd);
     }
 
-  /* No dice.  */
+
+  /* we didn't find any working debuginfo */
   errno = 0;
+
+  if (validation_failed) {
+        /* 
+	   we didn't find any _working_ debuginfo, but there
+	   was at least one file that couldn't been validated.
+	   this might sometimes be an interesting information
+	   for end-users. 
+	    
+           This case (fd==-1 and errno==1) will be taken care
+           of in dwfl_module_getdwarf.c:find_debuginfo (Dwfl_Module *mod)
+	   and will produce the DWFL_E_VALIDATION_FAILED error.
+	*/
+
+        errno = 1;
+  }
+
   return -1;
 }
 
diff --git a/libdwfl/libdwflP.h b/libdwfl/libdwflP.h
index 6ba5c96..914f500 100644
--- a/libdwfl/libdwflP.h
+++ b/libdwfl/libdwflP.h
@@ -81,6 +81,7 @@
   DWFL_ERROR (BADSTROFF, N_("offset out of range"))			      \
   DWFL_ERROR (RELUNDEF, N_("relocation refers to undefined symbol"))	      \
   DWFL_ERROR (CB, N_("Callback returned failure"))			      \
+  DWFL_ERROR (VALIDATION_FAILED, N_("Validation of DWARF information failed"))\
   DWFL_ERROR (NO_DWARF, N_("No DWARF information found"))		      \
   DWFL_ERROR (NO_SYMTAB, N_("No symbol table found"))			      \
   DWFL_ERROR (NO_PHDR, N_("No ELF program headers"))			      \

Comment 2 Mark Wielaard 2015-01-21 00:11:36 UTC
*** Bug 1184245 has been marked as a duplicate of this bug. ***

Comment 3 Frank Ch. Eigler 2015-01-21 00:23:08 UTC
It would be helpful to audit all the cases where the summary
NO_DWARF is currently emitted, and if at all possible, detailed
sub-errors be used instead or in addition.  The error message
should help users identify and correct the exact problem, not
just a mystifying "something is wrong with [dwarf]".

Comment 4 Jonathan Lebon 2015-02-13 16:42:11 UTC
Created attachment 991438 [details]
patch to improve error if compression is not supported

Here's a small potential patch to address the concerns in BZ1184245.

Without patch:
(Assume the executable temp is built with compressed debug sections.)

$ stap -e 'probe process("temp").statement("main:5") { exit() }'
WARNING: cannot find module /home/yyz/jlebon/codebase/systemtap/systemtap/temp debuginfo: No DWARF information found [man warning::debuginfo]
semantic error: while resolving probe point: identifier 'process' at <input>:1:7
        source: probe process("temp").statement("main:5") { exit() }
                      ^

semantic error: no match

Pass 2: analysis failed.  [man error::pass2]
$ 

With patch:

$ stap -e 'probe process("temp").statement("main:5") { exit() }'
WARNING: cannot find module /home/yyz/jlebon/codebase/systemtap/systemtap/temp debuginfo: compressed DWARF unsupported [man warning::debuginfo]
semantic error: while resolving probe point: identifier 'process' at <input>:1:7
        source: probe process("temp").statement("main:5") { exit() }
                      ^

semantic error: no match

Pass 2: analysis failed.  [man error::pass2]
$ 

Also now looking at other potential errors to improve (including the original post for this BZ).

Feedback welcome!

Comment 5 Mark Wielaard 2015-03-11 14:47:55 UTC
(In reply to Jonathan Lebon from comment #4)
> Created attachment 991438 [details]
> patch to improve error if compression is not supported

Thanks. It might be good to move this discussion to the upstream mailinglist:
elfutils-devel.org.
https://fedorahosted.org/mailman/listinfo/elfutils-devel
https://git.fedorahosted.org/cgit/elfutils.git/plain/CONTRIBUTING

> Here's a small potential patch to address the concerns in BZ1184245.
> [...]
> Feedback welcome!

I do think we should warn harder when someone tries to configure/elfutils without zlib support. But if they do, then your patch looks like a good idea. I would just remove the // useless here stuff. And we should probably also error in the same way when zlib support is there, but the section failed to decompress.

Comment 6 Frank Ch. Eigler 2015-03-11 15:52:09 UTC
(With the case of the *-uncompression support, would you consider letting elfutils recognize all those extensions, but reject their use at run time with an appropriate error?  That's instead of compiling out even the ".xz" / ".zdebug" / ... suffix/prefix recognition tables.)

Comment 7 Mark Wielaard 2015-03-11 16:16:46 UTC
(In reply to Frank Ch. Eigler from comment #6)
> (With the case of the *-uncompression support, would you consider letting
> elfutils recognize all those extensions, but reject their use at run time
> with an appropriate error?  That's instead of compiling out even the ".xz" /
> ".zdebug" / ... suffix/prefix recognition tables.)

Note that I think what you are suggesting is something different from what the patch in comment #4 is handling. That patch is dealing with .zdebug sections, which are sections that contain data that happens to be zlib compressed as handled by libdw.

I think you are talking about the support for opening ELF images that are compressed through various schemes as handled in libdwfl. But that support doesn't depend on suffix or prefix recognition tables. When libdwfl tries to reads any ELF image it will just try all compiled in decompression formats till it gets a valid ELF image.

Or are you talking about dwfl_linux_kernel_report_offline/dwfl_linux_kernel_find_elf which deal with finding compressed kernel module files on disk? That does indeed use a suffix based scheme (see check_suffix in libdwfl/linux-kernel-modules.c). Is your suggestion that the callback should return any of those even though it knows the kernel module file as is cannot be handled? I think that would be OK, the caller would then try to use the file and get an error. I haven't checked which error though (probably just BAD_ELF).

Comment 8 Jonathan Lebon 2015-05-11 19:43:01 UTC
I am working on a patch series to address the issues brought up here.

The patches address one issue mentioned in BZ1184245: when elfutils fails to decompress a section (either zlib failed or it's not built in), SystemTap will now give the following message:

WARNING: cannot find module /home/jlebon/code/systemtap/systemtap/temp debuginfo: cannot decompress DWARF [man warning::debuginfo]

Additionally, if elfutils fails to open any debug files, it will now report to SystemTap all the paths it tried along with the reason why it failed. In the common case (if you simply do not have debuginfo installed for an executable), SystemTap will report something like this:

WARNING: cannot find module /usr/bin/find debuginfo: No DWARF information found (tried to open /usr/lib/debug/.build-id/73/bf718a5b9fae72c2c8c8fb4cb270cc371f9605.debug (errno 2), /var/cache/abrt-di/usr/lib/debug/.build-id/73/bf718a5b9fae72c2c8c8fb4cb270cc371f9605.debug (errno 2), /usr/bin/find.debug (errno 2), /usr/bin/.debug/find.debug (errno 2), /usr/lib/debug/usr/bin/find.debug (errno 2), /var/cache/abrt-di/usr/lib/debug/usr/bin/find.debug (errno 2)) [man warning::debuginfo]

Finally, the original issue mentioned in this BZ is also covered. If elfutils finds a debuginfo file which fails to validate (the build ID/CRC don't match), it will be accordingly reported as "mismatched" for that file:

WARNING: cannot find module /home/jlebon/code/systemtap/systemtap/temp debuginfo: No DWARF information found (tried to open /usr/lib/debug/.build-id/77/84998137100332dab20ff0bb1859bfa8d033d9.debug (errno 2), /var/cache/abrt-di/usr/lib/debug/.build-id/77/84998137100332dab20ff0bb1859bfa8d033d9.debug (errno 2), /home/jlebon/code/systemtap/systemtap/temp.debug (mismatched), /home/jlebon/code/systemtap/systemtap/.debug/temp.debug (errno 2), /usr/lib/debug/home/jlebon/code/systemtap/systemtap/temp.debug (errno 2), /var/cache/abrt-di/usr/lib/debug/home/jlebon/code/systemtap/systemtap/temp.debug (errno 2)) [man warning::debuginfo]

Comment 9 Mark Wielaard 2015-06-08 12:15:53 UTC
This will be partially solved in the upcoming elfutils 0.162 (better compression issues reports). Other improvements will be deferred to after 0.162. See also the following upstream discussion:
https://lists.fedorahosted.org/pipermail/elfutils-devel/2015-May/thread.html#4841

Comment 10 Frank Ch. Eigler 2023-03-08 18:40:09 UTC
(closing this old bug; messages and mechanisms have improved long since)