Bug 1889763 - Regression with binutils-2.35.1-5.fc34.x86_64 / LTO w/ libvirt causes error "missing __open_missing_mode" symbol
Summary: Regression with binutils-2.35.1-5.fc34.x86_64 / LTO w/ libvirt causes error ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1892753
TreeView+ depends on / blocked
 
Reported: 2020-10-20 14:22 UTC by Daniel Berrangé
Modified: 2020-11-02 01:11 UTC (History)
15 users (show)

Fixed In Version: binutils-2.35.1-10.fc34 binutils-2.35-14.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1892753 (view as bug list)
Environment:
Last Closed: 2020-11-02 01:11:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Proposed patch (1.15 KB, patch)
2020-10-23 12:56 UTC, Nick Clifton
no flags Details | Diff
virfdstream.i (1.30 MB, text/plain)
2020-10-23 18:33 UTC, Michal Privoznik
no flags Details
rhbz-1889763-zziplib.c (275 bytes, text/plain)
2020-10-27 11:01 UTC, Kamil Dudka
no flags Details
rhbz-1889763-zziplib.c (274 bytes, text/plain)
2020-10-27 11:03 UTC, Kamil Dudka
no flags Details

Description Daniel Berrangé 2020-10-20 14:22:53 UTC
Description of problem:
Since binutils-2.35.1-5.fc34.x86_64 libvirt build has started failing with missing symbols:

https://kojipkgs.fedoraproject.org//work/tasks/494/53840494/build.log

/usr/bin/ld: tests/libtest_qemu_driver.so: undefined reference to `__open_missing_mode'


AFAICT, libvirt passes "mode" in all cases where we pass O_CREAT for flags.

Disabling LTO removes the error, as does downgrading to binutils-2.35.1-4.fc34.x86_64 

Unfortunately I can't determine a simple standalone reproducer yet.

Version-Release number of selected component (if applicable):
binutils-2.35.1-5.fc34.x86_64 

How reproducible:
Always

Steps to Reproduce:
1. Scratch build of current libvirt source against rawhide.
2.
3.

Actual results:
/usr/bin/ld: tests/libtest_qemu_driver.so: undefined reference to `__open_missing_mode'

Expected results:
Compiles succesfully :-)

Additional info:

Comment 1 Nick Clifton 2020-10-22 10:33:53 UTC
The same problem occurs when building the nodejs package.
I do not know the cause yet, but I am investigating.

Comment 2 Nick Clifton 2020-10-23 12:56:48 UTC
Created attachment 1723757 [details]
Proposed patch

Fix calls to open() that need a third argument.

Comment 3 Nick Clifton 2020-10-23 12:57:08 UTC
Hi Daniel,

  Despite the very unhelpful linker error message.  I think that this problems is actually due to a bug in the qemu sources that form part of libvirt.  Specifically in 

at line 6237 there is:

  if ((ctxt->writefd = open(ctxt->path, O_WRONLY | O_CREAT | O_APPEND, S_IRUSR | S_IWUSR)) < 0) {

This call to open() is missing a mode argument for the O_CREAT flag.  I believe that it is this the root cause of the problem.
(Actually there appears to be a similar omission on line 6405 of that file as well...)

What I think is happening is that without LTO, the erroneous code is being eliminated by the compiler, and so no error has been seen before.  Now that LTO is enabled by default (in rawhide) the erroneous code is being retained so that it is available for the final LTO compilation.  Before the 2.35.1-5.fc34 binutils release however a bug in the linker's handling of lto-plugin symbols meant that the code was never pulled into the lto-compilation-then-link process, and so the issue was not seen.  (The linker bug meant that sometimes libraries would not be pulled into a link, if an earlier library appeared to provide all of the needed symbols).

With 2.35.1-5.fc34 the linker bug was fixed and now the lto-compilation process sees the code.  I theorize however that definition of the open() function (in /usr/include/bits/fcntl2.h) had somehow lost error function attribute that is supposed to be on the __open_missing_mode symbol, (presumably because of an LTO bug) and so it now compiles a call to the non-existant __open_missing_mode function...

Anyway, I am going to upload a potential patch for libvirt.  Although my lack of familiarity with the package means that I do not know  how to apply it.  Hopefully you will have more success with testing it.

Cheers
  Nick

Comment 4 Florian Weimer 2020-10-23 13:06:32 UTC
Sorry, this patch is not correct. Both calls already contain a mode argument.

I think this is a GCC bug. If there is a reference left to __open_missing_mode, the compilation *must* fail:

'error ("MESSAGE")'
'warning ("MESSAGE")'
     If the 'error' or 'warning' attribute is used on a function
     declaration and a call to such a function is not eliminated through
     dead code elimination or other optimizations, an error or warning
     (respectively) that includes MESSAGE is diagnosed.  This is useful
     for compile-time checking, especially together with
     '__builtin_constant_p' and inline functions where checking the
     inline function arguments is not possible through 'extern char
     [(condition) ? 1 : -1];' tricks.

Comment 5 Michal Privoznik 2020-10-23 13:13:12 UTC
(In reply to Nick Clifton from comment #3)
> Hi Daniel,
> 
>   Despite the very unhelpful linker error message.  I think that this
> problems is actually due to a bug in the qemu sources that form part of
> libvirt.  Specifically in 
> 
> at line 6237 there is:
> 
>   if ((ctxt->writefd = open(ctxt->path, O_WRONLY | O_CREAT | O_APPEND,
> S_IRUSR | S_IWUSR)) < 0) {
> 
> This call to open() is missing a mode argument for the O_CREAT flag.  I
> believe that it is this the root cause of the problem.

Am I misunderstanding something? I can see three arguments passed and the last one (mode) is "S_IRUSR | S_IWUSR" which is a verbose way of writing 0600.

Comment 6 Jakub Jelinek 2020-10-23 13:34:32 UTC
Do any functions calling open use something like optimize (0) attribute or similar ways to prevent optimizations?
Because the glibc header only enables these open compile time checks if using _FORTIFY_SOURCE{,=2} and if -O1 or higher; furthermore,
the function is extern inline __attribute__((always_inline)), so it should be inlined already by the early inliner (unless -fno-early-inlining ?),
and __va_arg_pack_len () < 1 should fold into 1 < 1 when the mode is specified.  Thus, unless optimizations are massively disabled, I don't see how that
__open_missing_mode could survive until IPA (and thus it shouldn't be even LTO related).
So, if the source has optimize (0) etc., avoid calling open/open64/etc. in those functions or drop the attribute or call them from helper functions
that are optimized and may not be inlined.
Otherwise, it would be interesting to see -fdump-tree-all dumps from a TU which results in the __open_missing_mode calls in the IL.
You can just run nm on the *.o file and see which one it is.

Comment 7 Michal Privoznik 2020-10-23 15:50:41 UTC
By trial and error I've converged to a minimal set of CFLAGS="-O2 -flto=auto" which gets this error:

cc  -o tests/cputest src/libvirt_probes.o tests/cputest.p/cputest.c.o -Wl,--as-needed -Wl,--no-undefined -Wl,-export-dynamic -pie -Wl,--whole-archive -Wl,--start-group tests/libtest_utils.a tests/libtest_utils_qemu.a -Wl,--no-whole-archive -O2 -flto=auto src/libvirt.so.0.6009.0 tests/libtest_utils_qemu_monitor.a tests/libtest_qemu_driver.so -Wl,--no-copy-dt-needed-entries -Wl,-export-dynamic -ldl /usr/lib64/libglib-2.0.so /usr/lib64/libgobject-2.0.so /usr/lib64/libgio-2.0.so /usr/lib64/libgnutls.so /usr/lib64/libnl-3.so /usr/lib64/libnl-route-3.so /usr/lib64/libxml2.so /usr/lib64/libsasl2.so -lselinux /usr/lib64/libtirpc.so /usr/lib64/libyajl.so -Wl,-export-dynamic -lselinux -Wl,-export-dynamic -lselinux -Wl,-export-dynamic -lselinux -Wl,--end-group '-Wl,-rpath,$ORIGIN/../src:$ORIGIN/' -Wl,-rpath-link,/home/zippy/tmp/libvirt.git/_build/src -Wl,-rpath-link,/home/zippy/tmp/libvirt.git/_build/tests
/usr/bin/ld: tests/libtest_qemu_driver.so: undefined reference to `__open_missing_mode'
collect2: error: ld returned 1 exit status

But if I disable optimizations CFLAGS="-O0 -flto=auto" then I can't reproduce and the build succeeds. Hope this helps.

Comment 8 Jakub Jelinek 2020-10-23 15:56:42 UTC
So, can you please
nm tests/cputest.p/cputest.c.o tests/libtest_utils.a tests/libtest_utils_qemu.a src/libvirt.so.0.6009.0 tests/libtest_utils_qemu_monitor.a | less
and find there which *.o file has __open_missing_mode references?

Comment 9 Michal Privoznik 2020-10-23 17:38:25 UTC
That did no yield any, but using a brute search I've found the following:

$ pwd
/home/zippy/tmp/libvirt.git/_build

$ for i in $(find . -name "*.o"); do if test -n "$(nm $i | grep __open_missing_mode)" ; then echo $i; fi; done
./src/util/libvirt_util.a.p/virfdstream.c.o
./src/qemu/libvirt_driver_qemu_impl.a.p/qemu_domain.c.o


In virfdstream.c there are only two open():

https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virfdstream.c#L1267

    if (oflags & O_CREAT)
        fd = open(path, oflags, mode);
    else
        fd = open(path, oflags);


but in qemu_domain.c there is one open that looks very suspicious:

https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_domain.c#L6254

    if ((ctxt->readfd = open(ctxt->path, O_RDONLY, S_IRUSR | S_IWUSR)) < 0) {

Nevertheless, compiler did not complain, and in fact, when I remove the mode, it doesn't fix the problem. I've inspected other calls of open() in both files and found nothing suspicious.

Comment 10 Jakub Jelinek 2020-10-23 18:07:14 UTC
So, can you please recompile virfdstream.c with the options that were used to compile it with additional -save-temps and attach here virfdstream.i it creates + paste the options?
Thanks.

Comment 11 Jakub Jelinek 2020-10-23 18:11:18 UTC
Note
    if (oflags & O_CREAT)
         fd = open(path, oflags, mode);
     else
         fd = open(path, oflags);
doesn't make any sense unless mode would be e.g. uninitialized if oflags doesn't have O_CREAT set (but that is not the case).
Just do
    fd = open(path, oflags, mode);
Anyway, I'd really like to understand what's going on.

Comment 12 Michal Privoznik 2020-10-23 18:33:32 UTC
Created attachment 1723813 [details]
virfdstream.i

Comment 13 Michal Privoznik 2020-10-23 18:37:10 UTC
(In reply to Jakub Jelinek from comment #11)
>

This got me experimenting, and if I provide mode for all open() (in those two files virfdstream.c and qemu_domain.c) - even those that open only for reading then the problem is gone. But this is not that surpising, is it?

Comment 14 Jakub Jelinek 2020-10-23 19:12:39 UTC
You certainly don't need to provide a mode to all open calls.  All I wanted to say is that the above pass mode only if O_CREAT is in the flags and not otherwise is a complete optimization anti-pattern, it will make the code slower unless oflags is optimized into a constant.
Anyway, in the tree dumps I see that before IPA in the one function which isn't inline yet we have basically:
oflags |= O_NOCCTY;
if (oflags & O_CREAT)
  fd = __builtin_constant_p (oflags) ? __open_alias (path, oflags, mode) : __open_alias (path, oflags, mode);
else if (__builtin_constant_p (oflags))
  {
    if ((oflags & __O_TMPFILE) == __O_TMPFILE)
      {
        __open_missing_mode ();
        fd = __open_2 (path, oflags);
      }
    else
      fd = __open_alias (path, oflags);
  }
else
  fd = __open_2 (path, oflags);
and that function is pretty large, called from 4 places where two of them pass in their parameter | O_CREAT.
The reason the O_CREAT check inside is optimized away is because of the O_CREAT check from the user code.
Now, without -flto both of the __builtin_constant_p (oflags) calls are folded into 0 during vrp1 and the __open_missing_mode () call is optimized away because of that during cfg cleanup at the end of vrp.
So something else must be going on during LTO, perhaps some callers pass constant to oflags (otherwise I don't see how it would fold to non-zero or not be optimized away), but for some reason the oflags & __O_TMPFILE check isn't optimized to 0 or something.
That said, unless some caller calls these functions with O_TMPFILE, then the error would be completely valid and it would be libvirt bug.

Thus, what oflags pass callers of virFDStreamOpenFile or virFDStreamOpenBlockDevice to those functions (or their callers etc.)?

Comment 15 Michal Privoznik 2020-10-23 19:37:26 UTC
(In reply to Jakub Jelinek from comment #14)
> You certainly don't need to provide a mode to all open calls.  All I wanted
> to say is that the above pass mode only if O_CREAT is in the flags and not
> otherwise is a complete optimization anti-pattern, it will make the code
> slower unless oflags is optimized into a constant.

I know that I don't need to provide mode. I'm just experimenting and seeing what sticks on the wall.

> Anyway, in the tree dumps I see that before IPA in the one function which
> isn't inline yet we have basically:
> oflags |= O_NOCCTY;
> if (oflags & O_CREAT)
>   fd = __builtin_constant_p (oflags) ? __open_alias (path, oflags, mode) :
> __open_alias (path, oflags, mode);
> else if (__builtin_constant_p (oflags))
>   {
>     if ((oflags & __O_TMPFILE) == __O_TMPFILE)
>       {
>         __open_missing_mode ();
>         fd = __open_2 (path, oflags);
>       }
>     else
>       fd = __open_alias (path, oflags);
>   }
> else
>   fd = __open_2 (path, oflags);
> and that function is pretty large, called from 4 places where two of them
> pass in their parameter | O_CREAT.
> The reason the O_CREAT check inside is optimized away is because of the
> O_CREAT check from the user code.
> Now, without -flto both of the __builtin_constant_p (oflags) calls are
> folded into 0 during vrp1 and the __open_missing_mode () call is optimized
> away because of that during cfg cleanup at the end of vrp.
> So something else must be going on during LTO, perhaps some callers pass
> constant to oflags (otherwise I don't see how it would fold to non-zero or
> not be optimized away), but for some reason the oflags & __O_TMPFILE check
> isn't optimized to 0 or something.
> That said, unless some caller calls these functions with O_TMPFILE, then the
> error would be completely valid and it would be libvirt bug.
> 
> Thus, what oflags pass callers of virFDStreamOpenFile or
> virFDStreamOpenBlockDevice to those functions (or their callers etc.)?

Nothing passes O_TMPFILE, the only occurrence within the whole code base of that flag is in one comment.
And callers of virFDStreamOpenFile() and virFDStreamOpenBlockDevice() pass constant oflags:

src/lxc/lxc_driver.c:2781:    if (virFDStreamOpenFile(st, chr->source->data.file.path,
src/lxc/lxc_driver.c-2782-                            0, 0, O_RDWR) < 0)

src/qemu/qemu_driver.c:3485:    if (virFDStreamOpenFile(st, tmp, 0, 0, O_RDONLY) < 0) 

src/test/test_driver.c:7749:    if (virFDStreamOpenFile(st, PKGDATADIR "/test-screenshot.png", 0, 0, O_RDONLY) < 0)

src/vbox/vbox_common.c:7423:                if (virFDStreamOpenFile(st, tmp, 0, 0, O_RDONLY) < 0) {

tests/fdstreamtest.c:82:    if (virFDStreamOpenFile(st, file,
tests/fdstreamtest.c-83-                            PATTERN_LEN / 2, PATTERN_LEN * 9,
tests/fdstreamtest.c-84-                            O_RDONLY) < 0)

src/storage/storage_util.c:2439:    return virFDStreamOpenBlockDevice(stream, target_path,
src/storage/storage_util.c-2440-                                      offset, len, sparse, O_WRONLY);

src/storage/storage_util.c:2470:    return virFDStreamOpenBlockDevice(stream, target_path,
src/storage/storage_util.c-2471-                                      offset, len, sparse, O_RDONLY);

Comment 16 Boris Fiuczynski (IBM) 2020-10-27 10:55:59 UTC
I ran into this problem with binutils-2.35-11.fc33.s390x on Fedora 33.
Is this is also happening on x86?

Comment 17 Kamil Dudka 2020-10-27 11:01:14 UTC
Created attachment 1724523 [details]
rhbz-1889763-zziplib.c

Legacy code of zziplib, which takes an address of &open, also hit this issue.  A minimal example is attached.

Comment 18 Kamil Dudka 2020-10-27 11:03:49 UTC
Created attachment 1724524 [details]
rhbz-1889763-zziplib.c

Comment 19 Kamil Dudka 2020-10-27 11:05:39 UTC
$ curl -JO 'https://bugzilla.redhat.com/attachment.cgi?id=1724524'
$ sh rhbz-1889763-zziplib.c
+ gcc rhbz-1889763-zziplib.c -Wp,-D_FORTIFY_SOURCE=2 -Wl,--export-dynamic -O2 -flto=auto -shared -o rhbz-1889763-zziplib.c.so
+ readelf -s rhbz-1889763-zziplib.c.so
+ grep __open
     3: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __open_too_many_args
     5: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __open_missing_mode
    42: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __open_2@@GLIBC_2.7
+ exit 0

Comment 20 Jakub Jelinek 2020-10-27 11:25:36 UTC
I can't reproduce using your above minimal example, one gets there open and __open_2 references which is ok, no __open_missing_mode.
And, while I can reproduce it with libvirt, it doesn't seem to look like a GCC bug, but more likely binutils (or perhaps the linker plugin).
Because, if I look at the assembly coming out of lto1 which is what all the *.o files with LTO bytecode are supposed to be replaced with,
there is no __open_missing_mode at all.  Sure, in the *.o files when using gcc-nm or when nm can find the linker plugin __open_missing_mode
is seen, but not without it.

Comment 21 Jakub Jelinek 2020-10-27 11:33:39 UTC
Strange, now I can using the #c18 testcase, but those __open_too_many_args and __open_missing_mode strangely only show up in .dynsym and not in .symtab, that is the first time I've seen symbols only in the former and not in the latter.
So it really seems like a binutils bug.

Comment 22 Jakub Jelinek 2020-10-27 12:27:07 UTC
Another example where binutils 2.35.1 behaves differently from 2.31:
#include <unistd.h>
int foo (int x) { if (__builtin_constant_p (x)) return getpid (); return 0; }
gcc -shared -fpic -O2 -flto=auto -Wl,-E -o test.so test.c
With binutils 2.35.1 undefined getpid symbol shows up in both .dynsym and .symtab, with 2.31 it doesn't.
And, if one looks at objdump -dr test.so , getpid isn't called there at all, and it doesn't show up
in test*ltrans*.s either (when -save-temps is added).

Comment 23 Nick Clifton 2020-10-29 09:21:01 UTC
I *think* that I may have found the problem.  Please could you try binutils-2.35.1-9.fc34 and see if this resolves the issue.

Comment 24 Michal Privoznik 2020-10-29 10:16:28 UTC
Yep, binutils-2.35.1-9.fc34 fixes the problem for me.

Comment 25 Kamil Dudka 2020-10-29 10:55:23 UTC
(In reply to Nick Clifton from comment #23)
> I *think* that I may have found the problem.  Please could you try
> binutils-2.35.1-9.fc34 and see if this resolves the issue.

It fixes the zziplib build issue as well.  Thanks for the fix!

Comment 26 Nick Clifton 2020-10-29 12:51:34 UTC
Ah no, sorry - the patch does not work.  It basically reverts the original change that triggered this bug, but it does not solve the bug that that change was trying to fix.

H.J.Lu has posted an alternative patch, so I am trying that out now.

Comment 27 Nick Clifton 2020-10-29 20:12:33 UTC
Right - H.J.'s patch works, so I have brought it in to rawhide.  Please try binutils-2.35.1-10.fc34.

Comment 28 Daniel Berrangé 2020-10-30 09:11:24 UTC
We have a user reporting the same libvirt error message about _open_missing_mode when on F33 too now. Does the fix need to be applied to F33 too ?

Comment 29 Kamil Dudka 2020-10-30 09:20:35 UTC
I confirm that both attachment #1724524 [details] and the build of zziplib-0.13.69-9.fc34 work as expected with binutils-2.35.1-10.fc34.

Comment 30 Nick Clifton 2020-10-30 10:11:25 UTC
(In reply to Daniel Berrangé from comment #28)
> We have a user reporting the same libvirt error message about
> _open_missing_mode when on F33 too now. Does the fix need to be applied to
> F33 too ?

Yes it does.  Working on it now.

Comment 31 Fedora Update System 2020-10-30 13:23:20 UTC
FEDORA-2020-6f4c3e05ea has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-6f4c3e05ea

Comment 32 Fedora Update System 2020-10-31 02:55:06 UTC
FEDORA-2020-6f4c3e05ea has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-6f4c3e05ea`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-6f4c3e05ea

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 33 Fedora Update System 2020-11-02 01:11:07 UTC
FEDORA-2020-6f4c3e05ea has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.