Bug 1477354
Summary: | ld.bfd: binutils 2.29 dropped global visibility of implicit symbols pertaining the (orphaned) sections (__{start,stop}_S) on some architectures (not ppc64*) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan Pokorný [poki] <jpokorny> |
Component: | binutils | Assignee: | Nick Clifton <nickc> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | fweimer, jakub, nickc |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-04 07:18:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Pokorný [poki]
2017-08-01 21:03:36 UTC
re [comment 0]: > In libqb unit test that now fails, the problem may be similar: > the library's section "wins" over binary's proper one within the > in-library-call. Indeed, a probable root cause in libqb is that qb_log_init (in-library-call) refers to the pointers for section boundaries that are expected to refer to the context of "main compilation unit" (possibly out-of-library), but with ld.bfd @ 2.29, they refer to the library context uncondtionally: https://github.com/ClusterLabs/libqb/blob/v1.0.2/lib/log.c#L873 Trying to set various visibility attributes at the extern declaration for the section boundary symbols was fruitless: https://github.com/ClusterLabs/libqb/blob/v1.0.2/include/qb/qblog.h#L268 Is there any apparent solution (barring a complete rethinking) for libqb to stay compatible with unpatched binutils 2.29? Any idea on a shortest test possible (rather than being fixed on ld.bfd --version) for a configure script to detect misbehaving linker in this sense? The example in [comment 0] feels still too large. Can this indeed be considered a bug in ld.bfd? Hi Jan, Relying upon the linker's orphan section placement algorithm to remain constant is a very bad idea IMHO. As an alternative, why not tell the linker exactly what you want ? For example with the small test case you reported, if you change the second gcc command to this: gcc -Wl,foo.t -o test -L. -ldummy test.c then I think that the test will work. The foo.t file looks like this: # cat foo.t SECTIONS { __foo : { PROVIDE (__start___foo = .); *(__foo); } } This just tells the linker to accumulate all __foo sections together and to define a symbol called __start___foo at the beginning of the section. Since the file is included on the command line as an input file, rather than being the argument of a -T or --script option, the linker will use the contents to *augment* the built-in linker script, rather than override it. This solution also has the advantage of being backwards compatible, so it should work the 2.28 linker as well. Does this solve the problem for you ? Cheers Nick Nick, thanks for the response.
Yes, it's entirely possible that previously libqb was relying on some
behaviour that was working just ouf of pure luck (though it lasted for
at least 6 recent years).
And sadly, nope, the proposed update to the build recipe didn't help,
furthermore I am getting this complaint:
> /usr/bin/ld: warning: foo.t contains output sections; did you forget -T?
Easy way out would be to extend the interface of the library function:
=== dummylib.c ===
#include <stdio.h>
#define JOIN_PASS(a, b) a##b
#define JOIN(a, b) JOIN_PASS(a, b)
#define STRINGIFY_PASS(a) #a
#define STRINGIFY(a) STRINGIFY_PASS(a)
#define SECTION_NAME __foo
#define SECTION_START JOIN(__start_, SECTION_NAME)
#define SECTION_STR STRINGIFY(SECTION_NAME)
extern int SECTION_START[];
static void dummy(const int *start) {
{ static int __attribute__((section(SECTION_STR))) z = 13; }
{ static int __attribute__((section(SECTION_STR))) z = 17; }
printf("dummy finished %d %d\n",
*((int *) start), *((int *) start + 1));
}
int do_something(int n, const int *start) { dummy(start); return n*n; }
=== test.c ===
#include <stdio.h>
#define JOIN_PASS(a, b) a##b
#define JOIN(a, b) JOIN_PASS(a, b)
#define STRINGIFY_PASS(a) #a
#define STRINGIFY(a) STRINGIFY_PASS(a)
#define SECTION_NAME __foo
#define SECTION_START JOIN(__start_, SECTION_NAME)
#define SECTION_STR STRINGIFY(SECTION_NAME)
extern int SECTION_START[];
int do_something(int n, const int *start);
static void test(void) {
{ static int __attribute__((section(SECTION_STR))) z = 42; }
{ static int __attribute__((section(SECTION_STR))) z = 10; }
printf("test finished %d %d\n",
*((int *) SECTION_START), *((int *) SECTION_START + 1));
}
int main(char argc, char *argv[]) {
int ret = do_something(argc, SECTION_START);
test();
printf("got it? %d %d\n",
*((int *) SECTION_START), *((int *) SECTION_START + 1));
return ret;
}
===
But what if we don't want to change do_something interface at any cost?
Can we somehow arrange for programmatically (dlopen + dlsym?) accessing
SECTION_START of test.c from within do_something (within dummylib.c)?
(In reply to Jan Pokorný from comment #3) Hi Jan, > And sadly, nope, the proposed update to the build recipe didn't help, What happened - did the behaviour change at all ? > furthermore I am getting this complaint: > > > /usr/bin/ld: warning: foo.t contains output sections; did you forget -T? Yes, I was hoping that that would not matter to you. (You can ignore the message, it does not effect the output). I also note that when I tried your original test case I received this warning from the linker: ld: warning: type and size of dynamic symbol `__start___foo' are not defined Do you get this too ? > But what if we don't want to change do_something interface at any cost? > Can we somehow arrange for programmatically (dlopen + dlsym?) accessing > SECTION_START of test.c from within do_something (within dummylib.c)? I must admit that I still do not really understand why libqb is doing this. What is wrong with allowing the compiler and linker to do their thing and arrange for access to data in the normal way ? One idea - untested - is that in the library, instead of referring to __start___foo, you refer to the address of the first declared symbol ? (Hoping that the linker does not rearrange the order of the symbols, which of course it may well do). Cheers Nick re [comment 4] > What happened - did the behaviour change at all ? Yep, mere combination of files/commands from [comment 0] and [comment 2] did not change the behaviour at all. >> furthermore I am getting this complaint: >> >> > /usr/bin/ld: warning: foo.t contains output sections; did you forget -T? > > Yes, I was hoping that that would not matter to you. (You can ignore the > message, it does not effect the output). It doesn't matter to me in any other way than, frankly, I don't understand low-level linker magic (and barely some user-facing options), so whatever is complained about can have consequences I cannot predict :) > I also note that when I tried > your original test case I received this warning from the linker: > > ld: warning: type and size of dynamic symbol `__start___foo' are not defined > > Do you get this too ? No, haven't seen this with binutils-2.29-5.fc27.x86_64, gcc-7.1.1-6.fc27.x86_64. > I must admit that I still do not really understand why libqb is > doing this. What is wrong with allowing the compiler and linker to > do their thing and arrange for access to data in the normal way ? Original author is no longer active (or with RH, for that matter). As noted above, I am only coincidentally getting to explore this rabbit hole. I guess there could be two reasons why qb_log_init is not using the pattern I sketched in [comment 3]: - a goal to maintain ABI/API compatibility from times there wasn't this compile-time offloading of run-time work - unified approach (single header to do the right job) used for both intra-library and actual user logging, and it was working well with ld.bfd prior to 2.29 (rhetorical question: is there only a single project affected by this change? I would doubt that) > One idea - untested - is that in the library, instead of referring to > __start___foo, you refer to the address of the first declared symbol ? > (Hoping that the linker does not rearrange the order of the symbols, > which of course it may well do). I am not sure I follow, but I tried to workaround the section aliasing issue by adding one more section that's expected to be present only in binary, not in the library: --- dummylib.c.orig 2017-08-02 18:28:53.252179427 +0200 +++ dummylib.c 2017-08-02 19:07:35.888675684 +0200 @@ -7,10 +7,18 @@ #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; +#define CROSSUNIT 1 +#if CROSSUNIT +extern int *__start___bar[]; +#endif static void dummy(void) { { static int __attribute__((section(SECTION_STR))) z = 13; } { static int __attribute__((section(SECTION_STR))) z = 17; } printf("dummy finished %d %d\n", +#if CROSSUNIT + **((int **) __start___bar), *(*((int **) __start___bar) + 1)); +#else *((int *) SECTION_START), *((int *) SECTION_START + 1)); +#endif } int do_something(int n) { dummy(); return n*n; } --- test.c.orig 2017-08-02 18:36:24.460664287 +0200 +++ test.c 2017-08-02 19:06:31.324606291 +0200 @@ -7,14 +7,17 @@ #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; +extern int *__start___bar[]; int do_something(int n); static void test(void) { - { static int __attribute__((section(SECTION_STR))) z = 42; } + { static int __attribute__((section(SECTION_STR))) z = 42; + static int __attribute__((section("__bar"))) *y = &z; } { static int __attribute__((section(SECTION_STR))) z = 10; } printf("test finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); } int main(char argc, char *argv[]) { + printf("**__bar_start == %d\n", **((int **) __start___bar)); int ret = do_something(argc); test(); printf("got it? %d %d\n", With that "define CROSSUNIT 1", I am getting this linker error: > /usr/bin/ld: test: hidden symbol `__start___bar' in test is referenced by DSO > /usr/bin/ld: final link failed: Bad value > collect2: error: ld returned 1 exit status When I "define CROSSUNIT 0", the binary is created, with output: > **__bar_start == 42 > dummy finished 13 17 > test finished 42 10 > got it? 42 10 (**__bar_start is per expectation there) Furthermore, "section start equals address of the first item being put here" seems pretty fragile, because I can imagine more complex cases where it is hard to say which item will be "first" from the buildchain perspective. Ok, I think I have a sketch of possible solution, using sort of out-of-band-signaling (extraneous exported symbols). === dummylib.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STOP JOIN(__stop_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; extern int SECTION_STOP[]; extern int *main_section_first; extern int *main_section_last; int *main_section_first __attribute__((weak)) = NULL; int *main_section_last __attribute__((weak)) = NULL; static void dummy(void) { { static int __attribute__((section(SECTION_STR))) z = 13; } { static int __attribute__((section(SECTION_STR))) z = 17; } { static int __attribute__((section(SECTION_STR))) z = 23; } if (main_section_first == NULL || main_section_last == NULL) { main_section_first = SECTION_START; main_section_last = SECTION_STOP; } for (int *i = main_section_first; i < main_section_last; i++) { printf("dummy: %d\n", *i); } } int do_something(int n) { dummy(); return n*n; } === test.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STOP JOIN(__stop_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; extern int SECTION_STOP[]; int *main_section_first = SECTION_START; #if 1 int *main_section_last = SECTION_STOP; #endif int do_something(int n); static void test(void) { { static int __attribute__((section(SECTION_STR))) z = 42; } { static int __attribute__((section(SECTION_STR))) z = 10; } { static int __attribute__((section(SECTION_STR))) z = 1; } printf("test finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); } int main(char argc, char *argv[]) { printf("*main_section_first == %d\n", *((int *) main_section_first)); int ret = do_something(argc); test(); printf("got it? %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); return ret; } === When "#if 1": > *main_section_first == 42 > dummy: 42 > dummy: 10 > dummy: 1 > test finished 42 10 > got it? 42 10 and when changed to "#if 0": > *main_section_first == 42 > dummy: 13 > dummy: 17 > dummy: 23 > test finished 42 10 > got it? 42 10 That could achieve basic forward/backward compatibility (indeed, the reported case, unpatched libqb vs. ld.bfd @ 2.29 won't work, perhaps there could be some additional one-off self-test for libqb users compiled against newer libqb when the older one is actually in use to prevent mentioned no-logs issue?). Plus some library-specific tweaking. More importantly, such a change would also bring a compatibility with ld.gold -> portability++. That being said, I'd take any possibly further discussion regarding the solution on the libqb side out of this bug (except if you have anything immediately to add, Nick, anyway, thanks for the assistance so far), I'll only post final commit(s) link when that's ready. It's up to consideration what the risk of other SW being broken by this unexpected change is in the wild. Probably only time will show. After further contemplation, here's the actually working solution across both ld.bfd {2.28,2.29}, but not with ld.gold, which still might be solvable when -Wl,<file> is turned into proper "-T <file>": === dummylib.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STOP JOIN(__stop_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; extern int SECTION_STOP[]; // following only needed to force SECTION_{START,STOP} be global with ld 2.29 int __attribute__((weak)) SECTION_START[] = { 0 }; int __attribute__((weak)) SECTION_STOP[] = { 0 }; static void dummy(void) { { static int __attribute__((section(SECTION_STR))) z = 13; } { static int __attribute__((section(SECTION_STR))) z = 17; } printf("dummy finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_STOP - 1)); } int do_something(int n) { dummy(); return n*n; } === test.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STOP JOIN(__stop_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; extern int SECTION_STOP[]; int do_something(int n); static void test(void) { { static int __attribute__((section(SECTION_STR))) z = 42; } { static int __attribute__((section(SECTION_STR))) z = 10; } printf("test finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_STOP - 1)); } int main(char argc, char *argv[]) { int ret = do_something(argc); test(); printf("got it? %d %d\n", *((int *) SECTION_START), *((int *) SECTION_STOP - 1)); return ret; } === foo.t === SECTIONS { __foo : { __start___foo = .; *(__foo); __stop___foo = .; } } === # gcc -Wl,foo.t -shared -fPIC dummylib.c -o libdummy.so # gcc -Wl,foo.t -o test -L. -ldummy test.c # LD_LIBRARY_PATH=. ./test > dummy finished 42 10 > test finished 42 10 > got it? 42 10 Node the following readelf outputs, similar for both 2.28 and 2.29, were very similar out of the box with files and recipe in [comment 0] with 2.28: # readelf -s libdummy.so | grep foo > 7: 0000000000601030 4 OBJECT GLOBAL DEFAULT 25 __start___foo > 9: 0000000000601038 4 OBJECT GLOBAL DEFAULT 25 __stop___foo > 48: 0000000000601038 4 OBJECT GLOBAL DEFAULT 25 __stop___foo > 52: 0000000000601030 4 OBJECT GLOBAL DEFAULT 25 __start___foo # readelf -s test | grep foo > 6: 0000000000a01030 4 OBJECT GLOBAL DEFAULT 26 __start___foo > 8: 0000000000a01038 4 OBJECT GLOBAL DEFAULT 26 __stop___foo > 50: 0000000000a01038 4 OBJECT GLOBAL DEFAULT 26 __stop___foo > 56: 0000000000a01030 4 OBJECT GLOBAL DEFAULT 26 __start___foo whereas those original files + recipe with 2.29 resulted in: # readelf -s test | grep foo > 42: 0000000000601034 0 NOTYPE LOCAL DEFAULT 24 __stop___foo > 43: 000000000060102c 0 NOTYPE LOCAL DEFAULT 24 __start___foo # readelf -s libdummy.so | grep foo > 41: 0000000000201028 0 NOTYPE LOCAL DEFAULT 23 __stop___foo > 42: 0000000000201020 0 NOTYPE LOCAL DEFAULT 23 __start___foo which could, arguably, have been a regression. Now, what bothers me, libqb library users might forget this "-Wl,foo.t" equivalent easily, even if it will be provided through "pkg-config --libs (or rather --cflags?) libqb". Partial solution would be that one-off run-time self-test ([comment 7]). I guess there's no "when you are linking me to a program, apply this script snippet automatically" facility carried within the shared library directly. (In reply to Jan Pokorný from comment #8) Hi Jan, I am glad that you have found a solution. > Now, what bothers me, libqb library users might forget this "-Wl,foo.t" > equivalent easily, even if it will be provided through > "pkg-config --libs (or rather --cflags?) libqb". Partial solution would > be that one-off run-time self-test ([comment 7]). I guess there's no > "when you are linking me to a program, apply this script snippet > automatically" facility carried within the shared library directly. Actually there may be ... have a look at, for example /usr/lib64/libncurses.so which rather than being a shared library is actually a linker script fragment that uses the INPUT directive to include both the real, versioned, libncurses.so.6 library and another library. So you could rename foo.t to libgb.so, and add INPUT(libqb.so.6) at the start then everything should work automatically without the users having to change their command lines at all. (Well in theory anyway. I have not actually tested this idea. Plus of course you would have to use the correct name of the real libfo.so.xxx shared library). Cheers Nick Please excuse the typos - I was typing too fast. For "libgb.so" I of course meant "libqb.so", and "libfo.so.xxx" was meant to be "libqb.so.xxx". Cheers Nick Nick, thanks, that looks pretty cool! Will try it for sure. In the meantime, preliminary patch solves the issue in libqb, looking at x86_64 scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=21019015 My custom WIP branch: https://github.com/jnpkrn/libqb/commits/workaround-ld-2.29 To conclude, the main problem is the *changed visibility* of the boundaries denoting (start, stop) symbols for orphaned sections. Might be still worth considering to restore original behaviour. Hi Jan, (In reply to Jan Pokorný from comment #11) > To conclude, the main problem is the *changed visibility* of the > boundaries denoting (start, stop) symbols for orphaned sections. > Might be still worth considering to restore original behaviour. Thanks. I will keep that in mind. If other developers encounter similar problems then I will consider creating a patch for the linker. If it is just the libqb project however, I will probably decide to leave things as they are. :-} Cheers Nick Please note, though, this behaviour of the linker is moreover inconsistent amongst platforms (not just between versions), which may be bad on its own merit: https://koji.fedoraproject.org/koji/taskinfo?taskID=20995093 Hmm, apparently I wasn't the only to notice the change in behaviour, and the proper upstream was more open to do some reconciling (as opposed to CLOSED NOTABUG): https://sourceware.org/bugzilla/show_bug.cgi?id=21964 Anyway, while I fixed the issue in libqb for 2.29, there're new issues with 2.29.1, will share the results of my investigation. re [comment 14]: > Anyway, while I fixed the issue in libqb for 2.29, there're new > issues with 2.29.1, will share the results of my investigation. Turned to be a matter of partially reconciled change making the configure check consider no workaround is needed, while in fact it remains to be the case. So I just hardened this check to hit a boundary condition that hasn't changed between 2.29 and 2.29.1 (but has between 2.28 and 2.29+). re [comment 15]: For how I extended the check, see the 2.29.1 mentioning assertion line at reproducer from related [bug 1500898 comment 0]. Just noting down that I was pointed out the topic keep re-emerging in upstream: https://sourceware.org/ml/binutils/2018-02/msg00038.html Also related code keeps (at least in outlook) shuffling: https://sourceware.org/ml/binutils/2018-01/msg00347.html Luckily, binutils 2.30 doesn't seem to break anything based on libqb self-checks. On the bright side, libqb's (workaround) mechanism keeps working with the current master as of now, 407aa07cee - 2.30.0.20180207. (https://github.com/ClusterLabs/libqb/pull/266#issuecomment-363737275) |