All started by unexpected libqb build failure in recent mass rebuild in rawhide and x86_64 architecture, which should not have been affected by the commonly observed failure elsewhere in that round: https://koji.fedoraproject.org/koji/getfile?taskID=20791411&volume=DEFAULT&name=build.log&offset=-1824 > PASS: array.test > PASS: map.test > PASS: rb.test > FAIL: log.test > PASS: blackbox-segfault.sh > PASS: loop.test > PASS: ipc.test > PASS: resources.test Log test failed. After some investigation, I realized this is due to some linker change between 2.28 an 2.29, and even identified commits that likely introduced the change leading to the failed tests, as reverting them restores the expected outcome[1]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7dba9362c172f1073487536eb137feb2da30b0ff https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=b27685f2016c510d03ac9a64f7b04ce8efcf95c4 https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=cbd0eecf261c2447781f8c89b0d955ee66fae7e9 Indeed, libqb is using orphaned section heavily when it detects the compiler supports __attribute__((section("foo"))) construct. My understanding is that it's so as to have the static data stored in compile-time, with an alternative approach of extracting them in run-time, which is less efficient, indeed. Sadly, I was unable to come up with a better minimal case that would also fit the libqb pattern, but at least it shows something has changed: === dummylib.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; static void dummy(void) { { static int __attribute__((section(SECTION_STR))) z = 13; } { static int __attribute__((section(SECTION_STR))) z = 17; } printf("dummy finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); } int do_something(int n) { dummy(); return n*n; } === test.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; int do_something(int n); static void test(void) { { static int __attribute__((section(SECTION_STR))) z = 42; } { static int __attribute__((section(SECTION_STR))) z = 10; } printf("test finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); } int main(char argc, char *argv[]) { int ret = do_something(argc); test(); printf("got it? %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); return ret; } === # gcc -shared -fPIC dummylib.c -o libdummy.so # gcc -o test -L. -ldummy test.c # LD_LIBRARY_PATH=. ./test ld.bfd @ 2.28: > dummy finished 42 10 > test finished 42 10 > got it? 42 10 ld.bfd @ 2.29: > dummy finished 13 17 > test finished 42 10 > got it? 42 10 In libqb unit test that now fails, the problem may be similar: the library's section "wins" over binary's proper one within the in-library-call. As an aside, ld.gold was likely never applied with libqb, as that won't work regardless of binutils version; the output with the above minimal example: > ./test: Symbol `__start___foo' causes overflow in R_X86_64_PC32 relocation > ./test: Symbol `__start___foo' causes overflow in R_X86_64_PC32 relocation > ./test: Symbol `__start___foo' causes overflow in R_X86_64_PC32 relocation > ./test: Symbol `__start___foo' causes overflow in R_X86_64_PC32 relocation On the other hand, the established arrangement within libqb served well for the past 6+ years, so I'd dare to call the change in ld.bfd behaviour a regression. What's worse, any inflicted loss of logging output happens silently, making people realize there's a problem when they would need some clues as to what happens the most :-/ [1] http://oss.clusterlabs.org/pipermail/developers/2017-July/000504.html
re [comment 0]: > In libqb unit test that now fails, the problem may be similar: > the library's section "wins" over binary's proper one within the > in-library-call. Indeed, a probable root cause in libqb is that qb_log_init (in-library-call) refers to the pointers for section boundaries that are expected to refer to the context of "main compilation unit" (possibly out-of-library), but with ld.bfd @ 2.29, they refer to the library context uncondtionally: https://github.com/ClusterLabs/libqb/blob/v1.0.2/lib/log.c#L873 Trying to set various visibility attributes at the extern declaration for the section boundary symbols was fruitless: https://github.com/ClusterLabs/libqb/blob/v1.0.2/include/qb/qblog.h#L268 Is there any apparent solution (barring a complete rethinking) for libqb to stay compatible with unpatched binutils 2.29? Any idea on a shortest test possible (rather than being fixed on ld.bfd --version) for a configure script to detect misbehaving linker in this sense? The example in [comment 0] feels still too large. Can this indeed be considered a bug in ld.bfd?
Hi Jan, Relying upon the linker's orphan section placement algorithm to remain constant is a very bad idea IMHO. As an alternative, why not tell the linker exactly what you want ? For example with the small test case you reported, if you change the second gcc command to this: gcc -Wl,foo.t -o test -L. -ldummy test.c then I think that the test will work. The foo.t file looks like this: # cat foo.t SECTIONS { __foo : { PROVIDE (__start___foo = .); *(__foo); } } This just tells the linker to accumulate all __foo sections together and to define a symbol called __start___foo at the beginning of the section. Since the file is included on the command line as an input file, rather than being the argument of a -T or --script option, the linker will use the contents to *augment* the built-in linker script, rather than override it. This solution also has the advantage of being backwards compatible, so it should work the 2.28 linker as well. Does this solve the problem for you ? Cheers Nick
Nick, thanks for the response. Yes, it's entirely possible that previously libqb was relying on some behaviour that was working just ouf of pure luck (though it lasted for at least 6 recent years). And sadly, nope, the proposed update to the build recipe didn't help, furthermore I am getting this complaint: > /usr/bin/ld: warning: foo.t contains output sections; did you forget -T? Easy way out would be to extend the interface of the library function: === dummylib.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; static void dummy(const int *start) { { static int __attribute__((section(SECTION_STR))) z = 13; } { static int __attribute__((section(SECTION_STR))) z = 17; } printf("dummy finished %d %d\n", *((int *) start), *((int *) start + 1)); } int do_something(int n, const int *start) { dummy(start); return n*n; } === test.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; int do_something(int n, const int *start); static void test(void) { { static int __attribute__((section(SECTION_STR))) z = 42; } { static int __attribute__((section(SECTION_STR))) z = 10; } printf("test finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); } int main(char argc, char *argv[]) { int ret = do_something(argc, SECTION_START); test(); printf("got it? %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); return ret; } === But what if we don't want to change do_something interface at any cost? Can we somehow arrange for programmatically (dlopen + dlsym?) accessing SECTION_START of test.c from within do_something (within dummylib.c)?
(In reply to Jan Pokorný from comment #3) Hi Jan, > And sadly, nope, the proposed update to the build recipe didn't help, What happened - did the behaviour change at all ? > furthermore I am getting this complaint: > > > /usr/bin/ld: warning: foo.t contains output sections; did you forget -T? Yes, I was hoping that that would not matter to you. (You can ignore the message, it does not effect the output). I also note that when I tried your original test case I received this warning from the linker: ld: warning: type and size of dynamic symbol `__start___foo' are not defined Do you get this too ? > But what if we don't want to change do_something interface at any cost? > Can we somehow arrange for programmatically (dlopen + dlsym?) accessing > SECTION_START of test.c from within do_something (within dummylib.c)? I must admit that I still do not really understand why libqb is doing this. What is wrong with allowing the compiler and linker to do their thing and arrange for access to data in the normal way ? One idea - untested - is that in the library, instead of referring to __start___foo, you refer to the address of the first declared symbol ? (Hoping that the linker does not rearrange the order of the symbols, which of course it may well do). Cheers Nick
re [comment 4] > What happened - did the behaviour change at all ? Yep, mere combination of files/commands from [comment 0] and [comment 2] did not change the behaviour at all. >> furthermore I am getting this complaint: >> >> > /usr/bin/ld: warning: foo.t contains output sections; did you forget -T? > > Yes, I was hoping that that would not matter to you. (You can ignore the > message, it does not effect the output). It doesn't matter to me in any other way than, frankly, I don't understand low-level linker magic (and barely some user-facing options), so whatever is complained about can have consequences I cannot predict :) > I also note that when I tried > your original test case I received this warning from the linker: > > ld: warning: type and size of dynamic symbol `__start___foo' are not defined > > Do you get this too ? No, haven't seen this with binutils-2.29-5.fc27.x86_64, gcc-7.1.1-6.fc27.x86_64. > I must admit that I still do not really understand why libqb is > doing this. What is wrong with allowing the compiler and linker to > do their thing and arrange for access to data in the normal way ? Original author is no longer active (or with RH, for that matter). As noted above, I am only coincidentally getting to explore this rabbit hole. I guess there could be two reasons why qb_log_init is not using the pattern I sketched in [comment 3]: - a goal to maintain ABI/API compatibility from times there wasn't this compile-time offloading of run-time work - unified approach (single header to do the right job) used for both intra-library and actual user logging, and it was working well with ld.bfd prior to 2.29 (rhetorical question: is there only a single project affected by this change? I would doubt that) > One idea - untested - is that in the library, instead of referring to > __start___foo, you refer to the address of the first declared symbol ? > (Hoping that the linker does not rearrange the order of the symbols, > which of course it may well do). I am not sure I follow, but I tried to workaround the section aliasing issue by adding one more section that's expected to be present only in binary, not in the library: --- dummylib.c.orig 2017-08-02 18:28:53.252179427 +0200 +++ dummylib.c 2017-08-02 19:07:35.888675684 +0200 @@ -7,10 +7,18 @@ #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; +#define CROSSUNIT 1 +#if CROSSUNIT +extern int *__start___bar[]; +#endif static void dummy(void) { { static int __attribute__((section(SECTION_STR))) z = 13; } { static int __attribute__((section(SECTION_STR))) z = 17; } printf("dummy finished %d %d\n", +#if CROSSUNIT + **((int **) __start___bar), *(*((int **) __start___bar) + 1)); +#else *((int *) SECTION_START), *((int *) SECTION_START + 1)); +#endif } int do_something(int n) { dummy(); return n*n; } --- test.c.orig 2017-08-02 18:36:24.460664287 +0200 +++ test.c 2017-08-02 19:06:31.324606291 +0200 @@ -7,14 +7,17 @@ #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; +extern int *__start___bar[]; int do_something(int n); static void test(void) { - { static int __attribute__((section(SECTION_STR))) z = 42; } + { static int __attribute__((section(SECTION_STR))) z = 42; + static int __attribute__((section("__bar"))) *y = &z; } { static int __attribute__((section(SECTION_STR))) z = 10; } printf("test finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); } int main(char argc, char *argv[]) { + printf("**__bar_start == %d\n", **((int **) __start___bar)); int ret = do_something(argc); test(); printf("got it? %d %d\n", With that "define CROSSUNIT 1", I am getting this linker error: > /usr/bin/ld: test: hidden symbol `__start___bar' in test is referenced by DSO > /usr/bin/ld: final link failed: Bad value > collect2: error: ld returned 1 exit status When I "define CROSSUNIT 0", the binary is created, with output: > **__bar_start == 42 > dummy finished 13 17 > test finished 42 10 > got it? 42 10 (**__bar_start is per expectation there)
Furthermore, "section start equals address of the first item being put here" seems pretty fragile, because I can imagine more complex cases where it is hard to say which item will be "first" from the buildchain perspective.
Ok, I think I have a sketch of possible solution, using sort of out-of-band-signaling (extraneous exported symbols). === dummylib.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STOP JOIN(__stop_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; extern int SECTION_STOP[]; extern int *main_section_first; extern int *main_section_last; int *main_section_first __attribute__((weak)) = NULL; int *main_section_last __attribute__((weak)) = NULL; static void dummy(void) { { static int __attribute__((section(SECTION_STR))) z = 13; } { static int __attribute__((section(SECTION_STR))) z = 17; } { static int __attribute__((section(SECTION_STR))) z = 23; } if (main_section_first == NULL || main_section_last == NULL) { main_section_first = SECTION_START; main_section_last = SECTION_STOP; } for (int *i = main_section_first; i < main_section_last; i++) { printf("dummy: %d\n", *i); } } int do_something(int n) { dummy(); return n*n; } === test.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STOP JOIN(__stop_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; extern int SECTION_STOP[]; int *main_section_first = SECTION_START; #if 1 int *main_section_last = SECTION_STOP; #endif int do_something(int n); static void test(void) { { static int __attribute__((section(SECTION_STR))) z = 42; } { static int __attribute__((section(SECTION_STR))) z = 10; } { static int __attribute__((section(SECTION_STR))) z = 1; } printf("test finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); } int main(char argc, char *argv[]) { printf("*main_section_first == %d\n", *((int *) main_section_first)); int ret = do_something(argc); test(); printf("got it? %d %d\n", *((int *) SECTION_START), *((int *) SECTION_START + 1)); return ret; } === When "#if 1": > *main_section_first == 42 > dummy: 42 > dummy: 10 > dummy: 1 > test finished 42 10 > got it? 42 10 and when changed to "#if 0": > *main_section_first == 42 > dummy: 13 > dummy: 17 > dummy: 23 > test finished 42 10 > got it? 42 10 That could achieve basic forward/backward compatibility (indeed, the reported case, unpatched libqb vs. ld.bfd @ 2.29 won't work, perhaps there could be some additional one-off self-test for libqb users compiled against newer libqb when the older one is actually in use to prevent mentioned no-logs issue?). Plus some library-specific tweaking. More importantly, such a change would also bring a compatibility with ld.gold -> portability++. That being said, I'd take any possibly further discussion regarding the solution on the libqb side out of this bug (except if you have anything immediately to add, Nick, anyway, thanks for the assistance so far), I'll only post final commit(s) link when that's ready. It's up to consideration what the risk of other SW being broken by this unexpected change is in the wild. Probably only time will show.
After further contemplation, here's the actually working solution across both ld.bfd {2.28,2.29}, but not with ld.gold, which still might be solvable when -Wl,<file> is turned into proper "-T <file>": === dummylib.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STOP JOIN(__stop_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; extern int SECTION_STOP[]; // following only needed to force SECTION_{START,STOP} be global with ld 2.29 int __attribute__((weak)) SECTION_START[] = { 0 }; int __attribute__((weak)) SECTION_STOP[] = { 0 }; static void dummy(void) { { static int __attribute__((section(SECTION_STR))) z = 13; } { static int __attribute__((section(SECTION_STR))) z = 17; } printf("dummy finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_STOP - 1)); } int do_something(int n) { dummy(); return n*n; } === test.c === #include <stdio.h> #define JOIN_PASS(a, b) a##b #define JOIN(a, b) JOIN_PASS(a, b) #define STRINGIFY_PASS(a) #a #define STRINGIFY(a) STRINGIFY_PASS(a) #define SECTION_NAME __foo #define SECTION_START JOIN(__start_, SECTION_NAME) #define SECTION_STOP JOIN(__stop_, SECTION_NAME) #define SECTION_STR STRINGIFY(SECTION_NAME) extern int SECTION_START[]; extern int SECTION_STOP[]; int do_something(int n); static void test(void) { { static int __attribute__((section(SECTION_STR))) z = 42; } { static int __attribute__((section(SECTION_STR))) z = 10; } printf("test finished %d %d\n", *((int *) SECTION_START), *((int *) SECTION_STOP - 1)); } int main(char argc, char *argv[]) { int ret = do_something(argc); test(); printf("got it? %d %d\n", *((int *) SECTION_START), *((int *) SECTION_STOP - 1)); return ret; } === foo.t === SECTIONS { __foo : { __start___foo = .; *(__foo); __stop___foo = .; } } === # gcc -Wl,foo.t -shared -fPIC dummylib.c -o libdummy.so # gcc -Wl,foo.t -o test -L. -ldummy test.c # LD_LIBRARY_PATH=. ./test > dummy finished 42 10 > test finished 42 10 > got it? 42 10 Node the following readelf outputs, similar for both 2.28 and 2.29, were very similar out of the box with files and recipe in [comment 0] with 2.28: # readelf -s libdummy.so | grep foo > 7: 0000000000601030 4 OBJECT GLOBAL DEFAULT 25 __start___foo > 9: 0000000000601038 4 OBJECT GLOBAL DEFAULT 25 __stop___foo > 48: 0000000000601038 4 OBJECT GLOBAL DEFAULT 25 __stop___foo > 52: 0000000000601030 4 OBJECT GLOBAL DEFAULT 25 __start___foo # readelf -s test | grep foo > 6: 0000000000a01030 4 OBJECT GLOBAL DEFAULT 26 __start___foo > 8: 0000000000a01038 4 OBJECT GLOBAL DEFAULT 26 __stop___foo > 50: 0000000000a01038 4 OBJECT GLOBAL DEFAULT 26 __stop___foo > 56: 0000000000a01030 4 OBJECT GLOBAL DEFAULT 26 __start___foo whereas those original files + recipe with 2.29 resulted in: # readelf -s test | grep foo > 42: 0000000000601034 0 NOTYPE LOCAL DEFAULT 24 __stop___foo > 43: 000000000060102c 0 NOTYPE LOCAL DEFAULT 24 __start___foo # readelf -s libdummy.so | grep foo > 41: 0000000000201028 0 NOTYPE LOCAL DEFAULT 23 __stop___foo > 42: 0000000000201020 0 NOTYPE LOCAL DEFAULT 23 __start___foo which could, arguably, have been a regression. Now, what bothers me, libqb library users might forget this "-Wl,foo.t" equivalent easily, even if it will be provided through "pkg-config --libs (or rather --cflags?) libqb". Partial solution would be that one-off run-time self-test ([comment 7]). I guess there's no "when you are linking me to a program, apply this script snippet automatically" facility carried within the shared library directly.
(In reply to Jan Pokorný from comment #8) Hi Jan, I am glad that you have found a solution. > Now, what bothers me, libqb library users might forget this "-Wl,foo.t" > equivalent easily, even if it will be provided through > "pkg-config --libs (or rather --cflags?) libqb". Partial solution would > be that one-off run-time self-test ([comment 7]). I guess there's no > "when you are linking me to a program, apply this script snippet > automatically" facility carried within the shared library directly. Actually there may be ... have a look at, for example /usr/lib64/libncurses.so which rather than being a shared library is actually a linker script fragment that uses the INPUT directive to include both the real, versioned, libncurses.so.6 library and another library. So you could rename foo.t to libgb.so, and add INPUT(libqb.so.6) at the start then everything should work automatically without the users having to change their command lines at all. (Well in theory anyway. I have not actually tested this idea. Plus of course you would have to use the correct name of the real libfo.so.xxx shared library). Cheers Nick
Please excuse the typos - I was typing too fast. For "libgb.so" I of course meant "libqb.so", and "libfo.so.xxx" was meant to be "libqb.so.xxx". Cheers Nick
Nick, thanks, that looks pretty cool! Will try it for sure. In the meantime, preliminary patch solves the issue in libqb, looking at x86_64 scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=21019015 My custom WIP branch: https://github.com/jnpkrn/libqb/commits/workaround-ld-2.29 To conclude, the main problem is the *changed visibility* of the boundaries denoting (start, stop) symbols for orphaned sections. Might be still worth considering to restore original behaviour.
Hi Jan, (In reply to Jan Pokorný from comment #11) > To conclude, the main problem is the *changed visibility* of the > boundaries denoting (start, stop) symbols for orphaned sections. > Might be still worth considering to restore original behaviour. Thanks. I will keep that in mind. If other developers encounter similar problems then I will consider creating a patch for the linker. If it is just the libqb project however, I will probably decide to leave things as they are. :-} Cheers Nick
Please note, though, this behaviour of the linker is moreover inconsistent amongst platforms (not just between versions), which may be bad on its own merit: https://koji.fedoraproject.org/koji/taskinfo?taskID=20995093
Hmm, apparently I wasn't the only to notice the change in behaviour, and the proper upstream was more open to do some reconciling (as opposed to CLOSED NOTABUG): https://sourceware.org/bugzilla/show_bug.cgi?id=21964 Anyway, while I fixed the issue in libqb for 2.29, there're new issues with 2.29.1, will share the results of my investigation.
re [comment 14]: > Anyway, while I fixed the issue in libqb for 2.29, there're new > issues with 2.29.1, will share the results of my investigation. Turned to be a matter of partially reconciled change making the configure check consider no workaround is needed, while in fact it remains to be the case. So I just hardened this check to hit a boundary condition that hasn't changed between 2.29 and 2.29.1 (but has between 2.28 and 2.29+).
re [comment 15]: For how I extended the check, see the 2.29.1 mentioning assertion line at reproducer from related [bug 1500898 comment 0].
Just noting down that I was pointed out the topic keep re-emerging in upstream: https://sourceware.org/ml/binutils/2018-02/msg00038.html Also related code keeps (at least in outlook) shuffling: https://sourceware.org/ml/binutils/2018-01/msg00347.html Luckily, binutils 2.30 doesn't seem to break anything based on libqb self-checks.
On the bright side, libqb's (workaround) mechanism keeps working with the current master as of now, 407aa07cee - 2.30.0.20180207. (https://github.com/ClusterLabs/libqb/pull/266#issuecomment-363737275)