Description of problem: Referring to the koschei monitoring of the HepMC3 package build in Fedora: https://koschei.fedoraproject.org/package/HepMC3?collection=f34 https://koschei.fedoraproject.org/package/HepMC3?collection=f33 The package has started failing to build on Fedora 33 and 34. For both F33 and F34 the first failing build list glibc as a package that was updated since the previous (successful) attempt. From 2.33-5.fc34 to 2.33-8.fc34 and from 2.32-4.fc33 to 2.32-6.fc33 respectively. None of the other listed updated dependencies looks relevant. The line of code that triggers the errors is line 107 in HepMC3-3.2.3/src/WriterAscii.cc: ``` m_cursor += sprintf(m_cursor, "U %s %s\n", Units::name(evt.momentum_unit()).c_str(), Units::name(evt.length_unit()).c_str()); ``` I do not see any obviously wrong with this line. I made some test and replaced the above line with: ``` std::cerr << Units::name(evt.momentum_unit()).c_str() << std::endl; std::cerr << Units::name(evt.length_unit()).c_str() << std::endl; m_cursor += sprintf(m_cursor, "U %s %s\n", "GEV", "MM"); ``` which print outs the two strings (thereby accessing the same memory) and replacing the strings with hardcoded values in the sprintf call. This does not trigger any errors from valgrind. Just to test that the use of temporaries in the call is not the cause I also tried: ``` std::string mu = Units::name(evt.momentum_unit()); std::string lu = Units::name(evt.length_unit()); m_cursor += sprintf(m_cursor, "U %s %s\n", mu.c_str(), lu.c_str()); ``` This triggers the same errors from valgrind as the original line. I can't tell what is the cause of the issue here. Is it a problem in the updated glibc or its headers, is it a problem with g++ that was not seen before the glibc update or is it a false positive from valgrind or is there something wrong with the code I do not see. Version-Release number of selected component (if applicable): F33: gcc-c++-10.3.1-1.fc33.aarch64 glibc-devel-2.32-6.fc33.aarch64 valgrind-1:3.16.1-18.fc33.aarch64 F34: gcc-c++-11.1.1-1.fc34.aarch64 glibc-devel-2.33-8.fc34.aarch64 valgrind-1:3.17.0-1.fc34.aarch64 F32 (not affected): gcc-c++-10.3.1-1.fc32.aarch64 glibc-devel-2.31-6.fc32.aarch64 valgrind-1:3.16.1-18.fc32.aarch64 F35/rawhide (also not affected): gcc-c++-11.1.1-2.fc35.aarch64 glibc-devel-2.33.9000-2.fc35.aarch64 valgrind-1:3.17.0-3.fc35.aarch64 How reproducible: Always on Fedora 33 and Fedora 34. Steps to Reproduce: 1. Build HepMC3 package on Fedora 33 or 34 for aarch64 Actual results: Failure Expected results: Success Additional info: I tried to download old glibc packages from koji: glibc-2.33-5.fc34.aarch64.rpm glibc-common-2.33-5.fc34.aarch64.rpm glibc-devel-2.33-5.fc34.aarch64.rpm glibc-langpack-en-2.33-5.fc34.aarch64.rpm Installing these in a mock chroot on aarch64-test01.fedorainfracloud.org the HemMC3 package is still buildable. Please reassing to a different component if you think another component is more appropriate.
It seems that valgrind had suppressions that relied on glibc debugging information: ==20030== Conditional jump or move depends on uninitialised value(s) ==20030== at 0x4C195EC: ??? (in /usr/lib64/libc-2.33.so) ==20030== by 0x4BED35F: ??? (in /usr/lib64/libc-2.33.so) ==20030== by 0x4BF753B: ??? (in /usr/lib64/libc-2.33.so) ==20030== by 0x4C7506B: __sprintf_chk (in /usr/lib64/libc-2.33.so) ==20030== by 0x48CF4F7: HepMC3::WriterAscii::write_event(HepMC3::GenEvent const&) (stdio2.h:38) ==20030== by 0x10A50B: main (testIO6.cc:27) After removal of that debugging information, these suppressions no longer work. Perhaps there is a fix for this without bringing back all the debugging information.
I've just tried to rebuild valgrind-3.17.0-3.fc35.src.rpm for aarch64 on f33 and f34 and see -- Running tests in gdbserver_tests ----------------------------------- hginfo: valgrind --tool=helgrind --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-hginfo -q ./../helgrind/tests/hg01_all_ok (progB: ./gdb --quiet -l 60 --nx ../helgrind/tests/hg01_all_ok) *** hginfo failed (stderr) *** *** hginfo failed (stdoutB) *** *** hginfo failed (stderrB) *** hgtls: valgrind --tool=helgrind --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-hgtls -q ./../none/tests/tls (progB: ./gdb --quiet -l 60 --nx ../none/tests/tls) *** hgtls failed (stdoutB) *** mcblocklistsearch: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcblocklistsearch -q ./../memcheck/tests/leak-tree (progB: ./gdb --quiet -l 60 --nx 1>&2 ../memcheck/tests/leak-tree) mcbreak: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcbreak ./t (progB: ./gdb --quiet -l 60 --nx ./t) sh: line 1: 14371 Aborted (core dumped) ./gdb --quiet -l 60 --nx ./t < mcbreak.stdinB.gdb > mcbreak.stdoutB.out 2> mcbreak.stderrB.out Best regards, Andrii
Is this an arm64 only issue? Or are you seeing this on other architectures too?
I've tried for arm64 only. The f33/f34 builds failed, but f32 and rawhide were ok. Just restarted the builds for other architectures.
The rebuild fails on multiple architectures. It succeeds only for all *86 and for aarch64@rawhide/f32.
glibc-2.33.9000-16.fc35 may improve this situation: https://koji.fedoraproject.org/koji/buildinfo?buildID=1772153 It is available in the f35-build-side-42221 side tag. I may not have guessed the relevant symbols correctly, but we can extend the symbol set easily enough. If rawhide is confirmed fixed by this, I will backport to Fedora 34 and 33.
The HepMC3 build in koschei for f35 (rawhide) failed with glibc-2.33.9000-18.fc35.aarch64 https://koschei.fedoraproject.org/package/HepMC3?collection=f35
(In reply to Mattias Ellert from comment #7) > The HepMC3 build in koschei for f35 (rawhide) failed with > glibc-2.33.9000-18.fc35.aarch64 > https://koschei.fedoraproject.org/package/HepMC3?collection=f35 The valgrind errors suggest that the required all symbols are present, so maybe the issue wasn't the missing symbols after all. Hmm. Unfortunately I do not have the resources to analyze this further at this time, sorry.
I had to revert the addition of the __str* symbols due to bug 1973026. I spent some time figuring out why __strlen_mte is apparently selected: ==2874915== Conditional jump or move depends on uninitialised value(s) ==2874915== at 0x4C29AEC: __strlen_mte (in /usr/lib64/libc.so.6) ==2874915== by 0x4BF008F: __vfprintf_internal (in /usr/lib64/libc.so.6) ==2874915== by 0x4BFA0FB: ??? (in /usr/lib64/libc.so.6) ==2874915== by 0x4C84E9B: __sprintf_chk (in /usr/lib64/libc.so.6) ==2874915== by 0x48CF037: HepMC3::WriterAscii::write_event(HepMC3::GenEvent const&) (stdio2.h:38) ==2874915== by 0x10D71F: main (testBoost.cc:132) This may be a binutils bug. It looks like the IFUNC is bypassed and the weak definition of __strlen_mte is used to bind strlen locally. For example: Dump of assembler code for function __strdup: 0x0000000000093e90 <+0>: paciasp 0x0000000000093e94 <+4>: stp x29, x30, [sp, #-32]! 0x0000000000093e98 <+8>: mov x29, sp 0x0000000000093e9c <+12>: stp x19, x20, [sp, #16] 0x0000000000093ea0 <+16>: mov x20, x0 0x0000000000093ea4 <+20>: bl 0x9dac0 <__strlen_mte> 0x0000000000093ea8 <+24>: add x19, x0, #0x1 0x0000000000093eac <+28>: mov x0, x19 0x0000000000093eb0 <+32>: bl 0x27e50 <malloc@plt> 0x0000000000093eb4 <+36>: cbz x0, 0x93ed0 <__strdup+64> 0x0000000000093eb8 <+40>: mov x2, x19 0x0000000000093ebc <+44>: mov x1, x20 0x0000000000093ec0 <+48>: ldp x19, x20, [sp, #16] 0x0000000000093ec4 <+52>: ldp x29, x30, [sp], #32 0x0000000000093ec8 <+56>: autiasp 0x0000000000093ecc <+60>: b 0x9b780 <__memcpy_generic> 0x0000000000093ed0 <+64>: ldp x19, x20, [sp, #16] 0x0000000000093ed4 <+68>: ldp x29, x30, [sp], #32 0x0000000000093ed8 <+72>: autiasp 0x0000000000093edc <+76>: ret End of assembler dump. I really don't want to bring back the full symbol table for libc.so.6 because it is so large. Maybe __strlen_asimd can be emulated better by valgrind, but it is significantly more complex.
__strlen_mte is used deliberately: /* It doesn't make sense to send libc-internal strlen calls through a PLT. */ .globl __GI_strlen; __GI_strlen = __strlen_mte So no binutils bug.
These are the intercepted symbols from valgrind shared/vg_replace_strmem.c: bcmp bcopy __GI_memchr __GI_memcmp __GI_memcpy __GI_memmove __GI_mempcpy __GI___rawmemchr __GI_stpcpy __GI_strcasecmp __GI___strcasecmp_l __GI_strcasecmp_l __GI_strcat __GI_strchr __GI_strcmp __GI_strcpy __GI_strcspn __GI_strlen __GI_strncasecmp __GI___strncasecmp_l __GI_strncasecmp_l __GI_strncmp __GI_strncpy __GI_strnlen __GI_strrchr __GI_wcsnlen __GI_wmemchr index memchr memcmp __memcmp_sse2 __memcmp_sse4_1 memcpy __memcpy_avx_unaligned_erms __memcpy_chk __memcpy_sse2 memcpyZAGLIBCZu2Zd2Zd5 memcpyZAZAGLIBCZu2Zd14 memcpyZDVARIANTZDsse3x memcpyZDVARIANTZDsse42 memcpyZPZa memmove __memmove_chk memmoveZDVARIANTZDsse3x memmoveZDVARIANTZDsse42 memmoveZPZa mempcpy memrchr memset memsetZPZa putenv rawmemchr rindex setenv stpcpy __stpcpy_chk __stpcpy_sse2 __stpcpy_sse2_unaligned stpncpy strcasecmp strcasecmp_l strcasestr strcat strchr strchrnul __strchr_sse2 __strchr_sse2_no_bsf strcmp __strcmp_sse2 __strcmp_sse42 strcpy __strcpy_chk strcspn strlcat strlcpy strlen __strlen_sse2 __strlen_sse2_no_bsf __strlen_sse42 strncasecmp strncasecmp_l strncat strncmp __strncmp_sse2 __strncmp_sse42 strncpy __strncpy_sse2 __strncpy_sse2_unaligned strnlen strpbrk strrchr __strrchr_sse2 __strrchr_sse2_no_bsf __strrchr_sse42 strspn strstr __strstr_sse2 __strstr_sse42 unsetenv wcschr wcscmp wcscpy wcslen wcsncmp wcsnlen wcsrchr wmemchr Note that some of these might be ancient, non-existing or unneeded these days. The intercepts are there for functions that are so optimized that valgrind/memcheck cannot proof them correct (at the time).
Thanks. I will not include the public symbols, only the __ variants, assuming that valgrind can lift them from .dynsym.
HepMC3 scratch build in f35-build-side-42819 with glibc-2.33.9000-21.fc35: https://koji.fedoraproject.org/koji/taskinfo?taskID=70305267 Succeeds on all architectures, including aarch64 that used to fail.
Thank you all!
Another HepMC3 scratch build against glibc-2.33.9000-22.fc35: https://koji.fedoraproject.org/koji/taskinfo?taskID=70343510 I'm going to treat this as a glibc bug (and not valgrind improperly depending on glibc internals). I plan to backport the changes to Fedora 34 and 33, which also received the glibc debuginfo updates.
FEDORA-2021-483200b24b has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-483200b24b
FEDORA-2021-483200b24b has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-483200b24b` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-483200b24b See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-483200b24b has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report.
HepMC3 build in koschei for Fedora 34 now OK: https://koschei.fedoraproject.org/package/HepMC3?collection=f34 Only Fedora 33 is still broken.
FEDORA-2021-f29b4643c7 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-f29b4643c7
FEDORA-2021-f29b4643c7 has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-f29b4643c7` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-f29b4643c7 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
If there are needed more symbols they need to be added to .dynsym, not to an incomplete .symtab: Bug 1975895 Comment 2
(In reply to Jan Kratochvil from comment #22) > If there are needed more symbols they need to be added to .dynsym, not to an > incomplete .symtab: Bug 1975895 Comment 2 Many of these .symtab symbols are hidden, so they cannot be added to .dynsym. Ideally, tools would be fixed not to assume that .symtab is a superset of .dynsym.
(In reply to Florian Weimer from comment #23) > Many of these .symtab symbols are hidden, so they cannot be added to > .dynsym. Ideally, tools would be fixed not to assume that .symtab is a > superset of .dynsym. Why the STB_LOCAL symbols cannot be in .dynsym? Maybe some tool does not support that and it could be fixed? Isn't it another option instead of the .symtab+.dynsym merging? Wouldn't be the STB_LOCAL in .dynsym more compatible across consumers? Or I suspect it would break existing ld.so (I have not checked)?
FEDORA-2021-f29b4643c7 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
Reproducer (build with -O2): #define _FORTIFY_SOURCE 2 #include <string> #include <stdio.h> void __attribute__((weak)) test(const char *a, const char *b) { std::string mu = a; std::string lu = b; char buf[256]; sprintf(buf, "U %s %s", mu.c_str(), lu.c_str()); puts(buf); } int main() { test("GEV", "MM"); }
(In reply to Florian Weimer from comment #26) > Reproducer (build with -O2): > > #define _FORTIFY_SOURCE 2 > #include <string> > #include <stdio.h> > > void __attribute__((weak)) > test(const char *a, const char *b) > { > std::string mu = a; > std::string lu = b; > char buf[256]; > sprintf(buf, "U %s %s", mu.c_str(), lu.c_str()); > puts(buf); > } > > int > main() > { > test("GEV", "MM"); > } I cannot reproduce this on either fedora 34 or rawhide on x86_64. In which situations does this cause valgrind memcheck errors? P.S. Note that this bug is already closed.
(In reply to Mark Wielaard from comment #27) > (In reply to Florian Weimer from comment #26) > > Reproducer (build with -O2): > > > > #define _FORTIFY_SOURCE 2 > > #include <string> > > #include <stdio.h> > > > > void __attribute__((weak)) > > test(const char *a, const char *b) > > { > > std::string mu = a; > > std::string lu = b; > > char buf[256]; > > sprintf(buf, "U %s %s", mu.c_str(), lu.c_str()); > > puts(buf); > > } > > > > int > > main() > > { > > test("GEV", "MM"); > > } > > I cannot reproduce this on either fedora 34 or rawhide on x86_64. > In which situations does this cause valgrind memcheck errors? The bug only occurred on aarch64. I made some changes to .symtab in Fedora 34, so I wanted to re-run the reproducer. I verified that it fails with glibc-2.33-12.fc34 (if I recall correctly; not all the relevant builds are still in Koji, but that one is still there). The reproducer passes on aarch64 on current rawhide and Fedora 34, and Fedora 34 with my upcoming changes in glibc-2.33-19.fc34 (essentially backporting the .symtab construction to Fedora 34). So there is nothing to do here really. I just kept rewriting that reproducer from scratch over and over again, so I wanted to have something that I can cut-and-paste easily. Sorry for any confusion caused.