Bug 2216662 - Misaligned symbol for s390-ccw image during qemu-kvm build
Summary: Misaligned symbol for s390-ccw image during qemu-kvm build
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 39
Hardware: s390x
OS: Linux
high
high
Target Milestone: ---
Assignee: Camilla Conte
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-22 06:58 UTC by Miroslav Rezanina
Modified: 2023-09-19 17:12 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2220866 (view as bug list)
Environment:
Last Closed: 2023-09-19 17:12:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 202826 0 None None None 2023-06-22 10:53:44 UTC

Description Miroslav Rezanina 2023-06-22 06:58:51 UTC
When trying to build qemu-kvm package for ELN, build for s390x is failing with following error:

/usr/bin/ld: warning: start.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
/usr/bin/ld: start.o(.text+0x8): misaligned symbol `__bss_start' (0xc1e5) for relocation R_390_PC32DBL
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [Makefile:61: s390-ccw.elf] Error 1

This problem is happening with rawhide version (binutils-2.40-9.fc39, clang-16.0.5-3.fc39). When using F38 version (binutils-2.39-9.fc39, clang-16.0.5-1.fc38), build is successful. In addition, build is successful using gcc too (both rawhide and f38).

Reproducible: Always

Steps to Reproduce:
1. Prepare srpm from CentOS 9 stream qemu-kvm (https://gitlab.com/redhat/centos-stream/rpms/qemu-kvm)

2. Try to build this srpm
Actual Results:  
Build is failing on s930-ccw rom image build.

Expected Results:  
Build pass

Comment 1 Thomas Huth 2023-06-22 09:45:39 UTC
Instructions for reproducing the problem manually:

 dnf install gcc pixman-devel glib2-devel meson ninja-build git \
             libaio-devel libcap-ng-devel libiscsi-devel capstone-devel \
             libseccomp-devel nettle-devel libattr-devel libjpeg-devel \
             libpng-devel libuuid-devel pulseaudio-libs-devel curl-devel \
             libssh-devel systemtap-sdt-devel
 rpm -qa | grep -E "clang|binutils"
 # clang-resource-filesystem-16.0.5-3.fc39.s390x
 # clang-libs-16.0.5-3.fc39.s390x
 # binutils-gold-2.40-9.fc39.s390x
 # binutils-2.40-9.fc39.s390x
 # clang-16.0.5-3.fc39.s390x
 git clone https://gitlab.com/qemu-project/qemu.git
 cd qemu
 ./configure --target-list=s390x-softmmu --cc=clang
 make pc-bios/s390-ccw/all

Results in:

pc-bios/s390-ccw: Linking s390-ccw.elf
/usr/bin/ld: warning: start.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
/usr/bin/ld: start.o(.text+0x8): misaligned symbol `__bss_start' (0xc1e5) for relocation R_390_PC32DBL
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [Makefile:63: s390-ccw.elf] Error 1
make[1]: *** [Makefile:181: pc-bios/s390-ccw/all] Error 2
make[1]: Leaving directory '/root/qemu/build'
make: *** [GNUmakefile:11: pc-bios/s390-ccw/all] Error 2

Comment 2 Thomas Huth 2023-06-22 10:05:26 UTC
FWIW, the problem goes away when I downgrade to binutils-2.39-9.eln125 ... but occurs again as soon as I upgrade to binutils-2.40-1.eln126 ... so it seems like binutils 2.40 introduced this problem.

Comment 3 Thomas Huth 2023-06-22 10:30:12 UTC
Kernel folks seem to face a similar issue with binutils 2.40 + Clang : https://github.com/ClangBuiltLinux/linux/issues/1747#issuecomment-1601675956

Comment 4 Nick Clifton 2023-06-22 12:43:55 UTC
Hi Thomas,

  I don't suppose you are able to create a small self-contained test case that reproduces this problem ?  Preferably something assembler-only so I do not have to rely upon specific versions of the compiler.

  The reason for the message is that the 2.40 linker is detecting and reporting the misaligned access to the __bss_start symbol, whereas the 2.39 linker does not.  The underlying problem of the misaligned access has probably been around for a lot longer.  The message was introduced with commit 906f69cf65da :

  https://sourceware.org/pipermail/binutils-cvs/2022-October/060150.html

  The cause of the misalignment, I assume, is that the built-in linker script does not enforce 2-byte alignment for the .bss section, *and* that the .data section is ending on an odd byte, thus making the .bss section start on an odd byte.
(As an aside you may find that adding a single byte of initialised data to the qemu sources might fix this problem, at least for now).

  There is a small possibility that this might be down to the s390-ccw.elf sources themselves.  If all of the uninitialised data only needs 1-byte alignment, then the linker is not really doing anything wrong in starting the .bss section on an odd byte boundary.  The bug then would using an instruction like LARL to attempt to load a symbol that is not guaranteed to be on an even byte boundary.

  If changing the qemu sources is not an option, then fixing the linker script should work.  Currently it is possible for individual targets to define extra symbols to be provided at the same time and place as the __bss_start symbol.  So the s390x port of the linker could define a __bss_start_aligned symbol for example, which would guarantee its alignment.  (Of course the code in start.o that references __bss_start would need to be changed to reference __bss_start_aligned).

Cheers
  Nick

PS.  I think that the kernel issues you mentioned are actually due to compiler problems rather than the linker script.  As in the compilers are generating misaligned code.

Comment 5 IBM Bug Proxy 2023-06-22 12:45:11 UTC
------- Comment From cborntra.com 2023-06-22 08:41 EDT-------
bss_start is clearly not 2 byte aligned:
misaligned symbol `__bss_start' (0xc1e5)

and we do use that in
pc-bios/s390-ccw/start.S:       /* clear bss */
pc-bios/s390-ccw/start.S:       larl %r2, __bss_start

Now I do not understand why llvm does not align .bss, we do not have a linker script as far as I can tell

https://gitlab.com/qemu-project/qemu/-/blob/master/pc-bios/s390-ccw/Makefile

Comment 6 Nick Clifton 2023-06-22 12:53:01 UTC
(In reply to IBM Bug Proxy from comment #5)
> ------- Comment From cborntra.com 2023-06-22 08:41 EDT-------
> Now I do not understand why llvm does not align .bss, we do not have a
> linker script as far as I can tell

If no linker script is provided on the command line, then the linker will use its own built-in script.  You can actually see this script if you run "ld --verbose".

The point is that if none of the uninitialised data needs 2-byte alignment (or higher) then the linker is free to start the .bss section on an unaligned boundary.  In this case using LARL to access the start of the .bss section is wrong.  The other possibility is that the .bss section *is* 2-byte aligned, but the __bss_start symbol is not being defined at the start of the section, but rather a byte before the start.  This would be bad, and indicates that the built-in linker script is not defining the __bss_start symbol properly.

Comment 7 Thomas Huth 2023-06-22 19:09:04 UTC
Thanks for the hint with "ld --verbose"! To my surprise, it indeed seems like the BSS is not marked as aligned here:

SECTIONS
{
  [...]
  .data           :
  {
    *(.data .data.* .gnu.linkonce.d.*)
    SORT(CONSTRUCTORS)
  }
  .data1          : { *(.data1) }
  _edata = .; PROVIDE (edata = .);
  . = .;
  __bss_start = .;
  .bss            :
  {
   *(.dynbss)
   *(.bss .bss.* .gnu.linkonce.b.*)
   *(COMMON)
   /* Align here to ensure that the .bss section occupies space up to
      _end.  Align after .bss to ensure correct alignment even if the
      .bss section disappears because there are no input sections.
      FIXME: Why do we need it? When there is no .bss section, we do not
      pad the .data section.  */
   . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  . = ALIGN(64 / 8);
  . = SEGMENT_START("ldata-segment", .);
  . = ALIGN(64 / 8);
  _end = .; PROVIDE (end = .);
  . = DATA_SEGMENT_END (.);
  [...]
}

So this seems to be a bug in the startup code of the s390-ccw bios, indeed. I think we have to rework the code there to avoid the LARL with the odd address.

Comment 8 IBM Bug Proxy 2023-06-22 19:50:19 UTC
------- Comment From Andreas.Krebbel.com 2023-06-22 15:44 EDT-------
The alignment of .bss will be applied by ld based on the biggest alignment of the objects inside. The weird thing is that the __bss_start symbol is defined outside the section. That way the alignment applied by ld does not move that symbol with it.

[12] .data             PROGBITS        000000000000c000 00b000 0001e5 00  WA  0   0  8
[13] .bss              NOBITS          000000000000e000 00c000 034000 00  WA  0   0 8192

142: 000000000000c1e5     0 NOTYPE  GLOBAL DEFAULT   13 __bss_start

Although .bss ends up at 0xe000 __bss_start still points at .data + sizeof(.data)

The most obvious solution would be to move __bss_start into the .bss definition in the default linker script. I've checked that the symbol will be present even if no .bss is among the input sections. In that case ld appears to create an empty .bss just for the symbol. I don't know if there are any other implications with that, so I'll ask on the mailing list.

This only appears to get triggered when building with clang. GAS apparently pads the .data section always to multiple of 8 bytes.

I've verified that with the following change qemu builds fine.

...
...

Comment 9 Thomas Staudt 2023-06-23 06:43:37 UTC
The changes were not mirrored correctly.
Adding them to the Red Hat bug.

...
.data1          : { *(.data1) }
_edata = .; PROVIDE (edata = .);
. = .;
.bss            :
{
__bss_start = .;
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
...

Comment 10 Thomas Huth 2023-06-23 07:49:24 UTC
Thanks for trying to tackle this on the toolchain side, Andreas! ... I thought it might be possible to work-around it in the startup code of the bios, too, but I only hit other problems while trying to do this (see https://bugzilla.redhat.com/show_bug.cgi?id=2216906 for example), so a fix in the toolchain would be really appreciated!

Comment 11 Nick Clifton 2023-06-23 09:45:33 UTC
(In reply to IBM Bug Proxy from comment #8)
> ------- Comment From Andreas.Krebbel.com 2023-06-22 15:44 EDT-------
Hi Andreas,

> The weird thing is that the __bss_start symbol is defined outside the section. 

Right.  This is done because on some targets there are other bss sections and
these appear before the .bss section.  (Sections like .sbbs and .bss.plt).  So
the __bss_start symbol is defined at the beginning of the region of memory where
the bss sections will be placed, but outside of any specific single section.

Note - I know that the s390x target does not use the .sbss or .bss.plt sections,
but the internal linker script is built from a template that attempts to support
all ELF based targets, so it tends to be quite generic.


> The most obvious solution would be to move __bss_start into the .bss
> definition in the default linker script. 

This would work.  It would mean creating an internal linker script specific
to the s390x architecture, rather than using the generic template.  The only
problem with this is that if enhancements are made to the generic template
in the future they will not be copied over to the s390x script unless someone
notices them.

The alternative is to modify the generic template and add a new symbol defined
inside the .bss section.  It would have to have a different name from __bss_start,
which would mean that qemu's start.S file would have to be updated as well.  But
it would also mean less change to the linker.

Any preferences ?

Ideally this problem should be filed upstream with the GNU Binutils project,
since it is not specific to Fedora/s390x.  If it was filed with a proposed
patch attached that would make it even nicer....

Comment 12 Thomas Huth 2023-06-23 10:17:08 UTC
(In reply to Nick Clifton from comment #11)
> (In reply to IBM Bug Proxy from comment #8)
> > ------- Comment From Andreas.Krebbel.com 2023-06-22 15:44 EDT-------
...
> > The most obvious solution would be to move __bss_start into the .bss
> > definition in the default linker script. 
> 
> This would work.  It would mean creating an internal linker script specific
> to the s390x architecture, rather than using the generic template.  The only
> problem with this is that if enhancements are made to the generic template
> in the future they will not be copied over to the s390x script unless someone
> notices them.

That sounds not really desirable, I think.

> The alternative is to modify the generic template and add a new symbol
> defined
> inside the .bss section.  It would have to have a different name from
> __bss_start,
> which would mean that qemu's start.S file would have to be updated as well. 
> But
> it would also mean less change to the linker.

That's ugly, too, since we would only be able to compile with newer versions of the linker in that case (without doing ugly things like compile-time detection of the correct symbol name).

Wouldn't it be feasible to simply put a ". = ALIGN(2);" in front of the definition of the __bss_start symbol ?

Comment 13 Nick Clifton 2023-06-23 12:51:51 UTC
(In reply to Thomas Huth from comment #12)
 
> Wouldn't it be feasible to simply put a ". = ALIGN(2);" in front of the
> definition of the __bss_start symbol ?

Yes, but ... 
  1. It would still not guarantee that the __bss_start symbol was set the
     the start address of the .bss section.  If the .bss section contains
     an object that needs 8-byte alignment for example, then the .bss
     section would gain 8-byte alignment, and so _bss_start might still
     be incorrect.

  2. Some people are bound to complain about the unnecessary loss
     of a potential byte of memory space.

  3. If we are going to align the symbol, then the alignment should be 
     suitable for the architecture.  So for the s390x 2-byte alignment 
     might be enough, but many architectures prefer 4-byte alignment
     for optimal memory access.

As an alternative, could the start.S script use the .bss *section* symbol
instead of __bss_start ?  This should always have the start address of 
the .bss section, assuming that it has not been stripped from the binary.

Comment 14 Thomas Huth 2023-06-23 13:30:59 UTC
(In reply to Nick Clifton from comment #13)
> As an alternative, could the start.S script use the .bss *section* symbol
> instead of __bss_start ?  This should always have the start address of 
> the .bss section, assuming that it has not been stripped from the binary.

You mean simply replacing "__bss_start" by ".bss" there? ... that does not seem to work, I'm getting this error message in that case:

<unknown>:0: error: Undefined section reference: .bss

Comment 15 Nick Clifton 2023-06-23 13:40:55 UTC
(In reply to Thomas Huth from comment #14)
> (In reply to Nick Clifton from comment #13)
> > As an alternative, could the start.S script use the .bss *section* symbol
> > instead of __bss_start ?  This should always have the start address of 
> > the .bss section, assuming that it has not been stripped from the binary.
> 
> You mean simply replacing "__bss_start" by ".bss" there? 

Yes.

> ... that does not
> seem to work, I'm getting this error message in that case:
> 
> <unknown>:0: error: Undefined section reference: .bss

That is a very strange error.  It seems to imply that the .bss section does not exist.
Hmm - this is from Clang yes ?  (Or rather clang's built-in assembler, right ?).
Possibly you need to add a reference to the .bss section in the start.S file ?

Comment 16 Thomas Huth 2023-06-23 16:45:55 UTC
Yes, this is from Clang with its built-in assembler. And yes! It seems to work if I add a dummy reference to the .bss section in the file:

diff --git a/pc-bios/s390-ccw/start.S b/pc-bios/s390-ccw/start.S
--- a/pc-bios/s390-ccw/start.S
+++ b/pc-bios/s390-ccw/start.S
@@ -10,13 +10,16 @@
  * directory.
  */
 
+.bss
+.text
+
         .globl _start
 _start:
 
        larl   %r15, stack + 0x8000     /* Set up stack */
 
        /* clear bss */
-       larl %r2, __bss_start
+       larl %r2, .bss
        larl %r3, _end
        slgr %r3, %r2           /* get sizeof bss */
        ltgr    %r3,%r3         /* bss empty? */

I'll give it a closer test next week, but if there is no regression, then this could maybe be a good workaround for the problem. Thank you very much!

Comment 17 IBM Bug Proxy 2023-06-23 17:10:36 UTC
------- Comment From Ulrich.Weigand.com 2023-06-23 13:06 EDT-------
>Yes, this is from Clang with its built-in assembler.

Ah, I've noticed that Clang will completely omit the .bss section from its output if it doesn't need any .bss contents.  GAS on the other hand always emits a .bss section, even if it is empty.

Comment 18 Thomas Huth 2023-06-27 13:14:52 UTC
Suggested patches here:

https://lore.kernel.org/qemu-devel/20230627074703.99608-1-thuth@redhat.com/

Comment 19 Nick Clifton 2023-06-27 13:50:52 UTC
Note - a thread in the upstream GNU Binutils mailing list has been started about this problem:

  https://sourceware.org/pipermail/binutils/2023-June/127983.html

And Andreas has proposed a patch to fix the problem:

  https://sourceware.org/pipermail/binutils/2023-June/127997.html

Assuming that it is approved I will backport it to rawhide.

Comment 20 Thomas Huth 2023-06-28 07:43:34 UTC
(In reply to Nick Clifton from comment #19)
...
> And Andreas has proposed a patch to fix the problem:
> 
>   https://sourceware.org/pipermail/binutils/2023-June/127997.html
> 
> Assuming that it is approved I will backport it to rawhide.

Ok, then I'll wait a little bit with the QEMU series ... it would be nicer to have this fixed in the linker script, indeed.

Comment 21 IBM Bug Proxy 2023-06-28 08:01:05 UTC
------- Comment From cborntra.com 2023-06-28 03:58 EDT-------
(In reply to comment #22)
> Ok, then I'll wait a little bit with the QEMU series ... it would be nicer
> to have this fixed in the linker script, indeed.

We should not rely on newest compilers for upstream qemu, so maybe do both?

Comment 22 Thomas Huth 2023-06-28 08:21:19 UTC
(In reply to IBM Bug Proxy from comment #21)
...
> We should not rely on newest compilers for upstream qemu, so maybe do both?

Hmm, maybe ... Question: Are we really sure that there will never be another segment like .sbss in front of .bss on s390x ? If so, I guess we could do both fixes indeed. But if there is a small chance that this might change one day, we should maybe better keep __bss_start in QEMU?

Comment 23 IBM Bug Proxy 2023-06-28 09:41:22 UTC
------- Comment From Andreas.Krebbel.com 2023-06-28 05:33 EDT-------
(In reply to comment #24)
> (In reply to IBM Bug Proxy from comment #21)
> ...
> > We should not rely on newest compilers for upstream qemu, so maybe do both?
> Hmm, maybe ... Question: Are we really sure that there will never be another
> segment like .sbss in front of .bss on s390x ? If so, I guess we could do
> both fixes indeed. But if there is a small chance that this might change one
> day, we should maybe better keep __bss_start in QEMU?

You could do what GCC currently does for misaligned symbols. Put it into the literal pool and load it from there. For the qemu case this might look like this (untested):

index 6072906df4..6b4c3adb85 100644
@@ -15,8 +15,10 @@ _start:
-       /* clear bss */
+       /* clear bss. Load __bss_start from literal pool to deal with
+       /* it being potentially unaligned */
+       larl    %r2, litpool
+       lg      %r2, 0(%r2)
@@ -42,6 +44,8 @@ done:
memsetxc:
xc      0(1,%r1),0(%r1)
+litpool:
+       .quad   __bss_start
/*
* void disabled_wait(void)

Comment 24 Thomas Huth 2023-06-30 10:09:22 UTC
(In reply to IBM Bug Proxy from comment #23)
> ------- Comment From Andreas.Krebbel.com 2023-06-28 05:33 EDT-------
...
> You could do what GCC currently does for misaligned symbols. Put it into the
> literal pool and load it from there.

Thanks, that worked well, too, and should be future-proof. Patch has now been merged here:

https://gitlab.com/qemu-project/qemu/-/commit/7cd50cbe4ca3e2860b31b06ec92c17c54bd82d48

I'm re-assigning the BZ to Miroslav for handling the integration in Fedora rawhide.

Comment 25 IBM Bug Proxy 2023-07-05 12:40:45 UTC
------- Comment From Andreas.Krebbel.com 2023-07-05 08:37 EDT-------
I've just committed a patch to binutils to make sure all symbols introduced by the linker scripts are aligned according to our ABI.
https://sourceware.org/pipermail/binutils/2023-July/128267.html

Comment 26 Miroslav Rezanina 2023-07-11 08:49:42 UTC
Re-assigning to Camila as she's going to handle fedora.

Comment 27 Fedora Release Engineering 2023-08-16 08:07:50 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 28 Cole Robinson 2023-09-19 17:12:16 UTC
Patch mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2216662#c24  is in qemu 8.1.0 which is in f39 and rawhide, so I don't think there's anything left to do here.


Note You need to log in before you can comment on or make changes to this bug.