Bug 1318788 - Compiling u-boot cpu/armv7/cache_v7.c with gcc6 leads to a non-booting system
Summary: Compiling u-boot cpu/armv7/cache_v7.c with gcc6 leads to a non-booting system
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: uboot-tools
Version: 24
Hardware: arm
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Peter Robinson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-17 19:58 UTC by Paul Whalen
Modified: 2016-04-13 21:34 UTC (History)
12 users (show)

Fixed In Version: uboot-tools-2016.03-5.fc24
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-13 21:34:00 UTC
Type: Bug


Attachments (Terms of Use)
boot log (43.67 KB, text/plain)
2016-03-17 19:58 UTC, Paul Whalen
no flags Details
pre-processed cache_v7.c (from building with -save-temps) (112.87 KB, text/plain)
2016-03-21 15:50 UTC, Hans de Goede
no flags Details
cache_v7.s from building with -Os (results in broken boot) (81.54 KB, text/plain)
2016-03-21 15:52 UTC, Hans de Goede
no flags Details
cache_v7.s from building with -O1 (results in working boot) (80.41 KB, text/plain)
2016-03-21 15:53 UTC, Hans de Goede
no flags Details
cache_v7.s (-Os, r222025 aka gcc 5 branch point) (54.78 KB, text/plain)
2016-03-22 10:38 UTC, Jakub Jelinek
no flags Details
cache_v7.s (-Os, r234285, i.e. 5 days old trunk) (81.33 KB, text/plain)
2016-03-22 10:39 UTC, Jakub Jelinek
no flags Details
r224703 r228141 r231340 (16.36 KB, application/octet-stream)
2016-03-22 16:26 UTC, Jakub Jelinek
no flags Details
r228940 r229740 r230140 r230540 (15.80 KB, application/octet-stream)
2016-03-22 19:42 UTC, Jakub Jelinek
no flags Details

Description Paul Whalen 2016-03-17 19:58:07 UTC
Created attachment 1137502 [details]
boot log

Description of problem:
uboot-tools-2016.03-1.fc24 frequently fails to boot on allwinner devices

Version-Release number of selected component (if applicable):
uboot-tools-2016.03-1.fc24 with 4.5 kernels

How reproducible:
frequently

Steps to Reproduce:
1. Attempt to boot f24 alpha 1.5 using uboot-tools-2016.03-1.fc24


Actual results:

Starting kernel ...

data abort
pc : [<7efa82fa>]          lr : [<7ef60b54>]
reloc pc : [<4a0492fa>]    lr : [<4a001b54>]
sp : 7af35c78  ip : 00000007     fp : 7ef60604
r10: 00000000  r9 : 7af3eee0     r8 : 7af35d8c
r7 : 42000000  r6 : 00000000     r5 : 7efb4cc4  r4 : 7efa2289
r3 : fffffffe  r2 : 00000047     r1 : 00007fc0  r0 : 00007fc0
Flags: nzCv  IRQs off  FIQs off  Mode SVC_32
Resetting CPU ...


Additional info:
Full log attached.

Comment 1 Hans de Goede 2016-03-21 12:05:15 UTC
Hi,

Ok, I can reproduce this, I did not see this before because I had:

exclude=dosbox cross-gcc-common gcc-arm-linux-gnu

In my dnf.conf to pin gcc-arm-linux-gnu to 5.3.1 to workaround the problems with building armv7 kernels with gcc6 and I always build my own u-boot.

With that removed from dnf.conf and cross-gcc updated I pretty much immediately hit the problem.

One thing which stands out during booting is:

"EHCI failed to shut down host controller."

Which I see your boot.log has too, I will investigate this further.

Regards,

Hans

Comment 2 Hans de Goede 2016-03-21 14:14:21 UTC
OK, I've found the usb bug and I've a fix ready, but it is an unrelated bug...

Comment 3 Hans de Goede 2016-03-21 15:44:33 UTC
Ok, so as suspected this is a problem when building with gcc6, by mixing and matching .o files build with gcc5 resp. gcc6 I've managed to pinpoint the issue to u-boot's:

arch/arm/cpu/armv7/cache_v7.c

When this is build with gcc-6 with -O2 or -Os the resulting u-boot will not boot the kernel, with the problem described earlier in this bug report.

When build with -O1 all is well. I will attached pre-processed sources as well as asm for both -O1 and -Os (the u-boot default). Hopefully one of the gcc devs can help us figure out if this is an u-boot or gcc bug.

Comment 4 Hans de Goede 2016-03-21 15:50:15 UTC
Created attachment 1138643 [details]
pre-processed cache_v7.c (from building with -save-temps)

Comment 5 Hans de Goede 2016-03-21 15:52:38 UTC
Created attachment 1138645 [details]
cache_v7.s from building with -Os (results in broken boot)

Comment 6 Hans de Goede 2016-03-21 15:53:11 UTC
Created attachment 1138646 [details]
cache_v7.s from building with -O1 (results in working boot)

Comment 7 Hans de Goede 2016-03-21 15:55:26 UTC
I've also tried adding __attribute__((optimize (0))) to the various functions in cache_v7.c to pinpoint this further, but if added to functions with inline asm, this too breaks things. I've added it to all the functions which do not use inline asm (about half of them) and there it makes no change.

I'm not sure how _attribute__((optimize (x))) works, can I add _attribute__((optimize (1))) to compile only certain functions with -01 settings ?  In that case I should be able to pinpoint this further.

Note that both -O2 and -Os result in a non working build.

Comment 8 Hans de Goede 2016-03-21 16:04:30 UTC
p.s.

1) Note that the -O1 build really is a "-Os ... -O1" build, assuming that the later -O1 flag overrides the earlier -Os completely.

2) I've tried adding -fno-strict-aliasing, that does not help

Comment 9 Dan Horák 2016-03-21 16:07:28 UTC
FWIW I have summarized various tricks to debug gcc issues in http://sharkcz.livejournal.com/13754.html

Comment 10 Dan Horák 2016-03-21 16:08:43 UTC
- per docs the last -O wins
- yes, __attribute__() can be set for functions

Comment 11 Hans de Goede 2016-03-21 17:01:25 UTC
Dan, thanks for the hint, I've narrowed this down to the following 2 functions:

diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c
index 94ff488..88a0c70 100644
--- a/arch/arm/cpu/armv7/cache_v7.c
+++ b/arch/arm/cpu/armv7/cache_v7.c
@@ -116,7 +116,7 @@ static void v7_clean_inval_dcache_level_setway(u32 level, u32 num_sets,
 	DSB;
 }
 
-static void v7_maint_dcache_level_setway(u32 level, u32 operation)
+__attribute__((optimize(("O1")))) static void v7_maint_dcache_level_setway(u32 level, u32 operation)
 {
 	u32 ccsidr;
 	u32 num_sets, num_ways, log2_line_len, log2_num_ways;
@@ -151,7 +151,7 @@ static void v7_maint_dcache_level_setway(u32 level, u32 operation)
 	}
 }
 
-static void v7_maint_dcache_all(u32 operation)
+__attribute__((optimize(("O1")))) static void v7_maint_dcache_all(u32 operation)
 {
 	u32 level, cache_type, level_start_bit = 0;
 	u32 clidr = get_clidr();

With this patch things work again. If I remove the _attribute__((optimize(("O1")))) from either one things break.

What these 2 have in common is that they call:

get_ccsidr();

resp.

get_clidr();

So I guess that something may be going wrong with those:

static u32 get_ccsidr(void)
{
        u32 ccsidr;

        /* Read current CP15 Cache Size ID Register */
        asm volatile ("mrc p15, 1, %0, c0, c0, 0" : "=r" (ccsidr));
        return ccsidr;
}

static u32 get_clidr(void)
{
        u32 clidr;

        /* Read current CP15 Cache Level ID Register */
        asm volatile ("mrc p15,1,%0,c0,c0,1" : "=r" (clidr));
        return clidr;
}

Comment 12 Jakub Jelinek 2016-03-22 10:33:47 UTC
The reason for the non-inlined calls to get_ccsidr and get_clidr if you stick optimize(1) on the callers is that gcc doesn't inline between different optimization levels.

CCing Nick; Nick, can you spot some error in the -Os assembly?

Or alternatively we could try to bisect to a gcc commit where this stopped working; I have some revisions of arm cross (configured without assembler though, so not sure if it is usable for this purpose), but don't have access to something that could reproduce this nor experience with that.
So, I could try to upload asm from say r222025 I have around (which is gcc 5.x branch point) and r234285 (as example of contemporary gcc 6) and if the former would work for you and the latter would not, we could iterate on narrowing it down.

Comment 13 Jakub Jelinek 2016-03-22 10:38:26 UTC
Created attachment 1138971 [details]
cache_v7.s (-Os, r222025 aka gcc 5 branch point)

Comment 14 Jakub Jelinek 2016-03-22 10:39:51 UTC
Created attachment 1138972 [details]
cache_v7.s (-Os, r234285, i.e. 5 days old trunk)

If you have spare cycles, can you try these two and report if the first one works and second one doesn't?

Comment 15 Hans de Goede 2016-03-22 16:18:57 UTC
Hi,

(In reply to Jakub Jelinek from comment #14)
> If you have spare cycles, can you try these two and report if the first one
> works and second one doesn't?

I can confirm that the first one works and the second does not work.

Regards,

Hans

Comment 16 Jakub Jelinek 2016-03-22 16:26:48 UTC
Created attachment 1139121 [details]
r224703 r228141 r231340

Can you try these 3?

Comment 17 Hans de Goede 2016-03-22 18:59:18 UTC
cache_v7.s.r224703 -> works
cache_v7.s.r228141 -> works
cache_v7.s.r231340 -> does not work

Regards,

Hans

Comment 18 Jakub Jelinek 2016-03-22 19:42:41 UTC
Created attachment 1139198 [details]
r228940 r229740 r230140 r230540

Another set.
Though r228940 and r229740 look the same except for debug info,
don't have arm gas around to check if r230140 and r230540 aren't the same too (just so that you save some time testing them).

Comment 19 Jakub Jelinek 2016-03-23 12:04:10 UTC
Ok, grabbed cross-binutils for arm and 228940 and r229740 are indeed identical disassembly, and r230140, r230540 and r231340 too.  So the question is just if r228940 works or not.

Comment 20 Nick Clifton 2016-03-23 13:21:42 UTC
(In reply to Jakub Jelinek from comment #12)
  
> CCing Nick; Nick, can you spot some error in the -Os assembly?

Not easily no. :-(  I did look, but the two disassemblies are quite different and without a simulator framework to run them through, I was unable to deduce any potential bugs.  Sorry.

Cheers
  Nick

Comment 21 Jakub Jelinek 2016-03-23 13:25:39 UTC
I've bisected the change in between r229740 and r230140, there is just one in http://gcc.gnu.org/r229825 .  But if already r228940 doesn't work, there are several possibilities, so I can't move on until I know the result of the testing.

Comment 22 Hans de Goede 2016-03-23 20:50:10 UTC
(In reply to Jakub Jelinek from comment #19)
> Ok, grabbed cross-binutils for arm and 228940 and r229740 are indeed
> identical disassembly, and r230140, r230540 and r231340 too.  So the
> question is just if r228940 works or not.

cache_v7.s.r228940 ->works

Comment 23 Hans de Goede 2016-03-23 22:47:17 UTC
It seems that gcc6 and u-boot's cache_v7.c really do not play well together. I've been debugging another arm u-boot issue today and eventually tried downgrading to gcc-5.3 and that fixed things.

So it seems that even with -O1 gcc is still miscompiling cache_v7.c, but in a way which is harder to detect / leads to harder to reproduce problems. The bisect results for this issue are the same.

Comment 24 Jakub Jelinek 2016-03-24 12:30:14 UTC
If you want some quick workaround, try adding __attribute__((noinline, noclone))
to some of the functions that are inlined into the problematic one, either to
v7_maint_dcache_level_setway or to v7_inval_dcache_level_setway and v7_clean_inval_dcache_level_setway.  If it fails even with that, and doesn't fail if compiled with the same attributes say with gcc 5.x, then we could narrow it further to one of those routines.
With r229824 vs. r229825 I still see differences in the assembly even when all 3 are marked __attribute__((noinline, noclone)).
In particular in both v7_inval_dcache_level_setway and v7_clean_inval_dcache_level_setway.
That said, this is some cache invalidation, somebody familiar with ARM hw needs to look at it and say what exactly are the rules for the invalidation, if you can e.g. between different invalidation fetch extra icache or dcache lines, ...
If the __attribute__((noinline, noclone)) attribute on all 3 (or subset of them) works with gcc 5.x and doesn't with gcc 6.x, it would be still big help in narrowing the issue down.  You can also add optimize (0) attribute selectively then to see where it matters.

Comment 25 Hans de Goede 2016-04-04 18:37:03 UTC
Ok, I've tried using noinline, I started with adding this to the 2 trouble some functions while still using gcc5 and things broke (in the same way as with gcc6) at that point already.

So it seems that this cache-flushing code just is terribly fragile and really should be written in asm, and that is actually what the kernel is doing *and* u-boot already contains a copy of the kernel's version.

I've tested that switching to the asm version fixes things. I've submitted a patch with this change to u-boot upstream: https://patchwork.ozlabs.org/patch/605967/

Now lets see if upstream iw happy with this approach, assuming they are this bug can be moved (back to) u-boot.

Comment 26 Hans de Goede 2016-04-05 08:32:30 UTC
Jakub,

u-boot upstream would like to know if the toolchain team is happy saying:
"that code is just too fragile, it's probably relying on undefined
behavior, investigation concluded" ?

For their exact words see:
http://lists.denx.de/pipermail/u-boot/2016-April/250503.html

Also see my comment 25 for why I came to the conclusion that just switching to asm would be best.

Thanks & Regards,

Hans

Comment 27 Jakub Jelinek 2016-04-05 08:39:56 UTC
That would need to answer somebody familiar with the ARM cache flushing instructions.  All I can say is that I haven't found any obvious errors on the toolchain side when compiling the code.  Ask somebody from Linaro or ARM?

Comment 28 Hans de Goede 2016-04-06 15:19:49 UTC
Hi,

Tom Rini (upstream u-boot maintainer) has managed to talk to some ARM people about this, and
quoting him:

"So I was able to talk with a few people and this is the right approach.
The cache routines we're doing here in C must not in fact be done in C.
That things work today with some compilers is not by design.  This is at
least a minimally correct thing to do and a more correct thing to do
would be to leverage more of the code from the kernel for cache
functions (and not just for v7)."

Where "this is the right approach" refers to replacing the C-code with asm being the right approach, so this does not count as a gcc bug. Moving to u-boot

Fedora U-boot maintainers: I've submitted a fix for this issue upstream here:

https://patchwork.ozlabs.org/patch/605967/

I've also done a follow-up patch which applied the same patch to the invalidate (vs the flush) function:

https://patchwork.ozlabs.org/patch/605968/

In my testing only the first patch is really necessary. Note that the workaround to use -O1 when building cache_v7.c is NOT sufficient, I can still reproduce the problem in some cases with that workaround (the workaround makes it harder to reproduce, but does not really fix it).

Regards,

Hans

Comment 29 Peter Robinson 2016-04-06 15:46:22 UTC
Hans: thanks for your hard work on this. Much appreciated. Will pull the v2 patch when I see it hit upstream list.

Comment 30 Fedora Update System 2016-04-09 16:57:59 UTC
uboot-tools-2016.03-5.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-2d091b6a28

Comment 31 Fedora Update System 2016-04-09 21:22:09 UTC
uboot-tools-2016.03-5.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-2d091b6a28

Comment 32 Fedora Update System 2016-04-13 21:33:54 UTC
uboot-tools-2016.03-5.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.