Created attachment 1137502 [details]
Description of problem:
uboot-tools-2016.03-1.fc24 frequently fails to boot on allwinner devices
Version-Release number of selected component (if applicable):
uboot-tools-2016.03-1.fc24 with 4.5 kernels
Steps to Reproduce:
1. Attempt to boot f24 alpha 1.5 using uboot-tools-2016.03-1.fc24
Starting kernel ...
pc : [<7efa82fa>] lr : [<7ef60b54>]
reloc pc : [<4a0492fa>] lr : [<4a001b54>]
sp : 7af35c78 ip : 00000007 fp : 7ef60604
r10: 00000000 r9 : 7af3eee0 r8 : 7af35d8c
r7 : 42000000 r6 : 00000000 r5 : 7efb4cc4 r4 : 7efa2289
r3 : fffffffe r2 : 00000047 r1 : 00007fc0 r0 : 00007fc0
Flags: nzCv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
Full log attached.
Ok, I can reproduce this, I did not see this before because I had:
exclude=dosbox cross-gcc-common gcc-arm-linux-gnu
In my dnf.conf to pin gcc-arm-linux-gnu to 5.3.1 to workaround the problems with building armv7 kernels with gcc6 and I always build my own u-boot.
With that removed from dnf.conf and cross-gcc updated I pretty much immediately hit the problem.
One thing which stands out during booting is:
"EHCI failed to shut down host controller."
Which I see your boot.log has too, I will investigate this further.
OK, I've found the usb bug and I've a fix ready, but it is an unrelated bug...
Ok, so as suspected this is a problem when building with gcc6, by mixing and matching .o files build with gcc5 resp. gcc6 I've managed to pinpoint the issue to u-boot's:
When this is build with gcc-6 with -O2 or -Os the resulting u-boot will not boot the kernel, with the problem described earlier in this bug report.
When build with -O1 all is well. I will attached pre-processed sources as well as asm for both -O1 and -Os (the u-boot default). Hopefully one of the gcc devs can help us figure out if this is an u-boot or gcc bug.
Created attachment 1138643 [details]
pre-processed cache_v7.c (from building with -save-temps)
Created attachment 1138645 [details]
cache_v7.s from building with -Os (results in broken boot)
Created attachment 1138646 [details]
cache_v7.s from building with -O1 (results in working boot)
I've also tried adding __attribute__((optimize (0))) to the various functions in cache_v7.c to pinpoint this further, but if added to functions with inline asm, this too breaks things. I've added it to all the functions which do not use inline asm (about half of them) and there it makes no change.
I'm not sure how _attribute__((optimize (x))) works, can I add _attribute__((optimize (1))) to compile only certain functions with -01 settings ? In that case I should be able to pinpoint this further.
Note that both -O2 and -Os result in a non working build.
1) Note that the -O1 build really is a "-Os ... -O1" build, assuming that the later -O1 flag overrides the earlier -Os completely.
2) I've tried adding -fno-strict-aliasing, that does not help
FWIW I have summarized various tricks to debug gcc issues in http://sharkcz.livejournal.com/13754.html
- per docs the last -O wins
- yes, __attribute__() can be set for functions
Dan, thanks for the hint, I've narrowed this down to the following 2 functions:
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c
index 94ff488..88a0c70 100644
@@ -116,7 +116,7 @@ static void v7_clean_inval_dcache_level_setway(u32 level, u32 num_sets,
-static void v7_maint_dcache_level_setway(u32 level, u32 operation)
+__attribute__((optimize(("O1")))) static void v7_maint_dcache_level_setway(u32 level, u32 operation)
u32 num_sets, num_ways, log2_line_len, log2_num_ways;
@@ -151,7 +151,7 @@ static void v7_maint_dcache_level_setway(u32 level, u32 operation)
-static void v7_maint_dcache_all(u32 operation)
+__attribute__((optimize(("O1")))) static void v7_maint_dcache_all(u32 operation)
u32 level, cache_type, level_start_bit = 0;
u32 clidr = get_clidr();
With this patch things work again. If I remove the _attribute__((optimize(("O1")))) from either one things break.
What these 2 have in common is that they call:
So I guess that something may be going wrong with those:
static u32 get_ccsidr(void)
/* Read current CP15 Cache Size ID Register */
asm volatile ("mrc p15, 1, %0, c0, c0, 0" : "=r" (ccsidr));
static u32 get_clidr(void)
/* Read current CP15 Cache Level ID Register */
asm volatile ("mrc p15,1,%0,c0,c0,1" : "=r" (clidr));
The reason for the non-inlined calls to get_ccsidr and get_clidr if you stick optimize(1) on the callers is that gcc doesn't inline between different optimization levels.
CCing Nick; Nick, can you spot some error in the -Os assembly?
Or alternatively we could try to bisect to a gcc commit where this stopped working; I have some revisions of arm cross (configured without assembler though, so not sure if it is usable for this purpose), but don't have access to something that could reproduce this nor experience with that.
So, I could try to upload asm from say r222025 I have around (which is gcc 5.x branch point) and r234285 (as example of contemporary gcc 6) and if the former would work for you and the latter would not, we could iterate on narrowing it down.
Created attachment 1138971 [details]
cache_v7.s (-Os, r222025 aka gcc 5 branch point)
Created attachment 1138972 [details]
cache_v7.s (-Os, r234285, i.e. 5 days old trunk)
If you have spare cycles, can you try these two and report if the first one works and second one doesn't?
(In reply to Jakub Jelinek from comment #14)
> If you have spare cycles, can you try these two and report if the first one
> works and second one doesn't?
I can confirm that the first one works and the second does not work.
Created attachment 1139121 [details]
r224703 r228141 r231340
Can you try these 3?
cache_v7.s.r224703 -> works
cache_v7.s.r228141 -> works
cache_v7.s.r231340 -> does not work
Created attachment 1139198 [details]
r228940 r229740 r230140 r230540
Though r228940 and r229740 look the same except for debug info,
don't have arm gas around to check if r230140 and r230540 aren't the same too (just so that you save some time testing them).
Ok, grabbed cross-binutils for arm and 228940 and r229740 are indeed identical disassembly, and r230140, r230540 and r231340 too. So the question is just if r228940 works or not.
(In reply to Jakub Jelinek from comment #12)
> CCing Nick; Nick, can you spot some error in the -Os assembly?
Not easily no. :-( I did look, but the two disassemblies are quite different and without a simulator framework to run them through, I was unable to deduce any potential bugs. Sorry.
I've bisected the change in between r229740 and r230140, there is just one in http://gcc.gnu.org/r229825 . But if already r228940 doesn't work, there are several possibilities, so I can't move on until I know the result of the testing.
(In reply to Jakub Jelinek from comment #19)
> Ok, grabbed cross-binutils for arm and 228940 and r229740 are indeed
> identical disassembly, and r230140, r230540 and r231340 too. So the
> question is just if r228940 works or not.
It seems that gcc6 and u-boot's cache_v7.c really do not play well together. I've been debugging another arm u-boot issue today and eventually tried downgrading to gcc-5.3 and that fixed things.
So it seems that even with -O1 gcc is still miscompiling cache_v7.c, but in a way which is harder to detect / leads to harder to reproduce problems. The bisect results for this issue are the same.
If you want some quick workaround, try adding __attribute__((noinline, noclone))
to some of the functions that are inlined into the problematic one, either to
v7_maint_dcache_level_setway or to v7_inval_dcache_level_setway and v7_clean_inval_dcache_level_setway. If it fails even with that, and doesn't fail if compiled with the same attributes say with gcc 5.x, then we could narrow it further to one of those routines.
With r229824 vs. r229825 I still see differences in the assembly even when all 3 are marked __attribute__((noinline, noclone)).
In particular in both v7_inval_dcache_level_setway and v7_clean_inval_dcache_level_setway.
That said, this is some cache invalidation, somebody familiar with ARM hw needs to look at it and say what exactly are the rules for the invalidation, if you can e.g. between different invalidation fetch extra icache or dcache lines, ...
If the __attribute__((noinline, noclone)) attribute on all 3 (or subset of them) works with gcc 5.x and doesn't with gcc 6.x, it would be still big help in narrowing the issue down. You can also add optimize (0) attribute selectively then to see where it matters.
Ok, I've tried using noinline, I started with adding this to the 2 trouble some functions while still using gcc5 and things broke (in the same way as with gcc6) at that point already.
So it seems that this cache-flushing code just is terribly fragile and really should be written in asm, and that is actually what the kernel is doing *and* u-boot already contains a copy of the kernel's version.
I've tested that switching to the asm version fixes things. I've submitted a patch with this change to u-boot upstream: https://patchwork.ozlabs.org/patch/605967/
Now lets see if upstream iw happy with this approach, assuming they are this bug can be moved (back to) u-boot.
u-boot upstream would like to know if the toolchain team is happy saying:
"that code is just too fragile, it's probably relying on undefined
behavior, investigation concluded" ?
For their exact words see:
Also see my comment 25 for why I came to the conclusion that just switching to asm would be best.
Thanks & Regards,
That would need to answer somebody familiar with the ARM cache flushing instructions. All I can say is that I haven't found any obvious errors on the toolchain side when compiling the code. Ask somebody from Linaro or ARM?
Tom Rini (upstream u-boot maintainer) has managed to talk to some ARM people about this, and
"So I was able to talk with a few people and this is the right approach.
The cache routines we're doing here in C must not in fact be done in C.
That things work today with some compilers is not by design. This is at
least a minimally correct thing to do and a more correct thing to do
would be to leverage more of the code from the kernel for cache
functions (and not just for v7)."
Where "this is the right approach" refers to replacing the C-code with asm being the right approach, so this does not count as a gcc bug. Moving to u-boot
Fedora U-boot maintainers: I've submitted a fix for this issue upstream here:
I've also done a follow-up patch which applied the same patch to the invalidate (vs the flush) function:
In my testing only the first patch is really necessary. Note that the workaround to use -O1 when building cache_v7.c is NOT sufficient, I can still reproduce the problem in some cases with that workaround (the workaround makes it harder to reproduce, but does not really fix it).
Hans: thanks for your hard work on this. Much appreciated. Will pull the v2 patch when I see it hit upstream list.
uboot-tools-2016.03-5.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-2d091b6a28
uboot-tools-2016.03-5.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-2d091b6a28
uboot-tools-2016.03-5.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.