Bug 171672
Summary: | x86_64 -Os kernel hangs after rc.sysinit overwrites dmesg | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alexandre Oliva <oliva> | ||||||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | rawhide | CC: | pfrields, umar, vonbrand, wtogami | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2005-12-28 05:16:20 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Alexandre Oliva
2005-10-25 03:59:06 UTC
Created attachment 120339 [details]
Reduced picture of the soft lockup oops
This is barely readable, but I hope it's enough. There's the tail of a SysRq-T
output before the lock up oops, so you can see that this time it got as far as
checking for new hardware. The end of the oops is the same as that I got
several other times, before I managed to stop the system from switching from
80x60 to 80x25 on boot.
sounds like this could be a dupe of 171615 and 171632 Could you try the kernel at http://people.redhat.com/davej/kernels/Fedora/devel/ please? Looks like the same problem, indeed. I'll try 1626 when my rawhide update completes, but from the two other bug reports, I won't hold my breath. My box is a UP notebook, and the only oddity I can think of is the use of external disks on both USB and Firewire, with root on LVM on raid 1, with one of the raid 1 members on one of the external disks, and some additional raid 1 (additional swap included) between the two external disks. A minor oddity, eh? :-) Created attachment 120373 [details]
lspic output
If 1.1622 is working than this is rc5-git2. git2 has very few patches. We already turned off powernow patch and I built without the hugetlb patch. What is left in there that relevant to architecture are some drm, dccp, tcp, and posix timers patches. Also, there are few Fedora patches (autofs-lookup and serial-of). Any educated guesses? Can build the kernel and try. I did build git5 with no success, so if it is posix timers they have not got it right yet. For the record... The only difference between 2.6.13-1.1622_FC5 and 2.6.13-1.1623_FC5 was that the latter had CONFIG_CC_OPTIMIZE_FOR_SIZE=y. The -git patches were not being applied because the %patch2 command was commented out. This unfortunately makes it both easy and difficult to fix the problem :-/ Yes, indeed. I rebuilt 1.1629 with actually applying git7 (this has optimize set to N) and system booted but after working for two minutes it froze. So, there is something still wrong in the git patch. I now building again with only the posix-thread patches applied from git tree to see if they are the ones causing the problem. Ok. adding all of the posix/thread related patches from git upto this time builds and runs fine. I have been running for an hour without problems. this should be fixed in 1629 *** Bug 171632 has been marked as a duplicate of this bug. *** It is fixed, indeed, as in, the problem no longer occurs. Until someone decides to turn -Os on again. I know I've seen this very same failure before, so I figured I'd track it down. So I built the entire kernel with -Os and it failed. Then I rebuilt only arch/x86_64/lib/bitops.o with -O2 and it would work fine. Then I compared the code of this file, compiled with -Os and -O2, and the only significant difference was that with -O2 find_first_zero_bit() would be inlined into find_next_zero_bit(). So I rename find_first_zero_bit to __find_first_zero_bit, make it always_inline, create a new find_first_zero_bit that just calls the always_inline function, and get find_next_zero_bit to call the always_inline function. At that point, the code in both object files is equivalent, so it should all work, righ? Well, it still doesn't, and I'm totally confused as to why. (As for how to get the kernel to not recompile everything when I change from -O2 to -Os or vice-versa, I commented out the addition of -O2 and -Os in the top-level Makefile, created `compile.Os' and `compile.O2' scripts that run CC with the corresponding option appended to the command line, then set up a soft-link to point to one of the other, and run `make bzImage CC=/that/soft/link'. :-) May I suggest that we keep this bug open such that we can eventually switch to an -Os kernel on amd64? Is this perhaps a gcc problem, just triggered by the kernel? Just wondering... Created attachment 120544 [details]
Patch that enables a kernel optimized for size to work
Nope, just the usual bug in asm statements that different compiler
optimizations often expose. The patch file contains a long explanation of the
bug and the various minor changes I made while fixing it.
Fixed upstream, and in rawhide for a while. |