Red Hat Bugzilla – Bug 338551
kernels with debugging off do not boot on assorted x86_64 machines
Last modified: 2015-01-04 17:29:58 EST
Description of problem:
Up to now all rawhide kernels, ending up with 2.6.23-6.fc8, were
booting without any problems on my SK8V x86_64 machine. This is
not true anymore with 126.96.36.199-23.fc8. It appears that it got
affected by a disease already reported in bug 249174 for fc6 and f7
kernels. A line "agpgart: Detected AGP bridge 0" shows up on
a screen and that is it.
The crucial difference is that a hack described in
of dropping down amounts of available memory does not seem to work
anymore. I tried different values, and also other possible options
together and separately, without any effect. The only possible way
to boot with this is kernel is when "agp=off" is added but then
a machine is nearly unusable with a graphic desktop.
The other difference with bug 249174 is does not matter what
I am trying I cannot provoke one of those panics and backtraces
which were showing up there. The kernel just sits there with
"Detected AGP bridge 0".
What got broken between 2.6.23 and 188.8.131.52?
Version-Release number of selected component (if applicable):
I checked kernels available at koji. The first rawhide kernel after
2.6.23-6.fc8, i.e. kernel-184.108.40.206-11.fc8, does not boot anymore
(unless "agp=off" hammer is used) and kernel-220.127.116.11-26.fc8 does not
change anything in this picture. So from changelog likely candidates
appear to be either 18.104.22.168 or "Disable debug".
(In reply to comment #1)
> I checked kernels available at koji. The first rawhide kernel after
> 2.6.23-6.fc8, i.e. kernel-22.214.171.124-11.fc8, does not boot anymore
> (unless "agp=off" hammer is used) and kernel-126.96.36.199-26.fc8 does not
> change anything in this picture. So from changelog likely candidates
> appear to be either 188.8.131.52 or "Disable debug".
Hmm, can you try kernel-debug-<currentversion>? That has the same config as the
rawhide kernels before "disable debug" was implemented.
This is probably a dupe of 336281
> This is probably a dupe of bug 336281.
Yes, indeed. It looks that way. Also bug 249174 is related as
Only the nasty this thing is that so far rawhide was booting and
now it stopped. Running with 'agp=off' and 'Option "BusType" "PCI"',
as proposed in bug 336281, indeed seems to provide a workaround
but when you are trying to install in the first place you may run
into some, ahem, difficulties.
Created attachment 234381 [details]
dmesg from a succesful boot with a debug kernel
> can you try kernel-debug-<currentversion>?
This indeed does boot without any extra options. For what is worth
here is dmesg from a boot with 184.108.40.206-26.fc8debug kernel.
if you boot the kernel-debug with the option slub_debug=- (a single minus sign.)
does the bug come back ?
220.127.116.11-26.fc8debug with 'slub_debug=-' boots without problems.
I also do not see any essential differences in dmesg. The biggest
@@ -227,9 +225,7 @@
NET: Registered protocol family 1
NET: Registered protocol family 17
powernow-k8: Power state transitions not supported
- Magic number: 11:413:294
- hash matches device hpet
- hash matches device tty27
+ Magic number: 11:43:604
Freeing unused kernel memory: 708k freed
Write protecting the kernel read-only data: 1088k
ACPI: PCI Interrupt 0000:00:10.4[C] -> GSI 21 (level, low) -> IRQ 21
which is way past the sore spot.
A stupid question. Does kernel debugging affects how __devinit
is handled? I ask because if backtraces, like those recorded in
bug 249174 and bug 336281, do happen they seem to consistently point
fingers at 'int agp_add_bridge(struct agp_bridge_data *bridge)'
called at the very bottom of 'static int __devinit agp_amd64_probe()',
with 'bridge' structure on stack in this function, and every time memory
is clearly corrupted. So maybe something gets dropped prematurely
when debugging is off?
We do have a workaround for this:
1) Boot the installer with agp=off
2) Install and use the -debug kernel to get AGP working
So this does not have to be a blocker...
The original subject was explicit about SK8V as this was observed
on that particular board. This really looks the same as
bug 249174 and bug 336281, with reports from a significantly wider
range of boards, so that subject become misleading.
A workaround proposed in bug 336281 is plausibly a better idea than the
one from comment #9 (although it assumes that it is possbile to configure
a suitable video driver to use PCI instead of a default AGP).
This is surely a duplicate of bug #249174.
Marking as dup. Feel free to reopen if this understanding is incorrect.
*** This bug has been marked as a duplicate of 249174 ***