Bug 338551 - kernels with debugging off do not boot on assorted x86_64 machines
Summary: kernels with debugging off do not boot on assorted x86_64 machines
Keywords:
Status: CLOSED DUPLICATE of bug 249174
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-18 18:41 UTC by Michal Jaegermann
Modified: 2015-01-04 22:29 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-29 01:22:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dmesg from a succesful boot with a debug kernel (19.40 KB, text/plain)
2007-10-22 19:44 UTC, Michal Jaegermann
no flags Details

Description Michal Jaegermann 2007-10-18 18:41:25 UTC
Description of problem:

Up to now all rawhide kernels, ending up with 2.6.23-6.fc8, were
booting without any problems on my SK8V x86_64 machine.  This is
not true anymore with 2.6.23.1-23.fc8.  It appears that it got
affected by a disease already reported in bug 249174 for fc6 and f7
kernels.  A line "agpgart: Detected AGP bridge 0" shows up on
a screen and that is it.

The crucial difference is that a hack described in
https://bugzilla.redhat.com/show_bug.cgi?id=249174#c49
of dropping down amounts of available memory does not seem to work
anymore.  I tried different values, and also other possible options
together and separately, without any effect.  The only possible way
to boot with this is kernel is when "agp=off" is added but then
a machine is nearly unusable with a graphic desktop.

The other difference with bug 249174 is does not matter what
I am trying I cannot provoke one of those panics and backtraces
which were showing up there.  The kernel just sits there with
"Detected AGP bridge 0".

What got broken between 2.6.23 and 2.6.23.1?

Version-Release number of selected component (if applicable):
kernel-2.6.23.1-23.fc8

How reproducible:
always

Comment 1 Michal Jaegermann 2007-10-19 01:49:37 UTC
I checked kernels available at koji.  The first rawhide kernel after
2.6.23-6.fc8, i.e. kernel-2.6.23.1-11.fc8, does not boot anymore
(unless "agp=off" hammer is used) and kernel-2.6.23.1-26.fc8 does not
change anything in this picture.  So from changelog likely candidates
appear to be either 2.6.23.1 or "Disable debug".

Comment 2 Chuck Ebbert 2007-10-22 17:29:56 UTC
(In reply to comment #1)
> I checked kernels available at koji.  The first rawhide kernel after
> 2.6.23-6.fc8, i.e. kernel-2.6.23.1-11.fc8, does not boot anymore
> (unless "agp=off" hammer is used) and kernel-2.6.23.1-26.fc8 does not
> change anything in this picture.  So from changelog likely candidates
> appear to be either 2.6.23.1 or "Disable debug".

Hmm, can you try kernel-debug-<currentversion>? That has the same config as the
rawhide kernels before "disable debug" was implemented.



Comment 3 Dave Jones 2007-10-22 18:35:26 UTC
This is probably a dupe of 336281


Comment 4 Michal Jaegermann 2007-10-22 19:42:08 UTC
> This is probably a dupe of bug 336281.
Yes, indeed.  It looks that way.  Also bug 249174 is related as
noted before.

Only the nasty this thing is that so far rawhide was booting and
now it stopped.  Running with 'agp=off' and 'Option "BusType" "PCI"',
as proposed in bug 336281, indeed seems to provide a workaround
but when you are trying to install in the first place you may run
into some, ahem, difficulties.

Comment 5 Michal Jaegermann 2007-10-22 19:44:27 UTC
Created attachment 234381 [details]
dmesg from a succesful boot with a debug kernel

> can you try kernel-debug-<currentversion>?

This indeed does boot without any extra options.  For what is worth
here is dmesg from a boot with 2.6.23.1-26.fc8debug kernel.

Comment 6 Dave Jones 2007-10-22 21:15:37 UTC
if you boot the kernel-debug with the option slub_debug=-  (a single minus sign.) 
does the bug come back ?


Comment 7 Michal Jaegermann 2007-10-22 22:47:26 UTC
2.6.23.1-26.fc8debug with 'slub_debug=-' boots without problems.
I also do not see any essential differences in dmesg.  The biggest
likely is:
@@ -227,9 +225,7 @@
 NET: Registered protocol family 1
 NET: Registered protocol family 17
 powernow-k8: Power state transitions not supported
-  Magic number: 11:413:294
-  hash matches device hpet
-  hash matches device tty27
+  Magic number: 11:43:604
 Freeing unused kernel memory: 708k freed
 Write protecting the kernel read-only data: 1088k
 ACPI: PCI Interrupt 0000:00:10.4[C] -> GSI 21 (level, low) -> IRQ 21

which is way past the sore spot.

Comment 8 Michal Jaegermann 2007-10-24 22:22:03 UTC
A stupid question.  Does kernel debugging affects how __devinit
is handled?  I ask because if backtraces, like those recorded in 
bug 249174 and bug 336281,  do happen they seem to consistently point
fingers at 'int agp_add_bridge(struct agp_bridge_data *bridge)'
called at the very bottom of 'static int __devinit agp_amd64_probe()',
with 'bridge' structure on stack in this function, and every time memory
is clearly corrupted.  So maybe something gets dropped prematurely
when debugging is off?

Comment 9 Chuck Ebbert 2007-10-26 17:23:59 UTC
We do have a workaround for this:

  1) Boot the installer with agp=off
  2) Install and use the -debug kernel to get AGP working

So this does not have to be a blocker...


Comment 10 Michal Jaegermann 2007-10-27 18:06:03 UTC
The original subject was explicit about SK8V as this was observed
on that particular board.  This really looks the same as
bug 249174 and bug 336281, with reports from a significantly wider
range of boards, so that subject become misleading.

A workaround proposed in bug 336281 is plausibly a better idea than the
one from comment #9 (although it assumes that it is possbile to configure
a suitable video driver to use PCI instead of a default AGP).

Comment 11 Martin Ebourne 2007-10-28 18:45:49 UTC
This is surely a duplicate of bug #249174.

Comment 12 Rahul Sundaram 2007-10-29 01:22:33 UTC
Marking as dup. Feel free to reopen if this understanding is incorrect. 

*** This bug has been marked as a duplicate of 249174 ***


Note You need to log in before you can comment on or make changes to this bug.