Bug 164371

Summary: kernel-smp causes hard lock with xorg-dri on Xeon cpu when HyperThreading enabled
Product: [Fedora] Fedora Reporter: Sinan H <haliyo>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: mike, pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-05 01:23:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
attached xorg.conf, with dri commented out. none

Description Sinan H 2005-07-27 13:12:25 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050719 Fedora/1.7.10-1.3.1

Description of problem:
I realized somehow that my Xeon (Dell precision 650, single cpu) have HyperThreading so I decided to try the smp kernel. System boots fine, but after launching xorg (xorg-x11-6.8.2-1.FC3.13) it freezes, not always immediatly but always in less than a minute.

I tried to pass acpi=off at boot, in this case the system works fine but detects only one cpu, according to /proc/cpuinfo.

booting in init 3 works fine with acpi=on, 2 cpus detected, until startx or init 5.

xorg.conf is really usual, no composite extension. The board is an ATI radeon 7000, with stock radeon driver.

HT is enabled in bios.

I haven't tried any older kernel-smp

kernel-UP (2.6.12-1.1372) works just fine.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.12-1.1372_FC3

How reproducible:
Always

Steps to Reproduce:
1.boot into kernel-2.6.12-1.1372smp init 5
2.boot OK until startx
3.hardlock after launching KDM (or gdm, or xdm), rarely I've got time to log in KDE or Gnome, but it hardlocks anyway.
  

Actual Results:  system complete hardlock (no disk activity, no ssh)

Expected Results:  no hardlock !

Additional info:

Should I file an other bug for single cpu detected if acpi=off on Xeon HT enabled ?

Comment 1 Sinan H 2005-07-27 13:33:51 UTC
commenting out "load "dri" " in xorg.conf avoids hardlock. So it would be a xorg
radeon driver bug ?

Comment 2 Sinan H 2005-07-27 13:38:17 UTC
Created attachment 117189 [details]
attached xorg.conf, with dri commented out.

#lspci -v

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon
7000/VE] (prog-if 00 [VGA])
	Subsystem: ATI Technologies Inc: Unknown device 1b8a
	Flags: bus master, stepping, 66Mhz, medium devsel, latency 64, IRQ 11
	Memory at f0000000 (32-bit, prefetchable) [size=128M]
	I/O ports at ec00 [size=256]
	Memory at ff8f0000 (32-bit, non-prefetchable) [size=64K]
	Expansion ROM at c1000000 [disabled] [size=128K]
	Capabilities: [58] AGP version 2.0
	Capabilities: [50] Power Management version 2

Comment 3 Sinan H 2005-07-27 13:50:27 UTC
bug 162702 looks similar except that the reporter having the issue on FC4, not
on FC3 as I do. 

Tried older kernel-smps, same behaviour.

Comment 4 Dan Carpenter 2005-07-27 20:45:52 UTC
>>  Should I file an other bug for single cpu detected if acpi=off on Xeon HT
enabled ?

My feeling is that this is more likely a BIOS bug than a kernel bug.

What graphics driver are you using?



Comment 5 Sinan H 2005-07-28 12:22:57 UTC
> 
> What graphics driver are you using?
> 

The board is an ATI radeon 7000, with stock radeon driver from xorg, no fglrx
(board not supported).



Comment 6 Dave Jones 2005-07-28 20:49:29 UTC
can you ssh into the box when its hung ?

something else to try..
boot with vga=791 and then after X has started, ctrl-alt-f1 and see if it locks
up . If it does, whilst you're on the console, you may get a panic
message/backtrace.


Comment 7 Sinan H 2005-07-29 10:19:16 UTC
(In reply to comment #6)
> can you ssh into the box when its hung ?
No. It's all dead, really. No ssh, no ftp, no http ...

> 
> something else to try..
> boot with vga=791 and then after X has started, ctrl-alt-f1 and see if it locks
> up . If it does, whilst you're on the console, you may get a panic
> message/backtrace.

Tried that: No hardlock until I switch back to X by ctrl-alt-f7. At this point,
it hangs again. 

I also upgraded to latest available bios for this box, with no success,
including single cpu detection if acpi=off.



Comment 8 Dave Jones 2005-07-29 19:37:25 UTC
any possibility you can hook it up to another box with a serial cable, and see
if anything comes out of the serial console ?


Comment 9 Mike Hutton 2005-08-20 16:14:48 UTC
I'm having a nearly identical problem with the 1372smp kernel on my dual 
processor Dell 1800 with the ATI Radeon 7000. 

The system will hard lock shortly after launching the X subsytem. It will last 
longer if I don't actually do anything (just leave a desktop up), but even if I 
just open a terminal window, it will crash shortly thereafter regardless of 
activity. The OS is definitely crashed, as the 1800's "deadman" light indicates 
a fault and it won't talk to any of the "soft" controls on the server (power, 
cd eject, etc.)

Interestingly, after booting the 1372smp kernel and experiencing the crash, the 
problem will then happen consistently even when booting from the 667smp kernel 
from the FC3 distro. I doubt it's my server hardware, because if I do a clean 
install from the FC3, the system will run indefinitely without a hitch.

Could this just be a bug in the ATI driver?

Comment 10 Dave Jones 2006-01-16 22:21:44 UTC
This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.


Comment 11 Dave Jones 2006-02-03 06:41:31 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 12 John Thacker 2006-05-05 01:23:50 UTC
Closing per previous comment.