Bug 39233

Summary: (ATI Mach64 - Xpert@play98)XFree86-4.0.3-5 crashes randomly on SMP system
Product: [Retired] Red Hat Linux Reporter: Need Real Name <peter>
Component: XFree86Assignee: Mike A. Harris <mharris>
Status: CLOSED CURRENTRELEASE QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1CC: alan, greg, lee.bryson, mharris, myrvoll, redhatnet
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-10-23 12:32:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xfree86.0.log
none
XF86Config
none
messages
none
cpuinfo
none
devices
none
meminfo
none
modules
none
XF86Config-4
none
XFree86.0.log (latest)
none
messages (latest)
none
XF86Config-4 (after rebooting having removed dri )
none
XFree86.0.log (after rebooting the os, having removed dri)
none
messages (after rebooting, having removed dri, and added append="noapic" to lilo.conf. Please check for any issues here.)
none
messages
none
XFree86.0.log (following X rebooted itself, to go with the messages)
none
ps.txt (from ps -aux output to text file)
none
core (found this core file in my home directory from the 12th - may help)
none
messages (2.4.3-7smp X rebooted)
none
XFree86.0.log (k2.4.3-7smp X rebooted)
none
XFree86 log-file from latest crash
none
The XFree86 v4xx config file used during latest crash
none
The config file used during the latest crash none

Description Need Real Name 2001-05-05 21:01:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2smp i686)

Description of problem:
XFree86-4.0.3-5 reboots itself randomly, or with use of Netscape.

How reproducible:
Sometimes

Steps to Reproduce:
1. XFree86-4.0.3-5 
2.  Use ATI Xpert@Play98 card
3. Mach64
4. Do some work, use Netscape or Messanger.
	

Actual Results:  X goes black, and comes back.

Expected Results:  X Should Not Reboot itself in middle of my work, such as
trying to post this, which it did!!!!  5 times in a week or two. Randomly.
Usually when I am in middle of work.

Additional info:

See my attachments.

Comment 1 Need Real Name 2001-05-05 21:03:40 UTC
Created attachment 17470 [details]
Xfree86.0.log

Comment 2 Need Real Name 2001-05-05 21:07:04 UTC
Created attachment 17471 [details]
XF86Config

Comment 3 Need Real Name 2001-05-05 21:12:26 UTC
I did not have this problem with X under Redhat Linux 6.2.  (k2.2.16)

Comment 4 Need Real Name 2001-05-05 21:21:21 UTC
Created attachment 17472 [details]
messages

Comment 5 Need Real Name 2001-05-05 21:28:10 UTC
Created attachment 17473 [details]
cpuinfo

Comment 6 Need Real Name 2001-05-05 21:28:40 UTC
Created attachment 17474 [details]
devices

Comment 7 Need Real Name 2001-05-05 21:29:14 UTC
Created attachment 17475 [details]
meminfo

Comment 8 Need Real Name 2001-05-05 21:29:58 UTC
Created attachment 17476 [details]
modules

Comment 9 Mike A. Harris 2001-05-06 06:20:11 UTC
Those APIC errors in your "messages" log make me suspect buggy hardware.

Are you overclocking?

Also, your config file is for XFree86 3.x, not 4.x.  Please attach the correct
file.

Comment 10 Need Real Name 2001-05-07 11:42:02 UTC
No, I am not overclocking. (Like I said, under RHLinux 6.2 I did NOT have this
problem, only under RHLinux 7.1, which makes me believe it is Xwindows problem,
or X with my Xper@Play card, or something else affecting X; I am concerned about
the APIC errors in the messages file. Did not see those under RH6.2. ) Please
specify the location and filename for the X config file. As you can see below,
the symbolic link XF86Config under the directory /usr/X11R6/lib/X11 belongs to
the XFree86-4.0.3-5 package, and points back to the config file
/etc/X11/XF86Config.  I am in fact running only XFree86 version 4.0.3-4.   See
below. (BTW, I installed a new hard drive in order to install RHLinux 7.1, I did
not upgrade from 6.2. Hence I am not running XFree86 v3 !   The new HD is on my
ATA/66 bus which is one difference -and is newly supported under 7.1- the old HD
was on ATA/33. This is on a BP6 motherboard with dual Celeron500, as you know if
billed for Gentus Linux, and worked perfectly for me under RH6.2 smp.) 

[root@boaz X11]# pwd
/usr/X11R6/lib/X11
[root@boaz X11]# rpm -qf XF86Config
XFree86-4.0.3-5
[root@boaz X11]# ls -l XF86Config
lrwxrwxrwx    1 root     root      30 Mar 28 04:20 XF86Config ->
../../../../etc/X11/XF86Config 
[root@boaz X11]# rpm -q XFree86
XFree86-4.0.3-5




Comment 11 Mike A. Harris 2001-05-07 13:37:59 UTC
I am trying to help you, however to help you, you need to help me with the
information I request.  Without that information I cannot help you at all.
I asked if you are overclocking because it is a VERY important datapoint and
I have no way of knowing without asking.

As I told you once already "XF86Config" is the config file for XFree86 3.3.6,
and the file you provided is XF86Config, which is the config file for 3.3.6,
which is useless to me because you are using XFree86 4, and the config file
for XFree86 4.x is XF86Config-4.  If you do NOT have an XF86Config-4 file, then
that is likely the problem right there, because if XFree86 4 cannot find it's
config file (XF86Config-4, or /etc/X11/XF86Config-4 more specifically), it
_WILL_ fall back to using XF86Config - regardless of wether or not the file
is an actual 4.0.x config file or not.  If it is a 3.3.6 config file (which
the one you attached *is*, then it will explode.

Some other data points:  The distribution comes with *BOTH* XFree86 4.0.3
*and* 3.3.6, so that cards unsupported by 4.x that are supported by 3.x still
work.  As such, both versions cannot coexist with the same config file as
the config file formats are different, and so 3.3.6 uses XF86Config and
4.0.3 uses XF86Config-4.  This is also not a Red Hatism either, it is standard
stock XFree86 behavior.  The symlink in /usr/... is a backward compatibility
symlink only, however I'm not sure how useful it really is so I might actually
just remove it in the future.

I'm betting that either you do have an /etc/X11/XF86Config-4, which I'll need
a file attachment of, or if not, the solution is to try:

Xconfigurator --preferxf4

and if the problems persist:

Xconfigurator --preferxf4 --nodri

and if still there are problems:

Xconfigurator --preferxf3

The latter enables usage of the 3.3.6 server, which you said worked in 6.2,
so it likely works in 7.1 also if the 4.x driver does not work.

For the APIC errors, the way I understand it is that 2.2.x kernels do not
detect the buggy APIC, and 2.4.x kernels do, so if the error is new to you
the problem (whatever it is) is likely not new, it is just reported now.

I hope this clears up things for you, and hopefully will get you up and running
ok.  If not, please supply the /etc/X11/XF86Config-4 file spit out by
Xconfigurator in the process above so I can see what might be causing you
trouble.  Also, please attach a new X log file from /var/log/XFree86.0.log
to match the new config file, as it contains very important info as well,
which likely will be different from your initial log.

Thanks.


Comment 12 Need Real Name 2001-05-07 16:08:57 UTC
Mike, I know you are helping me, and I appreciate it. I answered your
overclocking question directly "not overclocking".  I did not know what config
file the new XFree86 v4 used, so was previously supplied the wrong one, but now
that you have told me XF86Config-4, I will upload it for you as I saw it is
there, along with the latest Xfree86.0.log, later when I get to my home office. 

Sounds reasonable, what you said about APIC, and the 2.2.x vs. 2.4.x kernels and
detecting/logging, but 2.2.x did not show signs of X rebooting.   BTW  I
remember reading something on being able to disable APIC so as not to incur APIC
errors, see websites below. What do you think about disabling APIC ? (Think this
started with 2.3.99 kernels.) If you feel at this point it could be an APIC
issue, should I open an APIC bugzilla ????

http://www.telematik.informatik.uni-karlsruhe.de/forschung/apic/
A big release with many updates: 
  Added the APIC disabling code. 

http://nlug.org/smp/
7) Had to add append="noapic" to my lilo configuration for this system to boot
without a
     kernel panic.

http://www.uwsg.indiana.edu/hypermail/linux/kernel/0101.3/1176.html
 After an extensive testing I concluded the infamous APIC lock-up happens 
when a level-triggered interrupt gets masked in an I/O APIC when it's in 
the send pending state (bit 12 of the respective interrupt redirection 
entry is set).


Comment 13 Need Real Name 2001-05-07 22:52:28 UTC
Created attachment 17607 [details]
XF86Config-4

Comment 14 Need Real Name 2001-05-07 22:54:56 UTC
Created attachment 17608 [details]
XFree86.0.log (latest)

Comment 15 Need Real Name 2001-05-07 23:00:13 UTC
Created attachment 17609 [details]
messages (latest)

Comment 16 Need Real Name 2001-05-08 21:11:22 UTC
(I'm NOT being impatient, just giving some feedback I found.)
I received some word in the Linux community that "XFree86 itself appears to be
unstable on 2.4/SMP.  APIC errors don't make things better, obviously."    Do
you know of other reports discussing issues with XFree86 on k2.4/SMP?   I use an
SMP motherboard, so perhaps this has also some bearing? (Also, do you want me to
shut off APIC?   append="noapic"  to lilo.conf )

Comment 17 Need Real Name 2001-05-08 22:04:13 UTC
Another Linux Community report on Xfree86 with SMP on k2.4 issue:

http://www.uwsg.indiana.edu/hypermail/linux/kernel/0102.1/0940.html
> This is a long-standing problem with 2.3 and 2.4 SMP kernels. I 
> believe it is a kernel bug and isn't the XFree86 project's problem. 
> The problem does not exist on 2.2 SMP kernels nor on 2.3/4 UP kernels. 
> The symptoms are random segfaults in perfectly fine XFree86 code. 
I had an XFree86 setup which was perfectly stable in RH6.2, and had been 
for some months. Upon upgrading to RH7 - with glibc-2.2 and new 
screensavers, it started falling over almost every night. 

So is it really my BP6 hardware, or is it a problem with Xfree86 on SMP system
under k2.4???
I am seeing the latter as being a likely cause. Again, this is just feedback
which seems very relevant.

Comment 18 Need Real Name 2001-05-09 13:41:28 UTC
Mike, yet another Linux community person told me this:
"Kernel 2.4.3 solved my X 4.0 crashing problems." 
Does Redhat have k2.4.3 in rpm yet? ACTUALLY do you
have k2.4.4 in RPM, because I need to install VMWare
4.0.4-1118, and VMWare says it does not support k2.4.3
due to a bug in that kernel, I will need k2.4.4.  Please
advise. (I know this is seems like straying a bit from bug 39233, 
but if k2.4.3 and 2.4.4 fix the X crashing under SMP, then
that should be a fix , yes?) 



Comment 19 Need Real Name 2001-05-10 14:57:46 UTC
Mike, I've uploaded my XF86Config-4, the latest XFree96.0.log, and even the
newest messages file.  Please advise what you find with regard to these.

I also added commentary on Linux community people using RHL7.1 w/SMP and having
X issues, but one person said kernel 2.4.3, and perhaps 2.4.4 will fix the X
issues under 7.1 with SMP.  I would prefer to use k2.4.4 and XFree86 v4.  Please
let me know if 2.4.4 is avaiable from Redhat, or when it might be?

Comment 20 Mike A. Harris 2001-05-12 15:57:37 UTC
Before anything, comment out the "Option dri" line to disable DRI.
DRI is not supported on your video card, and could cause problems if
left enabled.

Does this fix the problem?

We have a 2.4.3 based kernel being tested right now.  I do not know
if it will solve your problems or not.  Our kernel includes many bugfixes
over and above the stock Linus kernel tarballs, and so fixes that are
in 2.4.4 that are critical have likely been backported to our 2.4.3
kernel.  I've CC'd Alan.

Alan, have any SMP related changes been incorporated into our kernel that
affect XFree86 stability?



Comment 21 Mike A. Harris 2001-05-12 16:02:53 UTC
Also, try reconfiguring with:

Xconfigurator --preferxf4 --nodri


Comment 22 Alan Cox 2001-05-12 16:36:58 UTC
'X reboots' I assume means the Xserver crashed not the machine..

To answer the bits I can

-	No we have not added any SMP fixes for the DRI code. Im not aware of any
	bugs there

-	I have seen the occasional XFree86 4 report and crash but I have no 
	reason to believe the kernel is involved

-	The BP6 APIC stuff is a mess. 2.2 merely doesnt log the errors except as 		a
count in /proc. They can cause coherency problems but I really dont
	think they are involved here. Its possible but it doesnt feel right.



Comment 23 Alan Cox 2001-05-12 16:37:27 UTC
Could this be an out of memory case tripped my something ?


Comment 24 Need Real Name 2001-05-12 19:56:10 UTC
Ok, Mike/Alan.  I remarked out "#"  the Load "dri"  in XF86Config-4.   I've also
added  append="noapic" to my lilo.conf, and ran lilo.  Finally I rebooted.  I'm
attaching my new XF86Config-4, XFree86.0.log, and messages file for any further
review. Please tell me if the noapic statement will harm me or do any good, or
if I lose functionality?

Comment 25 Need Real Name 2001-05-12 20:01:54 UTC
Created attachment 18206 [details]
XF86Config-4 (after rebooting having removed dri )

Comment 26 Need Real Name 2001-05-12 20:03:33 UTC
Created attachment 18207 [details]
XFree86.0.log (after rebooting the os,  having removed dri)

Comment 27 Need Real Name 2001-05-12 20:10:28 UTC
Created attachment 18208 [details]
messages (after rebooting, having removed dri, and added append="noapic" to lilo.conf. Please check for any issues here.)

Comment 28 Need Real Name 2001-05-15 12:08:33 UTC
Alan/Mike/REDHAT,
Came back to my home office this AM, and found my RHLinux7.1 monitor BLACKED
OUT. Could not even ping the box from another computer. Rebooted, ran fsck, etc.
back up.  Started reading mail in Messenger, and Xwindows rebooted after about
10 minutes. I'll try to provide the usual logs. I REALLY hope you guys consider
what others have said that the newer 2.4.3+ kernel FIXES SMP/Xwindows issues!  I
cannot stress enough that k2.2.16smp + Xwindows was FAR MORE STABLE!!! There has
got to be something amis with k2.4.2-2 that Redhat did not incorporate from the
newer kernels.  Please review this again, and keep in mind what I was told my
another Linux user--->"Kernel 2.4.3 solved my X 4.0 crashing problems."

Comment 29 Need Real Name 2001-05-15 12:14:42 UTC
Created attachment 18370 [details]
messages

Comment 30 Need Real Name 2001-05-15 12:15:52 UTC
Created attachment 18371 [details]
XFree86.0.log (following X rebooted itself, to go with the messages)

Comment 31 Need Real Name 2001-05-15 12:17:01 UTC
Created attachment 18372 [details]
ps.txt (from ps -aux  output to text file)

Comment 32 Mike A. Harris 2001-05-15 12:20:10 UTC
We will be releasing a new kernel errata in the future.  We do not rush
out kernel errata (or any other errata) solely to fix one bug like this.
Our kernel errata needs to be well tested by internal quality assurance
procedures, and beta tested.  We are aware of the kernel issue, and our
kernel when released should contain the fix to the problem.  There is
really nothing more to do other than wait until our official kernel is
released, or try to build your own kernel.  We do not support homemade
kernels however, so your best bet is likely to wait until our official
kernel update is ready.  Feel free to try the rawhide test kernel and see
if it solves your problem.  The feedback you give could help speed up the
release of the kernel.

Comment 33 Mike A. Harris 2001-05-15 12:22:19 UTC
Reassigning to the kernel component because it is a kernel issue.

Comment 34 Arjan van de Ven 2001-05-15 12:28:51 UTC
How is this a kernel bug ?

Comment 35 Alan Cox 2001-05-15 12:34:38 UTC
I don't think it is a kernel bug, but Im prepared to keep an open mind. Lets see
if the kernel errata fixes it when its done and if not continue to pursue it as
an X bug. 

Right now we have too many unknowns and too little information in the bug report
that actually gives concrete data we can work on.


Comment 36 Need Real Name 2001-05-15 12:48:07 UTC
Created attachment 18390 [details]
core (found this core file in my home directory from the 12th - may help)

Comment 37 Mike A. Harris 2001-05-15 16:38:54 UTC
Arjan, from the data given, it seemed to me to point to the kernel, however I
could be wrong.  I've received email of similar problems, and seen postings
on XFree86 mailing lists that 2.4.4 fixes the problem.  I do not know
conclusively where problem is however, or even exactly what the problem is.
Alan says it best I think, that we should just wait for the errata kernel to
come out, and if the problem goes away, we can consider it solved.  If not,
we can then try to dig deeper.  I should have instead said "I think it
_might_ be a kernel issue", bad wording on my part indeed.

Comment 38 Need Real Name 2001-05-28 13:49:46 UTC
Hi Mike and all,

Any news on how close Redhat is to putting out a kernel update?  (We suspect the
current v2.4.2-2 with the current XFree84 4.0.3-5 has X-reboot 
issues.)  Yes, my X rebooted again.

Also, I noted something interesting in messages which seems to be related to the
X reboot.  Message:  "gnome-name-server[9661]: input condition 
is: 0x11, exiting".     Is the gnome-name-server involved here somehow????

--snip from messages--
May 26 19:53:01 boaz kernel: APIC error on CPU0: 04(02)
May 26 20:13:05 boaz kernel: APIC error on CPU0: 02(02)
May 26 20:35:22 boaz gnome-name-server[9661]: input condition is: 0x11, exiting
May 26 20:35:25 boaz gdm(pam_unix)[9558]: session closed for user phb
May 26 20:35:40 boaz gdm(pam_unix)[22699]: session opened for user phb by
(uid=0)
May 26 20:35:40 boaz gdm[22699]: gdm_slave_session_start: phb on :0
May 26 20:35:42 boaz gnome-name-server[22798]: starting
May 26 20:35:42 boaz gnome-name-server[22798]: name server starting
May 26 20:36:15 boaz su(pam_unix)[22857]: session opened for user root by
phb(uid=500)
----

Comment 39 Arjan van de Ven 2001-05-28 14:21:54 UTC
peter: does the 2.4.3-5 kernel fix this ?
If not, it is NOT a kernel issue.

Comment 40 Need Real Name 2001-05-28 17:47:30 UTC
OK. I upgraded to the Redhat Rawhide kernel 2.4.3-7smp. We shall see how things
go between X and the k.  

/proc/version
Linux version 2.4.3-7smp (root.redhat.com) (gcc version 2.96
20000731 (Red Hat Linux 7.1 2.96-85)) #1 SMP Mon May 21 16:57:54 EDT 2001

top
CPU0 states:  0.2% user,  0.0% system,  0.0% nice, 99.3% idle
CPU1 states:  0.2% user,  0.5% system,  0.0% nice, 98.3% idle



Comment 41 Need Real Name 2001-06-03 17:14:53 UTC
Stardate 010603.0112 Supplemental:  No XFree86 reboots thus far. Been using this
RHLinux 7.1 k2.4.3-7SMP all weekend, and several times throughout the week.  (As
expected, I still see APIC errors, perhaps not as many? Any way to force the
kernel to stop logging APIC? Anyway, no X reboots yet in 6 days. If this
continues another 2 weeks, I'd say it's fixed. )

Comment 42 Need Real Name 2001-06-03 17:25:53 UTC
PS:  Back on 5-28-2001, I did the following updates to my stock RHLInux 7.1...

I had upgraded the following kernel and related packages (per kernel upgrade
instructions):
initscripts-5.86-1.i386.rpm  kernel-doc-2.4.3-7.i386.rpm     
kernel-smp-2.4.3-7.i686.rpm
kernel-2.4.3-7.i686.rpm      kernel-headers-2.4.3-7.i386.rpm 
kernel-source-2.4.3-7.i386.rpm

As well as the following errata updates via the Redhat Network:
arts-2.1.2-1.i386.rpm                 netscape-communicator-4.77-1.i386.rpm
gftp-2.0.8-1.i386.rpm                 netscape-navigator-4.77-1.i386.rpm
kdelibs-2.1.2-1.i386.rpm              rhn_register-1.3.2-1.noarch.rpm
kdelibs-devel-2.1.2-1.i386.rpm        rhn_register-gnome-1.3.2-1.noarch.rpm
kdelibs-sound-2.1.2-1.i386.rpm        samba-2.0.8-1.7.1.i386.rpm
kdelibs-sound-devel-2.1.2-1.i386.rpm  samba-client-2.0.8-1.7.1.i386.rpm
losetup-2.11b-3.i386.rpm              samba-common-2.0.8-1.7.1.i386.rpm
mgetty-sendfax-1.1.25-5.i386.rpm      samba-swat-2.0.8-1.7.1.i386.rpm
minicom-1.83.1-8.i386.rpm             up2date-2.5.4-1.i386.rpm
mount-2.11b-3.i386.rpm                up2date-gnome-2.5.4-1.i386.rpm
mouseconfig-4.22-1.i386.rpm           Xconfigurator-4.9.29-1.i386.rpm
netscape-common-4.77-1.i386.rpm

(I did not, however re-configure X with the new Xconfigurator, did not need to
thus far.... However the newer samba fixed another bug I posted  35915 -
stability doing samba networking with Windows Me and 95.)

Comment 43 Need Real Name 2001-06-03 18:44:01 UTC
Supplemental: I spoke too soon?  XFree86 4.0.3-5 just rebooted itself again!!!  

PLEASE TAKE NOTE: This cannot be coincidental,  3rd/4th  time I noticed this ->
I WAS SCROLLING ON THE NETSCAPE UP/DOWN BAR when X REBOOTED ITSELF!!!  No
kidding.

Also please see my messages file:     GNOME-NAME-SERVER input condition...
Exiting... Kernel HW bug ??? Restoring Chip condition??? (HOW COME REDHAT 6.2
DID NOT BOMB ON ME? LIKE THIS??? SAME HARDWARE.)  Is the BP6 a VIA686a
motherboard??? Did not see this HW BUG message with k2.4.2-2... did you guys add
that to k2.4.3-7 ???

Jun  3 14:14:36 boaz kernel: APIC error on CPU0: 02(02)
Jun  3 14:16:04 boaz kernel: APIC error on CPU0: 02(02)
Jun  3 14:19:42 boaz gnome-name-server[1133]: input condition is: 0x11, exiting
Jun  3 14:19:42 boaz su(pam_unix)[23496]: session closed for user root
Jun  3 14:19:43 boaz su(pam_unix)[24598]: session closed for user root
Jun  3 14:19:45 boaz gdm(pam_unix)[1022]: session closed for user phb
Jun  3 14:19:47 boaz kernel: probable hardware bug: clock timer configuration
lost - probably a VIA686a motherboard.
Jun  3 14:19:47 boaz kernel: probable hardware bug: restoring chip
configuration.


WHAT NEXT, INSTALL THE NEWER RAWHIDE XFree86 ??? OR GO BACK TO OLDER XFree86 ???
Since RH6.2 was stable on my same hw, there has got to be something that can
bring stability back to RH7.1  on my same hw !  Please advise.

Comment 44 Need Real Name 2001-06-03 18:45:51 UTC
Created attachment 20192 [details]
messages (2.4.3-7smp X rebooted)

Comment 45 Need Real Name 2001-06-03 18:47:15 UTC
Created attachment 20193 [details]
XFree86.0.log (k2.4.3-7smp X rebooted)

Comment 46 Need Real Name 2001-06-03 21:20:43 UTC
Question, I just noticed that the following packages are installed (note
versions)

XFree86-FBDev-3.3.6-35
XFree86-Mach64-3.3.6-35

Even though my XFree86-4.0.3-5 is installed and in-use.  There does not seem to
be a v4 for Mach64 installed....

Any advise on this ??? Is this right ???


Comment 47 Need Real Name 2001-06-09 13:24:50 UTC
Hi all. I noted you have not answered my latest two postings with the additional files posted.Giving up on me?  Anyway, just as well, my 60GB M_x__r 
hard drive gave up the ghost after only two months or so. Getting it RMA'd - will be several days.  Then I will try a completely new Redhat 7.1 scratch 
install. I hope you will still answer my questions in time for my re-install. Specifically the "what to do next following X rebooting again with k2.4.3-7smp
specifically since RH6.2 was stable on my motherboard, there's got to be something to stabalize X v4 and k2.4.x.  Also what's up with Mach64 version 
(3.x.x) not matching the XFree86 version (4.x.x) -  is that right? 

Note: With k2.4.3-7smp this is what I see in messages when X reboots (full file is already posted). Is a BP6 motherboard a VIA686a ???
~~~~~~~~~~~~~~~~~~~~~
Jun  3 14:19:42 boaz gnome-name-server[1133]: input condition is: 0x11, exiting
Jun  3 14:19:42 boaz su(pam_unix)[23496]: session closed for user root
Jun  3 14:19:47 boaz kernel: probable hardware bug: clock timer configuration lost - probably a VIA686a motherboard.
~~~~~~~~~~~~~~~~~~~~~

Comment 48 Alan Cox 2001-06-09 13:41:23 UTC
Unfortunately the data provided so far doesn't really help identify the problem.
Its more a case of "suspect hardware but no proof either way and no idea what is
up"



Comment 49 Mike A. Harris 2001-06-09 15:49:16 UTC
Peter, you have provided both the information requested more or less, as
well as ample other information.  I thank you for that, however a lot
of the information - as Alan has pointed out - is not useful.  The
info you have provided definitely indicates that your hardware broken
in some respects (the APIC errors, and VIA kernel messages).  These
messages from the kernel are not kernel bugs to be upset about, they are
*hardware* flaws to get upset with your motherboard manufacturer and VIA
about.  The kernel is just informing you the hardware is broken.

Our kernel will be done when it is done.  If you want to know when it is
released, the best thing you can do, is download and try all rawhide kernels,
and subscribe to the redhat-watch-list, redhat-announce-list, and possibly
even sign up for Red Hat Network.  Asking in here when it will be released
just wastes both of our time - if we knew when it would be released, then it
would be released already.  We NEVER preannounce release dates of ANYTHING
to anyone *ever*.  Mostly because we do not SET release dates - things occur
when they are ready, and quality testing is complete.  Quite often during
that quality testing, some bug is found and the process starts over until it
is complete.

Personally, I believe the problem is entirely related to your hardware, as
I have seen nothing yet to believe it is an XFree86 bug, and none of the
data points to a kernel bug either as Alan has said.  Updated software is
not going to fix hardware bugs - if that is the case of course.  You've
been able to run it for days on end, and then all of a sudden boom.  That
*strongly* smells of hardware problems, possibly bad RAM, possibly an
overheating CPU, possibly bad power or low power.

So, since we really do not have any idea whatsoever what the problem you
are experiencing is, there is absolutely no way whatsoever for me/Alan or
anyone to do absolutely anything about it for you.  I would love to fix
the problem if I could, but I can't draw water from a rock...  ;o(

If Red Hat Linux 6.2 works for you, I recommend going back to it as a
workaround.  Another possibility that may help you is by talking directly
with the XFree86 people about the problem.  The xpert mailing
list is the best way to access their expertise.

Other than what I've said already, all I can suggest is trying to rule out
other possibilities.  I recommend downloading and running memtest86 to
test your memory.  Also, if your board has lmsensors support, please use
it to detect any possible overheating problems, or such.  I would enable
voltage checks on the power supply also.

You should also check your video RAM out and/or try a different video
card.  Try swapping hardware with other hardware one at a time, and see if
you can narrow it down.

Other than these possibilities I am completely out of suggestions as to 
what the problem might be.  If I had the machine sitting in front of me, I
might be able to determine more, but of course I do not.  SMP works for me
very well, both in Linux in general, and in X, and with multiple ATI and
Matrox video hardware.

One thing that *could* help me, is if you can get an *exact* reproduceable
test case, and document it step by step, ie:  1) do this, 2) do this also,
3) do that, 4) boom, the lockup occurs.  ie: non-intermittent.  If you can
do this, it may point to other problems, but right now I consider this
a hardware bug.

I will leave the bug report open for you for a while so I can monitor any
updates you can provide.  If you provide data and do not hear back, it is
more than likely because the data did not provide any new useful information
to try and track anything down.  Again, xpert will be much
more of a helpful realtime forum for tracking this down for you.

Sorry we can't be of much more help than this for now.



Comment 50 Need Real Name 2001-06-10 14:19:00 UTC
Mike, your 6-9-2001 response appears to address old topics, and completely ignores my newest postings.
Please only see my postings 6-3-2001 and newer - which by the way - are answers to some things you
guys asked me to do - I did them - and posted follow-ups for you - like the 2.4.3-7smp rawhide upgrade.
But you appear to treat all my postings like just so much jibberish.....
You also did not address some specific questions I posted. Like Mach64 v3 with XFree86 v4 is that right?

Responses to your 6-9-2001 post. Again, I am in the 6-3-2001 and newer mindset here. (P = paragraph).
P1: Thank you, but we've known about the APIC issue since day 1.  I asked is my BP6 a VIA686a motherbrd?
      Because if BP6 is NOT a VIA, then the kernel message is bogus!  But I see no reply on this question.
P2: I have in fact done all you said, not "more or less", rather all. If you read my postings you will see that!  
P2: I have already upgraded to Redhat's RAWHIDE k2.4.3-7 and said so.   I thereafter posting that  X still reboots.
P2: I have already signed up with the Redhat Network, and upgraded.  See my postings! 
P2: I have not asked again about any kernel upgrades, so stop saying I did :) Great googely moogely!  

P5&6: I've noted the new suggestions:  memtest86, lmsensors, xpert site., (also trying video cards.)
Only thing I can say is if it were bad RAM, hot CPU issue, power, etc. why is it only X that reboots
while Linux kernel keeps on running just fine and dany? HW issues usually lockup 
the entire system, not just selectively reboot only X.  And since RH6.2 worked fine, once again, 
HW issues would not be the logical culprit - don't ignore these obvious clues!   There was a messages
entry on "clock timer" loss, and you did not speak on that what-so-ever, it sounds important.

P9: It is illogical for you to conclude that only non-intermittant issues can be kernel or X bugs!
Just because I cannot reproduce a step-method for forcing this X-reboot, does not make it
NOT a bug. Others in the Linux world have complained about their X rebooting on SMP 
systems. I can't take your word for it, and ignore all the others. You yourself even said there
were others!  See YOUR OWN POSTING:
"------- Additional comments from mharris 2001-05-15 12:38:54 -------
 Arjan, from the data given, it seemed to me to point to the kernel, however I
 could be wrong.  I've received email of similar problems, and seen postings
 on XFree86 mailing lists that 2.4.4 fixes the problem.  "

Maybe I need kernel 2.4.4 to fix this problem. (I only have 2.4.3-7 right now.)
I think this still should not be ruled out. I read that some major changes were
done between 2.4.3 and 2.4.4, or 2.4.4 and 2.4.5, due to some issues.

SEE:  
http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.3
http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.4
http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.5
http://www.kernel.org/pub/linux/kernel/testing/patch-2.4.6.log

Should I run kernel 2.4.3-7 and install XFree86 v3 instead of v4?








Comment 51 Arjan van de Ven 2001-06-10 14:28:01 UTC
This is not a kernel issue..... It's XFree bug, and some people suggest
that it goes away when using glibc 2.1


Comment 52 Alan Cox 2001-06-10 14:46:25 UTC
Lets try and answer the bits you said I missed

1.	A 3.x server with 4.x libraries is fine. Its also neccessary for some
	cards that do not yet have a 4.x driver

2.	The VIA warning is indicative of possible bios/hw problems but not proof
	and the fixup it does is safe anyway

3.	X and the display action - especially scrolling have the most impact on
	the PCI bus load and possibly on power although I cant see it being
	power related

4.	If you want to throw a standard 2.4.5 kernel on your box and test it
	then go ahead. If it works thats great, if it doesnt work well its 
	info



Comment 53 Alan Cox 2001-06-10 14:48:08 UTC
Arjan - would it be worth trying the .i386 not ..i686 glibc in case the X server
code is corrupting segment registers ?


Comment 54 Need Real Name 2001-06-17 19:19:05 UTC
Note: I have my new RMA'd hard drive installed, and RHLinux 7.1 fully installed
from scratch and configured again. Also upgraded rawhide kernel 2.4.5-0.2.9smp,
and updated errata rpm's too. I'll run for awhile and see how stable this is
with X.  In under 2 days since fully installed, none to report.

Comment 55 Need Real Name 2001-06-23 15:16:58 UTC
More news - with k2.5.4-0.2.9smp - I've experienced a couple Xwindows reboots
and something I did not get before, a couple of solid lockup incidents (system
comes to a sudden crawl then locks), both involving use of Netscape. 

I note that there is a rawhide XFree86 4.1.0-x  ... would it help for me to
upgrade to it? IF SO WHICH RPM's do I need to upgrade?

Also, I've now rebooted to the UP kernel 2.5.4-0.2.9 to see if single processor
kernel makes any stability difference, and I am researching upgrading the BP6
BIOS.



Comment 56 Need Real Name 2001-06-24 01:08:36 UTC
Would it help/benefit any to recompile the kernel on my hardware - if so is
there a website outlining recompiling??? (Please respond to this and to my
previous posted questions from June 23). 

Also, I FLASHED my BP6 motherboard with a newer BIOS bp6ru128.bin dated
1-4-2001. After working the remainder of the day in SMP kernel, X rebooted a few
minutes ago.

Comment 57 Havoc Pennington 2001-06-25 18:14:40 UTC
*** Bug 36422 has been marked as a duplicate of this bug. ***

Comment 58 Need Real Name 2001-06-26 21:04:15 UTC
Note to all - running in UP uniprocessor (kernel 2.4.5-0.2.9) seems to be
stabalizing the system - no X reboots in several days, and I've been using it
extensively this week.   No APIC errors in messages.

PS: Are ANY of you going to respond to my last two postings from 6-23-2001 ???
:) Please ?





Comment 59 Need Real Name 2001-06-26 21:12:49 UTC
Note - I just realized since I did the full re-install, I forgot to go back and
remark out the dri load statement in /etc/X11/XF86config-4, so I am doing so
now.

From previous suggestion from  mharris 2001-05-07 09:37:59
"and if the problems persist:
 Xconfigurator --preferxf4 --nodri  " 

Would you please tell me if you think upgrading to the newest XFree86 v4.1.x
would help???
Reply to my 6-23-2001 postings ok?

Comment 60 Mike A. Harris 2001-06-26 22:06:32 UTC
XFree86 4.1.0 will cause you more problems than it solves right now,
and recompiling your kernel is not recommended.

Absolutely nobody can do anything about this problem because nobody has
the foggiest clue what the problem is.  It could be kernel related, X related,
SMP related, hardware related, or some combination therof.  It is not easily
reproduceable, and without any debugging info (backtrace/ltrace/strace) it
is impossible to do anything about.  I have a new SMP box here that I just
put a Mach64 card in and will use this card for a few weeks.  It is Tyan HEsl
motherboard though so it might not show up due to different hardware.  Hard
to say.

Recommendations:
1) Try your hardest to see if you can find something that increases the
   frequency of this problem.  If you can build up a list of items that
   tend to cause this problem to happen earlier rather than later, it will
   help narrow it down.
2) Build your own static XFree86 server, with the video driver built in, and
   debug it with gdb/strace/ltrace et al. logging to a file or whatnot hoping
   to capture the crash.  Due to the randomness of the problem, you'd have to
   do this a couple of times to see if it hangs in the same spot.  I'm guessing
   it wont.

If this is an XFree86 bug - I *NEED* a backtrace - preferably from more than
one core dump.  Without that I can do nothing.


Comment 61 Need Real Name 2001-06-26 22:54:15 UTC
Thank you Mike.  I may be able to help with these latest requests, but I am not
necessarily that technical (hope Redhat is).  But here are some additional
thoughts:

(1)  This bug is happening on more than one type of hardware and video card (See
also bug 36422) and other internet-Linux-community complaints. So it is probably
not hardware related. 
(2)  Per my experience, this bug appears to go away when running UP kernel. So,
it is not XFree86 4.x by itself.
(3)  Per my experience, the version of the 2.4.x  SMP kernel did not fix it.  
(4)  Bug 36422 says that using the XFree86 3.x.x stabilized use with SMP kernel.
(Conclusion) This leaves XFree96 v4.x interactions with SMP kernel or vice versa
as the culprit. 

What about Alan's comments from alan 2001-06-10 10:48:08 about GLIBC
?
"in case the X server code is corrupting segment registers ?"  If GLIBC is the
glue between X and SMPkernel, then it is worth trying. 

Is the XFree86 Project interested in helping on this one???




Comment 62 Need Real Name 2001-07-08 14:01:30 UTC
Mike, please review this an advise me on this? I just tried to run
"Xconfigurator --preferxf3" and the first screen gives me these (autodetected?)
results:

PCI Entry: ATI | 3D Rage Pro 215GP  
Xserver:  XF86_Mach64
XFree4 driver:  (default)  
          [OK]
(Bombs out of Xconfigurator at this point with this message:)
Server doesn't exist, can't continue
tried to use ../../usr/X11R6/bin/XF86_Mach64

Now technically, I have a ATI Xpert@Play98 3D card, which could have the same
chipset
I suppose as the ATI | 3D Rage Pro 215GP.   Also, I could have swarn that upon
initial 
install of RHLinux 7.1 it detected the Mach64, and I saw this as one of the
XFree86 RPM's
in the Custom Install RPM List!!!   But in all I see in /usr/X11R6/bin is
XF86_FBDev (see
list of XFree RPM 's installed on my system below).   Also   rpm -q 
XFree86-Mach64
shows "package XFree86-Mach64 is not installed".  

What does this mean (1) far as using --preferx3 <do I need to force install
Mach64???>
and (2) SMP/Xfree reboot bug with FBDev, not Mach64 ???? 


~~~~~~~~~~~~~~~~~~LIST OF MY X RPM's~~~~~~~~~~~~~~~~~~~~~~~~~ 
           gdm-2.0beta2
                                          The GNOME Display Manager. 
            xtt-fonts-0.19990222
                                          Free Japanese TrueType fonts (mincho &
gothic) 
            XFree86-ISO8859-9-2.1.2
                                         Turkish language fonts and modmaps for
X. 
            XFree86-4.0.3
                                         The basic fonts, programs and docs for
an X workstation. 
            XFree86-tools-4.0.3
                                        Various tools for XFree86 
            vnc-server-3.3.3r2
                                        A VNC server. 
            XFree86-ISO8859-7-100dpi-fonts-1.0
                                        ISO 8859-7 fonts in 100 dpi resolution
for the X Window System. 
            gqview-0.8.1
                                        An image viewer. 
            XFree86-twm-4.0.3
                                        A simple window manager 
            glms-1.03
                                        A GNOME hardware monitoring applet. 
            ttfonts-1.0
                                        Some TrueType fonts 
            XFree86-ISO8859-2-100dpi-fonts-4.0.3
                                        A set of 100 dpi Central European
language fonts for X. 
            XFree86-ISO8859-7-75dpi-fonts-1.0
                                        ISO 8859-7 fonts in 75 dpi resolution
for the X Window System. 
            XFree86-KOI8-R-100dpi-fonts-1.0
                                        KOI8-R fonts in 100 dpi resolution for
the X Window System. 
            XFree86-100dpi-fonts-4.0.3
                                        X Window System 100dpi fonts. 
            xinitrc-3.6
                                        The default startup script for the X
Window System. 
            XFree86-ISO8859-2-Type1-fonts-4.0.3
                                         A set of Type1 Central European
language fonts for X. 
            XFree86-ISO8859-7-Type1-fonts-1.0
                                        Type 1 scalable Greek (ISO 8859-7 )
fonts 
            urw-fonts-2.0
                                        Free versions of the 35 standard
PostScript fonts. 
            XFree86-75dpi-fonts-4.0.3
                                        A set of 75 dpi resolution fonts for the
X Window System. 
            XFree86-ISO8859-7-1.0
                                        Greek language fonts for the X Window
System. 
            XFree86-xf86cfg-4.0.3
                                        XFree86 configurator 
            rxvt-2.7.5
                                        A color VT102 terminal emulator for the
X Window System. 
            XFree86-xdm-4.0.3
                                        X Display Manager 
            gnome-kerberos-0.2.2
                                        Kerberos 5 tools for GNOME. 
            XFree86-ISO8859-9-100dpi-fonts-2.1.2
                                        100 dpi Turkish (ISO8859-9) fonts for X. 
         *** X Hardware Support ***

            Xconfigurator-4.9.29
                                        The Red Hat Linux configuration tool for
the X Window System. 
            XFree86-Xnest-4.0.3
                                        A nested XFree86 server. 
            XFree86-Xvfb-4.0.3
                                         A virtual framebuffer X Windows System
server for XFree86. 
            XFree86-V4L-4.0.3
                                         Video for Linux (V4L) support for
XFree86 
            XFree86-FBDev-3.3.6
                                         The X server for the generic frame
buffer device on some machines.

Comment 63 Alan Cox 2001-07-08 14:09:31 UTC
Rage cards are MACH64 and a bit (well MACH64 and a lot actually) so it is trying
to use the right server. If you install the Mach64 Xserver RPM then the
Xconfigurator should do the desired job


Comment 64 Mike A. Harris 2001-07-08 22:51:11 UTC
Yes, as Alan says, install the Mach64 package.  I think I might put all of the
XFree86 3.3.6 servers into ne package in the future to prevent this sort of
problem, and also to prevent the recent upgrade problem where every server
package duplicates 1Mb+of pex, xie...


Comment 65 Need Real Name 2001-07-28 19:28:44 UTC
Hello Mike. I updated all the errata packages available for 7.1 then upgraded to
kernel 2.4.6-2smp. I see you have k2.4.3 official package in the errata - is
this highly recommended over 2.4.6-2  ie. anything relating toward this bug?  
Also my Linux does not seem to reboot X anymore, but I do have 2-3 incidents of
total hard lockups, with some CORE files appearing under /home/myid and
/home/myid/.gnome-desktop. Would you be interested in analyzing them? Or is that
not relevant??? I leave it to you ,please do let me know: (1) should I use 2.4.3
rather than 2.4.6-2 and (2) do you want the cores - I can upload them to this
bugzilla. They're 25-35mb each. Maybe will help to close this bug?

Comment 66 Mike A. Harris 2001-07-29 00:31:39 UTC
The errata kernel is the latest officially supported kernel, so that
one should be used.  I have no idea wether there is anything in it
or in any of our rawhide kernels that are related to this bug because
as we've said before, we do not know what the bug is, it could be X,
could be the kernel, could be glibc related, or it could bad memory
or something else.  If you feel it is worth it to try a 2.4.6 kernel,
go ahead and try it.  I don't know if it will fix the problem because
I don't know what the problem is.

So in answer to your questions:
1) You can use whatever kernel you like, the errata kernel is a stable
supported kernel, rawhide is unstable, might or might not work.  I am not
a kernel guy, I have no idea.

2) No, I do not want 35Mb core files attached to bugzilla. ;o)
You can do a backtrace on each of them however by doing:

gdb --core core
Then doing "bt", and then cut and pasting all output from gdb.  If it is
more than 20 lines or so of output from gdb, then attach it as a file
instead.  Note when you run gdb on the corefile as above, it will
tell you which application generated the core ie:

[mharris@asdf mharris]$ gdb --core core
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux".
Core was generated by `kdeinit: kdesktop'.
                       ^^^^^^^^^^^^^^^^^
Program terminated with signal 11, Segmentation fault.
#0  0x4053499b in ?? ()

(gdb)bt
#0  0x4053499b in ?? ()
#1  0x404cfe05 in ?? ()
#2  0x40563f99 in ?? ()
#3  0x410cfe62 in ?? ()
#4  0x0804a49b in ?? ()
#5  0x0804add1 in ?? ()
#6  0x0804b2a6 in ?? ()
#7  0x0804bff1 in ?? ()
#8  0x40c640be in ?? ()
(gdb)

Also, by the way, the previous core file that you attached:[mharris@asdf
mharris]$ gdb --core core
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux".
"/home/mharris/core" is not a core dump: File format not recognized
                     ^^^^^^^^^^^^^^^^^^^

Also, to give you more perspective on this problem, even if I had your
machine on my desk in front of me, it would be extremely difficult
to try to track this problem down using debugging tools, etc. since you
cannot reproduce it at will.  Random spurious lockups are very difficult
to chase down.  A large percentage of which end up being hardware flaws.

Some more suggestions:  Do you or can you get access to some other
video hardware?  If you can borrow some other non-ATI hardware, that
could help narrow things down a slight bit.

Comment 67 Need Real Name 2001-07-29 13:31:37 UTC
Thank you Mike...and here's a few bits of info for clarification and settling
some things: 
BUG POSTS RELATED:
(1) As of a week or so prior to my last post of 7-28-2001 I reconfigured with
Xconfigurator --preferxf3 with the Mach64 installed, and that appears to have
stabalized my SMP+X. No more X reboots since. (I realize this is a work-around,
not a fix.)
(2) Prior to the above, running the XFree86v4 with UP kernel also stablized X so
it did not reboot.
(3)  I am still having some "out of the blue" lockups where it happed with no
one using the computer and not running anything other than GNOME and Netscape
sitting there - I just sit down to use the computer after a few days and it's
totally locked up - so I also suspect hardware related to SMP usage. Also my
screensaver is a simple black screen, so no fancy graphical cpu needs there.
UNRELATED:
(4) One of my cores - now seems unrelated to lockup - was from Corel Photopaint
- probably occured upon exiting the app.
(5) Two core.xxxx files (x being some numbers) appeared in  my /home/mydir
directory but gdb tells me they are not recogizable, so I can only assume
certain cores get generated incorrectly:
"This GDB was configured as "i386-redhat-linux"..."/core": not in executable
format: File format not recognized"   ...   

If I continue having these system lockups (BTW I have to UNPLUG the POWER CHORD
when this happens since the power button also is useless when it happens) I may
install the 2.4.3 supported kernel for trial. 


Comment 68 Mike A. Harris 2001-07-30 06:57:53 UTC
Ok, if #1 and #2 continue to work, please keep using them as much as you can.
If it never locks, then we may be able to prove that it is software
problem which is a big step IMHO.  Try using X3 for a week or two if
that is possible.  How long did you run #2?
#3 I assume you mean X4+SMP kernel right?
#4 Sounds like multithreaded cores.  The X server isn't
threaded so it cant be X cores.  An X core should indeed show up
correctly in gdb.  Enamble Option NoTrapSignals in XF86Config-4
(as per manpage) if you want to nail some X cores.  If we can get that
point, I have some new ideas.  Also, I am going to get ATI to send me
an Xpert98 same model if they've got one.

Comment 69 Need Real Name 2001-07-31 00:22:08 UTC
I shall continue running #1 (X v3) forever if you like - it works. All my Corel
sw works too.
I ran #2 for about a month. Stable.
#3 was with SMP k2.4.x.  X3 or 4.  But it's total lockup, not X rebooting. Seems
a different problem. Maybe hw. Happens even when no one is using the computer. 
#4 I'll check out the manpage. 

Thought - could this be a problem with "xfs" font server intermittenly failing
to respond, and X blowing up???  I say this because when I later added
Fontastic, and accidentally killed it, well of course it stopped responding on
port 7102 and X rebooted. So if the normal XFS occasionally fails intermittently
(with X4), that could do it too??? Just a strange idea. 








Comment 70 Mike A. Harris 2001-07-31 06:51:46 UTC
I've updated the summary to closer reflect the problem.  Also, I have
an ATI Xpert@play98 PCI on its way to me from ATI to look deeper into
this problem.

Ok, for your last comments:  For the #3 response, just to 100% clarify,
you are saying that using an SMP kernel with 3.3.6 *or* 4.x results in
a crashed box randomly?

So correct me here if wrong (trying to summarize so no need to read whole
bug report each time):
3.3.6 + UP kernel == stable
4.0.3-5 + UP kernel == crash
3.3.6 + SMP kernel == stable
4.0.3-5 + SMP kernel == crash

Kernel version doesn't seem to make any difference, SMP is unstable for you.
UP is stable.  This could still be either kernel or X related, but we're
narrowing things down at least a miniscule amount.  ;o)  I am going to
torture myself to using the xpert@play98 PCI card on my dual 1Ghz box.
My hope is that I will have the same problem and can try to track it down,
hopefully without losing work.  ;o)  Even then it'll be an upgrade from
my current torture of a Cirrus Logic 5446.  ;o)  Then it is back to the
Radeon for me.  ;o)

Question:  What version of gdb do you have installed:  rpm -q gdb
           (Trying to determine why cores are showing up invalid)


Comment 71 Need Real Name 2001-07-31 21:02:12 UTC
Peter's Responses:

Correct. Hard lockups on occasion with 3.3.6 and 4.x.... But with 3.3.6, no more
X Reboots which is the original bug.

With regard to the above, the original bug is X-REBOOTING.  The HARD LOCKUP is
something which I noted happening somewhat more often recently (even when not
using the computer) but happened all along occasionally.

To be clear, and giving it some thought, here is a new chart:

3.3.6 + UP kernel       ==  (Have not tested this scenerio as of yet. Will do
so....)
4.0.3-5 + UP kernel   ==  No X Reboots in under 1 month (July).  Hard locks <
1/week.
3.3.6 + SMP kernel    ==  No X Reboots to date.   2 hard locks since ~7/18/01.
4.0.3-5 + SMP kernel ==  X Reboots 2-4 per week.  Hard locks >= 1/week.

I can tell you my GDB version when I get home. But I don't recall any errata
updates for it, so it may be the original 7.1 distribution default.

BTW I also installed the kernel 2.4.3-12 up&smp  so I can test that too if
needed. But still running kernel 2.4.6-2.


Comment 72 Need Real Name 2001-08-03 00:23:01 UTC
rpm -q gdb   yields   gdb-5.0rh-5

Comment 73 Need Real Name 2001-08-03 18:09:50 UTC
Bug Info Update from Peter:

(a) Upgraded to Roswell rawhide kernel 2.4.6-3.1 (I was running 2.4.6-2).  I do
have 2.4.3-12 installed for testing if needed.
(b) I'm now running in 2.4.6-3.1 in UP mode to test for any hard locks or
otherwise, to fill the need from previous post.
(c) FYI while running 2.4.6-2smp I experienced one sudden, no warnings hard
lockup today.

(d) Hey RedHat, this "Roswell" business is heavily overrated. It was in fact
only a crashed classified high-altitude baloon for listening to USSR nuclear
tests.  The US gov't doesn't have to try hard at all to cover up anything -
people unwittingly work up fanciful tales to do the job quite nicely.  Sure, it
was a UFO - yah, whatever you say Mugsey.  And people blame the US gov't for
supposed cover-ups? Take the red pill and see how deep the rabbit hole of our
gullability goes (ie. Uninformed Falible Obliviousness).   Sorry, I could not
resist.... ;-) 




Comment 74 Mike A. Harris 2001-08-03 18:43:16 UTC
I now have an xpert@play98 PCI and will commense testing within the week.

Comment 75 Mike A. Harris 2001-08-08 15:18:15 UTC
/me switches to using xpert@play in main SMP workstation
Wow.. what a downgrade from the Radeon 64 AGP...  Talk about
punishing one's self...  ;O)

Comment 76 Need Real Name 2001-08-08 23:37:18 UTC
Hey, my name isn't  Gate$ so cheap or midrange it is...  What are you playing
Doom?  ;-) 

UPDATE: running kernel 2.5.6-3.1 UP for a few days, no hard lockups yet, and no
X reboots either, though I don't expect any X reboots with UP, esp. under Xfree
v3.x.

NOTICE:  Since using the 2.4.6 kernels, ie. 2.4.6-2 and 2.4.6-3.1, UP, I've
noted some sluggishness in running X v3, following certain activity, such as
after I ran a full backup to tape, or a full scp copy to other servers,  after
all done, found it slow switching between X screens, and typing to xterms,
repainting GUIs, etc. Think I have to reboot to make it normal again.   I did
note one of the logs /var/log/pacct was1.7MB but all other logs under 250kb.

Comment 77 Need Real Name 2001-08-09 01:15:49 UTC
Mike, I found another core in my home directory, copied to my coredumps
directory, and ran gdb on it, and below are the results:

-rw-------    1 phb      phb        380928 Aug  6 01:02 core-08-05-01

[root@boaz coredumps]# gdb --core core-08-05-01
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux".
Core was generated by `gnome-smproxy --sm-config-prefix /.gnome-smproxy-YKH8dN/
--sm-client-id 117f000'.
Program terminated with signal 11, Segmentation fault.
#0  0x0804a949 in ?? ()
(gdb) q

(gdb) bt
#0  0x0804a949 in ?? ()
#1  0x0804a984 in ?? ()
#2  0x0804acbf in ?? ()
#3  0x0804bd70 in ?? ()
#4  0x404ce177 in ?? ()


Comment 78 Need Real Name 2001-08-09 01:23:03 UTC
If I remember correctly, back when I was running XFree v4, and each time X
rebooted, I would see this error in messages about gnome-smproxy and I am seeing
this in my messages file, but of course X is not rebooting  under WFree v3:

[root@boaz log]# grep gnome messages
Aug  5 08:22:09 boaz gnome-name-server[1125]: input condition is: 0x11, exiting
Aug  5 08:22:25 boaz gnome-name-server[13542]: starting
Aug  5 08:22:25 boaz gnome-name-server[13542]: name server starting
Aug  8 13:31:14 boaz gnome-name-server[13542]: input condition is: 0x11, exiting
Aug  8 13:31:33 boaz gnome-name-server[18531]: starting
Aug  8 13:31:33 boaz gnome-name-server[18531]: name server starting
[root@boaz log]# grep gnome messages.1
Jul 29 15:47:39 boaz gnome-name-server[1148]: input condition is: 0x11, exiting
Jul 29 15:50:32 boaz gnome-name-server[1161]: starting
Jul 29 15:50:32 boaz gnome-name-server[1161]: name server starting
Aug  3 10:20:39 boaz gnome-name-server[1223]: starting
Aug  3 10:20:39 boaz gnome-name-server[1223]: name server starting
Aug  3 10:25:09 boaz gnome-name-server[1223]: input condition is: 0x11, exiting
Aug  3 10:27:44 boaz gnome-name-server[1131]: starting
Aug  3 10:27:44 boaz gnome-name-server[1131]: name server starting
Aug  3 10:37:12 boaz gnome-name-server[1131]: input condition is: 0x11, exiting
Aug  3 10:39:45 boaz gnome-name-server[1125]: starting
Aug  3 10:39:45 boaz gnome-name-server[1125]: name server starting


Comment 79 Need Real Name 2001-08-11 20:19:13 UTC
Mike, still no solid lockups, and no X reboots either, under RHLinux 7.1 with
k2.4.6-3 UP with XFree86 v3.x.

But I am still noticing tremendous sluggishness in the X windows repainting -
which used to only happen after coming back from a screensaver, but then was
fine all around. Now this slowness happens pretty much all the time, esp.
noticed when switching between the 4 virtual screens. What could be happening
here???

Comment 80 Tor Andre Myrvoll 2001-08-16 12:41:12 UTC
Created attachment 28096 [details]
XFree86 log-file from latest crash

Comment 81 Tor Andre Myrvoll 2001-08-16 12:43:51 UTC
Created attachment 28097 [details]
The XFree86 v4xx config file used during latest crash

Comment 82 Tor Andre Myrvoll 2001-08-16 12:44:08 UTC
I am experiencing the exact same problem (X crashing randomingly) with my
roswell install, whereas my previous install (6.2) didn't have any such problems.

I've attached XFree86.0.log and XFree86Config(-4)

Comment 83 Tor Andre Myrvoll 2001-08-16 12:46:04 UTC
Created attachment 28098 [details]
The config file used during the latest crash

Comment 84 Tor Andre Myrvoll 2001-08-16 13:42:16 UTC
I am experiencing the exact same problem (X crashing randomingly) with my
roswell install, whereas my previous install (6.2) didn't have any such problems.

I've attached XFree86.0.log and XFree86Config(-4)

Comment 85 Tor Andre Myrvoll 2001-08-16 13:45:32 UTC
Additional note:

I upgraded my kernel to 2.4.7-0.8smp (rawhide RPM) before the last crash. I have
had random crashes before this however. Also, I forgot to mention that my system
is dual-processor.

Comment 86 Need Real Name 2001-08-18 12:08:09 UTC
Thus far one solid lockup under kernel 2.4.6-3.1UP with Xfree86 v3. So from this
and past tests it appears the solid locks occur regardless of versions of
kernel, kernel mode (up/smp), and X.  I am assuming the solids are my hw?
However to-date, since using Xfree86 v3, zero X reboots.

As of 8-17-01 I've upgraded to rawhide 2.4.7-2 and testing UP and SMP with
Xfree86 v3. This kernel does appear to have corrected the X
re-painting+sluggishness problem.

One additional core dump - from Netscape - says BUS ERROR - any clues on this?
[phb@boaz phb]$ gdb --core core
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux".
Core was generated by `/usr/lib/netscape/netscape-communicator
-irix-session-management /usr/share/doc'.
Program terminated with signal 7, Bus error.
#0  0x40249801 in ?? ()
(gdb) bt
#0  0x40249801 in ?? ()
#1  0x40249648 in ?? ()
#2  0x0877f763 in ?? ()
#3  0x087c8f89 in ?? ()
#4  0x087b3b1d in ?? ()
#5  0x08934d06 in ?? ()
#6  0x087b4b4d in ?? ()
#7  0x087b4beb in ?? ()
#8  0x0893f53c in ?? ()
#9  0x0893f57b in ?? ()
#10 0x0893f5e4 in ?? ()
(gdb) 

Mike, how's your Xpert@Play98 testing going with up/smp/Xfree etc??? I am
beginning to sense that when Roswell hits the boxed set, that this bug 39233 is
going to be closed, which would be a shame since the X-rebooting issue has not
been discovered, except to say it involves only Xfree86 v4 with SMP kernels.

Comment 87 Mike A. Harris 2001-08-18 21:33:14 UTC
In general, bus errors like this indicate faulty RAM.  That isn't 100%
conclusive, but just a generalization.

This bug will not be closed prematurely if not found/fixed before
final.  It will remain open until I've done an adequate amount of
testing to reach a conclusion.  I had the xpert2000 in for 3 days,
no crashes, but had to swap it out for other work.  Will put it back
in soon and give it a real beating.

Kernel VM changes would account for better performance you're seeing,
and possibly the repainting also if it was related to performance.
2.4.7-foo is a MUCH better kernel than 2.4.6 indeed.  The 2.4 kernel
is starting to mature nicely.

Comment 88 Need Real Name 2001-08-26 14:41:38 UTC
Found this website speaking on netscape bus errors:
http://members.ping.at/theofilu/netscape.html

As many of you Linux users know, Netscape does not run well with the latest
libraries. But there is a possible work around to this problem. With the next
few lines I will
describe what to do. Look at this as a kind of mini HOWTO. The solution on the
next few lines is valid for the versions 3.x through 4.04 of Netscape Navigator
and
Netscape Communicator.

The library who makes the trouble is called libc.so.x.x.x. In the newer library
the memory management functions have changed. Now this functions check whether
the
freed memory was allocated prior or not. If not, a bus error occurs. Netscape
has three types of errors:

       It tries to free never allocated memory 
       It tries to free already freed memory again 
       Handling the pixmaps (libXpm.so) of Motif 1.2 (used by Netscape) is not
sane 

Now you ask yourself: Is Netscape usable on Linux?
Of course! Just do the following to give Netscape the modified library and let
all other programs use the normal library:

What do you think, Mike???

Comment 89 Mike A. Harris 2001-08-26 22:37:07 UTC
I think Netscape bus errors are a completely different problem having
nothing to do with this bug report.  Please file a separate report
for that if you like.  This bug report is already quite lengthy, we
shouldn't fork out from that into other bugs.

Comment 90 Need Real Name 2001-09-05 02:26:37 UTC
I have two Dell machines with integrated ATI Rage Pro video which both lock up
(no pings returned) while starting a Gnome session using RH 7.1 with all updates
applied except the new kernel.  I used a straight install (not an upgrade) of
all packages (everything).  As suggested above, I cured the problem with
"Xconfigurator --preferxf3".  The XFree86*3.3.6*rpm updates were needed.  The
kernel is stock RH 7.1, version 2.4.2-2.  So far so good. -- John,
dunlap.edu

Comment 91 redhatnet 2001-09-08 08:18:45 UTC
I have a similar problem, more reproduceable: ATI Mach64 - (Graphics Xpression) 
XFree86-4.0.3-5 under RedHat 7.1 on single-processor AMD K6-III. While not SMP, 
this bug report 39233 seems closest to my symptoms:
 
X aborts with sig 11 when the mouse is used, within a minute or so, especially 
with copy/paste.  
 
Same hardware is totally stable under RedHat 6.1 XFree86 3.3.6. Have been 
planning to test 3.3.6 under RedHat 7.1, and am willing to send full details if 
this might be helpful?

Comment 92 Need Real Name 2001-09-13 23:51:58 UTC
This bug may be the same as bugs 18449 and 46911.  No solutions there, either.
We are experiencing random X-server deaths on a dozen different RedHat 7.1 
boxes, with six different hardware configurations.  The *only* similarity
between all of these boxes is that they are all SMP... different MB, different
procs, different video cards, different memory.  Every SMP box we have has
this problem.  The Rawhide kernel *seems* to make the problem less frequent,
but does not solve it.

Comment 93 Need Real Name 2001-10-09 21:48:32 UTC
For what it's worth, I have the exact same problem, with a Number 9 Revolution
IV card. Also SMP, 512mb, SCSI, up2date on all erratas, and Ximian gnome installed.

Comment 94 Need Real Name 2001-10-09 22:00:34 UTC
Installing the following rpms from the current rawhide release
*appears* to fix the problem.  No crashes after 1.5 weeks so 
far.  I assume the 2.4.9 kernel is the source of the fix.
This is based on the fact that running the linux-up kernel 
that comes with RedHat 7.1 also fixed the problem.  It was 
only the smp kernel that comes with RedHat 7.1 that caused 
problems.


bash-2.05-8.i386.rpm
e2fsprogs-1.23-3.i386.rpm
e2fsprogs-devel-1.23-3.i386.rpm
filesystem-2.1.6-2.noarch.rpm
kernel-smp-2.4.9-0.5.i686.rpm
mkinitrd-3.2.6-1.i386.rpm
setup-2.5.7-1.noarch.rpm
tux-2.1.0-2.i386.rpm


Comment 95 Need Real Name 2001-10-12 17:02:48 UTC
I installed the same packages as Greg, and I have had no problems since.

Just an additional data point...

Comment 96 Need Real Name 2001-10-23 12:32:41 UTC
I installed the latest rawhide k 2.4.9-0.18 SMP and have been up and running for
several days with no problems.  I also installed VMWare v3.0.0 Beta and Windose
Xtra-Problems and left it running overnight with no lockups or issues.  I am
still running XFree86 v3.x because my Corel apps are all up and running that way
- but there has been a marked decrease in hard-lockups with the 2.4.9 kernel. I
believe I do have some hw issues with the bp6 dual-celeron500 motherboard, but
with this newer kernel, it is so far, acting more like k 2.2.16 SMP which was
nice and stable. :-)  Thanks all.
PS: I also updated all the RPM's that Greg mentioned.  I'm pretty satisfied that
this issue has been fixed, though I have not verified the XFree86 v4.x.

Comment 97 Mike A. Harris 2001-11-01 12:00:21 UTC
You've indicated the problem is gone now so I am closing the bug.  I've
just ran 4.1.0 for a few days on an SMP system with our latest kernel
erratum (2.4.9), and no problems.