Bug 39233
Description
Need Real Name
2001-05-05 21:01:30 UTC
Created attachment 17470 [details]
Xfree86.0.log
Created attachment 17471 [details]
XF86Config
I did not have this problem with X under Redhat Linux 6.2. (k2.2.16) Created attachment 17472 [details]
messages
Created attachment 17473 [details]
cpuinfo
Created attachment 17474 [details]
devices
Created attachment 17475 [details]
meminfo
Created attachment 17476 [details]
modules
Those APIC errors in your "messages" log make me suspect buggy hardware. Are you overclocking? Also, your config file is for XFree86 3.x, not 4.x. Please attach the correct file. No, I am not overclocking. (Like I said, under RHLinux 6.2 I did NOT have this problem, only under RHLinux 7.1, which makes me believe it is Xwindows problem, or X with my Xper@Play card, or something else affecting X; I am concerned about the APIC errors in the messages file. Did not see those under RH6.2. ) Please specify the location and filename for the X config file. As you can see below, the symbolic link XF86Config under the directory /usr/X11R6/lib/X11 belongs to the XFree86-4.0.3-5 package, and points back to the config file /etc/X11/XF86Config. I am in fact running only XFree86 version 4.0.3-4. See below. (BTW, I installed a new hard drive in order to install RHLinux 7.1, I did not upgrade from 6.2. Hence I am not running XFree86 v3 ! The new HD is on my ATA/66 bus which is one difference -and is newly supported under 7.1- the old HD was on ATA/33. This is on a BP6 motherboard with dual Celeron500, as you know if billed for Gentus Linux, and worked perfectly for me under RH6.2 smp.) [root@boaz X11]# pwd /usr/X11R6/lib/X11 [root@boaz X11]# rpm -qf XF86Config XFree86-4.0.3-5 [root@boaz X11]# ls -l XF86Config lrwxrwxrwx 1 root root 30 Mar 28 04:20 XF86Config -> ../../../../etc/X11/XF86Config [root@boaz X11]# rpm -q XFree86 XFree86-4.0.3-5 I am trying to help you, however to help you, you need to help me with the information I request. Without that information I cannot help you at all. I asked if you are overclocking because it is a VERY important datapoint and I have no way of knowing without asking. As I told you once already "XF86Config" is the config file for XFree86 3.3.6, and the file you provided is XF86Config, which is the config file for 3.3.6, which is useless to me because you are using XFree86 4, and the config file for XFree86 4.x is XF86Config-4. If you do NOT have an XF86Config-4 file, then that is likely the problem right there, because if XFree86 4 cannot find it's config file (XF86Config-4, or /etc/X11/XF86Config-4 more specifically), it _WILL_ fall back to using XF86Config - regardless of wether or not the file is an actual 4.0.x config file or not. If it is a 3.3.6 config file (which the one you attached *is*, then it will explode. Some other data points: The distribution comes with *BOTH* XFree86 4.0.3 *and* 3.3.6, so that cards unsupported by 4.x that are supported by 3.x still work. As such, both versions cannot coexist with the same config file as the config file formats are different, and so 3.3.6 uses XF86Config and 4.0.3 uses XF86Config-4. This is also not a Red Hatism either, it is standard stock XFree86 behavior. The symlink in /usr/... is a backward compatibility symlink only, however I'm not sure how useful it really is so I might actually just remove it in the future. I'm betting that either you do have an /etc/X11/XF86Config-4, which I'll need a file attachment of, or if not, the solution is to try: Xconfigurator --preferxf4 and if the problems persist: Xconfigurator --preferxf4 --nodri and if still there are problems: Xconfigurator --preferxf3 The latter enables usage of the 3.3.6 server, which you said worked in 6.2, so it likely works in 7.1 also if the 4.x driver does not work. For the APIC errors, the way I understand it is that 2.2.x kernels do not detect the buggy APIC, and 2.4.x kernels do, so if the error is new to you the problem (whatever it is) is likely not new, it is just reported now. I hope this clears up things for you, and hopefully will get you up and running ok. If not, please supply the /etc/X11/XF86Config-4 file spit out by Xconfigurator in the process above so I can see what might be causing you trouble. Also, please attach a new X log file from /var/log/XFree86.0.log to match the new config file, as it contains very important info as well, which likely will be different from your initial log. Thanks. Mike, I know you are helping me, and I appreciate it. I answered your overclocking question directly "not overclocking". I did not know what config file the new XFree86 v4 used, so was previously supplied the wrong one, but now that you have told me XF86Config-4, I will upload it for you as I saw it is there, along with the latest Xfree86.0.log, later when I get to my home office. Sounds reasonable, what you said about APIC, and the 2.2.x vs. 2.4.x kernels and detecting/logging, but 2.2.x did not show signs of X rebooting. BTW I remember reading something on being able to disable APIC so as not to incur APIC errors, see websites below. What do you think about disabling APIC ? (Think this started with 2.3.99 kernels.) If you feel at this point it could be an APIC issue, should I open an APIC bugzilla ???? http://www.telematik.informatik.uni-karlsruhe.de/forschung/apic/ A big release with many updates: Added the APIC disabling code. http://nlug.org/smp/ 7) Had to add append="noapic" to my lilo configuration for this system to boot without a kernel panic. http://www.uwsg.indiana.edu/hypermail/linux/kernel/0101.3/1176.html After an extensive testing I concluded the infamous APIC lock-up happens when a level-triggered interrupt gets masked in an I/O APIC when it's in the send pending state (bit 12 of the respective interrupt redirection entry is set). Created attachment 17607 [details]
XF86Config-4
Created attachment 17608 [details]
XFree86.0.log (latest)
Created attachment 17609 [details]
messages (latest)
(I'm NOT being impatient, just giving some feedback I found.) I received some word in the Linux community that "XFree86 itself appears to be unstable on 2.4/SMP. APIC errors don't make things better, obviously." Do you know of other reports discussing issues with XFree86 on k2.4/SMP? I use an SMP motherboard, so perhaps this has also some bearing? (Also, do you want me to shut off APIC? append="noapic" to lilo.conf ) Another Linux Community report on Xfree86 with SMP on k2.4 issue: http://www.uwsg.indiana.edu/hypermail/linux/kernel/0102.1/0940.html > This is a long-standing problem with 2.3 and 2.4 SMP kernels. I > believe it is a kernel bug and isn't the XFree86 project's problem. > The problem does not exist on 2.2 SMP kernels nor on 2.3/4 UP kernels. > The symptoms are random segfaults in perfectly fine XFree86 code. I had an XFree86 setup which was perfectly stable in RH6.2, and had been for some months. Upon upgrading to RH7 - with glibc-2.2 and new screensavers, it started falling over almost every night. So is it really my BP6 hardware, or is it a problem with Xfree86 on SMP system under k2.4??? I am seeing the latter as being a likely cause. Again, this is just feedback which seems very relevant. Mike, yet another Linux community person told me this: "Kernel 2.4.3 solved my X 4.0 crashing problems." Does Redhat have k2.4.3 in rpm yet? ACTUALLY do you have k2.4.4 in RPM, because I need to install VMWare 4.0.4-1118, and VMWare says it does not support k2.4.3 due to a bug in that kernel, I will need k2.4.4. Please advise. (I know this is seems like straying a bit from bug 39233, but if k2.4.3 and 2.4.4 fix the X crashing under SMP, then that should be a fix , yes?) Mike, I've uploaded my XF86Config-4, the latest XFree96.0.log, and even the newest messages file. Please advise what you find with regard to these. I also added commentary on Linux community people using RHL7.1 w/SMP and having X issues, but one person said kernel 2.4.3, and perhaps 2.4.4 will fix the X issues under 7.1 with SMP. I would prefer to use k2.4.4 and XFree86 v4. Please let me know if 2.4.4 is avaiable from Redhat, or when it might be? Before anything, comment out the "Option dri" line to disable DRI. DRI is not supported on your video card, and could cause problems if left enabled. Does this fix the problem? We have a 2.4.3 based kernel being tested right now. I do not know if it will solve your problems or not. Our kernel includes many bugfixes over and above the stock Linus kernel tarballs, and so fixes that are in 2.4.4 that are critical have likely been backported to our 2.4.3 kernel. I've CC'd Alan. Alan, have any SMP related changes been incorporated into our kernel that affect XFree86 stability? Also, try reconfiguring with: Xconfigurator --preferxf4 --nodri 'X reboots' I assume means the Xserver crashed not the machine.. To answer the bits I can - No we have not added any SMP fixes for the DRI code. Im not aware of any bugs there - I have seen the occasional XFree86 4 report and crash but I have no reason to believe the kernel is involved - The BP6 APIC stuff is a mess. 2.2 merely doesnt log the errors except as a count in /proc. They can cause coherency problems but I really dont think they are involved here. Its possible but it doesnt feel right. Could this be an out of memory case tripped my something ? Ok, Mike/Alan. I remarked out "#" the Load "dri" in XF86Config-4. I've also added append="noapic" to my lilo.conf, and ran lilo. Finally I rebooted. I'm attaching my new XF86Config-4, XFree86.0.log, and messages file for any further review. Please tell me if the noapic statement will harm me or do any good, or if I lose functionality? Created attachment 18206 [details]
XF86Config-4 (after rebooting having removed dri )
Created attachment 18207 [details]
XFree86.0.log (after rebooting the os, having removed dri)
Created attachment 18208 [details]
messages (after rebooting, having removed dri, and added append="noapic" to lilo.conf. Please check for any issues here.)
Alan/Mike/REDHAT, Came back to my home office this AM, and found my RHLinux7.1 monitor BLACKED OUT. Could not even ping the box from another computer. Rebooted, ran fsck, etc. back up. Started reading mail in Messenger, and Xwindows rebooted after about 10 minutes. I'll try to provide the usual logs. I REALLY hope you guys consider what others have said that the newer 2.4.3+ kernel FIXES SMP/Xwindows issues! I cannot stress enough that k2.2.16smp + Xwindows was FAR MORE STABLE!!! There has got to be something amis with k2.4.2-2 that Redhat did not incorporate from the newer kernels. Please review this again, and keep in mind what I was told my another Linux user--->"Kernel 2.4.3 solved my X 4.0 crashing problems." Created attachment 18370 [details]
messages
Created attachment 18371 [details]
XFree86.0.log (following X rebooted itself, to go with the messages)
Created attachment 18372 [details]
ps.txt (from ps -aux output to text file)
We will be releasing a new kernel errata in the future. We do not rush out kernel errata (or any other errata) solely to fix one bug like this. Our kernel errata needs to be well tested by internal quality assurance procedures, and beta tested. We are aware of the kernel issue, and our kernel when released should contain the fix to the problem. There is really nothing more to do other than wait until our official kernel is released, or try to build your own kernel. We do not support homemade kernels however, so your best bet is likely to wait until our official kernel update is ready. Feel free to try the rawhide test kernel and see if it solves your problem. The feedback you give could help speed up the release of the kernel. Reassigning to the kernel component because it is a kernel issue. How is this a kernel bug ? I don't think it is a kernel bug, but Im prepared to keep an open mind. Lets see if the kernel errata fixes it when its done and if not continue to pursue it as an X bug. Right now we have too many unknowns and too little information in the bug report that actually gives concrete data we can work on. Created attachment 18390 [details]
core (found this core file in my home directory from the 12th - may help)
Arjan, from the data given, it seemed to me to point to the kernel, however I could be wrong. I've received email of similar problems, and seen postings on XFree86 mailing lists that 2.4.4 fixes the problem. I do not know conclusively where problem is however, or even exactly what the problem is. Alan says it best I think, that we should just wait for the errata kernel to come out, and if the problem goes away, we can consider it solved. If not, we can then try to dig deeper. I should have instead said "I think it _might_ be a kernel issue", bad wording on my part indeed. Hi Mike and all, Any news on how close Redhat is to putting out a kernel update? (We suspect the current v2.4.2-2 with the current XFree84 4.0.3-5 has X-reboot issues.) Yes, my X rebooted again. Also, I noted something interesting in messages which seems to be related to the X reboot. Message: "gnome-name-server[9661]: input condition is: 0x11, exiting". Is the gnome-name-server involved here somehow???? --snip from messages-- May 26 19:53:01 boaz kernel: APIC error on CPU0: 04(02) May 26 20:13:05 boaz kernel: APIC error on CPU0: 02(02) May 26 20:35:22 boaz gnome-name-server[9661]: input condition is: 0x11, exiting May 26 20:35:25 boaz gdm(pam_unix)[9558]: session closed for user phb May 26 20:35:40 boaz gdm(pam_unix)[22699]: session opened for user phb by (uid=0) May 26 20:35:40 boaz gdm[22699]: gdm_slave_session_start: phb on :0 May 26 20:35:42 boaz gnome-name-server[22798]: starting May 26 20:35:42 boaz gnome-name-server[22798]: name server starting May 26 20:36:15 boaz su(pam_unix)[22857]: session opened for user root by phb(uid=500) ---- peter: does the 2.4.3-5 kernel fix this ? If not, it is NOT a kernel issue. OK. I upgraded to the Redhat Rawhide kernel 2.4.3-7smp. We shall see how things go between X and the k. /proc/version Linux version 2.4.3-7smp (root.redhat.com) (gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85)) #1 SMP Mon May 21 16:57:54 EDT 2001 top CPU0 states: 0.2% user, 0.0% system, 0.0% nice, 99.3% idle CPU1 states: 0.2% user, 0.5% system, 0.0% nice, 98.3% idle Stardate 010603.0112 Supplemental: No XFree86 reboots thus far. Been using this RHLinux 7.1 k2.4.3-7SMP all weekend, and several times throughout the week. (As expected, I still see APIC errors, perhaps not as many? Any way to force the kernel to stop logging APIC? Anyway, no X reboots yet in 6 days. If this continues another 2 weeks, I'd say it's fixed. ) PS: Back on 5-28-2001, I did the following updates to my stock RHLInux 7.1... I had upgraded the following kernel and related packages (per kernel upgrade instructions): initscripts-5.86-1.i386.rpm kernel-doc-2.4.3-7.i386.rpm kernel-smp-2.4.3-7.i686.rpm kernel-2.4.3-7.i686.rpm kernel-headers-2.4.3-7.i386.rpm kernel-source-2.4.3-7.i386.rpm As well as the following errata updates via the Redhat Network: arts-2.1.2-1.i386.rpm netscape-communicator-4.77-1.i386.rpm gftp-2.0.8-1.i386.rpm netscape-navigator-4.77-1.i386.rpm kdelibs-2.1.2-1.i386.rpm rhn_register-1.3.2-1.noarch.rpm kdelibs-devel-2.1.2-1.i386.rpm rhn_register-gnome-1.3.2-1.noarch.rpm kdelibs-sound-2.1.2-1.i386.rpm samba-2.0.8-1.7.1.i386.rpm kdelibs-sound-devel-2.1.2-1.i386.rpm samba-client-2.0.8-1.7.1.i386.rpm losetup-2.11b-3.i386.rpm samba-common-2.0.8-1.7.1.i386.rpm mgetty-sendfax-1.1.25-5.i386.rpm samba-swat-2.0.8-1.7.1.i386.rpm minicom-1.83.1-8.i386.rpm up2date-2.5.4-1.i386.rpm mount-2.11b-3.i386.rpm up2date-gnome-2.5.4-1.i386.rpm mouseconfig-4.22-1.i386.rpm Xconfigurator-4.9.29-1.i386.rpm netscape-common-4.77-1.i386.rpm (I did not, however re-configure X with the new Xconfigurator, did not need to thus far.... However the newer samba fixed another bug I posted 35915 - stability doing samba networking with Windows Me and 95.) Supplemental: I spoke too soon? XFree86 4.0.3-5 just rebooted itself again!!! PLEASE TAKE NOTE: This cannot be coincidental, 3rd/4th time I noticed this -> I WAS SCROLLING ON THE NETSCAPE UP/DOWN BAR when X REBOOTED ITSELF!!! No kidding. Also please see my messages file: GNOME-NAME-SERVER input condition... Exiting... Kernel HW bug ??? Restoring Chip condition??? (HOW COME REDHAT 6.2 DID NOT BOMB ON ME? LIKE THIS??? SAME HARDWARE.) Is the BP6 a VIA686a motherboard??? Did not see this HW BUG message with k2.4.2-2... did you guys add that to k2.4.3-7 ??? Jun 3 14:14:36 boaz kernel: APIC error on CPU0: 02(02) Jun 3 14:16:04 boaz kernel: APIC error on CPU0: 02(02) Jun 3 14:19:42 boaz gnome-name-server[1133]: input condition is: 0x11, exiting Jun 3 14:19:42 boaz su(pam_unix)[23496]: session closed for user root Jun 3 14:19:43 boaz su(pam_unix)[24598]: session closed for user root Jun 3 14:19:45 boaz gdm(pam_unix)[1022]: session closed for user phb Jun 3 14:19:47 boaz kernel: probable hardware bug: clock timer configuration lost - probably a VIA686a motherboard. Jun 3 14:19:47 boaz kernel: probable hardware bug: restoring chip configuration. WHAT NEXT, INSTALL THE NEWER RAWHIDE XFree86 ??? OR GO BACK TO OLDER XFree86 ??? Since RH6.2 was stable on my same hw, there has got to be something that can bring stability back to RH7.1 on my same hw ! Please advise. Created attachment 20192 [details]
messages (2.4.3-7smp X rebooted)
Created attachment 20193 [details]
XFree86.0.log (k2.4.3-7smp X rebooted)
Question, I just noticed that the following packages are installed (note versions) XFree86-FBDev-3.3.6-35 XFree86-Mach64-3.3.6-35 Even though my XFree86-4.0.3-5 is installed and in-use. There does not seem to be a v4 for Mach64 installed.... Any advise on this ??? Is this right ??? Hi all. I noted you have not answered my latest two postings with the additional files posted.Giving up on me? Anyway, just as well, my 60GB M_x__r hard drive gave up the ghost after only two months or so. Getting it RMA'd - will be several days. Then I will try a completely new Redhat 7.1 scratch install. I hope you will still answer my questions in time for my re-install. Specifically the "what to do next following X rebooting again with k2.4.3-7smp specifically since RH6.2 was stable on my motherboard, there's got to be something to stabalize X v4 and k2.4.x. Also what's up with Mach64 version (3.x.x) not matching the XFree86 version (4.x.x) - is that right? Note: With k2.4.3-7smp this is what I see in messages when X reboots (full file is already posted). Is a BP6 motherboard a VIA686a ??? ~~~~~~~~~~~~~~~~~~~~~ Jun 3 14:19:42 boaz gnome-name-server[1133]: input condition is: 0x11, exiting Jun 3 14:19:42 boaz su(pam_unix)[23496]: session closed for user root Jun 3 14:19:47 boaz kernel: probable hardware bug: clock timer configuration lost - probably a VIA686a motherboard. ~~~~~~~~~~~~~~~~~~~~~ Unfortunately the data provided so far doesn't really help identify the problem. Its more a case of "suspect hardware but no proof either way and no idea what is up" Peter, you have provided both the information requested more or less, as well as ample other information. I thank you for that, however a lot of the information - as Alan has pointed out - is not useful. The info you have provided definitely indicates that your hardware broken in some respects (the APIC errors, and VIA kernel messages). These messages from the kernel are not kernel bugs to be upset about, they are *hardware* flaws to get upset with your motherboard manufacturer and VIA about. The kernel is just informing you the hardware is broken. Our kernel will be done when it is done. If you want to know when it is released, the best thing you can do, is download and try all rawhide kernels, and subscribe to the redhat-watch-list, redhat-announce-list, and possibly even sign up for Red Hat Network. Asking in here when it will be released just wastes both of our time - if we knew when it would be released, then it would be released already. We NEVER preannounce release dates of ANYTHING to anyone *ever*. Mostly because we do not SET release dates - things occur when they are ready, and quality testing is complete. Quite often during that quality testing, some bug is found and the process starts over until it is complete. Personally, I believe the problem is entirely related to your hardware, as I have seen nothing yet to believe it is an XFree86 bug, and none of the data points to a kernel bug either as Alan has said. Updated software is not going to fix hardware bugs - if that is the case of course. You've been able to run it for days on end, and then all of a sudden boom. That *strongly* smells of hardware problems, possibly bad RAM, possibly an overheating CPU, possibly bad power or low power. So, since we really do not have any idea whatsoever what the problem you are experiencing is, there is absolutely no way whatsoever for me/Alan or anyone to do absolutely anything about it for you. I would love to fix the problem if I could, but I can't draw water from a rock... ;o( If Red Hat Linux 6.2 works for you, I recommend going back to it as a workaround. Another possibility that may help you is by talking directly with the XFree86 people about the problem. The xpert mailing list is the best way to access their expertise. Other than what I've said already, all I can suggest is trying to rule out other possibilities. I recommend downloading and running memtest86 to test your memory. Also, if your board has lmsensors support, please use it to detect any possible overheating problems, or such. I would enable voltage checks on the power supply also. You should also check your video RAM out and/or try a different video card. Try swapping hardware with other hardware one at a time, and see if you can narrow it down. Other than these possibilities I am completely out of suggestions as to what the problem might be. If I had the machine sitting in front of me, I might be able to determine more, but of course I do not. SMP works for me very well, both in Linux in general, and in X, and with multiple ATI and Matrox video hardware. One thing that *could* help me, is if you can get an *exact* reproduceable test case, and document it step by step, ie: 1) do this, 2) do this also, 3) do that, 4) boom, the lockup occurs. ie: non-intermittent. If you can do this, it may point to other problems, but right now I consider this a hardware bug. I will leave the bug report open for you for a while so I can monitor any updates you can provide. If you provide data and do not hear back, it is more than likely because the data did not provide any new useful information to try and track anything down. Again, xpert will be much more of a helpful realtime forum for tracking this down for you. Sorry we can't be of much more help than this for now. Mike, your 6-9-2001 response appears to address old topics, and completely ignores my newest postings. Please only see my postings 6-3-2001 and newer - which by the way - are answers to some things you guys asked me to do - I did them - and posted follow-ups for you - like the 2.4.3-7smp rawhide upgrade. But you appear to treat all my postings like just so much jibberish..... You also did not address some specific questions I posted. Like Mach64 v3 with XFree86 v4 is that right? Responses to your 6-9-2001 post. Again, I am in the 6-3-2001 and newer mindset here. (P = paragraph). P1: Thank you, but we've known about the APIC issue since day 1. I asked is my BP6 a VIA686a motherbrd? Because if BP6 is NOT a VIA, then the kernel message is bogus! But I see no reply on this question. P2: I have in fact done all you said, not "more or less", rather all. If you read my postings you will see that! P2: I have already upgraded to Redhat's RAWHIDE k2.4.3-7 and said so. I thereafter posting that X still reboots. P2: I have already signed up with the Redhat Network, and upgraded. See my postings! P2: I have not asked again about any kernel upgrades, so stop saying I did :) Great googely moogely! P5&6: I've noted the new suggestions: memtest86, lmsensors, xpert site., (also trying video cards.) Only thing I can say is if it were bad RAM, hot CPU issue, power, etc. why is it only X that reboots while Linux kernel keeps on running just fine and dany? HW issues usually lockup the entire system, not just selectively reboot only X. And since RH6.2 worked fine, once again, HW issues would not be the logical culprit - don't ignore these obvious clues! There was a messages entry on "clock timer" loss, and you did not speak on that what-so-ever, it sounds important. P9: It is illogical for you to conclude that only non-intermittant issues can be kernel or X bugs! Just because I cannot reproduce a step-method for forcing this X-reboot, does not make it NOT a bug. Others in the Linux world have complained about their X rebooting on SMP systems. I can't take your word for it, and ignore all the others. You yourself even said there were others! See YOUR OWN POSTING: "------- Additional comments from mharris 2001-05-15 12:38:54 ------- Arjan, from the data given, it seemed to me to point to the kernel, however I could be wrong. I've received email of similar problems, and seen postings on XFree86 mailing lists that 2.4.4 fixes the problem. " Maybe I need kernel 2.4.4 to fix this problem. (I only have 2.4.3-7 right now.) I think this still should not be ruled out. I read that some major changes were done between 2.4.3 and 2.4.4, or 2.4.4 and 2.4.5, due to some issues. SEE: http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.3 http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.4 http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.5 http://www.kernel.org/pub/linux/kernel/testing/patch-2.4.6.log Should I run kernel 2.4.3-7 and install XFree86 v3 instead of v4? This is not a kernel issue..... It's XFree bug, and some people suggest that it goes away when using glibc 2.1 Lets try and answer the bits you said I missed 1. A 3.x server with 4.x libraries is fine. Its also neccessary for some cards that do not yet have a 4.x driver 2. The VIA warning is indicative of possible bios/hw problems but not proof and the fixup it does is safe anyway 3. X and the display action - especially scrolling have the most impact on the PCI bus load and possibly on power although I cant see it being power related 4. If you want to throw a standard 2.4.5 kernel on your box and test it then go ahead. If it works thats great, if it doesnt work well its info Arjan - would it be worth trying the .i386 not ..i686 glibc in case the X server code is corrupting segment registers ? Note: I have my new RMA'd hard drive installed, and RHLinux 7.1 fully installed from scratch and configured again. Also upgraded rawhide kernel 2.4.5-0.2.9smp, and updated errata rpm's too. I'll run for awhile and see how stable this is with X. In under 2 days since fully installed, none to report. More news - with k2.5.4-0.2.9smp - I've experienced a couple Xwindows reboots and something I did not get before, a couple of solid lockup incidents (system comes to a sudden crawl then locks), both involving use of Netscape. I note that there is a rawhide XFree86 4.1.0-x ... would it help for me to upgrade to it? IF SO WHICH RPM's do I need to upgrade? Also, I've now rebooted to the UP kernel 2.5.4-0.2.9 to see if single processor kernel makes any stability difference, and I am researching upgrading the BP6 BIOS. Would it help/benefit any to recompile the kernel on my hardware - if so is there a website outlining recompiling??? (Please respond to this and to my previous posted questions from June 23). Also, I FLASHED my BP6 motherboard with a newer BIOS bp6ru128.bin dated 1-4-2001. After working the remainder of the day in SMP kernel, X rebooted a few minutes ago. *** Bug 36422 has been marked as a duplicate of this bug. *** Note to all - running in UP uniprocessor (kernel 2.4.5-0.2.9) seems to be stabalizing the system - no X reboots in several days, and I've been using it extensively this week. No APIC errors in messages. PS: Are ANY of you going to respond to my last two postings from 6-23-2001 ??? :) Please ? Note - I just realized since I did the full re-install, I forgot to go back and remark out the dri load statement in /etc/X11/XF86config-4, so I am doing so now. From previous suggestion from mharris 2001-05-07 09:37:59 "and if the problems persist: Xconfigurator --preferxf4 --nodri " Would you please tell me if you think upgrading to the newest XFree86 v4.1.x would help??? Reply to my 6-23-2001 postings ok? XFree86 4.1.0 will cause you more problems than it solves right now, and recompiling your kernel is not recommended. Absolutely nobody can do anything about this problem because nobody has the foggiest clue what the problem is. It could be kernel related, X related, SMP related, hardware related, or some combination therof. It is not easily reproduceable, and without any debugging info (backtrace/ltrace/strace) it is impossible to do anything about. I have a new SMP box here that I just put a Mach64 card in and will use this card for a few weeks. It is Tyan HEsl motherboard though so it might not show up due to different hardware. Hard to say. Recommendations: 1) Try your hardest to see if you can find something that increases the frequency of this problem. If you can build up a list of items that tend to cause this problem to happen earlier rather than later, it will help narrow it down. 2) Build your own static XFree86 server, with the video driver built in, and debug it with gdb/strace/ltrace et al. logging to a file or whatnot hoping to capture the crash. Due to the randomness of the problem, you'd have to do this a couple of times to see if it hangs in the same spot. I'm guessing it wont. If this is an XFree86 bug - I *NEED* a backtrace - preferably from more than one core dump. Without that I can do nothing. Thank you Mike. I may be able to help with these latest requests, but I am not necessarily that technical (hope Redhat is). But here are some additional thoughts: (1) This bug is happening on more than one type of hardware and video card (See also bug 36422) and other internet-Linux-community complaints. So it is probably not hardware related. (2) Per my experience, this bug appears to go away when running UP kernel. So, it is not XFree86 4.x by itself. (3) Per my experience, the version of the 2.4.x SMP kernel did not fix it. (4) Bug 36422 says that using the XFree86 3.x.x stabilized use with SMP kernel. (Conclusion) This leaves XFree96 v4.x interactions with SMP kernel or vice versa as the culprit. What about Alan's comments from alan 2001-06-10 10:48:08 about GLIBC ? "in case the X server code is corrupting segment registers ?" If GLIBC is the glue between X and SMPkernel, then it is worth trying. Is the XFree86 Project interested in helping on this one??? Mike, please review this an advise me on this? I just tried to run "Xconfigurator --preferxf3" and the first screen gives me these (autodetected?) results: PCI Entry: ATI | 3D Rage Pro 215GP Xserver: XF86_Mach64 XFree4 driver: (default) [OK] (Bombs out of Xconfigurator at this point with this message:) Server doesn't exist, can't continue tried to use ../../usr/X11R6/bin/XF86_Mach64 Now technically, I have a ATI Xpert@Play98 3D card, which could have the same chipset I suppose as the ATI | 3D Rage Pro 215GP. Also, I could have swarn that upon initial install of RHLinux 7.1 it detected the Mach64, and I saw this as one of the XFree86 RPM's in the Custom Install RPM List!!! But in all I see in /usr/X11R6/bin is XF86_FBDev (see list of XFree RPM 's installed on my system below). Also rpm -q XFree86-Mach64 shows "package XFree86-Mach64 is not installed". What does this mean (1) far as using --preferx3 <do I need to force install Mach64???> and (2) SMP/Xfree reboot bug with FBDev, not Mach64 ???? ~~~~~~~~~~~~~~~~~~LIST OF MY X RPM's~~~~~~~~~~~~~~~~~~~~~~~~~ gdm-2.0beta2 The GNOME Display Manager. xtt-fonts-0.19990222 Free Japanese TrueType fonts (mincho & gothic) XFree86-ISO8859-9-2.1.2 Turkish language fonts and modmaps for X. XFree86-4.0.3 The basic fonts, programs and docs for an X workstation. XFree86-tools-4.0.3 Various tools for XFree86 vnc-server-3.3.3r2 A VNC server. XFree86-ISO8859-7-100dpi-fonts-1.0 ISO 8859-7 fonts in 100 dpi resolution for the X Window System. gqview-0.8.1 An image viewer. XFree86-twm-4.0.3 A simple window manager glms-1.03 A GNOME hardware monitoring applet. ttfonts-1.0 Some TrueType fonts XFree86-ISO8859-2-100dpi-fonts-4.0.3 A set of 100 dpi Central European language fonts for X. XFree86-ISO8859-7-75dpi-fonts-1.0 ISO 8859-7 fonts in 75 dpi resolution for the X Window System. XFree86-KOI8-R-100dpi-fonts-1.0 KOI8-R fonts in 100 dpi resolution for the X Window System. XFree86-100dpi-fonts-4.0.3 X Window System 100dpi fonts. xinitrc-3.6 The default startup script for the X Window System. XFree86-ISO8859-2-Type1-fonts-4.0.3 A set of Type1 Central European language fonts for X. XFree86-ISO8859-7-Type1-fonts-1.0 Type 1 scalable Greek (ISO 8859-7 ) fonts urw-fonts-2.0 Free versions of the 35 standard PostScript fonts. XFree86-75dpi-fonts-4.0.3 A set of 75 dpi resolution fonts for the X Window System. XFree86-ISO8859-7-1.0 Greek language fonts for the X Window System. XFree86-xf86cfg-4.0.3 XFree86 configurator rxvt-2.7.5 A color VT102 terminal emulator for the X Window System. XFree86-xdm-4.0.3 X Display Manager gnome-kerberos-0.2.2 Kerberos 5 tools for GNOME. XFree86-ISO8859-9-100dpi-fonts-2.1.2 100 dpi Turkish (ISO8859-9) fonts for X. *** X Hardware Support *** Xconfigurator-4.9.29 The Red Hat Linux configuration tool for the X Window System. XFree86-Xnest-4.0.3 A nested XFree86 server. XFree86-Xvfb-4.0.3 A virtual framebuffer X Windows System server for XFree86. XFree86-V4L-4.0.3 Video for Linux (V4L) support for XFree86 XFree86-FBDev-3.3.6 The X server for the generic frame buffer device on some machines. Rage cards are MACH64 and a bit (well MACH64 and a lot actually) so it is trying to use the right server. If you install the Mach64 Xserver RPM then the Xconfigurator should do the desired job Yes, as Alan says, install the Mach64 package. I think I might put all of the XFree86 3.3.6 servers into ne package in the future to prevent this sort of problem, and also to prevent the recent upgrade problem where every server package duplicates 1Mb+of pex, xie... Hello Mike. I updated all the errata packages available for 7.1 then upgraded to kernel 2.4.6-2smp. I see you have k2.4.3 official package in the errata - is this highly recommended over 2.4.6-2 ie. anything relating toward this bug? Also my Linux does not seem to reboot X anymore, but I do have 2-3 incidents of total hard lockups, with some CORE files appearing under /home/myid and /home/myid/.gnome-desktop. Would you be interested in analyzing them? Or is that not relevant??? I leave it to you ,please do let me know: (1) should I use 2.4.3 rather than 2.4.6-2 and (2) do you want the cores - I can upload them to this bugzilla. They're 25-35mb each. Maybe will help to close this bug? The errata kernel is the latest officially supported kernel, so that one should be used. I have no idea wether there is anything in it or in any of our rawhide kernels that are related to this bug because as we've said before, we do not know what the bug is, it could be X, could be the kernel, could be glibc related, or it could bad memory or something else. If you feel it is worth it to try a 2.4.6 kernel, go ahead and try it. I don't know if it will fix the problem because I don't know what the problem is. So in answer to your questions: 1) You can use whatever kernel you like, the errata kernel is a stable supported kernel, rawhide is unstable, might or might not work. I am not a kernel guy, I have no idea. 2) No, I do not want 35Mb core files attached to bugzilla. ;o) You can do a backtrace on each of them however by doing: gdb --core core Then doing "bt", and then cut and pasting all output from gdb. If it is more than 20 lines or so of output from gdb, then attach it as a file instead. Note when you run gdb on the corefile as above, it will tell you which application generated the core ie: [mharris@asdf mharris]$ gdb --core core GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux". Core was generated by `kdeinit: kdesktop'. ^^^^^^^^^^^^^^^^^ Program terminated with signal 11, Segmentation fault. #0 0x4053499b in ?? () (gdb)bt #0 0x4053499b in ?? () #1 0x404cfe05 in ?? () #2 0x40563f99 in ?? () #3 0x410cfe62 in ?? () #4 0x0804a49b in ?? () #5 0x0804add1 in ?? () #6 0x0804b2a6 in ?? () #7 0x0804bff1 in ?? () #8 0x40c640be in ?? () (gdb) Also, by the way, the previous core file that you attached:[mharris@asdf mharris]$ gdb --core core GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux". "/home/mharris/core" is not a core dump: File format not recognized ^^^^^^^^^^^^^^^^^^^ Also, to give you more perspective on this problem, even if I had your machine on my desk in front of me, it would be extremely difficult to try to track this problem down using debugging tools, etc. since you cannot reproduce it at will. Random spurious lockups are very difficult to chase down. A large percentage of which end up being hardware flaws. Some more suggestions: Do you or can you get access to some other video hardware? If you can borrow some other non-ATI hardware, that could help narrow things down a slight bit. Thank you Mike...and here's a few bits of info for clarification and settling some things: BUG POSTS RELATED: (1) As of a week or so prior to my last post of 7-28-2001 I reconfigured with Xconfigurator --preferxf3 with the Mach64 installed, and that appears to have stabalized my SMP+X. No more X reboots since. (I realize this is a work-around, not a fix.) (2) Prior to the above, running the XFree86v4 with UP kernel also stablized X so it did not reboot. (3) I am still having some "out of the blue" lockups where it happed with no one using the computer and not running anything other than GNOME and Netscape sitting there - I just sit down to use the computer after a few days and it's totally locked up - so I also suspect hardware related to SMP usage. Also my screensaver is a simple black screen, so no fancy graphical cpu needs there. UNRELATED: (4) One of my cores - now seems unrelated to lockup - was from Corel Photopaint - probably occured upon exiting the app. (5) Two core.xxxx files (x being some numbers) appeared in my /home/mydir directory but gdb tells me they are not recogizable, so I can only assume certain cores get generated incorrectly: "This GDB was configured as "i386-redhat-linux"..."/core": not in executable format: File format not recognized" ... If I continue having these system lockups (BTW I have to UNPLUG the POWER CHORD when this happens since the power button also is useless when it happens) I may install the 2.4.3 supported kernel for trial. Ok, if #1 and #2 continue to work, please keep using them as much as you can. If it never locks, then we may be able to prove that it is software problem which is a big step IMHO. Try using X3 for a week or two if that is possible. How long did you run #2? #3 I assume you mean X4+SMP kernel right? #4 Sounds like multithreaded cores. The X server isn't threaded so it cant be X cores. An X core should indeed show up correctly in gdb. Enamble Option NoTrapSignals in XF86Config-4 (as per manpage) if you want to nail some X cores. If we can get that point, I have some new ideas. Also, I am going to get ATI to send me an Xpert98 same model if they've got one. I shall continue running #1 (X v3) forever if you like - it works. All my Corel sw works too. I ran #2 for about a month. Stable. #3 was with SMP k2.4.x. X3 or 4. But it's total lockup, not X rebooting. Seems a different problem. Maybe hw. Happens even when no one is using the computer. #4 I'll check out the manpage. Thought - could this be a problem with "xfs" font server intermittenly failing to respond, and X blowing up??? I say this because when I later added Fontastic, and accidentally killed it, well of course it stopped responding on port 7102 and X rebooted. So if the normal XFS occasionally fails intermittently (with X4), that could do it too??? Just a strange idea. I've updated the summary to closer reflect the problem. Also, I have an ATI Xpert@play98 PCI on its way to me from ATI to look deeper into this problem. Ok, for your last comments: For the #3 response, just to 100% clarify, you are saying that using an SMP kernel with 3.3.6 *or* 4.x results in a crashed box randomly? So correct me here if wrong (trying to summarize so no need to read whole bug report each time): 3.3.6 + UP kernel == stable 4.0.3-5 + UP kernel == crash 3.3.6 + SMP kernel == stable 4.0.3-5 + SMP kernel == crash Kernel version doesn't seem to make any difference, SMP is unstable for you. UP is stable. This could still be either kernel or X related, but we're narrowing things down at least a miniscule amount. ;o) I am going to torture myself to using the xpert@play98 PCI card on my dual 1Ghz box. My hope is that I will have the same problem and can try to track it down, hopefully without losing work. ;o) Even then it'll be an upgrade from my current torture of a Cirrus Logic 5446. ;o) Then it is back to the Radeon for me. ;o) Question: What version of gdb do you have installed: rpm -q gdb (Trying to determine why cores are showing up invalid) Peter's Responses: Correct. Hard lockups on occasion with 3.3.6 and 4.x.... But with 3.3.6, no more X Reboots which is the original bug. With regard to the above, the original bug is X-REBOOTING. The HARD LOCKUP is something which I noted happening somewhat more often recently (even when not using the computer) but happened all along occasionally. To be clear, and giving it some thought, here is a new chart: 3.3.6 + UP kernel == (Have not tested this scenerio as of yet. Will do so....) 4.0.3-5 + UP kernel == No X Reboots in under 1 month (July). Hard locks < 1/week. 3.3.6 + SMP kernel == No X Reboots to date. 2 hard locks since ~7/18/01. 4.0.3-5 + SMP kernel == X Reboots 2-4 per week. Hard locks >= 1/week. I can tell you my GDB version when I get home. But I don't recall any errata updates for it, so it may be the original 7.1 distribution default. BTW I also installed the kernel 2.4.3-12 up&smp so I can test that too if needed. But still running kernel 2.4.6-2. rpm -q gdb yields gdb-5.0rh-5 Bug Info Update from Peter: (a) Upgraded to Roswell rawhide kernel 2.4.6-3.1 (I was running 2.4.6-2). I do have 2.4.3-12 installed for testing if needed. (b) I'm now running in 2.4.6-3.1 in UP mode to test for any hard locks or otherwise, to fill the need from previous post. (c) FYI while running 2.4.6-2smp I experienced one sudden, no warnings hard lockup today. (d) Hey RedHat, this "Roswell" business is heavily overrated. It was in fact only a crashed classified high-altitude baloon for listening to USSR nuclear tests. The US gov't doesn't have to try hard at all to cover up anything - people unwittingly work up fanciful tales to do the job quite nicely. Sure, it was a UFO - yah, whatever you say Mugsey. And people blame the US gov't for supposed cover-ups? Take the red pill and see how deep the rabbit hole of our gullability goes (ie. Uninformed Falible Obliviousness). Sorry, I could not resist.... ;-) I now have an xpert@play98 PCI and will commense testing within the week. /me switches to using xpert@play in main SMP workstation Wow.. what a downgrade from the Radeon 64 AGP... Talk about punishing one's self... ;O) Hey, my name isn't Gate$ so cheap or midrange it is... What are you playing Doom? ;-) UPDATE: running kernel 2.5.6-3.1 UP for a few days, no hard lockups yet, and no X reboots either, though I don't expect any X reboots with UP, esp. under Xfree v3.x. NOTICE: Since using the 2.4.6 kernels, ie. 2.4.6-2 and 2.4.6-3.1, UP, I've noted some sluggishness in running X v3, following certain activity, such as after I ran a full backup to tape, or a full scp copy to other servers, after all done, found it slow switching between X screens, and typing to xterms, repainting GUIs, etc. Think I have to reboot to make it normal again. I did note one of the logs /var/log/pacct was1.7MB but all other logs under 250kb. Mike, I found another core in my home directory, copied to my coredumps directory, and ran gdb on it, and below are the results: -rw------- 1 phb phb 380928 Aug 6 01:02 core-08-05-01 [root@boaz coredumps]# gdb --core core-08-05-01 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux". Core was generated by `gnome-smproxy --sm-config-prefix /.gnome-smproxy-YKH8dN/ --sm-client-id 117f000'. Program terminated with signal 11, Segmentation fault. #0 0x0804a949 in ?? () (gdb) q (gdb) bt #0 0x0804a949 in ?? () #1 0x0804a984 in ?? () #2 0x0804acbf in ?? () #3 0x0804bd70 in ?? () #4 0x404ce177 in ?? () If I remember correctly, back when I was running XFree v4, and each time X rebooted, I would see this error in messages about gnome-smproxy and I am seeing this in my messages file, but of course X is not rebooting under WFree v3: [root@boaz log]# grep gnome messages Aug 5 08:22:09 boaz gnome-name-server[1125]: input condition is: 0x11, exiting Aug 5 08:22:25 boaz gnome-name-server[13542]: starting Aug 5 08:22:25 boaz gnome-name-server[13542]: name server starting Aug 8 13:31:14 boaz gnome-name-server[13542]: input condition is: 0x11, exiting Aug 8 13:31:33 boaz gnome-name-server[18531]: starting Aug 8 13:31:33 boaz gnome-name-server[18531]: name server starting [root@boaz log]# grep gnome messages.1 Jul 29 15:47:39 boaz gnome-name-server[1148]: input condition is: 0x11, exiting Jul 29 15:50:32 boaz gnome-name-server[1161]: starting Jul 29 15:50:32 boaz gnome-name-server[1161]: name server starting Aug 3 10:20:39 boaz gnome-name-server[1223]: starting Aug 3 10:20:39 boaz gnome-name-server[1223]: name server starting Aug 3 10:25:09 boaz gnome-name-server[1223]: input condition is: 0x11, exiting Aug 3 10:27:44 boaz gnome-name-server[1131]: starting Aug 3 10:27:44 boaz gnome-name-server[1131]: name server starting Aug 3 10:37:12 boaz gnome-name-server[1131]: input condition is: 0x11, exiting Aug 3 10:39:45 boaz gnome-name-server[1125]: starting Aug 3 10:39:45 boaz gnome-name-server[1125]: name server starting Mike, still no solid lockups, and no X reboots either, under RHLinux 7.1 with k2.4.6-3 UP with XFree86 v3.x. But I am still noticing tremendous sluggishness in the X windows repainting - which used to only happen after coming back from a screensaver, but then was fine all around. Now this slowness happens pretty much all the time, esp. noticed when switching between the 4 virtual screens. What could be happening here??? Created attachment 28096 [details]
XFree86 log-file from latest crash
Created attachment 28097 [details]
The XFree86 v4xx config file used during latest crash
I am experiencing the exact same problem (X crashing randomingly) with my roswell install, whereas my previous install (6.2) didn't have any such problems. I've attached XFree86.0.log and XFree86Config(-4) Created attachment 28098 [details]
The config file used during the latest crash
I am experiencing the exact same problem (X crashing randomingly) with my roswell install, whereas my previous install (6.2) didn't have any such problems. I've attached XFree86.0.log and XFree86Config(-4) Additional note: I upgraded my kernel to 2.4.7-0.8smp (rawhide RPM) before the last crash. I have had random crashes before this however. Also, I forgot to mention that my system is dual-processor. Thus far one solid lockup under kernel 2.4.6-3.1UP with Xfree86 v3. So from this and past tests it appears the solid locks occur regardless of versions of kernel, kernel mode (up/smp), and X. I am assuming the solids are my hw? However to-date, since using Xfree86 v3, zero X reboots. As of 8-17-01 I've upgraded to rawhide 2.4.7-2 and testing UP and SMP with Xfree86 v3. This kernel does appear to have corrected the X re-painting+sluggishness problem. One additional core dump - from Netscape - says BUS ERROR - any clues on this? [phb@boaz phb]$ gdb --core core GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux". Core was generated by `/usr/lib/netscape/netscape-communicator -irix-session-management /usr/share/doc'. Program terminated with signal 7, Bus error. #0 0x40249801 in ?? () (gdb) bt #0 0x40249801 in ?? () #1 0x40249648 in ?? () #2 0x0877f763 in ?? () #3 0x087c8f89 in ?? () #4 0x087b3b1d in ?? () #5 0x08934d06 in ?? () #6 0x087b4b4d in ?? () #7 0x087b4beb in ?? () #8 0x0893f53c in ?? () #9 0x0893f57b in ?? () #10 0x0893f5e4 in ?? () (gdb) Mike, how's your Xpert@Play98 testing going with up/smp/Xfree etc??? I am beginning to sense that when Roswell hits the boxed set, that this bug 39233 is going to be closed, which would be a shame since the X-rebooting issue has not been discovered, except to say it involves only Xfree86 v4 with SMP kernels. In general, bus errors like this indicate faulty RAM. That isn't 100% conclusive, but just a generalization. This bug will not be closed prematurely if not found/fixed before final. It will remain open until I've done an adequate amount of testing to reach a conclusion. I had the xpert2000 in for 3 days, no crashes, but had to swap it out for other work. Will put it back in soon and give it a real beating. Kernel VM changes would account for better performance you're seeing, and possibly the repainting also if it was related to performance. 2.4.7-foo is a MUCH better kernel than 2.4.6 indeed. The 2.4 kernel is starting to mature nicely. Found this website speaking on netscape bus errors: http://members.ping.at/theofilu/netscape.html As many of you Linux users know, Netscape does not run well with the latest libraries. But there is a possible work around to this problem. With the next few lines I will describe what to do. Look at this as a kind of mini HOWTO. The solution on the next few lines is valid for the versions 3.x through 4.04 of Netscape Navigator and Netscape Communicator. The library who makes the trouble is called libc.so.x.x.x. In the newer library the memory management functions have changed. Now this functions check whether the freed memory was allocated prior or not. If not, a bus error occurs. Netscape has three types of errors: It tries to free never allocated memory It tries to free already freed memory again Handling the pixmaps (libXpm.so) of Motif 1.2 (used by Netscape) is not sane Now you ask yourself: Is Netscape usable on Linux? Of course! Just do the following to give Netscape the modified library and let all other programs use the normal library: What do you think, Mike??? I think Netscape bus errors are a completely different problem having nothing to do with this bug report. Please file a separate report for that if you like. This bug report is already quite lengthy, we shouldn't fork out from that into other bugs. I have two Dell machines with integrated ATI Rage Pro video which both lock up (no pings returned) while starting a Gnome session using RH 7.1 with all updates applied except the new kernel. I used a straight install (not an upgrade) of all packages (everything). As suggested above, I cured the problem with "Xconfigurator --preferxf3". The XFree86*3.3.6*rpm updates were needed. The kernel is stock RH 7.1, version 2.4.2-2. So far so good. -- John, dunlap.edu I have a similar problem, more reproduceable: ATI Mach64 - (Graphics Xpression) XFree86-4.0.3-5 under RedHat 7.1 on single-processor AMD K6-III. While not SMP, this bug report 39233 seems closest to my symptoms: X aborts with sig 11 when the mouse is used, within a minute or so, especially with copy/paste. Same hardware is totally stable under RedHat 6.1 XFree86 3.3.6. Have been planning to test 3.3.6 under RedHat 7.1, and am willing to send full details if this might be helpful? This bug may be the same as bugs 18449 and 46911. No solutions there, either. We are experiencing random X-server deaths on a dozen different RedHat 7.1 boxes, with six different hardware configurations. The *only* similarity between all of these boxes is that they are all SMP... different MB, different procs, different video cards, different memory. Every SMP box we have has this problem. The Rawhide kernel *seems* to make the problem less frequent, but does not solve it. For what it's worth, I have the exact same problem, with a Number 9 Revolution IV card. Also SMP, 512mb, SCSI, up2date on all erratas, and Ximian gnome installed. Installing the following rpms from the current rawhide release *appears* to fix the problem. No crashes after 1.5 weeks so far. I assume the 2.4.9 kernel is the source of the fix. This is based on the fact that running the linux-up kernel that comes with RedHat 7.1 also fixed the problem. It was only the smp kernel that comes with RedHat 7.1 that caused problems. bash-2.05-8.i386.rpm e2fsprogs-1.23-3.i386.rpm e2fsprogs-devel-1.23-3.i386.rpm filesystem-2.1.6-2.noarch.rpm kernel-smp-2.4.9-0.5.i686.rpm mkinitrd-3.2.6-1.i386.rpm setup-2.5.7-1.noarch.rpm tux-2.1.0-2.i386.rpm I installed the same packages as Greg, and I have had no problems since. Just an additional data point... I installed the latest rawhide k 2.4.9-0.18 SMP and have been up and running for several days with no problems. I also installed VMWare v3.0.0 Beta and Windose Xtra-Problems and left it running overnight with no lockups or issues. I am still running XFree86 v3.x because my Corel apps are all up and running that way - but there has been a marked decrease in hard-lockups with the 2.4.9 kernel. I believe I do have some hw issues with the bp6 dual-celeron500 motherboard, but with this newer kernel, it is so far, acting more like k 2.2.16 SMP which was nice and stable. :-) Thanks all. PS: I also updated all the RPM's that Greg mentioned. I'm pretty satisfied that this issue has been fixed, though I have not verified the XFree86 v4.x. You've indicated the problem is gone now so I am closing the bug. I've just ran 4.1.0 for a few days on an SMP system with our latest kernel erratum (2.4.9), and no problems. |