Red Hat Bugzilla – Bug 428813
Kernels after -115 cause X to malfunction
Last modified: 2008-02-11 19:03:10 EST
Description of problem:
Rawhide kernels after -115 cause X to display like the following upon startup:
Version-Release number of selected component (if applicable):
-118, -133, -136, -150 all failed
Always on this particular machine
Steps to Reproduce:
1. Install kernel
3. Watch it become pretty like zebra
Can actually login
I recall that the DRM stuff in the kernel has gotten an update so I wondered if
that was causing a problem. Kyle McMartin and I were discussing various changes
and he pointed out that the only thing that changed between -115 and -118 was
the addition of:
So last night I built a kernel out of CVS with that patch commented out and
things work fine:
Linux vader.jdub.homelinux.org 2.6.24-0.153.rc7.git5.fc9 #1 SMP Mon Jan 14
20:49:39 EST 2008 i686 i686 i386 GNU/Linux
Seems that patch causes badness. How/why, I have no idea.
I should add that this is a z60m Thinkpad with an ATI Radeon X600 Mobility chipset.
-155 still has this problem.
lsmod output from good kernel:
[jwboyer@vader ~]$ sudo /sbin/lsmod
Module Size Used by
michael_mic 6144 6
arc4 5760 6
ecb 6528 6
blkcipher 9220 1 ecb
ieee80211_crypt_tkip 12672 3
radeon 119328 0
drm 118516 1 radeon
rfcomm 34208 0
l2cap 23056 9 rfcomm
autofs4 21148 2
sunrpc 155460 3
ipv6 235972 22
cpufreq_ondemand 10524 1
acpi_cpufreq 12044 1
dm_mirror 22440 0
dm_multipath 19240 0
dm_mod 50988 2 dm_mirror,dm_multipath
uinput 10896 0
mmc_block 14220 0
nsc_ircc 18224 0
thinkpad_acpi 47508 0
hwmon 6276 1 thinkpad_acpi
irda 104044 1 nsc_ircc
crc_ccitt 5760 1 irda
rtc_cmos 10784 0
ipw2200 135848 0
pcspkr 6400 0
ieee80211 30800 1 ipw2200
sdhci 18060 0
mmc_core 44828 2 mmc_block,sdhci
firewire_ohci 18952 0
firewire_core 37704 1 firewire_ohci
ieee80211_crypt 8320 2 ieee80211_crypt_tkip,ieee80211
i2c_i801 12176 0
crc_itu_t 5760 1 firewire_core
iTCO_wdt 14036 0
i2c_core 21392 1 i2c_i801
iTCO_vendor_support 6916 1 iTCO_wdt
hci_usb 16820 2
bluetooth 49444 7 rfcomm,l2cap,hci_usb
snd_hda_intel 276588 4
battery 14860 0
snd_seq_dummy 6532 0
ac 8324 0
snd_seq_oss 30480 0
snd_seq_midi_event 9736 1 snd_seq_oss
snd_seq 46296 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
video 19992 0
output 6656 1 video
snd_seq_device 10132 3 snd_seq_dummy,snd_seq_oss,snd_seq
snd_pcm_oss 37536 0
snd_mixer_oss 16776 2 snd_pcm_oss
snd_pcm 64940 2 snd_hda_intel,snd_pcm_oss
snd_timer 20892 2 snd_seq,snd_pcm
button 10256 0
snd_page_alloc 11272 2 snd_hda_intel,snd_pcm
snd_hwdep 10380 1 snd_hda_intel
tg3 103716 0
snd 45380 15
soundcore 9568 2 snd
sr_mod 17572 0
sg 33176 0
cdrom 33440 1 sr_mod
ahci 25860 0
ata_piix 17284 4
ata_generic 8836 0
pata_acpi 8704 0
libata 129200 4 ahci,ata_piix,ata_generic,pata_acpi
sd_mod 27136 5
scsi_mod 126508 4 sr_mod,sg,libata,sd_mod
ext3 112688 3
jbd 43604 1 ext3
mbcache 10240 1 ext3
uhci_hcd 23576 0
ohci_hcd 22932 0
ehci_hcd 32148 0
I'll have to get the lsmod output from a bad kernel once dhclient stops
segfaulting in rawhide.
One thing I did notice is that the good kernel doesn't seem to load the radeon
or drm modules before X is started, whereas they are in the bad kernel.
Another thing to point out is that using a kernel that works, the X log shows
that it can't open any of the drm devices and DRI is disabled. Looking back at
the X output from a kernel that fails, DRI is found and used.
Created attachment 291994 [details]
Xorg.0.log from "good" boot
The X log from a good boot where things work and I don't get dirty zebra
Created attachment 291995 [details]
Xorg.0.log from "bad" boot
X log from the dirty zebra kind of boot
I added an initscript that is the last thing to run in runlevel 5 before X is
started. It just runs lsmod and dumps it to a file. I'll attached the output
of that from good and bad boots
Created attachment 292031 [details]
lsmod output right before X starts from a bad boot
Created attachment 292032 [details]
lsmod output right before X starts from a good boot
The lsmod output in comment #3 was from the -153 kernel _after_ X was running.
To clarify, all of the output from the "good" boots are on the -153 kernel I
built without linux-2.6-drm-add-i915-radeon-mdt.patch included.
For shits n giggles, I did the following on the -153 kernel:
rmmod radeon drm
X started and hung in the same way. So it seems the only thing "good" about the
good kernel is that it has some kind of timing window that prevents the drm
devices from working and therefore DRI becomes disabled.
I'm talking to myself... I'm so alone...
Experiencing the same thing on a desktop machine, with X1300 card. Happened as
well on the .13 kernels in F8, as well as rawhide.
Mike, can you try 0.155 or higher? Dave Airlie included some fixes which should
resolve issues on a X1300. Possibly the same bug as Josh is still biting on top
of the r500 issue though.
Neither .155 nor .157 worked, same thing. I konw the fix your talking about for
r500, and the kernel with that fix in F8 didn't work neither.
Thinking about this more, I'm not sure -115 worked correctly either. It may
just be getting by on the same race that my -153 kernel is.
Actually, I'm pretty sure that's the case because if I log out of gnome I get
the hung X thing as well. The reason I didn't think of that earlier is that gdm
is known to be in rough shape, so I was just blaming that.
I think we need to look back further for what's causing this. I'll try and get
some time to try kernels older than -115 to see which one actually does work.
I too has this problem, I am stuck with 107 kernel (which works very fine,
suspend to ram etc. works).
I did a diff between "good" and "bad" Xorg.0.log and the result is similar to
diff between comment #5 and #6.
What's strange - as Josh noted in comment #4 - is that the "bad" kernel
seems to understand more of the hardware than the "good" kernel.
E.g. dri seems to be enabled in "bad" kernel and disabled in "good" kernel.
Maybe some other piece of code get confused when things are working in a lower
ajax reverted the patch which was causing radeon and intel to use
MODULE_DEVICE_TABLE. it's in build 0.164, please try it and confirm things are
at least working again for you. cheers, kyle
(In reply to comment #18)
> ajax reverted the patch which was causing radeon and intel to use
> MODULE_DEVICE_TABLE. it's in build 0.164, please try it and confirm things are
> at least working again for you. cheers, kyle
Erm, that's what my -153 kernel did. I think it will allow people to log in,
but I don't think DRI will be used, so it's more of a workaround than a fix.
I'll try it when I get home tonight.
Still fails with kernel-2.6.24-0.164.rc8.git4.fc9 and libdrm-2.4.0-0.3.fc9
Created attachment 292632 [details]
Xorg.0.log from failed 0.164 boot
Worked around this by moving /usr/lib/xorg/extensions/libdri.so to libdri_broken.so.
yup, that works. Thanks Josh.
*** Bug 428308 has been marked as a duplicate of this bug. ***
*** Bug 428043 has been marked as a duplicate of this bug. ***
I was able to boot kernel 184.108.40.206-26.fc9 on the z60m with DRI today
Wahooo! compiz starts with kernel-220.127.116.11-28.fc9.i686 !!!!!
Seem to get some screen "flash" every 40 seconds or so, but "wobbly windows"
40-42 second flashing stopped when I restarted X.