Bug 428813 - Kernels after -115 cause X to malfunction
Kernels after -115 cause X to malfunction
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
i686 Linux
low Severity low
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
: 428043 428308 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-01-15 07:56 EST by Josh Boyer
Modified: 2008-02-11 19:03 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-02-11 19:03:10 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Xorg.0.log from "good" boot (37.52 KB, text/plain)
2008-01-17 08:40 EST, Josh Boyer
no flags Details
Xorg.0.log from "bad" boot (35.92 KB, text/plain)
2008-01-17 08:41 EST, Josh Boyer
no flags Details
lsmod output right before X starts from a bad boot (2.82 KB, text/plain)
2008-01-17 11:49 EST, Josh Boyer
no flags Details
lsmod output right before X starts from a good boot (2.75 KB, text/plain)
2008-01-17 11:49 EST, Josh Boyer
no flags Details
Xorg.0.log from failed 0.164 boot (36.36 KB, text/plain)
2008-01-23 07:58 EST, Josh Boyer
no flags Details

  None (edit)
Description Josh Boyer 2008-01-15 07:56:53 EST
Description of problem:

Rawhide kernels after -115 cause X to display like the following upon startup:

http://jwboyer.fedorapeople.org/01-14-08_1613.jpg

Version-Release number of selected component (if applicable):

-118, -133, -136, -150 all failed

How reproducible:

Always on this particular machine

Steps to Reproduce:
1.  Install kernel
2.  Reboot
3.  Watch it become pretty like zebra
  
Actual results:

Ew

Expected results:

Can actually login

Additional info:

I recall that the DRM stuff in the kernel has gotten an update so I wondered if
that was causing a problem.  Kyle McMartin and I were discussing various changes
and he pointed out that the only thing that changed between -115 and -118 was
the addition of:

linux-2.6-drm-add-i915-radeon-mdt.patch

So last night I built a kernel out of CVS with that patch commented out and
things work fine:

Linux vader.jdub.homelinux.org 2.6.24-0.153.rc7.git5.fc9 #1 SMP Mon Jan 14
20:49:39 EST 2008 i686 i686 i386 GNU/Linux

Seems that patch causes badness.  How/why, I have no idea.
Comment 1 Josh Boyer 2008-01-15 12:10:09 EST
I should add that this is a z60m Thinkpad with an ATI Radeon X600 Mobility chipset.

http://www.smolts.org/show?UUID=63cc2d27-4185-4cc2-bae6-87042b715610
Comment 2 Josh Boyer 2008-01-16 19:29:12 EST
-155 still has this problem.
Comment 3 Josh Boyer 2008-01-17 07:58:40 EST
lsmod output from good kernel:

[jwboyer@vader ~]$ sudo /sbin/lsmod
Module                  Size  Used by
michael_mic             6144  6 
arc4                    5760  6 
ecb                     6528  6 
blkcipher               9220  1 ecb
ieee80211_crypt_tkip    12672  3 
radeon                119328  0 
drm                   118516  1 radeon
rfcomm                 34208  0 
l2cap                  23056  9 rfcomm
autofs4                21148  2 
sunrpc                155460  3 
ipv6                  235972  22 
cpufreq_ondemand       10524  1 
acpi_cpufreq           12044  1 
dm_mirror              22440  0 
dm_multipath           19240  0 
dm_mod                 50988  2 dm_mirror,dm_multipath
uinput                 10896  0 
mmc_block              14220  0 
nsc_ircc               18224  0 
thinkpad_acpi          47508  0 
hwmon                   6276  1 thinkpad_acpi
irda                  104044  1 nsc_ircc
crc_ccitt               5760  1 irda
rtc_cmos               10784  0 
ipw2200               135848  0 
pcspkr                  6400  0 
ieee80211              30800  1 ipw2200
sdhci                  18060  0 
mmc_core               44828  2 mmc_block,sdhci
firewire_ohci          18952  0 
firewire_core          37704  1 firewire_ohci
ieee80211_crypt         8320  2 ieee80211_crypt_tkip,ieee80211
i2c_i801               12176  0 
crc_itu_t               5760  1 firewire_core
iTCO_wdt               14036  0 
i2c_core               21392  1 i2c_i801
iTCO_vendor_support     6916  1 iTCO_wdt
hci_usb                16820  2 
bluetooth              49444  7 rfcomm,l2cap,hci_usb
snd_hda_intel         276588  4 
battery                14860  0 
snd_seq_dummy           6532  0 
ac                      8324  0 
snd_seq_oss            30480  0 
snd_seq_midi_event      9736  1 snd_seq_oss
snd_seq                46296  5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
video                  19992  0 
output                  6656  1 video
snd_seq_device         10132  3 snd_seq_dummy,snd_seq_oss,snd_seq
snd_pcm_oss            37536  0 
snd_mixer_oss          16776  2 snd_pcm_oss
snd_pcm                64940  2 snd_hda_intel,snd_pcm_oss
snd_timer              20892  2 snd_seq,snd_pcm
button                 10256  0 
snd_page_alloc         11272  2 snd_hda_intel,snd_pcm
snd_hwdep              10380  1 snd_hda_intel
tg3                   103716  0 
snd                    45380  15
snd_hda_intel,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer,snd_hwdep
soundcore               9568  2 snd
sr_mod                 17572  0 
sg                     33176  0 
cdrom                  33440  1 sr_mod
ahci                   25860  0 
ata_piix               17284  4 
ata_generic             8836  0 
pata_acpi               8704  0 
libata                129200  4 ahci,ata_piix,ata_generic,pata_acpi
sd_mod                 27136  5 
scsi_mod              126508  4 sr_mod,sg,libata,sd_mod
ext3                  112688  3 
jbd                    43604  1 ext3
mbcache                10240  1 ext3
uhci_hcd               23576  0 
ohci_hcd               22932  0 
ehci_hcd               32148  0 
Comment 4 Josh Boyer 2008-01-17 08:38:37 EST
I'll have to get the lsmod output from a bad kernel once dhclient stops
segfaulting in rawhide.

One thing I did notice is that the good kernel doesn't seem to load the radeon
or drm modules before X is started, whereas they are in the bad kernel.

Another thing to point out is that using a kernel that works, the X log shows
that it can't open any of the drm devices and DRI is disabled.  Looking back at
the X output from a kernel that fails, DRI is found and used.
Comment 5 Josh Boyer 2008-01-17 08:40:50 EST
Created attachment 291994 [details]
Xorg.0.log from "good" boot

The X log from a good boot where things work and I don't get dirty zebra
stripes.
Comment 6 Josh Boyer 2008-01-17 08:41:27 EST
Created attachment 291995 [details]
Xorg.0.log from "bad" boot

X log from the dirty zebra kind of boot
Comment 7 Josh Boyer 2008-01-17 11:48:02 EST
I added an initscript that is the last thing to run in runlevel 5 before X is
started.  It just runs lsmod and dumps it to a file.  I'll attached the output
of that from good and bad boots
Comment 8 Josh Boyer 2008-01-17 11:49:10 EST
Created attachment 292031 [details]
lsmod output right before X starts from a bad boot
Comment 9 Josh Boyer 2008-01-17 11:49:34 EST
Created attachment 292032 [details]
lsmod output right before X starts from a good boot
Comment 10 Josh Boyer 2008-01-17 11:57:45 EST
The lsmod output in comment #3 was from the -153 kernel _after_ X was running.

To clarify, all of the output from the "good" boots are on the -153 kernel I
built without linux-2.6-drm-add-i915-radeon-mdt.patch included.
Comment 11 Josh Boyer 2008-01-17 13:46:39 EST
For shits n giggles, I did the following on the -153 kernel:

init 3
rmmod radeon drm
modprobe radeon
init 5

X started and hung in the same way.  So it seems the only thing "good" about the
good kernel is that it has some kind of timing window that prevents the drm
devices from working and therefore DRI becomes disabled.
Comment 12 Josh Boyer 2008-01-17 15:11:11 EST
I'm talking to myself...  I'm so alone...
Comment 13 Mike Chambers 2008-01-17 23:18:21 EST
Experiencing the same thing on a desktop machine, with X1300 card.  Happened as
well on the .13 kernels in F8, as well as rawhide.
Comment 14 Kyle McMartin 2008-01-17 23:27:31 EST
Mike, can you try 0.155 or higher? Dave Airlie included some fixes which should
resolve issues on a X1300. Possibly the same bug as Josh is still biting on top
of the r500 issue though.

cheers, Kyle
Comment 15 Mike Chambers 2008-01-17 23:42:36 EST
Neither .155 nor .157 worked, same thing.  I konw the fix your talking about for
r500, and the kernel with that fix in F8 didn't work neither.
Comment 16 Josh Boyer 2008-01-19 10:40:51 EST
Thinking about this more, I'm not sure -115 worked correctly either.  It may
just be getting by on the same race that my -153 kernel is.

Actually, I'm pretty sure that's the case because if I log out of gnome I get
the hung X thing as well.  The reason I didn't think of that earlier is that gdm
is known to be in rough shape, so I was just blaming that.

I think we need to look back further for what's causing this.  I'll try and get
some time to try kernels older than -115 to see which one actually does work.
Comment 17 Terje Røsten 2008-01-22 06:58:12 EST
I too has this problem, I am stuck with 107 kernel (which works very fine,
suspend to ram etc. works).

I did a diff between "good" and "bad" Xorg.0.log and the result is similar to
diff between comment #5 and #6.

What's strange - as Josh noted in comment #4 - is that the "bad" kernel 
seems to understand more of the hardware than the "good" kernel.
E.g. dri seems to be enabled in "bad" kernel and disabled in "good" kernel.

Maybe some other piece of code get confused when things are working in a lower
layer?







Comment 18 Kyle McMartin 2008-01-22 09:31:42 EST
ajax reverted the patch which was causing radeon and intel to use
MODULE_DEVICE_TABLE. it's in build 0.164, please try it and confirm things are
at least working again for you. cheers, kyle
Comment 19 Josh Boyer 2008-01-22 11:59:58 EST
(In reply to comment #18)
> ajax reverted the patch which was causing radeon and intel to use
> MODULE_DEVICE_TABLE. it's in build 0.164, please try it and confirm things are
> at least working again for you. cheers, kyle

Erm, that's what my -153 kernel did.  I think it will allow people to log in,
but I don't think DRI will be used, so it's more of a workaround than a fix. 
I'll try it when I get home tonight.
Comment 20 Josh Boyer 2008-01-23 07:48:26 EST
Still fails with kernel-2.6.24-0.164.rc8.git4.fc9 and libdrm-2.4.0-0.3.fc9
Comment 21 Josh Boyer 2008-01-23 07:58:15 EST
Created attachment 292632 [details]
Xorg.0.log from failed 0.164 boot
Comment 22 Josh Boyer 2008-01-23 08:01:21 EST
Worked around this by moving /usr/lib/xorg/extensions/libdri.so to libdri_broken.so.

suck.
Comment 23 Matěj Cepl 2008-01-24 10:53:35 EST
yup, that works. Thanks Josh.
Comment 24 Matěj Cepl 2008-01-24 19:29:12 EST
*** Bug 428308 has been marked as a duplicate of this bug. ***
Comment 25 Matěj Cepl 2008-01-25 08:46:12 EST
*** Bug 428043 has been marked as a duplicate of this bug. ***
Comment 26 Josh Boyer 2008-02-11 16:57:18 EST
I was able to boot kernel 2.6.24.1-26.fc9 on the z60m with DRI today
Comment 27 Tom London 2008-02-11 18:00:29 EST
Wahooo!  compiz starts with kernel-2.6.24.1-28.fc9.i686 !!!!!

Seem to get some screen "flash" every 40 seconds or so, but "wobbly windows"
work again!
Comment 28 Tom London 2008-02-11 18:33:24 EST
40-42 second flashing stopped when I restarted X.

Note You need to log in before you can comment on or make changes to this bug.