Bug 634239

Summary: black bars representing system messages always overlay current tty1 Xorg display
Product: [Fedora] Fedora Reporter: Jon Masters <jcm>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 19CC: anton, atswartz, dougsland, gansalmon, itamar, jcm, jforbes, jonathan, kernel-maint, madhu.chinakonda, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-05 16:38:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg.txt
none
working config none

Description Jon Masters 2010-09-15 15:42:40 UTC
Description of problem:

The display periodically corrupts with what appears (it appears as entirely black blocks, can't see the actual text) to be something attempting to write to TTY1. I can produce the same while X is running by simply "echo 'random crap' >/dev/tty1" in a loop, for example. I did an lsof and looked through /proc and Xorg was the *only* thing with /dev/tty1 open. I then moved the /dev/tty1 link temporarily to confirm it wasn't a script and waited until the same thing happened. Xorg has /dev/tty1 open twice, once on fd1, once on fd6.

Version-Release number of selected component (if applicable):

xorg-x11-server-Xorg-1.9.0-9.fc15.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Login to the system graphically through gdm
2. gnome-session or something starts this early on during login
3. Happens randomly afterward (some kind of debug text?)
  
Actual results:

Random crap on the display requires a repaint or moving windows to cause it to go away. Gets in the way of using the desktop.

Expected results:

No random crap on the display.

Additional info:

This system was installed without modeset, since it didn't work with the F14 Alpha install media. Then, after install of Rawhide through that media, it does correctly boot with KMS. This is an ATI Mobility FireGL V5200 graphics card:

01:00.0 VGA compatible controller: ATI Technologies Inc M56GL [Mobility FireGL V5200] (prog-if 00 [VGA controller])
        Subsystem: Lenovo ThinkPad T60p
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 46
        Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Region 1: I/O ports at 2000 [size=256]
        Region 2: Memory at ee100000 (32-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at ee120000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v1) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0300c  Data: 41a9
        Kernel driver in use: radeon
        Kernel modules: radeon

Comment 1 Jon Masters 2010-09-15 18:24:32 UTC
Logging out and in again results in Xorg running on tty7. It then has less corruption. But that is another annoying bug, that the second time you login after booting (this is not FUS - there is no second user, this is a logout/in again) you now get Xorg running on tty7 without it being told explicitly to run on tty1. So you will need to reproduce this by booting rawhide from scratch and observing what happens.

Comment 2 Jon Masters 2010-09-15 18:37:57 UTC
The second issue of X randomly moving to tty7 on the second login (after logging out from the first login), and being started without being told to run on a specific tty is here (and you'll likely re-assign that one to gdm):

https://bugzilla.redhat.com/show_bug.cgi?id=634299

Nevertheless, the corruption of the display by random characters on tty1 is still true and the concern of this bug.

Jon.

Comment 3 Jon Masters 2010-09-16 04:24:46 UTC
I think this is kernel (logging). Reason being that I had checked only Xorg had the tty1 device node open and moved the device, but neglected to consider kernel logging might be screwed up. And with DRM debugging turned on, I now see the predictable once-per-5-seconds kernel thread kicking off to prove the external device VGA connector causing screen corruption. So, either the kernel is outputting this crap to tty1 regardless, or rsyslogd is failing.

The kernel is: 2.6.36-0.21.rc4.git1.fc15.x86_64
The rsyslogd is: rsyslog-4.6.3-2.fc15.x86_64

Jon.

Comment 4 Jon Masters 2010-09-16 04:27:23 UTC
Here, you can see the gdm login prompt is nicely corrupted by the flowing "text":

http://www.flickr.com/photos/jonmasters/4994556587/

Jon.

Comment 5 Jon Masters 2010-09-16 05:02:46 UTC
http://www.youtube.com/watch?v=L4ORbAvbm9E

Comment 6 Jon Masters 2010-09-16 05:07:43 UTC
http://www.youtube.com/watch?v=8JvtEjfuNIE

Comment 7 Jon Masters 2010-09-16 05:10:31 UTC
The two videos walk you through the problems.

Comment 8 Jon Masters 2010-09-17 00:13:47 UTC
I explicitly set console=tty9 and I still see a repeatable once per 5 second two lines of black bars on the Xorg display, which is consistent with the two lines output by drm.debug=0x04 debugging. So, I think the kernel is outputting crap on tty1 no matter what, and rsyslogd is running, and it has klog open. Hmmm.

Anyone?

Comment 9 atswartz 2010-09-18 04:58:01 UTC
I was getting this same behavior in addition to: Wrong permissions on /dev/dri
https://bugzilla.redhat.com/show_bug.cgi?id=626559
with two different ati cards 3870 & 2600.  I fixed it by compiling my own kernels.

Comment 10 Jon Masters 2010-09-18 07:02:51 UTC
Confirmed that the problem is kernel oops data being displayed, but replaced by black bars. I have been able to boot this laptop only once wherein the actual oops data displayed correctly - attaching a screenshot with my custom 2.6.36-rc1 kernel build running. There are a series of nasty oopses related to this for which I am attaching the dmesg logs.

So this bug is still that we're overlaying the X session with the oops data. That's actually a good thing now we have KMS (especially if we could do it Solaris-style with color) but it's not working. Typically, the user sees only weird black bars overwriting the display that disappear when windows are moved over them. I suppose it's obvious now, but it wasn't obvious at first.

Jon.

Comment 12 Jon Masters 2010-09-18 07:14:43 UTC
Created attachment 448168 [details]
dmesg.txt

Comment 13 Chuck Ebbert 2010-09-20 15:06:11 UTC
(In reply to comment #9)
> I was getting this same behavior in addition to: Wrong permissions on /dev/dri
> https://bugzilla.redhat.com/show_bug.cgi?id=626559
> with two different ati cards 3870 & 2600.  I fixed it by compiling my own
> kernels.

With different kernel config options that you didn't specify.

Comment 14 atswartz 2010-09-20 22:05:13 UTC
Created attachment 448560 [details]
working config

I adapted a config that I use on another distro and there are many changes, so I am not sure that this will be of much use.

Comment 15 Jon Masters 2010-09-21 05:07:32 UTC
Yea. I'd love to know. There's also a long-standing RCU check failure in lockdep on boot, but that seems unrelated. Worst case, I'll bisect this. But I'm going to try Linus' latest RC first, in case it has gone away in the latest GPU updates. If it has, the bisect is hopefully smaller the other way to bisect from working to broken. We'll see. I'll keep you informed Chuck.

btw, this kernel is also horribly unstable in general. The box falls over after an hour or two and requires a hard reset. Various different oopses, panics, lockup warnings, you name it. I'll perhaps also try a config without lockdep and debugging options enabled. If it continues, I'll need to let Linus know.

Jon.

Comment 16 Jon Masters 2010-09-21 05:08:02 UTC
Ah, the config was attached but I hadn't refreshed this BZ. I'll look at it tomorrow.

Comment 17 Jon Masters 2010-09-21 05:10:20 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=626026 is the longstanding one that always hits this box with these kernels too.

Comment 18 Chuck Ebbert 2010-09-21 12:02:57 UTC
(In reply to comment #15)
> btw, this kernel is also horribly unstable in general. The box falls over after
> an hour or two and requires a hard reset. Various different oopses, panics,
> lockup warnings, you name it. I'll perhaps also try a config without lockdep
> and debugging options enabled. If it continues, I'll need to let Linus know.

It's been totally stable for me.

Comment 19 atswartz 2010-09-21 14:03:00 UTC
(In reply to comment #13)
> (In reply to comment #9)
> > I was getting this same behavior in addition to: Wrong permissions on /dev/dri
> > https://bugzilla.redhat.com/show_bug.cgi?id=626559
> > with two different ati cards 3870 & 2600.  I fixed it by compiling my own
> > kernels.
> 
> With different kernel config options that you didn't specify.

Sorry about the delay, although there was a config for 2.6.36-0.18 on the linked bug report the whole time.

Comment 20 Chuck Ebbert 2010-09-21 17:13:10 UTC
When I run the rawhide kernel on F13 I don't see any of those video artifacts.

Comment 21 Jon Masters 2010-09-21 17:55:54 UTC
Still happening with latest Linus RC. I was wrong about the instability - that was when it was booting an older kernel. I think it's just logging a lot of oops/other crap. I'll show you if you're in today as I'm headed in now. Are you around later this pm?

Jon.

Comment 22 Jon Masters 2010-09-21 20:43:48 UTC
I can see from running the latest RC that the oopses are gone. However, /var/log/messages is updated in time with the remaining corruption, so it's clearly a problem with the logging setup. I mean even for kernel it would happen if rsyslogd were for some reason still outputting on tty1 or its stdout.

Attaching a screenshot immediately after the following landed in the log:

Sep 20 16:39:16 tonnant kernel: gnome-volume-co[1723]: segfault at 7fff4a574ff8 ip 0000003032a0faa4 sp 00007fff4a575000 error 6 in libgobject-2.0.so.0.2515.0[3032a00000+4e000]
Sep 20 16:39:17 tonnant abrt[2107]: saved core dump of pid 1723 (/usr/bin/gnome-volume-control-applet) to /var/spool/abrt/ccpp-1285015156-1723.new/coredump (30703616 bytes)
Sep 20 16:39:17 tonnant abrtd: Directory 'ccpp-1285015156-1723' creation detected
Sep 20 16:39:19 tonnant abrtd: New crash /var/spool/abrt/ccpp-1285015156-1723, processing
Sep 20 16:39:19 tonnant abrtd: Registered Action plugin 'RunApp'
Sep 20 16:39:19 tonnant abrtd: RunApp('/var/spool/abrt/ccpp-1285015156-1723','test x"`cat component`" = x"xorg-x11-server-Xorg" && cp /var/log/Xorg.0.log .')
Sep 20 16:39:57 tonnant kernel: kworker/u:0 used greatest stack depth: 2976 bytes left

Jon.

Comment 24 atswartz 2010-09-22 20:31:38 UTC
(In reply to comment #20)
> When I run the rawhide kernel on F13 I don't see any of those video artifacts.

When I run the rawhide kernel (2.6.36-0.24) on F14, I do see the artifacts.  Yet all the standard f14 kernels do not produce the artifacts.

Comment 25 atswartz 2010-09-22 21:18:29 UTC
(In reply to comment #15)

> I'll perhaps also try a config without lockdep
> and debugging options enabled. If it continues, I'll need to let Linus know.
> 
> Jon.
You are on to something here.  This difference may be enough.
< # CONFIG_LOCKUP_DETECTOR is not set
< # CONFIG_HARDLOCKUP_DETECTOR is not set
> CONFIG_LOCKUP_DETECTOR=y
> CONFIG_HARDLOCKUP_DETECTOR=y

Comment 26 atswartz 2010-09-22 23:08:09 UTC
(In reply to comment #25)
> You are on to something here.  This difference may be enough.
< # CONFIG_LOCKUP_DETECTOR is not set
< # CONFIG_HARDLOCKUP_DETECTOR is not set
> CONFIG_LOCKUP_DETECTOR=y
> CONFIG_HARDLOCKUP_DETECTOR=y

No luck.  Needs more changes.  I went back to the config that was working.

Comment 27 Jon Masters 2010-09-24 17:53:38 UTC
We'll do some poking then I think.

Comment 28 atswartz 2010-10-17 00:07:54 UTC
this particular problem has been fixed for me in kernel-2.6.36-0.39.rc8.git0.fc15.x86_64.

Comment 29 Fedora End Of Life 2013-04-03 18:43:09 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19