Bug 426597

Summary: intermittent failure on gdm login
Product: [Fedora] Fedora Reporter: Tom London <selinux>
Component: xorg-x11-drv-i810Assignee: Adam Jackson <ajax>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-05 15:05:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xorg.0.log when system failed at login
none
Xorg.0.log when system succeded at login
none
/var/log/messages from stalled session
none
/var/log/messages from 'good' (non-stalled) session none

Description Tom London 2007-12-22 20:33:46 UTC
Description of problem:
Intermittently, after providing gdm login credentials, the screen blanks, the
cursor is 'resized' (from 800x600 to 1280x1024) and then ...... nothing.

On one such run, I managed to 'Ctl-Alt-F1' and captured /var/log/Xorg.0.log and
compared it to the very next attempt (I rebooted).

The 'failing' attempt shows this in Xorg.0.log:

[root@localhost ~]# diff Xorg.0.log /var/log | more
15c15
< (==) Log file: "/var/log/Xorg.0.log", Time: Sat Dec 22 10:05:41 2007
---
> (==) Log file: "/var/log/Xorg.0.log", Time: Sat Dec 22 10:09:31 2007
1629,1821c1629,1903
< (II) AIGLX: Suspending AIGLX clients for VT switch
< (II) intel(0): Comparing regs from server start up to After LeaveVT
< (II) intel(0): DumpRegsBegin
< (II) intel(0):    VCLK_DIVISOR_VGA0: 0x00031108 (n = 3, m1 = 17, m2 = 8)
< (II) intel(0):    VCLK_DIVISOR_VGA1: 0x00031406 (n = 3, m1 = 20, m2 = 6)
< (II) intel(0):        VCLK_POST_DIV: 0x00020002 (vga0 p1 = 4, p2 = 2, vga1 p1 
= 2, p2 = 2)
< (II) intel(0):            DPLL_TEST: 0x00010001 ()
< (II) intel(0):              D_STATE: 0x00000000
< (II) intel(0):        DSPCLK_GATE_D: 0x00001000
< (II) intel(0):       RENCLK_GATE_D1: 0x00000000
< (II) intel(0):       RENCLK_GATE_D2: 0x00000000
< (II) intel(0):                SDVOB: 0x00480000 (disabled, pipe A, stall disab
led, not detected)
<<<<< SNIP >>>>>

Am attaching complete logs (failing and working)

Version-Release number of selected component (if applicable):
xorg-x11-drv-i810-2.2.0-2.fc9
xorg-x11-server-Xorg-1.4.99.1-0.13.fc9

How reproducible:
intermittent

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Tom London 2007-12-22 20:33:46 UTC
Created attachment 290298 [details]
Xorg.0.log when system failed at login

Comment 2 Tom London 2007-12-22 20:34:30 UTC
Created attachment 290299 [details]
Xorg.0.log when system succeded at login

Comment 3 Tom London 2007-12-23 17:16:17 UTC
Happened again this morning.

This time, I tried 'Ctl-Alt-BS' from the 'black screen with only the cursor
showing'. It put me back on console (e.g., same as if I had entered 'Ctl-Alt-F1').

Not sure what info to collect.  I did do 'ps algx' in this state, I insert below.

I then tried 'telinit 3' (appeared to work), followed by 'telinit 5' (it product
a completely blank monitor, nothing, not even cursor). I had to do a hard reset.

Here is output from 'ps algx' after 'Ctl-Alt-BS' and before 'telinit 3':
[root@localhost ~]# cat ps-stall.1
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:01 init [5]                                           
    2 ?        S<     0:00 [kthreadd]
    3 ?        S<     0:00 [migration/0]
    4 ?        S<     0:00 [ksoftirqd/0]
    5 ?        S<     0:00 [watchdog/0]
    6 ?        S<     0:00 [migration/1]
    7 ?        S<     0:00 [ksoftirqd/1]
    8 ?        S<     0:00 [watchdog/1]
    9 ?        S<     0:00 [events/0]
   10 ?        R<     0:00 [events/1]
   11 ?        S<     0:00 [khelper]
   67 ?        S<     0:00 [kblockd/0]
   68 ?        S<     0:00 [kblockd/1]
   71 ?        S<     0:00 [kacpid]
   72 ?        S<     0:00 [kacpi_notify]
  167 ?        S<     0:00 [cqueue/0]
  168 ?        S<     0:00 [cqueue/1]
  170 ?        S<     0:00 [ksuspend_usbd]
  175 ?        S<     0:00 [khubd]
  178 ?        S<     0:00 [kseriod]
  220 ?        S      0:00 [pdflush]
  221 ?        S      0:00 [pdflush]
  222 ?        S<     0:00 [kswapd0]
  304 ?        S<     0:00 [aio/0]
  305 ?        S<     0:00 [aio/1]
  460 ?        S<     0:00 [pccardd]
  474 ?        S<     0:00 [kpsmoused]
  534 ?        S<     0:00 [ata/0]
  535 ?        S<     0:00 [ata/1]
  536 ?        S<     0:00 [ata_aux]
  540 ?        S<     0:00 [scsi_eh_0]
  541 ?        S<     0:00 [scsi_eh_1]
  542 ?        S<     0:00 [scsi_eh_2]
  543 ?        S<     0:00 [scsi_eh_3]
  556 ?        S<     0:00 [ksnapd]
  567 ?        S<     0:00 [kjournald]
  601 ?        S<     0:00 [kauditd]
  638 ?        S<s    0:00 /sbin/udevd -d
 1200 ?        S<     0:00 [iwl3945/0]
 1204 ?        S<     0:00 [iwl3945/1]
 1325 ?        S<     0:00 [kmmcd]
 1395 ?        S<     0:00 [iwl3945]
 1471 ?        S<     0:00 [scsi_eh_4]
 1472 ?        S<     0:00 [scsi_eh_5]
 1711 ?        S<     0:00 [kmpathd/0]
 1712 ?        S<     0:00 [kmpathd/1]
 1791 ?        S<     0:00 [kjournald]
 1804 ?        Ss     0:00 /sbin/mount.ntfs-3g /dev/sda1 /mnt/windows -o rw
 1928 ?        S<     0:00 [kondemand/0]
 1929 ?        S<     0:00 [kondemand/1]
 2180 ?        Ss     0:00 /usr/sbin/restorecond
 2189 ?        S<sl   0:00 auditd
 2191 ?        S<sl   0:00 /sbin/audispd
 2216 ?        Ss     0:00 mcstransd
 2230 ?        Sl     0:00 rsyslogd -m 0
 2234 ?        Ss     0:00 rklogd -x
 2245 ?        Ssl    0:00 dbus-daemon --system
 2258 ?        Ssl    0:00 /usr/bin/python -E /usr/sbin/setroubleshootd
 2268 ?        Ss     0:00 /usr/sbin/acpid
 2277 ?        Ss     0:00 cupsd
 2288 ?        Ss     0:00 /usr/sbin/sshd
 2303 ?        Ss     0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
 2312 ?        S      0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
 2313 ?        Ssl    0:00 console-kit-daemon
 2384 ?        Ss     0:00 crond
 2392 ?        Ss     0:00 /usr/sbin/atd
 2401 ?        SLs    0:00 /usr/sbin/audio-entropyd
 2411 ?        Ss     0:00 avahi-daemon: running [linux.local]
 2412 ?        Ss     0:00 avahi-daemon: chroot helper
 2422 ?        Ss     0:00 hald
 2423 ?        S      0:00 hald-runner
 2437 ?        S      0:00 hald-addon-input: Listening on /dev/input/event4
/dev/input/event3 /dev/input/event11 /dev/input/event10 /dev/input/event1
/dev/input/event7 /dev/input/event8 /dev/input/event9
 2453 ?        S      0:00 /usr/libexec/hald-addon-cpufreq
 2454 ?        S      0:00 hald-addon-acpi: listening on acpid socket
/var/run/acpid.socket
 2476 ?        S      0:00 hald-addon-storage: polling /dev/sr0 (every 2 sec)
 2486 ?        Ss     0:00 NetworkManager
--pid-file=/var/run/NetworkManager/NetworkManager.pid
 2501 ?        Ss     0:00 NetworkManagerDispatcher
--pid-file=/var/run/NetworkManager/NetworkManagerDispatcher.pid
 2510 ?        SNs    0:00 anacron -s
 2529 ?        S      0:00 /usr/sbin/smartd -q never
 2533 ?        Ss     0:00 login -- root     
 2534 tty2     Ss+    0:00 /sbin/mingetty tty2
 2535 tty3     Ss+    0:00 /sbin/mingetty tty3
 2536 tty4     Ss+    0:00 /sbin/mingetty tty4
 2537 tty5     Ss+    0:00 /sbin/mingetty tty5
 2538 tty6     Ss+    0:00 /sbin/mingetty tty6
 2539 ?        Ss     0:00 /usr/sbin/gdm-binary -nodaemon
 2599 ?        S      0:00 /usr/libexec/gdm-simple-slave --display-id
/org/gnome/DisplayManager/Display1
 2600 ?        S      0:00 /usr/libexec/gdm-simple-slave --display-id
/org/gnome/DisplayManager/Display2
 2607 ?        S      0:00 /usr/sbin/wpa_supplicant -c
/etc/wpa_supplicant/wpa_supplicant.conf -u -f
 2675 ?        S      0:00 /usr/libexec/gdm-session-worker
 2727 ?        Sl     0:00 /usr/bin/gnome-keyring-daemon -d --login
 2728 ?        Ss     0:00 gnome-session
 2792 ?        Ss     0:00 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c
"/usr/bin/dbus-launch --exit-with-session gnome-session"
 2825 ?        S      0:00 /usr/libexec/gconfd-2 7
 2884 tty1     Ss     0:00 -bash
 2972 tty1     R+     0:00 ps agx
[root@localhost ~]# 



Comment 4 Tom London 2007-12-24 17:23:09 UTC
Happened again this morning.

I've captured /var/log/messages (with GDM debug output) from the stalled session
as well as from the reboot immediately after (that did not stall).

Attached below.

Comment 5 Tom London 2007-12-24 17:23:49 UTC
Created attachment 290353 [details]
/var/log/messages from stalled session

Comment 6 Tom London 2007-12-24 17:24:43 UTC
Created attachment 290354 [details]
/var/log/messages from 'good' (non-stalled) session

Comment 7 Tom London 2007-12-24 23:20:41 UTC
Digging into the /var/log/messages files, the behavior appears similar until
roughly here:

messages-stalled:
Dec 24 09:08:51 localhost gdm-simple-slave[2597]: DEBUG: Writing wtmp session
record to /var/log/wtmp
Dec 24 09:08:51 localhost gdm-simple-slave[2597]: DEBUG: Adding new utmp record
Dec 24 09:08:51 localhost gdm-simple-slave[2597]: DEBUG: GdmSimpleSlave: session
started
Dec 24 09:08:55 localhost gconfd (tbl-2823): starting (version 2.20.1), pid 2823
user 'tbl'
Dec 24 09:08:55 localhost gconfd (tbl-2823): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration
source at position 0
Dec 24 09:08:55 localhost gconfd (tbl-2823): Resolved address
"xml:readwrite:/home/tbl/.gconf" to a writable configuration source at position 1
Dec 24 09:08:55 localhost gconfd (tbl-2823): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.system" to a read-only configuration source
at position 2
Dec 24 09:08:55 localhost gconfd (tbl-2823): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source
at position 3
<<<< STALLS HERE>>>

While in messages-good:
Dec 24 09:16:16 localhost gdm-simple-slave[2601]: DEBUG: Writing wtmp session
record to /var/log/wtmp
Dec 24 09:16:16 localhost gdm-simple-slave[2601]: DEBUG: Adding new utmp record
Dec 24 09:16:16 localhost gdm-simple-slave[2601]: DEBUG: GdmSimpleSlave: session
started
Dec 24 09:16:19 localhost gconfd (tbl-2827): starting (version 2.20.1), pid 2827
user 'tbl'

Dec 24 09:16:19 localhost gconfd (tbl-2827): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration
source at position 0
Dec 24 09:16:19 localhost gconfd (tbl-2827): Resolved address
"xml:readwrite:/home/tbl/.gconf" to a writable configuration source at position 1
Dec 24 09:16:19 localhost gconfd (tbl-2827): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.system" to a read-only configuration source
at position 2
Dec 24 09:16:19 localhost gconfd (tbl-2827): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source
at position 3
Dec 24 09:16:24 localhost kernel: wlan0: no IPv6 routers present
Dec 24 09:16:25 localhost pulseaudio[2839]: pid.c: Stale PID file, overwriting.
Dec 24 09:16:25 localhost pulseaudio[2839]: main.c: setrlimit(RLIMIT_NICE, (31,
31)) failed: Operation not permitted
Dec 24 09:16:25 localhost pulseaudio[2839]: alsa-util.c: Cannot find fallback
mixer control "Mic".
Dec 24 09:16:27 localhost pulseaudio[2839]: module.c: Failed to load  module
"module-rtp-recv" (argument: ""): initialization failed.
Dec 24 09:16:27 localhost pulseaudio[2839]: module-gconf.c: pa_module_load() failed
Dec 24 09:16:38 localhost kernel: CPU0 attaching NULL sched-domain.
Dec 24 09:16:38 localhost kernel: CPU1 attaching NULL sched-domain.
Dec 24 09:16:38 localhost kernel: CPU0 attaching sched-domain:
<<< LOTS MORE STUFF >>>


Comment 8 Tom London 2007-12-26 16:39:01 UTC
I found newer BIOS for the X60 and installed (changelog mentioned something
about support for external monitor).

I noticed the following 'diff' of Xorg.0.log files between the before and the
after (when the system 'came up' successfully).

[root@localhost ~]# diff Xorg*ing newest | more
15c15
< (==) Log file: "/var/log/Xorg.0.log", Time: Sat Dec 22 10:09:31 2007
---
> (==) Log file: "/var/log/Xorg.0.log", Time: Wed Dec 26 08:03:54 2007
44c44
< (--) PCI: (0@0:2:1) unknown vendor (0x8086) Mobile 945GM/GMS, 943/940GML Expre
ss Integrated Graphics Controller rev 3, Mem @ 0xee180000/0
---
> (--) PCI: (0@0:2:1) unknown vendor (0x8086) Mobile 945GM/GMS/GME, 943/940GML E
xpress Integrated Graphics Controller rev 3, Mem @ 0xee180000/0
343c343
< (II) intel(0):                 CR0f: 0x70
---
> (II) intel(0):                 CR0f: 0x20
976c976
< (II) intel(0):                 CR0f: 0x70
---
> (II) intel(0):                 CR0f: 0x20
1903d1902
< (II) 3rd Button detected: disabling emulate3Button
[root@localhost ~]# 

So, the 'device ID' is slightly different, and some values in the register dump
are different.

The only other change is that the iwl3945 kill switch is now set to disable WiFi.


Any of this significant?

Comment 9 Tom London 2008-01-01 21:19:42 UTC
I continue to get hit by this about 1 in 4 or 5 boots.

Is there any additional information/logs I can provide? 


Comment 10 Tom London 2008-01-07 15:01:08 UTC
Just got hit by this 5 consecutive reboots.

Can I provide any additional info?

Am I misdirecting this BZ?



Comment 11 Tom London 2008-03-05 15:05:51 UTC
This looks stale to me.

Haven't seen this in quite a while.

Closing.