72295 – gdm kills network connectivity on X restart

Bug 72295 - gdm kills network connectivity on X restart

Summary: gdm kills network connectivity on X restart

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	gdm
Sub Component:
Version:	8.0
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Owen Taylor
QA Contact:	Mike McLean
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	67218 79579
TreeView+	depends on / blocked

Reported:	2002-08-22 19:47 UTC by Mark Cooke
Modified:	2005-10-31 22:00 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-02-04 21:29:14 UTC
Embargoed:

Attachments	(Terms of Use)

Description Mark Cooke 2002-08-22 19:47:03 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020809

Description of problem:
If you 'CTRL | ALT | Backspace' from gdm or whilst still logged in, it restarts
gdm , but kills any network connectivity, thus you
have to issue '/etc/init.d/network restart'.

This is probably a gdm issue (but not sure), as I had this using gdm on Gnome2
on Valhalla as well, and It also happens on 2 other pc's, both completely
different hardware and network cards.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.CTRL | ALT | Backspace whilst logged in or whislt gsdm is up


Actual Results:  kills any network connectivity, unable to ping, or gain access
to any other pc on network (kill connectivity to NIS server and NFS shares and
so forth)

Expected Results:  restart X cleanly without killing network

Additional info:

Taken from /var/log/messages

-------------------- cut ----------------------

Aug 22 20:05:25 stimpy gdm[868]: (child 915) gdm_slave_xioerror_handler:
Fatal X error - Restarting :0
Aug 22 20:05:28 stimpy netfs: Unmounting NFS filesystems:  succeeded
Aug 22 20:05:30 stimpy network: Shutting down interface eth0:  succeeded
Aug 22 20:05:30 stimpy network: Shutting down loopback interface: 
succeeded
Aug 22 20:05:31 stimpy /etc/hotplug/net.agent: NET unregister event not
supported
Aug 22 20:05:31 stimpy apmd[596]: User Suspend
Aug 22 20:05:31 stimpy kernel: apm: suspend was vetoed.
Aug 22 20:05:32 stimpy gdm[6205]: gdm_slave_xioerror_handler: Fatal X
error - Restarting :0

-------------------- cut ----------------------

Comment 1 George Lebl 2002-08-23 01:01:22 UTC

Damnit.  This looks like gdm is killing something quite random or some such.  I
need to check what's happening there.  There could be a race in the xioerror
handler stuff.

Comment 2 Havoc Pennington 2002-08-23 01:26:23 UTC

As a datapoint, does not happen for me on my test machine.

Comment 3 George Lebl 2002-08-23 06:42:35 UTC

Does not happen here either though I may have fixed two minor problems that
could perhaps avoid some races (I couldn't find races, but the fact that
gdm_slave_xioerror_handler got called twice and once within a signal since the
message has been proxied through the protocol tells me that something went
horribly wrong somewhere).  Can you (The reporter) turn on debugging
(debug/Enable=true in the config file) and then give me what it dumps into the
syslog.

Comment 4 George Lebl 2002-08-23 07:47:00 UTC

Damnit I keep finding flaws and subtle races.  Though none that could explain
the above.  I don't think.  A very fun race can happen at times though forcing
gdm to do kill (0, whatever) which would explain the above if not run with
setsid, and this happens currently when run in -nodaemon.  I can avoid these
races and
I've now fixed all such kills in the code by pushing a block to SIGCHLD.  I'm
testing currently and will commit the cleanup to CVS soon.

Comment 5 George Lebl 2002-08-23 07:54:29 UTC

If redhat wants a quick fix which doesn't fix but should minimize effects of
this, they should add setsid in gdm.c, in the branch of the if statement before
we cann gdm_daemonize.  Then gdm will run in it's own session and a kill(0,...)
will just kill gdm itself rather then random other stuff that may be in the
process group.

Comment 6 Mark Cooke 2002-08-23 08:33:54 UTC

No problem, will enable debugging and report back the output.

Comment 7 Mark Cooke 2002-08-23 18:40:35 UTC

Debugging information for gdm taken from /var/log/messages, 
This is taken straight after using CTRL | ALT | Backspace
-----------------------------------------------------------

Aug 23 19:31:35 stimpy gdm[871]: (child 976) gdm_slave_child_handler
Aug 23 19:31:35 stimpy gdm[871]: (child 976) gdm_slave_child_handler: 1170 died
Aug 23 19:31:35 stimpy gdm[871]: (child 976) gdm_slave_child_handler: 1170
returned 1
Aug 23 19:31:35 stimpy gdm[871]: (child 976) gdm_slave_xioerror_handler: I/O
error for display :0
Aug 23 19:31:35 stimpy gdm[871]: (child 976) gdm_slave_xioerror_handler: Fatal X
error - Restarting :0
Aug 23 19:31:35 stimpy gdm[871]: (child 976) gdm_server_stop: Server for :0
going down!
Aug 23 19:31:35 stimpy gdm[871]: (child 976) gdm_server_stop: Killing server pid 977
Aug 23 19:31:35 stimpy gdm[871]: mainloop_sig_callback: Got signal 17
Aug 23 19:31:35 stimpy gdm[871]: gdm_cleanup_children: child 976 returned 2
Aug 23 19:31:35 stimpy gdm[871]: gdm_child_action: Slave process returned 2
Aug 23 19:31:35 stimpy gdm[871]: gdm_display_manage: Managing :0
Aug 23 19:31:35 stimpy gdm[871]: Resetting counts for loop of death detection,
90 seconds elapsed.
Aug 23 19:31:35 stimpy gdm[871]: gdm_display_manage: Forked slave: 1866
Aug 23 19:31:35 stimpy gdm[871]: (child 976) gdm_server_stop: Server pid 977 dead
Aug 23 19:31:35 stimpy gdm[871]: main: Exited main loop
Aug 23 19:31:35 stimpy gdm[1866]: gdm_slave_start: Starting slave process for :0
Aug 23 19:31:35 stimpy gdm[1866]: gdm_slave_start: Loop Thingie
Aug 23 19:31:35 stimpy gdm[1866]: Sending VT_NUM == -1 for slave 1866
Aug 23 19:31:35 stimpy gdm[1866]: Sending VT_NUM 1866 -1
Aug 23 19:31:35 stimpy gdm[871]: Handling message: 'VT_NUM 1866 -1'
Aug 23 19:31:35 stimpy gdm[871]: Got VT_NUM == -1
Aug 23 19:31:35 stimpy gdm[871]: (child 1866) gdm_slave_usr2_handler: :0 got
USR2 signal
Aug 23 19:31:35 stimpy gdm[1866]: gdm_server_start: :0
Aug 23 19:31:35 stimpy gdm[1866]: gdm_auth_secure_display: Setting up access for :0
Aug 23 19:31:35 stimpy gdm[1866]: gdm_auth_secure_display: Setting up socket access
Aug 23 19:31:35 stimpy gdm[1866]: gdm_auth_secure_display: Setting up network access
Aug 23 19:31:35 stimpy gdm[1866]: gdm_auth_secure_display: Setting up access for
:0 - 5 entries
Aug 23 19:31:35 stimpy gdm[1866]: Sending COOKIE == <secret> for slave 1866
Aug 23 19:31:35 stimpy gdm[1866]: Sending COOKIE 1866
0c6841d7f4e1baf5b7d164dc234d6e8d
Aug 23 19:31:35 stimpy gdm[871]: Handling message: 'COOKIE 1866 0c...'
Aug 23 19:31:35 stimpy gdm[871]: Got COOKIE == <secret>
Aug 23 19:31:35 stimpy gdm[871]: (child 1866) gdm_slave_usr2_handler: :0 got
USR2 signal
Aug 23 19:31:35 stimpy gdm[1866]: gdm_server_spawn: Forked server on pid 1867
Aug 23 19:31:35 stimpy gdm[1867]: gdm_server_spawn: '/usr/X11R6/bin/X :0 -auth
/var/gdm/:0.Xauth'
Aug 23 19:31:35 stimpy gdm[871]: (child 1866) gdm_server_usr1_handler: Got
SIGUSR1, server running
Aug 23 19:31:35 stimpy gdm[1866]: gdm_server_start: Before mainloop waiting for
server
Aug 23 19:31:35 stimpy gdm[1866]: gdm_server_start: After mainloop waiting for
server
Aug 23 19:31:35 stimpy gdm[1866]: gdm_server_start: Completed :0!
Aug 23 19:31:35 stimpy gdm[1866]: Sending XPID == 1867 for slave 1866
Aug 23 19:31:35 stimpy gdm[1866]: Sending XPID 1866 1867
Aug 23 19:31:35 stimpy gdm[871]: Handling message: 'XPID 1866 1867'
Aug 23 19:31:35 stimpy gdm[871]: Got XPID == 1867
Aug 23 19:31:35 stimpy gdm[1866]: gdm_slave_run: Opening display :0
Aug 23 19:31:35 stimpy gdm[871]: (child 1866) gdm_slave_usr2_handler: :0 got
USR2 signal
Aug 23 19:31:36 stimpy netfs: Unmounting NFS filesystems:  succeeded
Aug 23 19:31:38 stimpy network: Shutting down interface eth0:  succeeded
Aug 23 19:31:38 stimpy network: Shutting down loopback interface:  succeeded
Aug 23 19:31:38 stimpy /etc/hotplug/net.agent: NET unregister event not supported
Aug 23 19:31:39 stimpy apmd[599]: User Suspend
Aug 23 19:31:39 stimpy kernel: apm: suspend was vetoed.
Aug 23 19:31:40 stimpy gdm[1866]: Sending START_NEXT_LOCAL
Aug 23 19:31:40 stimpy gdm[871]: Handling message: 'START_NEXT_LOCAL'
Aug 23 19:31:40 stimpy gdm[1866]: gdm_slave_greeter: Running greeter on :0
Aug 23 19:31:40 stimpy gdm[1866]: gdm_slave_greeter: Greeter on pid 2060
Aug 23 19:31:40 stimpy gdm[1866]: Sending GREETPID == 2060 for slave 1866
Aug 23 19:31:40 stimpy gdm[1866]: Sending GREETPID 1866 2060
Aug 23 19:31:40 stimpy gdm[871]: (child 1866) gdm_slave_child_handler
Aug 23 19:31:40 stimpy gdm[871]: Handling message: 'GREETPID 1866 2060'
Aug 23 19:31:40 stimpy gdm[871]: Got GREETPID == 2060
Aug 23 19:31:40 stimpy gdm[871]: (child 1866) gdm_slave_usr2_handler: :0 got
USR2 signal
Aug 23 19:31:44 stimpy gdm[1866]: gdm_slave_wait_for_login: In loop

Comment 8 George Lebl 2002-08-23 20:48:34 UTC

Wow I've been staring at the code and fixing problems but I can't see the
precise problem.  I think it just may be one of the races I've recently fixed in
CVS.  Can you try out of CVS?  If not can you try a tarball?  I'll make .9
version soon (maybe today or tommorrow).

For RedHat: the setsid is wrong where I put it before.  If in -nodaemon it shoul
dnot do setsid (I'm a dumbass).  But the slave should do it when it forks.  Note
that the slave fork is in display.c and not slave.c.  This should also remove
some minor races when you change init levels in case you start gdm from init. 
The main daemon doesn't really try to kill so many things so I doubt the error
was there.  If you do a setsid in the slave, at worst the slave kills itself, so
I'd recommend this one-liner-fix in case redhat won't want to update to .9 now.

Comment 9 Havoc Pennington 2002-08-25 17:52:19 UTC

I put in george's fix in 2.4.0.7-5

segfault: as we haven't reproduced the problem would appreciate you checking
whether this fixes it when the new gdm shows up in rawhide. please close/reopen
the bug accordingly.

Comment 10 Mark Cooke 2002-08-25 18:19:23 UTC

Will check out rawhide, but I can reporduce it on at least 6 test machines now.

In case this affects anything, they are all using NFS mounts (/pub/....) and NIS
for passwds (for users).
Will report back when 2.4.0.7-5 comes available in rawhide.

Comment 11 Havoc Pennington 2002-08-25 18:51:58 UTC

Hmm, let's speed this up in fact. ;-)
http://people.redhat.com/~hp/gdm-2.4.0.7-5.i386.rpm
(assuming x86)

Comment 12 Mark Cooke 2002-08-25 19:34:13 UTC

Yup.. (got the rpm from Havoc's directory) fixed the network problem, 
except now for some reason, it switches off the monitor at times (not
everytime), then you just have to press Return to get gdm back.
This is livable with, but would be nice not to have to do this.

Well done everyone :-)

Comment 13 Jay Turner 2002-09-03 20:05:38 UTC

Closing out as resolved.

Comment 14 Mark Cooke 2002-10-13 04:52:48 UTC

This is still happening on the official 8.0 release, I can reproduce it on 4
machines and other members have reported the same still happening.

The rpm that I downloaded from the link Havoc mentioned above fixed it in Null,
but its come back in Psyche

Comment 15 Havoc Pennington 2002-10-13 05:52:36 UTC

Maybe it was never really fixed - the final version still seems to have the
patch that we thought fixed this.

Comment 16 Havoc Pennington 2002-12-20 01:13:11 UTC

gdm 2.4.0.12 to appear in rawhide shortly has george's full upstream fix.

Maybe check in next beta to come out what the status is.

Comment 17 Owen Taylor 2003-01-16 21:19:45 UTC

Taking gdm target/blockers.

Comment 18 Owen Taylor 2003-01-20 23:28:17 UTC

Could people test with 2.4.1.0/2.4.1.1 from Raw Hide and see if it 
still occurs?

Comment 19 Mark Cooke 2003-01-26 20:43:06 UTC

After a recompile a several rawhide packages, this now seems to be fixed on my
system anyway.

Mark

Comment 20 Owen Taylor 2003-02-04 21:29:14 UTC

Thanks for the testing.

Note You need to log in before you can comment on or make changes to this bug.