Bug 190603

Summary:	metacity when used purely with a remote X display, freezes regularly
Product:	Red Hat Enterprise Linux 4	Reporter:	Yves Perrenoud <yves-redhat>
Component:	kernel	Assignee:	Red Hat Kernel Manager <kernel-mgr>
Status:	CLOSED WONTFIX	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.0	CC:	david.barnwell, pablo.iranzo
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-06-20 16:04:16 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yves Perrenoud 2006-05-03 20:38:30 UTC

Description of problem:

I've got a setup where 4 users use the default gnome desktop from X terminals
(based on ThinStations and PXES), and hence everything is remote X. So we're
talking XDMCP requests to gdm which then serves up the login prompt, etc.. so
everything is remote, including the window manager. Obviously, they're all using
the same system.

These users mostly stick to using Thunderbird, Firefox and OpenOffice. Anyway,
what's happening is that metacity freezes/hangs quite regularly (it all depends,
some users get it twice per day, some once every other day... no specific
pattern I can discern). By freeze, I mean freeze, it's as if the process had
received a SIGSTOP, literally. Now to fix this, I simply kill the frozen
metacity, at which point gnome-session (I believe) restarts metacity and
everything returns to normal.

I'm actually using a little tool called "emap" that I've configured to kill
metacity on a specific key sequence so that the users can unfreeze themselves
when this happens by simply hitting the key combo.

This has been occurring for months now. Given my work around, I've almost
forgotten about it and thus never got round to filing a bug, but having to
educate a new user in the last few days about the problem reminded me it was
time to file this.

I don't personally use this setup as my desktop, I use Fedora Core as my
desktop, not RHEL4, but I did run FC3 for quite some time, which used exactly
the same version of metacity as RHEL4, and without X actually going over the
network in my case, metacity has never had a problem. So this seems to be
something directly related to metacity being remotely displayed.

The system this is occurring on is fully up to date and the network between the
X terminals and the server, is a perfectly healthy 100Mb/s Ethernet (all systems
are on the same subnet as a matter of fact). The X terminals are made up of all
sorts of different hardware, and it's occurring on all of them.

Version-Release number of selected component (if applicable):

metacity-2.8.6-2.8

How reproducible:

It occurs daily, but other than using the system for several hours, there's
nothing I've been able to pinpoint that specifically triggers the problem. When
this happens, it seems users are often in Thunderbird, but that may just be
because they spend most of their time using Thunderbird.

Steps to Reproduce:
1. Use system purely over a remote X setup for several hours
2.
3.
  
Actual results:

complete and unrecoverable (without killing process) freeze of metacity

Expected results:

No freezing.

Additional info:

Comment 1 Søren Sandmann Pedersen 2006-05-04 16:16:07 UTC

If you can log in to the computer with the frozen metacity, could you try and
attach gdb to it and get a backtrace?

Comment 2 Yves Perrenoud 2006-05-17 22:25:44 UTC

Here's the backtrace from a frozen metacity process:

#0  0x005117a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x005e7c6d in poll () from /lib/tls/libc.so.6
#2  0x007bcf5f in g_main_context_acquire () from /usr/lib/libglib-2.0.so.0
#3  0x007bd264 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#4  0x0806fb7a in main ()

Comment 3 David Barnwell 2006-12-05 16:04:51 UTC

We've had exactly the same problem since switching from RHEL3 to RHEL4. Our
users log in from Windows PCs using Hummingbird Exceed, and mainly use terminal
sessions, nedit and emacs. 

A user's windows may freeze a couple of times a day. I have not been able to
pinpoint the cause; the debug option in /etc/X11/gdm/gdm.conf showed nothing
relevant.

What the user sees is this: windows lose their borders and cannot be moved. If
multiple desktops are in use, then windows from the other desktops suddenly
appear. However, the terminal and edit sessions themselves still operate and you
can create new windows from the command line. You can simulate the effect of
this bug (for a second or two) by killing metacity on a working system.

Version used: metacity-2.8.6-2.8

Comment 4 Yves Perrenoud 2006-12-05 21:43:46 UTC

I've found the source of the problem from my perspective, and it doesn't seem to
actually be a metacity problem. What I've pinpointed it to, is a kernel iptables
problem.

I have a classic iptables setup that allows all outgoing traffic unfiltered, and
allows return traffic through the usual "RELATED, ESTABLISHED" state entry.
metacity as well as other X processes are establishing connections from server
to client, so the return traffic is allowed in through that above mentioned
state iptables entry. When metacity freezes (and at this point, I'm assuming
other X processes freeze as well, but the users must simply kill and restart
those apps and hence have never complained... though this is pure speculation),
it's because though the metacity TCP session is still established (as reported
by netstat), it's iptables state entry (as reported by /proc/net/ip_conntrack)
is no longer present. Somehow iptables is removing the state entry despite that
the TCP session is still established (and hence no FIN or RST packet was received).

The fact that return traffic is denied for metacity, is causing the app to
freeze, and if you look at the traceback I provided above, it's doing a poll,
hence confirming that it's waiting for traffic, traffic that will of course
never come.

What I've done to work around this and confirm that my troubleshooting is
correct, is to add the following iptables rule (and make sure it shows up after
the state matching rule, so it only hits when the state rule doesn't):

iptables -A INPUT -p tcp -s 192.168.2.0/23 --sport 6000 --dport 32768: ! --syn
-j ACCEPT

It seems that all X processes use ports higher than 32768, so I used this to
narrow it down a little in order to minimize the potential security exposure
from having to add this rule.

Anyway, looking at the amount of traffic hitting this rule over time, I'm seeing
about 1.5% of all return X traffic being allowed through this rule and not the
state matching one. Since I've applied this work around, it would seem like the
metacity freezing problem is indeed gone (it's always difficult to get good
information from the end users, but none of them remember it freezing lately).
I've just set something up to track metacity restarts... so in a few weeks, I
should be able to confirm that this is indeed the case (though I'm fairly
confident that's the case already).

The iptables rule I added should of course never hit (unless someone is using
something like nmap to generate bogus packets, which definitely isn't what's
happening here). The fact that it's hitting 1 million packets per week (1.5% of
all X traffic) is a bit concerning as far as the reliability of iptables in the
RHEL4 kernel goes....

Comment 5 David Barnwell 2006-12-13 09:50:34 UTC

I added a similar iptables rule seven days ago and we have not seen this problem
since. Many thanks for the suggestion.

Comment 6 Owen Taylor 2009-10-19 19:15:47 UTC

Reassigning to kernel as should have been done a while ago, since the diagnosis looks like an iptables problem. (Sorry this didn't get taken care of at the time.)

Comment 7 Pablo Iranzo Gómez 2010-11-16 10:02:39 UTC

We've experiencing something similar, but afaik our problem is related with iptables restart.

As per default iptables-config instructs iptables to unload modules, then reload them.

If the connections where opened and allowed as per RELATED, ESTABLISHED, unloading ip_conntrack would render all conections not recorded, and being considered as new, so will be rejected with "icmp-host-prohibited" by default firewall rules...

setting "IPTABLES_MODULES_UNLOAD=no" /etc/sysconfig/iptables-config should stop unloading module, and probably stop this behaviour.

Can anyone confirm if their system shows something like "kernel: ip_conntrack version 2.4 (8192 buckets, 65536 max) - 228 bytes per conntrack" on /var/log/messages shortly before metacity disappearing ?

Regards
Pablo

PD: happens on EL 5.5 too

Comment 8 Yves Perrenoud 2010-11-19 00:14:30 UTC

I don't use /etc/init.d/iptables to manage iptables, I do it from /etc/rc.d/rc.local by hand if you will (calling iptables directly for every rule) and hence I've disabled the "iptables" service. As a result, I'm pretty confident that the iptables related modules aren't being reloaded on me at any point, so that's not the root cause of the problem, just another scenario that can lead to TCP sessions being established without being tracked by "ip_conntrack".

I'm no longer running EL4, I'm on 5.5 as well and 3% of my X traffic is still not being tracked correctly by "ip_conntrack". Hence, the iptables rule I provided in my previous comment is still the necessary workaround to make my environment usable. I'm curious to see whether the problem persists when I upgrade to EL6 in the near future.

I'm wondering if this bug should be assigned to RHEL 5.5? Though I suspect the bug is still applicable to EL4, I can only confirm that it's applicable to 5.5 at this point.

Comment 9 Jiri Pallich 2012-06-20 16:04:16 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.