Bug 190603
Summary: | metacity when used purely with a remote X display, freezes regularly | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Yves Perrenoud <yves-redhat> |
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> |
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | david.barnwell, pablo.iranzo |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 16:04:16 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yves Perrenoud
2006-05-03 20:38:30 UTC
If you can log in to the computer with the frozen metacity, could you try and attach gdb to it and get a backtrace? Here's the backtrace from a frozen metacity process: #0 0x005117a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x005e7c6d in poll () from /lib/tls/libc.so.6 #2 0x007bcf5f in g_main_context_acquire () from /usr/lib/libglib-2.0.so.0 #3 0x007bd264 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 #4 0x0806fb7a in main () We've had exactly the same problem since switching from RHEL3 to RHEL4. Our users log in from Windows PCs using Hummingbird Exceed, and mainly use terminal sessions, nedit and emacs. A user's windows may freeze a couple of times a day. I have not been able to pinpoint the cause; the debug option in /etc/X11/gdm/gdm.conf showed nothing relevant. What the user sees is this: windows lose their borders and cannot be moved. If multiple desktops are in use, then windows from the other desktops suddenly appear. However, the terminal and edit sessions themselves still operate and you can create new windows from the command line. You can simulate the effect of this bug (for a second or two) by killing metacity on a working system. Version used: metacity-2.8.6-2.8 I've found the source of the problem from my perspective, and it doesn't seem to actually be a metacity problem. What I've pinpointed it to, is a kernel iptables problem. I have a classic iptables setup that allows all outgoing traffic unfiltered, and allows return traffic through the usual "RELATED, ESTABLISHED" state entry. metacity as well as other X processes are establishing connections from server to client, so the return traffic is allowed in through that above mentioned state iptables entry. When metacity freezes (and at this point, I'm assuming other X processes freeze as well, but the users must simply kill and restart those apps and hence have never complained... though this is pure speculation), it's because though the metacity TCP session is still established (as reported by netstat), it's iptables state entry (as reported by /proc/net/ip_conntrack) is no longer present. Somehow iptables is removing the state entry despite that the TCP session is still established (and hence no FIN or RST packet was received). The fact that return traffic is denied for metacity, is causing the app to freeze, and if you look at the traceback I provided above, it's doing a poll, hence confirming that it's waiting for traffic, traffic that will of course never come. What I've done to work around this and confirm that my troubleshooting is correct, is to add the following iptables rule (and make sure it shows up after the state matching rule, so it only hits when the state rule doesn't): iptables -A INPUT -p tcp -s 192.168.2.0/23 --sport 6000 --dport 32768: ! --syn -j ACCEPT It seems that all X processes use ports higher than 32768, so I used this to narrow it down a little in order to minimize the potential security exposure from having to add this rule. Anyway, looking at the amount of traffic hitting this rule over time, I'm seeing about 1.5% of all return X traffic being allowed through this rule and not the state matching one. Since I've applied this work around, it would seem like the metacity freezing problem is indeed gone (it's always difficult to get good information from the end users, but none of them remember it freezing lately). I've just set something up to track metacity restarts... so in a few weeks, I should be able to confirm that this is indeed the case (though I'm fairly confident that's the case already). The iptables rule I added should of course never hit (unless someone is using something like nmap to generate bogus packets, which definitely isn't what's happening here). The fact that it's hitting 1 million packets per week (1.5% of all X traffic) is a bit concerning as far as the reliability of iptables in the RHEL4 kernel goes.... I added a similar iptables rule seven days ago and we have not seen this problem since. Many thanks for the suggestion. Reassigning to kernel as should have been done a while ago, since the diagnosis looks like an iptables problem. (Sorry this didn't get taken care of at the time.) We've experiencing something similar, but afaik our problem is related with iptables restart. As per default iptables-config instructs iptables to unload modules, then reload them. If the connections where opened and allowed as per RELATED, ESTABLISHED, unloading ip_conntrack would render all conections not recorded, and being considered as new, so will be rejected with "icmp-host-prohibited" by default firewall rules... setting "IPTABLES_MODULES_UNLOAD=no" /etc/sysconfig/iptables-config should stop unloading module, and probably stop this behaviour. Can anyone confirm if their system shows something like "kernel: ip_conntrack version 2.4 (8192 buckets, 65536 max) - 228 bytes per conntrack" on /var/log/messages shortly before metacity disappearing ? Regards Pablo PD: happens on EL 5.5 too I don't use /etc/init.d/iptables to manage iptables, I do it from /etc/rc.d/rc.local by hand if you will (calling iptables directly for every rule) and hence I've disabled the "iptables" service. As a result, I'm pretty confident that the iptables related modules aren't being reloaded on me at any point, so that's not the root cause of the problem, just another scenario that can lead to TCP sessions being established without being tracked by "ip_conntrack". I'm no longer running EL4, I'm on 5.5 as well and 3% of my X traffic is still not being tracked correctly by "ip_conntrack". Hence, the iptables rule I provided in my previous comment is still the necessary workaround to make my environment usable. I'm curious to see whether the problem persists when I upgrade to EL6 in the near future. I'm wondering if this bug should be assigned to RHEL 5.5? Though I suspect the bug is still applicable to EL4, I can only confirm that it's applicable to 5.5 at this point. Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |