Bug 203122
| Summary: | Applications listening on a port stop accepting connections on a XenU kernel | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Russell McOrmond <russell> |
| Component: | xen | Assignee: | Herbert Xu <herbert.xu> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 5 | CC: | bstein, katzj |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2007-03-16 14:57:58 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Russell McOrmond
2006-08-18 14:49:28 UTC
I want to note that I have seen this problem too. I've set Apache to restart ever 4 hrs so that there isn't too much down time, but I am finding this very frustrating. It only seems to happen on one virtual server that I have noticed (in my install). It is odd because Apache is running (just not listening), and it doesn't seem to affect any other ports (I can still ssh in).. Port 80 just isn't responding. Thanks for the report. I'm not aware of any existing bugs that can produce a behaviour like this so this could be something new. What kernel version are you guys using in dom0? When this problem occurs, I would like to see the output of ss -an (or netstat -ant if you dont have ss). Please also attach strace to the daemon process, do a tcpdump on the vifX.0 interface in dom0 as well as on eth0 in domU and then attempt a connection to it. In my case I am running 2.6.17-1.2174_FC5xen0 (and 2.6.17-1.2174_FC5xenU for the domU's) I've also ran previous versions with similar results. This is an extremely intermittant problem, but I will do as you suggest when it next happens. Note: 'ss' seems to be part of iproute, so is already installed on my Xen0 and XenU's. Please let me know if this still happens with 2.6.18 (2189) in FC5 testing. If it does please provide the debugging output I requested for previously. Thanks. Once the new 'xen' package is available and tested, I'm going to roll out the latest kernel to various machines. My gut feeling is that this specific problem only applied to older kernels, but it has been hard to verify due to entirely different problems with newer kernels. The xen package is now available in testing. A quick note. I am still monitoring this. While I upgraded another server to the latest kernel last week, I only upgraded my mail server earlier today. This afternoon I saw another one of those odd situations where I needed to restart the mail server. I didn't do any of the suggested debugging, but was concentrating on figuring out why email wasn't flowing. Only after I restarted and mail was flowing did I think that this would have been an opportunity for testing. While doing the 'ss -an' suggested above is easy, I don't see how I'll be able to diagnose anything with tcpdump. This is an extremely busy mail server (mail.flora.ca -- which is the primary mail server for a number of domains), which is why whatever this "race condition" is showing up at all. Any attempt to attach tcpdump will just flood me with data that I won't be able to do much with. I also don't understand the suggestion of strace, which I believe is a tool that has to be used to run the command in the first place. Is there a way to attach and do a trace on a specific processID once a specific process is identified? This bug is to intermittent to just run 'strace' on and expect to get any useful results. You can get tcpdump to write the results to a file for analysis later. Just call it with -w <filename>. As to stracing a running processes, you can use -p <pid> to attach to them. Thanks. Closing due to insufficient data. Please reopen if you are still able to reproduce and capture the requesting information. |