Bug 206435
Summary: | Application stuck in recv() when cluster member crashes and IP address relocates | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Eric Z. Ayers <eric.ayers> |
Component: | kernel | Assignee: | Neil Horman <nhorman> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.4 | CC: | davem, jbaron, jesse.marlin, tgraf |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-04-17 10:49:26 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eric Z. Ayers
2006-09-14 12:45:17 UTC
I suspect this needs to be assigned to the kernel, maybe. It's certainly nothing to do with cman. I thought maybe some extra step might need to be taken when relocating an IP address, but probably this is an issue in the kernel. Our kernel version: $ uname -a Linux blade01-1 2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 17:57:31 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux We have been running our product on Digital/Compaq/HP TruCluster types of clusters, HP Service Guard, Sun custers using SunCluster and Veritas without this issue. We have been running Linux 2.2 and Linux 2.4 servers for many years that communicate with a clustered server as well and I've never noticed this issue. So my hunch on this is that there must be something different about Linux 2.6 or linux 2.6 X86_64 that is causing this situation. Patrick asked me if we had set setsockopt(SO_KEEPALIVE) on. The answer is no. We used to use KEEPALIVE for these types of connections (pre 1999), but they were apparently detrimental to server restarts (at least, that is what my comment from sometime in 2000 says.) This problem is a showstopper for us. We don't have it in prodcution anywhere, but we won't be able to use clustering with our APP on Linux until we can find out why it behaves differently from HP, OSF, Sun in clusters. And we only rarely install our application (telecom industry) without clustering. I think it would be best to provide tcpdumps from the hanging machine (binary format please) of a hanging and a non-hanging instance. The comparison of the two dumps will help us track down where the problem is occurring. I assume you want the TCP dumps while this is happening. The situation is that we have 3 nodes in a cluster, A B C A is the 'master' node running a TCPserver task and A, B and C are running code that execute TCP clients. A shuts down due to a crash and panic(but I assume it could be for any reason) and then B and C are left with hanging processes, even while the IP address that A was acting as a server on behalf of migrates to node B or C. I'm guessing you want me to run tcpdump on B and C and then shutoff A. HOw long should I keep dumping? The processes will hang indefinitely. Note that the client processes on A, B and C are transient worker tasks and there will be many megabytes of data if I run this test. I'll try to limit it as much as possible and get you dumps ASAP. Also note that we plan to update to RHEL 5 Beta sometime soon to test GFS issues. Are you using LVS or some simmilar cluster package to cluster these nodes? If so, you could probably just get a tcpdump from the client, and that would be sufficient. You only need to dump until you determine that the connection is hung. I understand that they will be large, but if you can do a capture filter that selects only the traffic to and from the client node that should help a little, and I'll just manage whatever size they wind up being from there. Although, now that I think about it, I'm a little curious about your cluster setup. How do you expect TCP sockets to migrate between nodes in a cluster? Reading your problem a little more closely (the cluster angle didn't occur to me earlier), I don't see how socket state for tcp can be migrated between nodes. To make that work, the server cluster is going to have go through either a reset cycle on the connection, or a graceful fin/ack shutdown and connection re-establishment with the client, which means your server and client code will have to be prepared for that. Well, the dumps will tell us more about what exactly is happening during the failover, and we can know more about exactly what is going wrong. (In reply to comment #6) > Although, now that I think about it, I'm a little curious about your cluster > setup. How do you expect TCP sockets to migrate between nodes in a cluster? > Reading your problem a little more closely (the cluster angle didn't occur to me > earlier), I don't see how socket state for tcp can be migrated between nodes. > To make that work, the server cluster is going to have go through either a reset > cycle on the connection, or a graceful fin/ack shutdown and connection > re-establishment with the client, which means your server and client code will > have to be prepared for that. Eric is out of the office for the rest of the week. From what I understand we don't want to migrate the status of the socket. Whats happening right now is that the sockets are hung after the service is migrated to another node. So the processes never exit. Other unices return EOF or some other error in this case. We were thinking that the socket should no longer be valid anyway since the endpoint has moved to another node (same IP different node). Thats good if you don't want to migrate the status of the socket, since you can't do it anyway (thats the reset cycle, or the fin/ack and re-establish cycle I referred to previously). As for what a socket read will return (or if it will return) depends on exactly what packets are exchanged between the client and the server during and after the failover. The fact that you don't expect the socket to migrate is good, though. And as I think about it more, while a trace from the client will be really helpful, additional traces from cluster nodes A/B/and C using a mirrored port for traffic to A if need be, to capure after the fencing) would also be good. Also, If you could describe the cluster setup a little more, I would appreciate it (are you using LVS, or cluster suite, or another utility to do your clustering), and what the network topology between the client and the cluster nodes looks like would help me when I read the traces. Thanks! Sorry. We are using RH cluster suite. The traces will have to wait for Eric to get back next week since he had a bunch of stuff setup before when we were working with Wendy. Internode communication is on a LAN connected to a switch. There are 4 blades and a single console machine. There were some issues with our software causing a kernel panic on one of the nodes which in turn caused the service to migrate. Clients on other nodes were connected to the service and these are the processes that are actually hanging. ping, any update? We have been waiting on a stable version of GFS before getting back to testing the Red Hat cluster. We upgraded to RHEL5 and all the old bugs we had were back, so we can't even run our application anymore. ok, let me know when you get back to it then. ping? closing due to inactivity. |