Bug 458712
Summary: | RH 5.2 - SCTP Messages out of order | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | William Reich <reich> | ||||||
Component: | kernel | Assignee: | Neil Horman <nhorman> | ||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.2 | CC: | nhorman | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-06-15 13:13:37 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
William Reich
2008-08-11 19:36:04 UTC
Hello, could you add as an attachment whole source code for simple server and client? I'd like to see how you initialize multi-home connection. Thanks. I'm not sure that you can do the load balancing like you do. In doc is, among other things, written that a primary address/path must be chosen and the other addresses/paths will only be used if the primary fails or is suspect. this style of load balancing works fine in Solaris 10 after Sol 10 patch 137111-02 is applied to the machine. The Solaris 10 case number was " Sun Case 65871261 " which became patch 137111-02. in reply to #2, the user is allowed to override the default/primary network selection as long as no network failures exist. If a network failure exists, then I agree that SCTP will ignore the user specified network. Created attachment 314318 [details]
code that performs BIND operation
initialization of multi-home connection - same code for listen or connect
According to teh sun patch it seems that this is kernel stuff. So I'm reassigning the bug to kernel. First question, just to make sure we don;t miss anything obviuos, is what are the values in gsctp_DefaultSendOptions and what does sctp_set_sinfo_options look like? I'd like to be certain that we aren't inadvertently setting SCTP_UNORDERED in the sinfo_flags field. That would certainly explain unordered delivery, as would some mungings of the stream id. notice that gsctp_DefaultSendOptions is not actually used. The "options" pointer gets a new value of "&send_request->options", which is data from the incoming message. This header is initialized to all zeros except for the peer_index ( which will have values rotating from 0 thru 3 and over again ). A message from my applicatio to SCTP does not ever contain the SCTP_UNORDERED in the sinfo_flags field. ( requested details below ) ++++++++++++++++++ typedef struct { sctp_header_t header; sctp_send_options_t options; int32_t peer_index; } sctp_send_t; typedef struct sctp_send_options_s { uint8_t version; uint16_t stream; uint16_t flags; uint32_t protocol; uint32_t context; } sctp_send_options_t; typedef struct sctp_header_s { uint32_t op; uint32_t error; uint32_t wait_tag; } sctp_header_t; ++++++++++++++++++++++++++ application sets the fields as . . . cntl = ( sctp_send_t * ) cntl_mp->b_rptr; cntl->header.op = SCTP_SENDTO_OP; cntl->options.version = 0; cntl->options.stream = 0; cntl->options.flags = 0; /* <==== */ cntl->options.protocol = 0; cntl->options.context = 0; cntl->peer_index = mr->peer_index++ % mr->peer_addr.ip_addr_count; . . . ++++++++++++++++++++++++++ and then inline static int sctp_set_sinfo_options(sctp_link_t *plink, struct sctp_sndrcvinfo *sinfo, const sctp_send_options_t *options) { int prev_no_delay, err; memset(sinfo, 0, sizeof(struct sctp_sndrcvinfo) ); sinfo->sinfo_ppid = htonl(options->protocol); sinfo->sinfo_stream = options->stream; sinfo->sinfo_context = options->context; /* * Map ULCM SCTP send options to LKSCTP send options. */ if ( options->flags & SCTP_UNORDERED ) { sinfo->sinfo_flags |= MSG_UNORDERED; } /* * Calling setsockopt() for each data message could * consume too much CPU time, therefore this is * only done when a new option is different from * the previous. * * WARNING: * SCTP_NODELAY, Section 7.1.5 SCTP Sockets API IETF draft, * globally disables message bundling, where as ULCM SCTP * does it on a per message basis. This approximation * is good enough to disable message bundling. */ prev_no_delay = plink->no_delay; plink->no_delay = (options->flags & SCTP_NOBUNDLE)? 1: 0; if ( prev_no_delay != plink->no_delay) { if ( setsockopt(plink->native_sctp_fd, IPPROTO_SCTP, SCTP_NODELAY, (void *)&plink->no_delay, sizeof(plink->no_delay)) == -1 ) { err = errno; ULCM_WARN("[id=%d] setsockopt(SCTP_NODELAY) failed, errno = %d", plink->id, err ); errno = err; return -1; } } return 0; } /* end sctp_set_sinfo_options */ ok, do you have tcpdumps that show this ordering problem? That would help diagnose this problem for me. Thanks! I do not have tcpdumps. All I have is my application complaining about getting messages out of order. I have the same problem is the number of networks is 4 or 2. Please take the tcpdump captures as we discussed via email and attach them here if you would. Thanks! bummer... My first attempts at running tcpdump are not fruitful. Each time I run tcpdump ( one on sender and one on receiver ), the problem does not occur. When I do not use tcpdump, the problem appears. I am using the command /usr/sbin/tcpdump -i any -s 0 -w <file> on each machine. I'll try varying the number of networks to see if I can get tcpdump to capture a failure... ( my first attempt was with 2 networks in the association. ) ok - got lucky... data files coming... Created attachment 314625 [details]
tcpdump output files ( gunzip, then tar xvf ... )
This gzip'd tar file contains the
output of tcpdump on 2 machines
The time of the failure was between 8:30:36 and 8:30:46.
The failure was reported by the receiver.
additional information for attachment 314625 [details] in comment 15: /etc/hosts file looks like this: 172.25.2.202 alderaan alderaan.ulticom.com 172.25.2.200 endor endor.ulticom.com 10.2.202.1 alderaan alderaan-a 10.2.202.129 alderaan alderaan-b 10.2.201.1 alderaan alderaan-c 10.2.201.129 alderaan alderaan-d 10.2.202.3 endor endor-a 10.2.202.131 endor endor-b 10.2.201.3 endor endor-c 10.2.201.131 endor endor-d ++++++++++ networks used are " -c, -d , -a, and -b" on each machine. The sender is alderaan The receiver is endor Based on my understanding, alderaan will think that his primary network is "-c". endor will think that his primary network is "-d" Traffic was flowing on all 4 ( -c, -d, -a, -b ) networks at the time of the error. endor ( recv1 file ) is the receiver. endor reported the error. The 172 addresses were used for xterm connections only. I've checked the receive dump and noted that we capture all sctp chunks in order (according to TSN value) so whatever the problem is its not happening on the wire or at the sender. how is your application determining that frames are not in the proper order? Is is check the stream id and stream sequence number? in answer to comment 17, I have two user space applications ( one on each machine ). The first ( sender ) is sending a series of messages to the second ( receiver ). The applications do not even know that they are on different machines. The applications have no idea that SCTP is being used. Within each message is a sequence count. When the receiver gets a message out-of-order, it complains. So, the sender is sending 30 messages numbered 1..30. The receiver is expecting to get the messages 1,2,3,... 30. However, in the test corresponding to the tcpdump data, message 26 arrived at the receiver before message 25. The application that is complaining has no knowledge of stream id and stream sequence number. Then I would either augment the sending and receiving applications to check the stream id ans stream sequence numbers. Alternatively, I would try to reproduce the problem with the sctp_darn utility or some other piece of code that I can use for a reproducer here. I checked the tcpdumps again and both the TSN values and stream sequence numbers are all ordinal and in order at the receiver. Either the application is processing the recevied frames out of order or something in the stack is re-ordering them (although I think you would be the first to see that). I would try to run the same sctp connection using sctp_darn. You can generate input data using seq to simulate the payload sequence numbers you describe above. Alternatively you can change the recvmsg call in your receiver to be sctp_recvmsg and interrogate the sctp_sndrcvinfo structure to validate that the messages are being read out of the kernel out of order. I do not see any options in sctp_darn to load share over all the networks, which is the key point in the test. Is there any source code around for sctp_darn so that a modification could be made in this area? +++++++ The testing that I am doing mirrors the test that I performed on Sol 10 sparc. ( comment #3 & #4 ) So, I feel confident that the application is not part of the trouble. "is there any source code around for sctp_darn" Yes, sctp_darn is part of the lksctp-tools package, you can download the soruces here: tp://ftp.redhat.com/pub/redhat/linux Alternatively, you can just write some throwaway code based on your application to simply do sending and receiving of some sequential data using your already existing load balancing code. That way if you can reproduce you can send me the code to attempt a reproduction here. " feel confident that the application is not part of the trouble." I understand why you feel that way, yet the evidence that we have in this bug at the moemnt contradicts that assertion. There is obviously still room for discrepancy in that the stack might be re-ordering frames (although I really think its unlikely). Thats what this test is meant to verify however. We have a tcpdump showing a stream of sctp data getting received in sequence order. If we can show that those same sequence numbers in the sctp chunk headers arrive at the receving application in a different order, we can conclude that something in the stack re-ordered them. I'll try to hack sctp_darn first... ping, any progress here? I have nothing new to report. I am currently in the middle of a major delivery cycle. Since this issue is not related to that delivery, I will not have time to perform any investigations on this issue until after the middle of May 2009. Ok, let us know when you get to it. Thanks. ping, as per your comment #24, I just wanted to check in to see if you managed to do any further investigation regarding this bug Ok, its been an extra month here, I'm closing due to inactivity. Please re-open if/when you get time to test. new ticket 517504 has been created to reopen this issue. Detailed programs that can reproduced the problem are attached to that ticket. |