Bug 453084
| Summary: | Unable to set heartbeat interval for a one-to-one SCTP socket | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Bergit <bergit.jenitha> | ||||||
| Component: | lksctp-tools | Assignee: | Neil Horman <nhorman> | ||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 4.0 | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | i386 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2009-03-31 11:04:39 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Bergit
2008-06-27 07:10:26 UTC
Created attachment 310428 [details]
patch to set hb_interval for all transports on a 1:1 socket
haven't tested it yet, but this should solve your problem. I'm building a test
kernel for you now. Will post when its available.
Test kernel available here: http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-75.EL.bz543084.i686.rpm Sorry about the name, I reversed some digits in the bz num when I built it. I think it should fix your problem though. Let me know and I'll get it posted for review internally.. Thanks! (In reply to comment #2) > Test kernel available here: > http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-75.EL.bz543084.i686.rpm > Sorry about the name, I reversed some digits in the bz num when I built it. I > think it should fix your problem though. Let me know and I'll get it posted for > review internally.. > > Thanks! > Hi Neil, Thanks for the patch. I Installed the patch on the kernel. $ rpm -q kernel-smp kernel-smp-2.6.9-22.EL kernel-smp-2.6.9-75.EL.bz543084 Tried after installing the patch. But i still face the same problems as mentioned in my previous mails. - Am still unable to change the heartbeat. - getting SCTP_PEER_ADDR_PARAMS using "sctp_opt_info" still fails with an error "Invalid argument" Regards, Bergit Well, theres only two reasons that you'll get EINVAL back from requesting this socket option, either the size of the data getting passed down is too small, or too large (specifically not equall to sizeof(struct sctp_paddrparams)), or the sctp_paddrparams structure is being sent down with a non zero association id and a and an IP address that is set to something other than INADDR_ANY (and one or both of them are actually invalid, which leads to a bad transport lookup). Could you please send me: 1) output of uname -a from your system 2) output of rpm -qli lksctp-tools 3) code that recreates the problem (I think you sent this to the upstream list, but I'd like to have it recorded here as well) 4) strace output of the code in (3) running on your system I'm starting to suspect that the sctp library you are using is is hitting the overly specific size check in the socket option code, but the above info will help me be certain. Thanks! Please find my comments inline.. (In reply to comment #4) > Well, theres only two reasons that you'll get EINVAL back from requesting this > socket option, either the size of the data getting passed down is too small, or > too large (specifically not equall to sizeof(struct sctp_paddrparams)), or the > sctp_paddrparams structure is being sent down with a non zero association id and > a and an IP address that is set to something other than INADDR_ANY (and one or > both of them are actually invalid, which leads to a bad transport lookup). > > Could you please send me: > > 1) output of uname -a from your system # uname -a Linux localhost.localdomain 2.6.9-75.EL.bz543084smp #1 SMP Fri Jun 27 09:07:38 EDT 2008 i686 i686 i386 GNU/Linux > 2) output of rpm -qli lksctp-tools # rpm -qli lksctp-tools Name : lksctp-tools Relocations: (not relocatable) Version : 1.0.8 Vendor: (none) Release : 1 Build Date: Sat 02 Feb 2008 12:43:14 AM IST Install Date: Fri 29 Feb 2008 07:39:05 PM IST Build Host: galen.zko.hp.com Group : System Environment/Libraries Source RPM: lksctp-tools-1.0.8-1.src.rpm Size : 200499 License: LGPL Signature : (none) URL : http://lksctp.sourceforge.net Summary : User-space access to Linux Kernel SCTP Description : This is the lksctp-tools package for Linux Kernel SCTP Reference Implementation. This package is intended to supplement the Linux Kernel SCTP Reference Implementation now available in the Linux kernel source tree in versions 2.5.36 and following. For more information on LKSCTP see the package documentation README file, section titled "LKSCTP - Linux Kernel SCTP." This package contains the base run-time library & command-line tools. /usr/bin/checksctp /usr/bin/sctp_darn /usr/bin/sctp_test /usr/bin/withsctp /usr/lib/libsctp.so.1 /usr/lib/libsctp.so.1.0.8 /usr/lib/lksctp-tools/libwithsctp.a /usr/lib/lksctp-tools/libwithsctp.la /usr/lib/lksctp-tools/libwithsctp.so /usr/lib/lksctp-tools/libwithsctp.so.1 /usr/lib/lksctp-tools/libwithsctp.so.1.0.8 /usr/share/doc/lksctp-tools-1.0.8 /usr/share/doc/lksctp-tools-1.0.8/AUTHORS /usr/share/doc/lksctp-tools-1.0.8/COPYING /usr/share/doc/lksctp-tools-1.0.8/COPYING.lib /usr/share/doc/lksctp-tools-1.0.8/ChangeLog > 3) code that recreates the problem (I think you sent this to the upstream list, > but I'd like to have it recorded here as well) The server will accept connections from client and sets heartbeat on each accepted association. int servSockFd; int clilen; struct sockaddr_in cli_addr; int hb[]={5000,93000}; struct sctp_paddrparams paddr; socklen_t len = sizeof(paddr); listen(servSockFd,numClients_g); bzero((char*)&cli_addr,sizeof(cli_addr)); clilen = sizeof(cli_addr); for(count =0; count <numClients_g; count++) { sockFd_g[count]= accept(servSockFd,(struct sockaddr *)&cli_addr,&clilen); if(sockFd_g[count] <0) { close(servSockFd); perror("Accept error:"); exit(-1); } memset(&paddr,0,sizeof(paddr)); paddr.spp_address.ss_family = AF_INET; paddr.spp_flags = SPP_HB_ENABLE; paddr.spp_hbinterval = hb[count]; /* in milliseconds */ if(setsockopt(sockFd_g[count],SOL_SOCKET,SCTP_PEER_ADDR_PARAMS,&paddr,sizeof(paddr)) < 0) { perror ("setsockopt for SCTP_PEER_ADDR_PARAMS"); } else { printf("\nHeartbeat interval is successfully changed to %d milliseconds",paddr.spp_hbinterval); } /* Commented this part as its failing with Invalid argument error struct sctp_paddrparams params; len = sizeof(struct sctp_paddrparams); memset(¶ms,0,sizeof(params)); params.spp_address.ss_family = AF_INET; if(sctp_opt_info(sockFd_g[count],0,SCTP_PEER_ADDR_PARAMS,¶ms,&len) < 0) { printf("\nerrno : %d\n",errno); perror ("sockFd: getsockopt for SCTP_PEER_ADDR_PARAMS"); exit(-1); } else { printf("\nAssoc id : %d\tHB Interval : %d",params.spp_assoc_id,params.spp_hbinterval); } */ } > 4) strace output of the code in (3) running on your system output of strace -p <pid> -o <file> : accept(3, {sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.2.83.200")}, [16]) = 4 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, "\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 152) = 0 write(1, "\n", 1) = 1 getsockopt(4, 0x84 /* SOL_??? */, 0, "\0\0\0\0\270\v\0\0`\352\0\0\350\3\0\0", [16]) = 0 write(1, "Heartbeat interval is successful"..., 64) = 64 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({5, 0}, {5, 0}) = 0 write(1, "Assoc id : 0\tRTO MAX : 60000\n", 29) = 29 fstat64(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa1000 write(1, "Done", 4) = 4 read(0, "\n", 1024) = 1 close(4) = 0 close(3) = 0 exit_group(0) = ? ~ Hope this is what you asked for.. > > I'm starting to suspect that the sctp library you are using is is hitting the > overly specific size check in the socket option code, but the above info will > help me be certain. Thanks! When i executed the code forgot to remove the portion that does "sctp_get_info"
with SCTP_RTOINFO. Since this is irrelevant to this discussion I ran again after
removing that and here is the strace output.
accept(3, {sa_family=AF_INET, sin_port=htons(8888),
sin_addr=inet_addr("192.2.83.200")}, [16]) = 4
setsockopt(4, SOL_SOCKET, SO_KEEPALIVE,
"\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 152) = 0
write(1, "\n", 1) = 1
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({5, 0}, {5, 0}) = 0
write(1, "Heartbeat interval is successful"..., 64) = 64
fstat64(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7f87000
write(1, "Done", 4) = 4
read(0, "\n", 1024) = 1
close(4) = 0
close(3) = 0
exit_group(0) = ?
In setsockopt i see "SO_KEEPALIVE" where i actually pass "SCTP_PEER_ADDR_PARAMS"
in the code..
Wondering if this could be a problem..(may be any mismatch between kernel and
sctp libraries?)
Thanks,
Bergit
Hey, I just noticed something. Your lksctp-tools package isn't from Red Hat. Its lksctp-tools-1.0.8-1 and it looks like it was built by HP. The latest lksctp-tools for RHEL-4 is 1.0.2-6.4E.1. Normally it wouldn't matter much, but sctp in RHEL-4 had alot of fairly fragile interfaces between userspace and kernel space that makes for some fairly restrictive version constraints. Can you uninstall lksctp-tools-1.0.8 on your system, and retest after installing version 1.0.2 from RHN? Thanks!
Hi,
Tried it in another machine which has the same kernel but
lksctp-tools-1.0.2-6.4E.1
Here the struct sctp_paddrparams is defined as follows.
struct sctp_paddrparams {
sctp_assoc_t spp_assoc_id;
struct sockaddr_storage spp_address;
__u32 spp_hbinterval;
__u16 spp_pathmaxrxt;
};
It does not have flags and a few other parameters. Guess its not an issue.
And these are the results.
- Here also HB interval does not get modified.
- But i did not get the "Invalid argument" error for "sctp_opt_info" with
option SCTP_PEER_ADDR_PARAMS.
- setsockopt for setting the heartbeat interval did not return any failure.
But when i did "sctp_opt_info" to get SCTP_PEER_ADDR_PARAMS after this, i could
see that the hb_interval was still the default value of 30000.
Please look at the code snippet and sample output below.
Code snippet:
listen(servSockFd,numClients_g);
bzero((char*)&cli_addr,sizeof(cli_addr));
clilen = sizeof(cli_addr);
for(count =0; count <numClients_g; count++)
{
sockFd_g[count]= accept(servSockFd,(struct sockaddr *)&cli_addr,&clilen);
if(sockFd_g[count] <0)
{
close(servSockFd);
perror("Accept error:");
exit(-1);
}
printf("\nClient connection established..");
memset(&paddr,0,sizeof(paddr));
paddr.spp_address.ss_family = AF_INET;
paddr.spp_hbinterval = hb[count]; /* in milliseconds */
if(setsockopt(sockFd_g[count],SOL_SOCKET,SCTP_PEER_ADDR_PARAMS,&paddr,sizeof(paddr))
< 0)
{
perror ("setsockopt for SCTP_PEER_ADDR_PARAMS");
}
else
{
printf("\nHeartbeat interval is successfully changed to %d
milliseconds",paddr.spp_hbinterval);
}
struct sctp_paddrparams params;
len = sizeof(struct sctp_paddrparams);
memset(¶ms,0,sizeof(params));
params.spp_address.ss_family = AF_INET;
if(sctp_opt_info(sockFd_g[count],0,SCTP_PEER_ADDR_PARAMS,¶ms,&len) < 0)
{
printf("\nerrno : %d\n",errno);
perror ("sockFd: getsockopt for SCTP_PEER_ADDR_PARAMS");
exit(-1);
}
else
{
printf("\nAssoc id : %d\tHB Interval :
%d",params.spp_assoc_id,params.spp_hbinterval);
}
}
Sample Output :
Client connection established..
Heartbeat interval is successfully changed to 5000 milliseconds
Assoc id : 0 HB Interval : 30000 --- //obtained using sctp_opt_info
In this machine also tried the patch that you provided. But it did not help.
Can't understand whats going wrong..
Thanks,
Bergit
Created attachment 311312 [details]
debug patch
Ok, then lets do it the old fashioned way. Can you build and run this patch
into a kernel and send me the message log? Thanks!
I have never done this before.. Can you please let me know how to do it? Thanks.. I'll just build you a test kernel here. I'll update the bz when I have it posted. http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-75.EL.bz543084dbg.i686.rpm debug kernel is available there. Thanks Neil.. Will this patch work on a x86_64 machine? I just want to use a different machine whose kernel info is as follows. $ uname -a Linux sahyadri 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux It has the same OS as the one i was using earlier but its an x86_64. Can the patch be tested in this? http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-75.EL.bz543084dbg.x86_64.rpm Heres teh x86_64 version ping, any update here? Hi Neil, Sorry for the delay, The first patch did not work and I could not try this x86_64 version as i got into something else and could not work on this.. Thanks! Ok, well, I'll close this for now. Please feel free to reopen it if you get time to work on it. Thanks! |