Bug 453084 - Unable to set heartbeat interval for a one-to-one SCTP socket
Unable to set heartbeat interval for a one-to-one SCTP socket
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: lksctp-tools (Show other bugs)
4.0
i386 Linux
low Severity medium
: rc
: ---
Assigned To: Neil Horman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-27 03:10 EDT by Bergit
Modified: 2009-03-31 07:04 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-03-31 07:04:39 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to set hb_interval for all transports on a 1:1 socket (1.33 KB, patch)
2008-06-27 08:01 EDT, Neil Horman
no flags Details | Diff
debug patch (2.58 KB, patch)
2008-07-08 16:23 EDT, Neil Horman
no flags Details | Diff

  None (edit)
Description Bergit 2008-06-27 03:10:26 EDT
Description of problem:
Unable to set heartbeat interval for a one-to-one SCTP socket.
While using "setsockopt" with the option "SCTP_PEER_ADDR_PARAMS" to update the
heartbeat interval, the 2.6.9 kernel expects the parameter "assoc_id" in the
structure "sctp_paddrparams" to be set . But this is not needed for a one-to-one
socket.

Version-Release number of selected component (if applicable):
Linux kernel : 2.6.9-22.ELsmp
lksctp : lksctp-tools-1.0.8-1

# rpm -q redhat-release
redhat-release-4WS-3


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Neil Horman 2008-06-27 08:01:42 EDT
Created attachment 310428 [details]
patch to set hb_interval for all transports on a 1:1 socket

haven't tested it yet, but this should solve your problem.  I'm building a test
kernel for you now.  Will post when its available.
Comment 2 Neil Horman 2008-06-27 11:18:57 EDT
Test kernel available here:
http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-75.EL.bz543084.i686.rpm
Sorry about the name, I reversed some digits in the bz num when I built it.  I
think it should fix your problem though.  Let me know and I'll get it posted for
review internally..

Thanks!
Comment 3 Bergit 2008-07-02 04:50:52 EDT
(In reply to comment #2)
> Test kernel available here:
> http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-75.EL.bz543084.i686.rpm
> Sorry about the name, I reversed some digits in the bz num when I built it.  I
> think it should fix your problem though.  Let me know and I'll get it posted for
> review internally..
> 
> Thanks!
> 

Hi Neil,

Thanks for the patch.
I Installed the patch on the kernel.
    $ rpm -q kernel-smp
    kernel-smp-2.6.9-22.EL
    kernel-smp-2.6.9-75.EL.bz543084    

Tried after installing the patch.

But i still face the same problems as mentioned in my previous mails.
 - Am still unable to change the heartbeat.
 - getting SCTP_PEER_ADDR_PARAMS using "sctp_opt_info" still fails with an error
"Invalid argument"

Regards,
Bergit



Comment 4 Neil Horman 2008-07-02 07:12:28 EDT
Well, theres only two reasons that you'll get EINVAL back from requesting this
socket option, either the size of the data getting passed down is too small, or
too large (specifically not equall to sizeof(struct sctp_paddrparams)), or the
sctp_paddrparams structure is being sent down with a non zero association id and
a and an IP address that is set to something other than INADDR_ANY (and one or
both of them are actually invalid, which leads to a bad transport lookup).

Could you please send me:

1) output of uname -a from your system
2) output of rpm -qli lksctp-tools
3) code that recreates the problem (I think you sent this to the upstream list,
but I'd like to have it recorded here as well)
4) strace output of the code in (3) running on your system  

I'm starting to suspect that the sctp library you are using is is hitting the
overly specific size check in the socket option code, but the above info will
help me be certain.  Thanks!
Comment 5 Bergit 2008-07-02 08:03:24 EDT
Please find my comments inline..

(In reply to comment #4)
> Well, theres only two reasons that you'll get EINVAL back from requesting this
> socket option, either the size of the data getting passed down is too small, or
> too large (specifically not equall to sizeof(struct sctp_paddrparams)), or the
> sctp_paddrparams structure is being sent down with a non zero association id and
> a and an IP address that is set to something other than INADDR_ANY (and one or
> both of them are actually invalid, which leads to a bad transport lookup).
> 
> Could you please send me:
> 
> 1) output of uname -a from your system

# uname -a
Linux localhost.localdomain 2.6.9-75.EL.bz543084smp #1 SMP Fri Jun 27 09:07:38
EDT 2008 i686 i686 i386 GNU/Linux

> 2) output of rpm -qli lksctp-tools

# rpm -qli lksctp-tools
Name        : lksctp-tools                 Relocations: (not relocatable)
Version     : 1.0.8                             Vendor: (none)
Release     : 1                             Build Date: Sat 02 Feb 2008 12:43:14
AM IST
Install Date: Fri 29 Feb 2008 07:39:05 PM IST      Build Host: galen.zko.hp.com
Group       : System Environment/Libraries   Source RPM:
lksctp-tools-1.0.8-1.src.rpm
Size        : 200499                           License: LGPL
Signature   : (none)
URL         : http://lksctp.sourceforge.net
Summary     : User-space access to Linux Kernel SCTP
Description :
This is the lksctp-tools package for Linux Kernel SCTP Reference
Implementation.

This package is intended to supplement the Linux Kernel SCTP Reference
Implementation now available in the Linux kernel source tree in
versions 2.5.36 and following.  For more information on LKSCTP see the
package documentation README file, section titled "LKSCTP - Linux
Kernel SCTP."

This package contains the base run-time library & command-line tools.
/usr/bin/checksctp
/usr/bin/sctp_darn
/usr/bin/sctp_test
/usr/bin/withsctp
/usr/lib/libsctp.so.1
/usr/lib/libsctp.so.1.0.8
/usr/lib/lksctp-tools/libwithsctp.a
/usr/lib/lksctp-tools/libwithsctp.la
/usr/lib/lksctp-tools/libwithsctp.so
/usr/lib/lksctp-tools/libwithsctp.so.1
/usr/lib/lksctp-tools/libwithsctp.so.1.0.8
/usr/share/doc/lksctp-tools-1.0.8
/usr/share/doc/lksctp-tools-1.0.8/AUTHORS
/usr/share/doc/lksctp-tools-1.0.8/COPYING
/usr/share/doc/lksctp-tools-1.0.8/COPYING.lib
/usr/share/doc/lksctp-tools-1.0.8/ChangeLog

> 3) code that recreates the problem (I think you sent this to the upstream list,
> but I'd like to have it recorded here as well)

The server will accept connections from client and sets heartbeat on each
accepted association.

    int servSockFd;
    int clilen;
    struct sockaddr_in cli_addr;

    int hb[]={5000,93000};
    struct sctp_paddrparams paddr;
    socklen_t len = sizeof(paddr);

    listen(servSockFd,numClients_g);

    bzero((char*)&cli_addr,sizeof(cli_addr));
    clilen = sizeof(cli_addr);

    for(count =0; count <numClients_g; count++)
    {
        sockFd_g[count]= accept(servSockFd,(struct sockaddr *)&cli_addr,&clilen);

        if(sockFd_g[count] <0)
        {
            close(servSockFd);
            perror("Accept error:");
            exit(-1);
        }

        memset(&paddr,0,sizeof(paddr));
        paddr.spp_address.ss_family = AF_INET;
        paddr.spp_flags = SPP_HB_ENABLE;
        paddr.spp_hbinterval = hb[count]; /* in milliseconds */

         
if(setsockopt(sockFd_g[count],SOL_SOCKET,SCTP_PEER_ADDR_PARAMS,&paddr,sizeof(paddr))
< 0)
        {
            perror ("setsockopt for SCTP_PEER_ADDR_PARAMS");
        }
        else
        {
                printf("\nHeartbeat interval is successfully changed to %d
milliseconds",paddr.spp_hbinterval);
        }

/*      Commented this part as its failing with Invalid argument error
        struct sctp_paddrparams params;
        len = sizeof(struct sctp_paddrparams);

        memset(&params,0,sizeof(params));
        params.spp_address.ss_family = AF_INET;

        if(sctp_opt_info(sockFd_g[count],0,SCTP_PEER_ADDR_PARAMS,&params,&len) < 0)
        {
            printf("\nerrno : %d\n",errno);
            perror ("sockFd: getsockopt for SCTP_PEER_ADDR_PARAMS");
            exit(-1);
        }
        else
        {
                printf("\nAssoc id : %d\tHB Interval :
%d",params.spp_assoc_id,params.spp_hbinterval);
        }
*/
    }


> 4) strace output of the code in (3) running on your system  

output of strace -p <pid> -o <file> :

accept(3, {sa_family=AF_INET, sin_port=htons(8888),
sin_addr=inet_addr("192.2.83.200")}, [16]) = 4
setsockopt(4, SOL_SOCKET, SO_KEEPALIVE,
"\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 152) = 0
write(1, "\n", 1)                       = 1
getsockopt(4, 0x84 /* SOL_??? */, 0, "\0\0\0\0\270\v\0\0`\352\0\0\350\3\0\0",
[16]) = 0
write(1, "Heartbeat interval is successful"..., 64) = 64
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({5, 0}, {5, 0})               = 0
write(1, "Assoc id : 0\tRTO MAX : 60000\n", 29) = 29
fstat64(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7fa1000
write(1, "Done", 4)                     = 4
read(0, "\n", 1024)                     = 1
close(4)                                = 0
close(3)                                = 0
exit_group(0)                           = ?
~                                                                               

Hope this is what you asked for.. 

> 
> I'm starting to suspect that the sctp library you are using is is hitting the
> overly specific size check in the socket option code, but the above info will
> help me be certain.  Thanks!

Comment 6 Bergit 2008-07-02 08:29:44 EDT
When i executed the code forgot to remove the portion that does "sctp_get_info"
with SCTP_RTOINFO. Since this is irrelevant to this discussion I ran again after
removing that and here is the strace output.

accept(3, {sa_family=AF_INET, sin_port=htons(8888),
sin_addr=inet_addr("192.2.83.200")}, [16]) = 4
setsockopt(4, SOL_SOCKET, SO_KEEPALIVE,
"\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 152) = 0
write(1, "\n", 1)                       = 1
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({5, 0}, {5, 0})               = 0
write(1, "Heartbeat interval is successful"..., 64) = 64
fstat64(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7f87000
write(1, "Done", 4)                     = 4
read(0, "\n", 1024)                     = 1
close(4)                                = 0
close(3)                                = 0
exit_group(0)                           = ?


In setsockopt i see "SO_KEEPALIVE" where i actually pass "SCTP_PEER_ADDR_PARAMS"
in the code.. 
Wondering if this could be a problem..(may be any mismatch between kernel and
sctp libraries?) 

Thanks,
Bergit
Comment 7 Neil Horman 2008-07-07 13:46:34 EDT
Hey, I just noticed something.  Your lksctp-tools package isn't from Red Hat. 
Its lksctp-tools-1.0.8-1 and it looks like it was built by HP.  The latest
lksctp-tools for RHEL-4 is 1.0.2-6.4E.1.  Normally it wouldn't matter much, but
sctp in RHEL-4 had alot of fairly fragile interfaces between userspace and
kernel space that makes for some fairly restrictive version constraints.  Can
you uninstall lksctp-tools-1.0.8 on your system, and retest after installing
version 1.0.2 from RHN?  Thanks!
Comment 8 Bergit 2008-07-08 07:17:11 EDT
Hi,

Tried it in another machine which has the same kernel but 
lksctp-tools-1.0.2-6.4E.1

Here the struct sctp_paddrparams is defined as follows.

struct sctp_paddrparams {
        sctp_assoc_t            spp_assoc_id;
        struct sockaddr_storage spp_address;
        __u32                   spp_hbinterval;
        __u16                   spp_pathmaxrxt;
};

It does not have flags and a few other parameters. Guess its not an issue.

And these are the results.
    - Here also HB interval does not get modified. 
    - But i did not get the "Invalid argument" error for "sctp_opt_info" with
option SCTP_PEER_ADDR_PARAMS.
    - setsockopt for setting the heartbeat interval did not return any failure.
But when i did "sctp_opt_info" to get SCTP_PEER_ADDR_PARAMS after this, i could
see that the hb_interval was still the default value of 30000. 

Please look at the code snippet and sample output below.

Code snippet:
    listen(servSockFd,numClients_g);

    bzero((char*)&cli_addr,sizeof(cli_addr));
    clilen = sizeof(cli_addr);
    for(count =0; count <numClients_g; count++)
    {
        sockFd_g[count]= accept(servSockFd,(struct sockaddr *)&cli_addr,&clilen);

        if(sockFd_g[count] <0)
        {
            close(servSockFd);
            perror("Accept error:");
            exit(-1);
        }

        printf("\nClient connection established..");

        memset(&paddr,0,sizeof(paddr)); 
        paddr.spp_address.ss_family = AF_INET;
        paddr.spp_hbinterval = hb[count]; /* in milliseconds */
            
if(setsockopt(sockFd_g[count],SOL_SOCKET,SCTP_PEER_ADDR_PARAMS,&paddr,sizeof(paddr))
< 0)
        {
            perror ("setsockopt for SCTP_PEER_ADDR_PARAMS");
        }
        else
        {
                printf("\nHeartbeat interval is successfully changed to %d
milliseconds",paddr.spp_hbinterval);
        }

        struct sctp_paddrparams params; 
        len = sizeof(struct sctp_paddrparams);

        memset(&params,0,sizeof(params));
        params.spp_address.ss_family = AF_INET;

        if(sctp_opt_info(sockFd_g[count],0,SCTP_PEER_ADDR_PARAMS,&params,&len) < 0)
        {
            printf("\nerrno : %d\n",errno);
            perror ("sockFd: getsockopt for SCTP_PEER_ADDR_PARAMS");
            exit(-1);
        }
        else
        {
                printf("\nAssoc id : %d\tHB Interval :
%d",params.spp_assoc_id,params.spp_hbinterval);
        }
    }

Sample Output :

Client connection established..
Heartbeat interval is successfully changed to 5000 milliseconds
Assoc id : 0    HB Interval : 30000       --- //obtained using sctp_opt_info


In this machine also tried the patch that you provided. But it did not help.
Can't understand whats going wrong..

Thanks,
Bergit

Comment 9 Neil Horman 2008-07-08 16:23:57 EDT
Created attachment 311312 [details]
debug patch

Ok, then lets do it the old fashioned way.  Can you build and run this patch
into a kernel and send me the message log?  Thanks!
Comment 10 Bergit 2008-07-14 00:29:22 EDT
I have never done this before..
Can you please let me know how to do it? 
Thanks.. 
Comment 11 Neil Horman 2008-07-14 07:06:12 EDT
I'll just build you a test kernel here.  I'll update the bz when I have it posted.
Comment 12 Neil Horman 2008-07-14 11:40:05 EDT
http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-75.EL.bz543084dbg.i686.rpm
debug kernel is available there.
Comment 13 Bergit 2008-07-16 03:00:18 EDT
Thanks Neil..
Will this patch work on a x86_64 machine? I just want to use a different machine
whose kernel info is as follows. 

$ uname -a
Linux sahyadri 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64 x86_64
x86_64 GNU/Linux

It has the same OS as the one i was using earlier but its an x86_64.
Can the patch be tested in this?
Comment 14 Neil Horman 2008-07-16 06:27:43 EDT
http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-75.EL.bz543084dbg.x86_64.rpm
Heres teh x86_64 version
Comment 15 Neil Horman 2009-03-23 07:21:47 EDT
ping, any update here?
Comment 16 Bergit 2009-03-31 02:34:11 EDT
Hi Neil,

Sorry for the delay, 
The first patch did not work and I could not try this x86_64 version as i got into something else and could not work on this.. 

Thanks!
Comment 17 Neil Horman 2009-03-31 07:04:39 EDT
Ok, well, I'll close this for now.  Please feel free to reopen it if you get time to work on it.  Thanks!

Note You need to log in before you can comment on or make changes to this bug.