Bug 524129 - LVS master and backup director - Synchronised connections on backup director have unsuitable timeout value
Summary: LVS master and backup director - Synchronised connections on backup director ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 452914 (view as bug list)
Depends On:
Blocks: 492942 528645 533192
TreeView+ depends on / blocked
 
Reported: 2009-09-18 01:03 UTC by Michael Kearey
Modified: 2023-09-14 01:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 07:23:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to add sync conn timeout systcl (2.53 KB, patch)
2009-09-18 01:08 UTC, Michael Kearey
no flags Details | Diff
backport of whats upstream (1.17 KB, patch)
2009-10-02 18:19 UTC, Neil Horman
no flags Details | Diff
corrected patch (621 bytes, patch)
2009-10-02 19:45 UTC, Neil Horman
no flags Details | Diff
corrected patch (1.88 KB, patch)
2009-10-02 20:29 UTC, Neil Horman
no flags Details | Diff
one more corrected patch (1.78 KB, patch)
2009-10-03 00:58 UTC, Neil Horman
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Michael Kearey 2009-09-18 01:03:51 UTC
Description of problem:
When using LVS in a master and backup configuration, and utilising the feature to propagate active connections on the master to the backup director, the time out value is hard coded 180 seconds.

For most applications of the LVS system, the short timeout is ok, ie short lived TCP sessions, and rapid create/tear down. But for TCP sessions where the connection is long lived, with frequent periods idle the timeout is useless - It makes synchronising to the backup direct pointless.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Configure a LVS master and backup pair
2. Set the master to send sync connections to the backup ie set syncdaemon = 1 in LVS config file or run ipvsadm --start-daemon master on master  and 'backup' on the backup director

3. Establish TCP connections via the master director 

Actual results:
The backup director will have a connection table of sync'd connections with a hard coded timeout of 180 seconds. If the connection is inactive for 180 seconds in the master the backup will delete the connection from its table.

Expected results:
Desired result is to be able to adjust the timeout value so it suits the characteristic connections the director is handling.

Additional info:

Upstream current LVS uses an expanded sync protocol, where it can sync connections to the backup and send messages to remove sync'd connections from the backup when the connection becomes in active on the master. The LVS components in RHEL5 do not have these features, it relies completely on a hard coded timeout for sync'd connections.

If the directors are handling long lived sessions with periods of apparent inactivity, the master can be configured to use an increased timeout for it's connection table to keep the connections in it's table. But on a backup director, the sync'd connection is default timed out and removed after three minutes. The sync'd connections are reset on the backup if the master detects activity, but the 3 minute timeout is too short for the 'activity threshold' on the master.

A patch to the kernel LVS components is attached - It adds a sysctl entry with a default value 180 seconds, and allows an adjustment of the timeout for sync'd connections.

Comment 1 Michael Kearey 2009-09-18 01:08:57 UTC
Created attachment 361581 [details]
Patch to add sync conn timeout systcl

Comment 4 Neil Horman 2009-10-02 18:19:07 UTC
Created attachment 363510 [details]
backport of whats upstream

not sure why you would use a new sysctl to do this.  From my read upstream sets up a table of sysctls to do this in a more fine grained manner.  This is a backport of of whats upstream right now.  I'll submit a brew build soon for it.  Can you please try this with the customer and see how it works out.  Thanks!

Comment 5 Neil Horman 2009-10-02 19:45:02 UTC
Created attachment 363518 [details]
corrected patch

sorry, corrected patch.  The sysctl values in proc should actually not be there, as they're deprecated (or were never actually used).  By drawing timeouts from the timeout_table that enables you to use the ipvsadm utility with the --set option to set the timeouts your interested in.  I'll post a build url shortly. Please test and confirm that it does what you need it to do.  Thanks!

Comment 7 Neil Horman 2009-10-02 20:29:32 UTC
Created attachment 363526 [details]
corrected patch

grr. This is what I get for trying to build in brew straight away.  Some more corrected errors.  If this build fails, I'll debug it locally over the weekend.

Comment 8 Neil Horman 2009-10-02 20:32:02 UTC
new build:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2014258

Comment 9 Neil Horman 2009-10-03 00:58:47 UTC
Created attachment 363540 [details]
one more corrected patch

oi, I've built it locally, and this patch will build in brew.

Comment 10 Neil Horman 2009-10-03 01:00:01 UTC
 http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2014455
New build

Comment 15 Cong Wang 2009-10-21 02:23:05 UTC
*** Bug 452914 has been marked as a duplicate of this bug. ***

Comment 16 Neil Horman 2009-10-21 14:25:57 UTC
I'm putting this back in assigned state, as the ABI changes I hoped to avoid are central to the function of this patch, I'm building a new kernel for the customer to test here:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2038869

Please confirm that this still works for the customer, and I'll get it posted asap.

Comment 19 Don Zickus 2009-10-21 19:13:19 UTC
in kernel-2.6.18-170.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 20 Don Zickus 2009-10-21 19:21:27 UTC
Moving back to ASSIGNED to pick up the other change Neil wants to add to the bz.

Comment 23 Don Zickus 2009-10-28 20:17:38 UTC
in kernel-2.6.18-171.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 25 Chris Ward 2010-02-11 10:06:49 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 27 errata-xmlrpc 2010-03-30 07:23:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 29 Red Hat Bugzilla 2023-09-14 01:18:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.