Bug 1636498 - Add support for configuring Octavia LB timeouts in OSP 14
Summary: Add support for configuring Octavia LB timeouts in OSP 14
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 14.0 (Rocky)
Assignee: Kamil Sambor
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks: 1636496 1668915
TreeView+ depends on / blocked
 
Reported: 2018-10-05 14:27 UTC by Jon Uriarte
Modified: 2019-10-09 15:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Octavia default timeouts for backend member and frontend client can be set by params exposed in template: * `OctaviaTimeoutClientData`: Frontend client inactivity timeout * `OctaviaTimeoutMemberConnect`: Backend member connection timeout * `OctaviaTimeoutMemberData`: Backend member inactivity timeout * `OctaviaTimeoutTcpInspect`: Time to wait for TCP packets for content inspection The value for all of these options is expected to be in milliseconds.
Clone Of:
: 1668915 (view as bug list)
Environment:
Last Closed: 2019-10-09 15:36:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jon Uriarte 2018-10-05 14:27:16 UTC
Description of problem:

For Openshift on Openstack deployments we would like to increase Octavia LB default timeouts.
There is a bug in Openshift [1], which will solve this issue but it will not be backported to OCP 3.10 and OCP 3.11,
the versions we currently support, so we need to cope with the solution from OSP side.

The procedure we are following manually to change the timeouts is:

1.- Log-in into overcloud controller
2.- Add the files on https://github.com/openstack/octavia/tree/stable/queens/octavia/common/jinja/haproxy/templates 
      on /var/lib/config-data/puppet-generated/octavia/
3.- Modify the base.j2 file to increase the default timeout_client and timeout_server values (for instance from 50000 to 500000,
      i.e., from 50 seconds to 500 seconds): https://github.com/openstack/octavia/blob/stable/queens/octavia/common/jinja/haproxy/templates/base.j2#L42-L43
4.- Edit file /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf to point to the haproxy template just copied (note it is mounted in a different
      directory inside the container):
      [haproxy_amphora]
      haproxy_template = /var/lib/kolla/config_files/src/haproxy.cfg.j24.- restart octavia-worker container
5.- Restart octavia-worker container
6.- Trigger openshift-ansible provisioning playbooks

Adding support in TripleO/Director would help configuring the timeouts with the required values and the changes would be persistent to upgrades/updates.

Version-Release number of selected component (if applicable): OSP 14

How reproducible: new requirement

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1618685

Comment 13 Mikey Ariel 2019-02-20 12:44:26 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 14 Jon Uriarte 2019-02-28 09:17:03 UTC
Verified in OSP14 2019-02-22.2 puddle.

[stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 
14   -p 2019-02-22.2

[stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-9.2.1-0.20190119154860.fe11ade.el7ost.noarch

[stack@undercloud-0 ~]$ rpm -qa | grep puppet-octavia
puppet-octavia-13.3.1-0.20181013113434.e19b590.el7ost.noarch

- octavia_health_manager container:
openstack-octavia-health-manager-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch
openstack-octavia-common-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch

- octavia_api container:
openstack-octavia-api-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch
openstack-octavia-common-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch

- octavia_housekeeping container:
openstack-octavia-housekeeping-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch
openstack-octavia-common-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch

- octavia_worker container:
openstack-octavia-worker-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch
openstack-octavia-common-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch



Verification steps:

1. Deploy OSP 14 with octavia in a hybrid environment
  ยท Set in customized Jenkins job OVERCLOUD_DEPLOY_OVERRIDE_OPTIONS:

     --config-heat OctaviaTimeoutClientData=1200000 \
     --config-heat OctaviaTimeoutMemberData=1200000

2. Check the new timeouts values are reflected in configuration (in the controller):
    /var/lib/config-data/octavia/etc/puppet/hieradata/service_configs.json:    "octavia::worker::timeout_client_data": 1200000,
    /var/lib/config-data/octavia/etc/puppet/hieradata/service_configs.json:    "octavia::worker::timeout_member_connect": 5000,
    /var/lib/config-data/octavia/etc/puppet/hieradata/service_configs.json:    "octavia::worker::timeout_member_data": 1200000,
    /var/lib/config-data/octavia/etc/puppet/hieradata/service_configs.json:    "octavia::worker::timeout_tcp_inspect": 0,

3. Deploy OSP 3.11 with Kuryr and check the cluster is ready and all the pods are running:

[openshift@master-0 ~]$ oc get nodes
NAME                                 STATUS    ROLES     AGE       VERSION
app-node-0.openshift.example.com     Ready     compute   12h       v1.11.0+d4cacc0
app-node-1.openshift.example.com     Ready     compute   12h       v1.11.0+d4cacc0
infra-node-0.openshift.example.com   Ready     infra     12h       v1.11.0+d4cacc0
master-0.openshift.example.com       Ready     master    12h       v1.11.0+d4cacc0

[openshift@master-0 ~]$ oc get pods --all-namespaces
NAMESPACE                    NAME                                                READY     STATUS    RESTARTS   AGE
default                      docker-registry-1-4p562                             1/1       Running   0          12h
default                      kuryr-pod-1259287626                                1/1       Running   0          12h
default                      registry-console-1-2j7fm                            1/1       Running   0          12h
default                      router-1-4zkl7                                      1/1       Running   0          12h
kube-system                  master-api-master-0.openshift.example.com           1/1       Running   0          12h
kube-system                  master-controllers-master-0.openshift.example.com   1/1       Running   0          12h
kube-system                  master-etcd-master-0.openshift.example.com          1/1       Running   1          12h
kuryr-namespace-1317222661   kuryr-pod-1964655624                                1/1       Running   0          11h
kuryr-namespace-820599586    kuryr-pod-362347688                                 1/1       Running   0          11h
openshift-console            console-6975575759-fr8fm                            1/1       Running   0          12h
openshift-infra              kuryr-cni-ds-74snc                                  2/2       Running   0          12h
openshift-infra              kuryr-cni-ds-glpxx                                  2/2       Running   0          12h
openshift-infra              kuryr-cni-ds-krg6p                                  2/2       Running   0          12h
openshift-infra              kuryr-cni-ds-q9srf                                  2/2       Running   0          12h
openshift-infra              kuryr-controller-6c6d965c54-fvprg                   1/1       Running   0          11h
openshift-monitoring         alertmanager-main-0                                 3/3       Running   0          12h
openshift-monitoring         alertmanager-main-1                                 3/3       Running   0          12h
openshift-monitoring         alertmanager-main-2                                 3/3       Running   0          12h
openshift-monitoring         cluster-monitoring-operator-75c6b544dd-5gsqp        1/1       Running   0          12h
openshift-monitoring         grafana-c7d5bc87c-849lb                             2/2       Running   0          12h
openshift-monitoring         kube-state-metrics-6c64799586-kdkdm                 3/3       Running   0          12h
openshift-monitoring         node-exporter-2rz7h                                 2/2       Running   0          12h
openshift-monitoring         node-exporter-9jrh5                                 2/2       Running   0          12h
openshift-monitoring         node-exporter-ff9sz                                 2/2       Running   0          12h
openshift-monitoring         node-exporter-vzcvh                                 2/2       Running   0          12h
openshift-monitoring         prometheus-k8s-0                                    4/4       Running   1          12h
openshift-monitoring         prometheus-k8s-1                                    4/4       Running   1          12h
openshift-monitoring         prometheus-operator-5b47ff445b-59lrd                1/1       Running   0          12h
openshift-node               sync-bpnjw                                          1/1       Running   0          12h
openshift-node               sync-fgffz                                          1/1       Running   0          12h
openshift-node               sync-mv8qw                                          1/1       Running   0          12h
openshift-node               sync-vc72l                                          1/1       Running   0          12h

Comment 15 Jon Uriarte 2019-02-28 11:06:43 UTC
Correction over my previous comment:

The new timeouts values should be reflected in /var/lib/config-data/octavia/etc/octavia/octavia.conf as well,
and are not.

Correct behaviour should reflect:
    /var/lib/config-data/octavia/etc/octavia/octavia.conf:timeout_client_data=1200000
    /var/lib/config-data/octavia/etc/octavia/octavia.conf:timeout_member_connect=5000
    /var/lib/config-data/octavia/etc/octavia/octavia.conf:timeout_member_data=1200000
    /var/lib/config-data/octavia/etc/octavia/octavia.conf:timeout_tcp_inspect=0

Moving the BZ back to assigned.

Comment 19 Carlos Goncalves 2019-10-09 15:36:58 UTC
Closing as WONTFIX. OSP 14, a 1-year support product, is fast approaching EOL and the team does not have the capacity to fix it in time until the next and last OSP 14 zstream. There are no customer cases attached nor anyone expressed urgency here to have this BZ resolved.

It is worth noting, though, that the fix is available in OSP 13 as well as in OSP 15 and on.


Note You need to log in before you can comment on or make changes to this bug.