1636496 – Add support for configuring Octavia LB timeouts in OSP 13

Bug 1636496 - Add support for configuring Octavia LB timeouts in OSP 13

Summary: Add support for configuring Octavia LB timeouts in OSP 13

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z5
Target Release:	13.0 (Queens)
Assignee:	Kamil Sambor
QA Contact:	Jon Uriarte
Docs Contact:
URL:
Whiteboard:
Depends On:	1636498 1668915
Blocks:	1623989 1669078
TreeView+	depends on / blocked

Reported:	2018-10-05 14:25 UTC by Jon Uriarte
Modified:	2019-09-24 10:10 UTC (History)
CC List:	11 users (show)
Fixed In Version:	puppet-octavia-12.4.0-4.el7ost openstack-tripleo-heat-templates-8.0.8-0.20181105200942.4d61f75.el7ost
Doc Type:	Enhancement
Doc Text:	With this update, you can use the following parameters to set the default Octavia timeouts for backend member and frontend client: * `OctaviaTimeoutClientData`: Frontend client inactivity timeout * `OctaviaTimeoutMemberConnect`: Backend member connection timeout * `OctaviaTimeoutMemberData`: Backend member inactivity timeout * `OctaviaTimeoutTcpInspect`: Time to wait for TCP packets for content inspection The value for all of these parameters is in milliseconds.
Clone Of:
Clones:	1669078 (view as bug list)
Environment:
Last Closed:	2019-03-14 13:54:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0448	0	None	None	None	2019-03-14 13:55:01 UTC

Description Jon Uriarte 2018-10-05 14:25:18 UTC

Description of problem:

For Openshift on Openstack deployments we would like to increase Octavia LB default timeouts.
There is a bug in Openshift [1], which will solve this issue but it will not be backported to OCP 3.10 and OCP 3.11,
the versions we currently support, so we need to cope with the solution from OSP side.

The procedure we are following manually to change the timeouts is:

1.- Log-in into overcloud controller
2.- Add the files on https://github.com/openstack/octavia/tree/stable/queens/octavia/common/jinja/haproxy/templates 
      on /var/lib/config-data/puppet-generated/octavia/
3.- Modify the base.j2 file to increase the default timeout_client and timeout_server values (for instance from 50000 to 500000,
      i.e., from 50 seconds to 500 seconds): https://github.com/openstack/octavia/blob/stable/queens/octavia/common/jinja/haproxy/templates/base.j2#L42-L43
4.- Edit file /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf to point to the haproxy template just copied (note it is mounted in a different
      directory inside the container):
      [haproxy_amphora]
      haproxy_template = /var/lib/kolla/config_files/src/haproxy.cfg.j24.- restart octavia-worker container
5.- Restart octavia-worker container
6.- Trigger openshift-ansible provisioning playbooks

Adding support in TripleO/Director would help configuring the timeouts with the required values and the changes would be persistent to upgrades/updates.

Version-Release number of selected component (if applicable): OSP 13 2018-09-28.1 puddle

How reproducible: new requirement

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1618685

Comment 9 Lon Hohberger 2019-01-24 11:34:30 UTC

According to our records, this should be resolved by puppet-octavia-12.4.0-7.el7ost.  This build is available now.

Comment 10 Alexander Stafeyev 2019-01-27 13:09:44 UTC

(In reply to Lon Hohberger from comment #9)
> According to our records, this should be resolved by
> puppet-octavia-12.4.0-7.el7ost.  This build is available now.

(overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripl | grep templ
openstack-tripleo-heat-templates-8.0.7-21.el7ost.noarch
(overcloud) [stack@undercloud-0 ~]$ 
(overcloud) [stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 
13   -p 2019-01-22.1
 

openstack-tripleo-heat-templates-8.0.7-21.el7ost.noarch != openstack-tripleo-heat-templates-8.0.8-0.20181105200942.4d61f75.el7ost on qa is not correct state for now

Comment 11 Carlos Goncalves 2019-01-27 18:34:26 UTC

No need for NEEDINFO on Jon, the reporter. It is a release delivery matter at this point.

Comment 13 Jon Uriarte 2019-02-28 15:09:42 UTC

Waiting to [1] to be fixed, as it blocks Openshift installation.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1684077

Comment 14 Jon Uriarte 2019-03-01 14:50:43 UTC

Verified in OSP13 2019-02-25.2 puddle.

[stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 
13  -p 2019-02-25.2

[stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-8.2.0-4.el7ost.noarch

[stack@undercloud-0 ~]$ rpm -qa | grep puppet-octavia
puppet-octavia-12.4.0-8.el7ost.noarch

- octavia_health_manager container:
openstack-octavia-common-2.0.3-2.el7ost.noarch
openstack-octavia-health-manager-2.0.3-2.el7ost.noarch

- octavia_api container:
openstack-octavia-common-2.0.3-2.el7ost.noarch
openstack-octavia-api-2.0.3-2.el7ost.noarch

- octavia_housekeeping container:
openstack-octavia-common-2.0.3-2.el7ost.noarch
openstack-octavia-housekeeping-2.0.3-2.el7ost.noarch

- octavia_worker container:
openstack-octavia-common-2.0.3-2.el7ost.noarch
openstack-octavia-worker-2.0.3-2.el7ost.noarch



Verification steps:

1. Deploy OSP 13 with octavia in a hybrid environment
  · Set in customized Jenkins job OVERCLOUD_DEPLOY_OVERRIDE_OPTIONS:

     --config-heat OctaviaTimeoutClientData=1200000 \
     --config-heat OctaviaTimeoutMemberData=1200000

2. Check the new timeouts values are reflected in configuration (in the controller):

    /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf:timeout_client_data=1200000
    /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf:timeout_member_connect=5000
    /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf:timeout_member_data=1200000
    /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf:timeout_tcp_inspect=0
    /var/lib/config-data/clustercheck/etc/puppet/hieradata/service_configs.json:    "octavia::controller::timeout_client_data": 1200000,
    /var/lib/config-data/clustercheck/etc/puppet/hieradata/service_configs.json:    "octavia::controller::timeout_member_connect": 5000,
    /var/lib/config-data/clustercheck/etc/puppet/hieradata/service_configs.json:    "octavia::controller::timeout_member_data": 1200000,
    /var/lib/config-data/clustercheck/etc/puppet/hieradata/service_configs.json:    "octavia::controller::timeout_tcp_inspect": 0,

3. Deploy OSP 3.11 with Kuryr and check the cluster is ready and all the pods are running:
Note: A workaround for [1] has been applied - "yum install openshift-ansible --enablerepo=rhelosp-rhel-7.6-server-opt"

[openshift@master-0 ~]$ oc version
oc v3.11.90
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://console.openshift.example.com:8443
openshift v3.11.90
kubernetes v1.11.0+d4cacc0


[openshift@master-0 ~]$ oc get nodes
NAME                                 STATUS    ROLES     AGE       VERSION
app-node-0.openshift.example.com     Ready     compute   2h        v1.11.0+d4cacc0
app-node-1.openshift.example.com     Ready     compute   2h        v1.11.0+d4cacc0
infra-node-0.openshift.example.com   Ready     infra     2h        v1.11.0+d4cacc0
master-0.openshift.example.com       Ready     master    2h        v1.11.0+d4cacc0


[openshift@master-0 ~]$ oc get pods --all-namespaces
NAMESPACE                   NAME                                                READY     STATUS    RESTARTS   AGE
default                     docker-registry-1-p6b2d                             1/1       Running   0          2h
default                     kuryr-pod-1958724533                                1/1       Running   0          1h
default                     registry-console-1-c7f79                            1/1       Running   0          2h
default                     router-1-vs7ld                                      1/1       Running   0          2h
kube-system                 master-api-master-0.openshift.example.com           1/1       Running   0          2h
kube-system                 master-controllers-master-0.openshift.example.com   1/1       Running   0          2h
kube-system                 master-etcd-master-0.openshift.example.com          1/1       Running   0          2h
kuryr-namespace-699852674   kuryr-pod-436421213                                 1/1       Running   0          1h
kuryr-namespace-920831344   kuryr-pod-826298564                                 1/1       Running   0          1h
openshift-console           console-5d5b6bd95d-m487h                            1/1       Running   0          2h
openshift-infra             kuryr-cni-ds-6w4hx                                  2/2       Running   0          1h
openshift-infra             kuryr-cni-ds-pdfx5                                  2/2       Running   0          1h
openshift-infra             kuryr-cni-ds-pkzxj                                  2/2       Running   0          1h
openshift-infra             kuryr-cni-ds-xpg25                                  2/2       Running   0          1h
openshift-infra             kuryr-controller-6c6d965c54-k68fn                   1/1       Running   0          1h
openshift-monitoring        alertmanager-main-0                                 3/3       Running   0          2h
openshift-monitoring        alertmanager-main-1                                 3/3       Running   0          2h
openshift-monitoring        alertmanager-main-2                                 3/3       Running   0          2h
openshift-monitoring        cluster-monitoring-operator-75c6b544dd-4swg4        1/1       Running   0          2h
openshift-monitoring        grafana-c7d5bc87c-qw92g                             2/2       Running   0          2h
openshift-monitoring        kube-state-metrics-6c64799586-gmjv5                 3/3       Running   0          2h
openshift-monitoring        node-exporter-57hdh                                 2/2       Running   0          2h
openshift-monitoring        node-exporter-mptl8                                 2/2       Running   0          2h
openshift-monitoring        node-exporter-qdqq5                                 2/2       Running   0          2h
openshift-monitoring        node-exporter-rqm49                                 2/2       Running   0          2h
openshift-monitoring        prometheus-k8s-0                                    4/4       Running   1          2h
openshift-monitoring        prometheus-k8s-1                                    4/4       Running   1          2h
openshift-monitoring        prometheus-operator-5b47ff445b-w5v4c                1/1       Running   0          2h
openshift-node              sync-fv2tv                                          1/1       Running   0          2h
openshift-node              sync-gklcl                                          1/1       Running   0          2h
openshift-node              sync-gr62d                                          1/1       Running   0          2h
openshift-node              sync-k6nn2                                          1/1       Running   0          2h



[1] https://bugzilla.redhat.com/show_bug.cgi?id=1684077

Comment 17 errata-xmlrpc 2019-03-14 13:54:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0448

Note You need to log in before you can comment on or make changes to this bug.