Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1636498

Summary:	Add support for configuring Octavia LB timeouts in OSP 14
Product:	Red Hat OpenStack	Reporter:	Jon Uriarte <juriarte>
Component:	openstack-tripleo-heat-templates	Assignee:	Kamil Sambor <ksambor>
Status:	CLOSED WONTFIX	QA Contact:	Jon Uriarte <juriarte>
Severity:	high	Docs Contact:
Priority:	high
Version:	14.0 (Rocky)	CC:	astafeye, cgoncalves, gcheresh, itbrown, ksambor, mburns, rheslop, slinaber
Target Milestone:	z4	Keywords:	Triaged, ZStream
Target Release:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:	Octavia default timeouts for backend member and frontend client can be set by params exposed in template: * `OctaviaTimeoutClientData`: Frontend client inactivity timeout * `OctaviaTimeoutMemberConnect`: Backend member connection timeout * `OctaviaTimeoutMemberData`: Backend member inactivity timeout * `OctaviaTimeoutTcpInspect`: Time to wait for TCP packets for content inspection The value for all of these options is expected to be in milliseconds.	Story Points:	---
Clone Of:
Clones:	1668915 (view as bug list)		Environment:
Last Closed:	2019-10-09 15:36:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1636496, 1668915

Description Jon Uriarte 2018-10-05 14:27:16 UTC

Description of problem:

For Openshift on Openstack deployments we would like to increase Octavia LB default timeouts.
There is a bug in Openshift [1], which will solve this issue but it will not be backported to OCP 3.10 and OCP 3.11,
the versions we currently support, so we need to cope with the solution from OSP side.

The procedure we are following manually to change the timeouts is:

1.- Log-in into overcloud controller
2.- Add the files on https://github.com/openstack/octavia/tree/stable/queens/octavia/common/jinja/haproxy/templates 
      on /var/lib/config-data/puppet-generated/octavia/
3.- Modify the base.j2 file to increase the default timeout_client and timeout_server values (for instance from 50000 to 500000,
      i.e., from 50 seconds to 500 seconds): https://github.com/openstack/octavia/blob/stable/queens/octavia/common/jinja/haproxy/templates/base.j2#L42-L43
4.- Edit file /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf to point to the haproxy template just copied (note it is mounted in a different
      directory inside the container):
      [haproxy_amphora]
      haproxy_template = /var/lib/kolla/config_files/src/haproxy.cfg.j24.- restart octavia-worker container
5.- Restart octavia-worker container
6.- Trigger openshift-ansible provisioning playbooks

Adding support in TripleO/Director would help configuring the timeouts with the required values and the changes would be persistent to upgrades/updates.

Version-Release number of selected component (if applicable): OSP 14

How reproducible: new requirement

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1618685

Comment 13 Mikey Ariel 2019-02-20 12:44:26 UTC

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 14 Jon Uriarte 2019-02-28 09:17:03 UTC

Verified in OSP14 2019-02-22.2 puddle.

[stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 
14   -p 2019-02-22.2

[stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-9.2.1-0.20190119154860.fe11ade.el7ost.noarch

[stack@undercloud-0 ~]$ rpm -qa | grep puppet-octavia
puppet-octavia-13.3.1-0.20181013113434.e19b590.el7ost.noarch

- octavia_health_manager container:
openstack-octavia-health-manager-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch
openstack-octavia-common-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch

- octavia_api container:
openstack-octavia-api-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch
openstack-octavia-common-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch

- octavia_housekeeping container:
openstack-octavia-housekeeping-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch
openstack-octavia-common-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch

- octavia_worker container:
openstack-octavia-worker-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch
openstack-octavia-common-3.0.2-0.20181219195054.ec4c88e.el7ost.noarch



Verification steps:

1. Deploy OSP 14 with octavia in a hybrid environment
  · Set in customized Jenkins job OVERCLOUD_DEPLOY_OVERRIDE_OPTIONS:

     --config-heat OctaviaTimeoutClientData=1200000 \
     --config-heat OctaviaTimeoutMemberData=1200000

2. Check the new timeouts values are reflected in configuration (in the controller):
    /var/lib/config-data/octavia/etc/puppet/hieradata/service_configs.json:    "octavia::worker::timeout_client_data": 1200000,
    /var/lib/config-data/octavia/etc/puppet/hieradata/service_configs.json:    "octavia::worker::timeout_member_connect": 5000,
    /var/lib/config-data/octavia/etc/puppet/hieradata/service_configs.json:    "octavia::worker::timeout_member_data": 1200000,
    /var/lib/config-data/octavia/etc/puppet/hieradata/service_configs.json:    "octavia::worker::timeout_tcp_inspect": 0,

3. Deploy OSP 3.11 with Kuryr and check the cluster is ready and all the pods are running:

[openshift@master-0 ~]$ oc get nodes
NAME                                 STATUS    ROLES     AGE       VERSION
app-node-0.openshift.example.com     Ready     compute   12h       v1.11.0+d4cacc0
app-node-1.openshift.example.com     Ready     compute   12h       v1.11.0+d4cacc0
infra-node-0.openshift.example.com   Ready     infra     12h       v1.11.0+d4cacc0
master-0.openshift.example.com       Ready     master    12h       v1.11.0+d4cacc0

[openshift@master-0 ~]$ oc get pods --all-namespaces
NAMESPACE                    NAME                                                READY     STATUS    RESTARTS   AGE
default                      docker-registry-1-4p562                             1/1       Running   0          12h
default                      kuryr-pod-1259287626                                1/1       Running   0          12h
default                      registry-console-1-2j7fm                            1/1       Running   0          12h
default                      router-1-4zkl7                                      1/1       Running   0          12h
kube-system                  master-api-master-0.openshift.example.com           1/1       Running   0          12h
kube-system                  master-controllers-master-0.openshift.example.com   1/1       Running   0          12h
kube-system                  master-etcd-master-0.openshift.example.com          1/1       Running   1          12h
kuryr-namespace-1317222661   kuryr-pod-1964655624                                1/1       Running   0          11h
kuryr-namespace-820599586    kuryr-pod-362347688                                 1/1       Running   0          11h
openshift-console            console-6975575759-fr8fm                            1/1       Running   0          12h
openshift-infra              kuryr-cni-ds-74snc                                  2/2       Running   0          12h
openshift-infra              kuryr-cni-ds-glpxx                                  2/2       Running   0          12h
openshift-infra              kuryr-cni-ds-krg6p                                  2/2       Running   0          12h
openshift-infra              kuryr-cni-ds-q9srf                                  2/2       Running   0          12h
openshift-infra              kuryr-controller-6c6d965c54-fvprg                   1/1       Running   0          11h
openshift-monitoring         alertmanager-main-0                                 3/3       Running   0          12h
openshift-monitoring         alertmanager-main-1                                 3/3       Running   0          12h
openshift-monitoring         alertmanager-main-2                                 3/3       Running   0          12h
openshift-monitoring         cluster-monitoring-operator-75c6b544dd-5gsqp        1/1       Running   0          12h
openshift-monitoring         grafana-c7d5bc87c-849lb                             2/2       Running   0          12h
openshift-monitoring         kube-state-metrics-6c64799586-kdkdm                 3/3       Running   0          12h
openshift-monitoring         node-exporter-2rz7h                                 2/2       Running   0          12h
openshift-monitoring         node-exporter-9jrh5                                 2/2       Running   0          12h
openshift-monitoring         node-exporter-ff9sz                                 2/2       Running   0          12h
openshift-monitoring         node-exporter-vzcvh                                 2/2       Running   0          12h
openshift-monitoring         prometheus-k8s-0                                    4/4       Running   1          12h
openshift-monitoring         prometheus-k8s-1                                    4/4       Running   1          12h
openshift-monitoring         prometheus-operator-5b47ff445b-59lrd                1/1       Running   0          12h
openshift-node               sync-bpnjw                                          1/1       Running   0          12h
openshift-node               sync-fgffz                                          1/1       Running   0          12h
openshift-node               sync-mv8qw                                          1/1       Running   0          12h
openshift-node               sync-vc72l                                          1/1       Running   0          12h

Comment 15 Jon Uriarte 2019-02-28 11:06:43 UTC

Correction over my previous comment:

The new timeouts values should be reflected in /var/lib/config-data/octavia/etc/octavia/octavia.conf as well,
and are not.

Correct behaviour should reflect:
    /var/lib/config-data/octavia/etc/octavia/octavia.conf:timeout_client_data=1200000
    /var/lib/config-data/octavia/etc/octavia/octavia.conf:timeout_member_connect=5000
    /var/lib/config-data/octavia/etc/octavia/octavia.conf:timeout_member_data=1200000
    /var/lib/config-data/octavia/etc/octavia/octavia.conf:timeout_tcp_inspect=0

Moving the BZ back to assigned.

Comment 19 Carlos Goncalves 2019-10-09 15:36:58 UTC

Closing as WONTFIX. OSP 14, a 1-year support product, is fast approaching EOL and the team does not have the capacity to fix it in time until the next and last OSP 14 zstream. There are no customer cases attached nor anyone expressed urgency here to have this BZ resolved.

It is worth noting, though, that the fix is available in OSP 13 as well as in OSP 15 and on.