Description of problem: For Openshift on Openstack deployments we would like to increase Octavia LB default timeouts. There is a bug in Openshift [1], which will solve this issue but it will not be backported to OCP 3.10 and OCP 3.11, the versions we currently support, so we need to cope with the solution from OSP side. The procedure we are following manually to change the timeouts is: 1.- Log-in into overcloud controller 2.- Add the files on https://github.com/openstack/octavia/tree/stable/queens/octavia/common/jinja/haproxy/templates on /var/lib/config-data/puppet-generated/octavia/ 3.- Modify the base.j2 file to increase the default timeout_client and timeout_server values (for instance from 50000 to 500000, i.e., from 50 seconds to 500 seconds): https://github.com/openstack/octavia/blob/stable/queens/octavia/common/jinja/haproxy/templates/base.j2#L42-L43 4.- Edit file /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf to point to the haproxy template just copied (note it is mounted in a different directory inside the container): [haproxy_amphora] haproxy_template = /var/lib/kolla/config_files/src/haproxy.cfg.j24.- restart octavia-worker container 5.- Restart octavia-worker container 6.- Trigger openshift-ansible provisioning playbooks Adding support in TripleO/Director would help configuring the timeouts with the required values and the changes would be persistent to upgrades/updates. Version-Release number of selected component (if applicable): OSP 13 2018-09-28.1 puddle How reproducible: new requirement [1] https://bugzilla.redhat.com/show_bug.cgi?id=1618685
According to our records, this should be resolved by puppet-octavia-12.4.0-7.el7ost. This build is available now.
(In reply to Lon Hohberger from comment #9) > According to our records, this should be resolved by > puppet-octavia-12.4.0-7.el7ost. This build is available now. (overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripl | grep templ openstack-tripleo-heat-templates-8.0.7-21.el7ost.noarch (overcloud) [stack@undercloud-0 ~]$ (overcloud) [stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 13 -p 2019-01-22.1 openstack-tripleo-heat-templates-8.0.7-21.el7ost.noarch != openstack-tripleo-heat-templates-8.0.8-0.20181105200942.4d61f75.el7ost on qa is not correct state for now
No need for NEEDINFO on Jon, the reporter. It is a release delivery matter at this point.
Waiting to [1] to be fixed, as it blocks Openshift installation. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1684077
Verified in OSP13 2019-02-25.2 puddle. [stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 13 -p 2019-02-25.2 [stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-heat-templates openstack-tripleo-heat-templates-8.2.0-4.el7ost.noarch [stack@undercloud-0 ~]$ rpm -qa | grep puppet-octavia puppet-octavia-12.4.0-8.el7ost.noarch - octavia_health_manager container: openstack-octavia-common-2.0.3-2.el7ost.noarch openstack-octavia-health-manager-2.0.3-2.el7ost.noarch - octavia_api container: openstack-octavia-common-2.0.3-2.el7ost.noarch openstack-octavia-api-2.0.3-2.el7ost.noarch - octavia_housekeeping container: openstack-octavia-common-2.0.3-2.el7ost.noarch openstack-octavia-housekeeping-2.0.3-2.el7ost.noarch - octavia_worker container: openstack-octavia-common-2.0.3-2.el7ost.noarch openstack-octavia-worker-2.0.3-2.el7ost.noarch Verification steps: 1. Deploy OSP 13 with octavia in a hybrid environment ยท Set in customized Jenkins job OVERCLOUD_DEPLOY_OVERRIDE_OPTIONS: --config-heat OctaviaTimeoutClientData=1200000 \ --config-heat OctaviaTimeoutMemberData=1200000 2. Check the new timeouts values are reflected in configuration (in the controller): /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf:timeout_client_data=1200000 /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf:timeout_member_connect=5000 /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf:timeout_member_data=1200000 /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf:timeout_tcp_inspect=0 /var/lib/config-data/clustercheck/etc/puppet/hieradata/service_configs.json: "octavia::controller::timeout_client_data": 1200000, /var/lib/config-data/clustercheck/etc/puppet/hieradata/service_configs.json: "octavia::controller::timeout_member_connect": 5000, /var/lib/config-data/clustercheck/etc/puppet/hieradata/service_configs.json: "octavia::controller::timeout_member_data": 1200000, /var/lib/config-data/clustercheck/etc/puppet/hieradata/service_configs.json: "octavia::controller::timeout_tcp_inspect": 0, 3. Deploy OSP 3.11 with Kuryr and check the cluster is ready and all the pods are running: Note: A workaround for [1] has been applied - "yum install openshift-ansible --enablerepo=rhelosp-rhel-7.6-server-opt" [openshift@master-0 ~]$ oc version oc v3.11.90 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://console.openshift.example.com:8443 openshift v3.11.90 kubernetes v1.11.0+d4cacc0 [openshift@master-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION app-node-0.openshift.example.com Ready compute 2h v1.11.0+d4cacc0 app-node-1.openshift.example.com Ready compute 2h v1.11.0+d4cacc0 infra-node-0.openshift.example.com Ready infra 2h v1.11.0+d4cacc0 master-0.openshift.example.com Ready master 2h v1.11.0+d4cacc0 [openshift@master-0 ~]$ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-p6b2d 1/1 Running 0 2h default kuryr-pod-1958724533 1/1 Running 0 1h default registry-console-1-c7f79 1/1 Running 0 2h default router-1-vs7ld 1/1 Running 0 2h kube-system master-api-master-0.openshift.example.com 1/1 Running 0 2h kube-system master-controllers-master-0.openshift.example.com 1/1 Running 0 2h kube-system master-etcd-master-0.openshift.example.com 1/1 Running 0 2h kuryr-namespace-699852674 kuryr-pod-436421213 1/1 Running 0 1h kuryr-namespace-920831344 kuryr-pod-826298564 1/1 Running 0 1h openshift-console console-5d5b6bd95d-m487h 1/1 Running 0 2h openshift-infra kuryr-cni-ds-6w4hx 2/2 Running 0 1h openshift-infra kuryr-cni-ds-pdfx5 2/2 Running 0 1h openshift-infra kuryr-cni-ds-pkzxj 2/2 Running 0 1h openshift-infra kuryr-cni-ds-xpg25 2/2 Running 0 1h openshift-infra kuryr-controller-6c6d965c54-k68fn 1/1 Running 0 1h openshift-monitoring alertmanager-main-0 3/3 Running 0 2h openshift-monitoring alertmanager-main-1 3/3 Running 0 2h openshift-monitoring alertmanager-main-2 3/3 Running 0 2h openshift-monitoring cluster-monitoring-operator-75c6b544dd-4swg4 1/1 Running 0 2h openshift-monitoring grafana-c7d5bc87c-qw92g 2/2 Running 0 2h openshift-monitoring kube-state-metrics-6c64799586-gmjv5 3/3 Running 0 2h openshift-monitoring node-exporter-57hdh 2/2 Running 0 2h openshift-monitoring node-exporter-mptl8 2/2 Running 0 2h openshift-monitoring node-exporter-qdqq5 2/2 Running 0 2h openshift-monitoring node-exporter-rqm49 2/2 Running 0 2h openshift-monitoring prometheus-k8s-0 4/4 Running 1 2h openshift-monitoring prometheus-k8s-1 4/4 Running 1 2h openshift-monitoring prometheus-operator-5b47ff445b-w5v4c 1/1 Running 0 2h openshift-node sync-fv2tv 1/1 Running 0 2h openshift-node sync-gklcl 1/1 Running 0 2h openshift-node sync-gr62d 1/1 Running 0 2h openshift-node sync-k6nn2 1/1 Running 0 2h [1] https://bugzilla.redhat.com/show_bug.cgi?id=1684077
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0448