Bug 1623989 - Openshift-on-OpenStack documentate required Octavia LB timeout values for OCP 3.10 and OCP 3.11 on OSP 13
Summary: Openshift-on-OpenStack documentate required Octavia LB timeout values for OCP...
Keywords:
Status: CLOSED DUPLICATE of bug 1685481
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.z
Assignee: Luis Tomas Bolivar
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On: 1636496 1669078 1685481
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-30 15:34 UTC by Jon Uriarte
Modified: 2019-03-12 10:27 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-12 10:27:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jon Uriarte 2018-08-30 15:34:34 UTC
Description of problem:

Openshift-on-Openstack installer playbooks end successfully, with no errors, but some pods in the default namespace remain in Error status.
It's currently happening in almost 100% of the deployments. The pod that fails is somehow aleatory, but there is always at least one pod that ends up in Error status.
In this case the docker-registry could not be deployed:

$ oc get pods -o wide                                                                                                                                                                        
NAME                       READY     STATUS              RESTARTS   AGE
docker-registry-1-deploy   1/1       Running             0          10m
docker-registry-1-rg9jj    0/1       ContainerCreating   0          2m
registry-console-1-mdh8z   1/1       Running             0          21s
router-1-v5dv4             1/1       Running             0          2m

$ oc logs docker-registry-1-rg9jj
time="2018-08-30T14:52:15Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT" 
time="2018-08-30T14:52:15Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT_9000_TCP" 
time="2018-08-30T14:52:15Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT_9000_TCP_ADDR" 
time="2018-08-30T14:52:15Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT_9000_TCP_PORT" 
time="2018-08-30T14:52:15Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT_9000_TCP_PROTO" 
time="2018-08-30T14:52:15Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_SERVICE_HOST" 
time="2018-08-30T14:52:15Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_SERVICE_PORT" 
time="2018-08-30T14:52:15Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_SERVICE_PORT_REGISTRY_CONSOLE" 
time="2018-08-30T14:52:15.919779213Z" level=info msg="start registry" distribution_version=v2.6.2+unknown go.version=go1.9.4 instance.id=1ba98ace-b855-4205-927f-bf36899de2b7 openshift_version=v3.10.34 
time="2018-08-30T14:52:15.920140907Z" level=info msg="quota enforcement disabled" go.version=go1.9.4 instance.id=1ba98ace-b855-4205-927f-bf36899de2b7 
time="2018-08-30T14:52:15.921843358Z" level=info msg="redis not configured" go.version=go1.9.4 instance.id=1ba98ace-b855-4205-927f-bf36899de2b7 
time="2018-08-30T14:52:15.941752053Z" level=info msg="using inmemory blob descriptor cache" go.version=go1.9.4 instance.id=1ba98ace-b855-4205-927f-bf36899de2b7 
time="2018-08-30T14:52:15.941783908Z" level=debug msg="configured \"openshift\" access controller" go.version=go1.9.4 instance.id=1ba98ace-b855-4205-927f-bf36899de2b7 
time="2018-08-30T14:52:15.941813082Z" level=debug msg="configured token endpoint at \"/openshift/token\"" go.version=go1.9.4 instance.id=1ba98ace-b855-4205-927f-bf36899de2b7 
time="2018-08-30T14:52:15.942707687Z" level=info msg="listening on :5000, tls" go.version=go1.9.4 instance.id=1ba98ace-b855-4205-927f-bf36899de2b7 
time="2018-08-30T14:52:15.943766611Z" level=info msg="Starting upload purge in 40m0s" go.version=go1.9.4 instance.id=1ba98ace-b855-4205-927f-bf36899de2b7 

There seems to be some kind of error when deploying the deploymentconfig that makes it fail, and thus the deploy pod is set to Error.
As the docker-registry-1-rg9jj pod is deleted no more logs could be retrieved, the above logs were taken during the second that docker-registry-1-rg9jj pod was running.

[openshift@master-0 ~]$ oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   0/1       Error     0          30m
registry-console-1-mdh8z   1/1       Running   0          19m
router-1-v5dv4             1/1       Running   0          21m

The workaround to recover from this situation is to re-trigger the pod(s) creation:
1- Edit the deploymentconfig of each pod in Error status:
  $ oc get dc
  NAME               REVISION   DESIRED   CURRENT   TRIGGERED BY
  docker-registry    1          1         1         config       <---
  registry-console   1          1         1         config
  router             1          1         1         config

  $ oc edit dc docker-registry
   Change (for example) the name inside the spec section:
        name: set it to registry-1

2- Check a new pod is deployed:
  $ oc get pods
  NAME                       READY     STATUS    RESTARTS   AGE
  docker-registry-1-deploy   0/1       Error     0          43m
  docker-registry-2-8cp2m    1/1       Running   0          1m
  registry-console-1-mdh8z   1/1       Running   0          32m
  router-1-v5dv4             1/1       Running   0          34m

3- Delete the replicationcontroller of the pod(s) in Error:
  $ oc get rc
  NAME                 DESIRED   CURRENT   READY     AGE
  docker-registry-1    0         0         0         45m
  docker-registry-2    1         1         1         4m
  registry-console-1   1         1         1         45m
  router-1             1         1         1         45m
  
  $ oc delete rc docker-registry-1
  replicationcontroller "docker-registry-1" deleted

4- Check all the pods are running
  $ oc get pods
  NAME                       READY     STATUS    RESTARTS   AGE
  docker-registry-2-8cp2m    1/1       Running   0          4m
  registry-console-1-mdh8z   1/1       Running   0          35m
  router-1-v5dv4             1/1       Running   0          38m

5- These steps must be followed for all the pods in Error status

Version-Release number of the following components:
rpm -q openshift-ansible

openshift-ansible-3.10.34-1.git.0.48df172None.noarch

rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch

ansible --version
ansible 2.4.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]


How reproducible: Almost 100% of the times

Steps to Reproduce:
1. Deploy OpenStack with Octavia (OSP13)
2. Deploy a DNS server and the Ansible host in the overcloud
3. Download OCP rpm and configure:
   - OpenStack (inventory/group_vars/all.yml)
   - OpenShift (inventory/group_vars/OSEv3.yml)
4. Run Openshift-on-Openstack playbooks:
- ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/prerequisites.yml
- ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/provision.yml
- ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory red-hat-ca.yml
- ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory repos.yml
- ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/install.yml

5. Check the pods are in Running status:
  $ oc get pods --all-namespaces

Actual results:
Some pods in the default namespace are in Error
[openshift@master-0 ~]$ oc get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP              NODE
docker-registry-1-deploy   0/1       Error     0          14m       10.11.0.5       infra-node-0.openshift.example.com
registry-console-1-mdh8z   1/1       Running   0          4m        10.11.0.23      master-0.openshift.example.com
router-1-v5dv4             1/1       Running   0          6m        192.168.99.13   infra-node-0.openshift.example.com

Expected results:
All the pods in Running status


Additional info:

$ oc describe pod docker-registry-1-deploy
Name:         docker-registry-1-deploy
Namespace:    default
Node:         infra-node-0.openshift.example.com/192.168.99.13
Start Time:   Thu, 30 Aug 2018 10:51:29 -0400
Labels:       openshift.io/deployer-pod-for.name=docker-registry-1
Annotations:  openshift.io/deployment-config.name=docker-registry
              openshift.io/deployment.name=docker-registry-1
              openshift.io/scc=restricted
              openstack.org/kuryr-vif={"versioned_object.data": {"active": true, "address": "fa:16:3e:18:d9:a8", "has_traffic_filtering": false, "id": "b5ecb7ba-1cdd-46cf-9f0b-388e5a74ef50", "network": {"version
ed_...
Status:       Failed
IP:           10.11.0.5
Containers:
  deployment:
    Container ID:   docker://63a980461b4445798046c45a725e557a641d5aba13318b43581b46cccf2a0958
    Image:          registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.10.34
    Image ID:       docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-deployer@sha256:208bd20062c144da5115d30aca966cf30619912855e709b663bafdfad8b5357a
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 30 Aug 2018 10:51:31 -0400
      Finished:     Thu, 30 Aug 2018 10:52:23 -0400
    Ready:          False
    Restart Count:  0
    Environment:
      OPENSHIFT_DEPLOYMENT_NAME:       docker-registry-1
      OPENSHIFT_DEPLOYMENT_NAMESPACE:  default
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from deployer-token-pmhh9 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  deployer-token-pmhh9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  deployer-token-pmhh9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/infra=true
Tolerations:     <none>
Events:
  Type     Reason            Age                 From                                         Message
  ----     ------            ----                ----                                         -------
  Warning  FailedScheduling  31m (x26 over 37m)  default-scheduler                            0/4 nodes are available: 4 node(s) were not ready.
  Normal   Pulled            29m                 kubelet, infra-node-0.openshift.example.com  Container image "registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.10.34" already present on machine
  Normal   Created           29m                 kubelet, infra-node-0.openshift.example.com  Created container
  Normal   Started           29m                 kubelet, infra-node-0.openshift.example.com  Started container

Comment 1 Jon Uriarte 2018-08-31 10:31:06 UTC
Adding logs from infra-node-0.openshift.example.com where the pod docker-registry-1-rg9jj creation failed:

Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.049784   28245 kubelet.go:1869] SyncLoop (ADD, "api"): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)"
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.194738   28245 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "registry-certificates" (UniqueName: "kube
rnetes.io/secret/2f05e71c-ac64-11e8-8a03-fa163ee43a34-registry-certificates") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.194835   28245 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "registry-storage" (UniqueName: "kubernete
s.io/empty-dir/2f05e71c-ac64-11e8-8a03-fa163ee43a34-registry-storage") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.194879   28245 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "registry-token-5rdwt" (UniqueName: "kuber
netes.io/secret/2f05e71c-ac64-11e8-8a03-fa163ee43a34-registry-token-5rdwt") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.295272   28245 reconciler.go:252] operationExecutor.MountVolume started for volume "registry-certificates" (UniqueName: "kubernetes.io/secret/2f
05e71c-ac64-11e8-8a03-fa163ee43a34-registry-certificates") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.295407   28245 reconciler.go:252] operationExecutor.MountVolume started for volume "registry-storage" (UniqueName: "kubernetes.io/empty-dir/2f05
e71c-ac64-11e8-8a03-fa163ee43a34-registry-storage") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.295440   28245 reconciler.go:252] operationExecutor.MountVolume started for volume "registry-token-5rdwt" (UniqueName: "kubernetes.io/secret/2f0
5e71c-ac64-11e8-8a03-fa163ee43a34-registry-token-5rdwt") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.296654   28245 operation_generator.go:555] MountVolume.SetUp succeeded for volume "registry-storage" (UniqueName: "kubernetes.io/empty-dir/2f05e
71c-ac64-11e8-8a03-fa163ee43a34-registry-storage") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.316504   28245 operation_generator.go:555] MountVolume.SetUp succeeded for volume "registry-certificates" (UniqueName: "kubernetes.io/secret/2f0
5e71c-ac64-11e8-8a03-fa163ee43a34-registry-certificates") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.329506   28245 operation_generator.go:555] MountVolume.SetUp succeeded for volume "registry-token-5rdwt" (UniqueName: "kubernetes.io/secret/2f05
e71c-ac64-11e8-8a03-fa163ee43a34-registry-token-5rdwt") pod "docker-registry-1-rg9jj" (UID: "2f05e71c-ac64-11e8-8a03-fa163ee43a34")
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: I0830 10:51:32.384930   28245 kuberuntime_manager.go:385] No sandbox for pod "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)" can be fou
nd. Need to start a new one
Aug 30 10:51:32 infra-node-0 atomic-openshift-node: + docker exec --env 'CNI_ARGS=IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=docker-registry-1-rg9jj;K8S_POD_INFRA_CONTAINER_ID=31a90463a12df537f53f1d2
a7f1b3e0aa186055c3140b838ecfc2028cfa55870' --env CNI_COMMAND=ADD --env CNI_IFNAME=eth0 --env CNI_NETNS=/proc/29328/ns/net --env CNI_CONTAINERID=31a90463a12df537f53f1d2a7f1b3e0aa186055c3140b838ecfc2028cfa55870 --
env CNI_PATH=/opt/kuryr-cni/bin:/opt/cni/bin -i 0a99cd97702f363f1246ff43866f77345296f9e4c282bb2047616812f51f0d53 kuryr-cni --config-file /etc/kuryr/kuryr.conf
Aug 30 10:51:33 infra-node-0 atomic-openshift-node: I0830 10:51:33.405982   28245 kubelet.go:1876] SyncLoop (UPDATE, "api"): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)"
Aug 30 10:51:33 infra-node-0 atomic-openshift-node: I0830 10:51:33.514494   28245 kubelet.go:1914] SyncLoop (PLEG): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)", event: &pleg.PodLifecy
cleEvent{ID:"2f05e71c-ac64-11e8-8a03-fa163ee43a34", Type:"ContainerStarted", Data:"31a90463a12df537f53f1d2a7f1b3e0aa186055c3140b838ecfc2028cfa55870"}
Aug 30 10:51:33 infra-node-0 atomic-openshift-node: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/29328/ns/net', 'CNI_PATH': '/opt/kuryr-cni/bin:/opt/cni/bin', 'CNI_ARGS': 'IgnoreUnknown=1;K8S_POD_NAMESPACE=default
;K8S_POD_NAME=docker-registry-1-rg9jj;K8S_POD_INFRA_CONTAINER_ID=31a90463a12df537f53f1d2a7f1b3e0aa186055c3140b838ecfc2028cfa55870', 'CNI_DAEMON': 'True', 'CNI_CONFIG_DIR_PATH': '/etc/cni/net.d', 'CNI_COMMAND': '
ADD', 'CNI_CONTAINERID': '31a90463a12df537f53f1d2a7f1b3e0aa186055c3140b838ecfc2028cfa55870', 'CNI_BIN_DIR_PATH': '/opt/cni/bin', 'config_kuryr': {u'debug': True, u'cniVersion': u'0.3.0', u'type': u'kuryr-cni', u
'kuryr_conf': u'/etc/kuryr/kuryr.conf', u'name': u'kuryr'}} _make_request /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py:169
Aug 30 10:54:09 infra-node-0 atomic-openshift-node: I0830 10:54:09.807448   28245 kubelet.go:1914] SyncLoop (PLEG): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)", event: &pleg.PodLifecy
cleEvent{ID:"2f05e71c-ac64-11e8-8a03-fa163ee43a34", Type:"ContainerStarted", Data:"934b8227048c27591bc326362ddbc70a257ee9517fa1691b459a550faa9d4a3a"}
Aug 30 10:54:11 infra-node-0 atomic-openshift-node: I0830 10:54:11.345338   28245 kubelet.go:1885] SyncLoop (DELETE, "api"): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)"
Aug 30 10:54:12 infra-node-0 atomic-openshift-node: + docker exec --env 'CNI_ARGS=IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=docker-registry-1-rg9jj;K8S_POD_INFRA_CONTAINER_ID=31a90463a12df537f53f1d2
a7f1b3e0aa186055c3140b838ecfc2028cfa55870' --env CNI_COMMAND=DEL --env CNI_IFNAME=eth0 --env CNI_NETNS=/proc/29328/ns/net --env CNI_CONTAINERID=31a90463a12df537f53f1d2a7f1b3e0aa186055c3140b838ecfc2028cfa55870 --
env CNI_PATH=/opt/kuryr-cni/bin:/opt/cni/bin -i 0a99cd97702f363f1246ff43866f77345296f9e4c282bb2047616812f51f0d53 kuryr-cni --config-file /etc/kuryr/kuryr.conf
Aug 30 10:54:13 infra-node-0 atomic-openshift-node: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/29328/ns/net', 'CNI_PATH': '/opt/kuryr-cni/bin:/opt/cni/bin', 'CNI_ARGS': 'IgnoreUnknown=1;K8S_POD_NAMESPACE=default
;K8S_POD_NAME=docker-registry-1-rg9jj;K8S_POD_INFRA_CONTAINER_ID=31a90463a12df537f53f1d2a7f1b3e0aa186055c3140b838ecfc2028cfa55870', 'CNI_DAEMON': 'True', 'CNI_CONFIG_DIR_PATH': '/etc/cni/net.d', 'CNI_COMMAND': '
DEL', 'CNI_CONTAINERID': '31a90463a12df537f53f1d2a7f1b3e0aa186055c3140b838ecfc2028cfa55870', 'CNI_BIN_DIR_PATH': '/opt/cni/bin', 'config_kuryr': {u'debug': True, u'cniVersion': u'0.3.0', u'type': u'kuryr-cni', u
'kuryr_conf': u'/etc/kuryr/kuryr.conf', u'name': u'kuryr'}} _make_request /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py:169
Aug 30 10:54:13 infra-node-0 atomic-openshift-node: I0830 10:54:13.994712   28245 kubelet.go:1914] SyncLoop (PLEG): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)", event: &pleg.PodLifecy
cleEvent{ID:"2f05e71c-ac64-11e8-8a03-fa163ee43a34", Type:"ContainerDied", Data:"934b8227048c27591bc326362ddbc70a257ee9517fa1691b459a550faa9d4a3a"}
Aug 30 10:54:13 infra-node-0 atomic-openshift-node: I0830 10:54:13.994823   28245 kubelet.go:1914] SyncLoop (PLEG): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)", event: &pleg.PodLifecy
cleEvent{ID:"2f05e71c-ac64-11e8-8a03-fa163ee43a34", Type:"ContainerDied", Data:"31a90463a12df537f53f1d2a7f1b3e0aa186055c3140b838ecfc2028cfa55870"}
Aug 30 10:54:15 infra-node-0 atomic-openshift-node: I0830 10:54:15.024339   28245 kubelet.go:1885] SyncLoop (DELETE, "api"): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)"
Aug 30 10:54:15 infra-node-0 atomic-openshift-node: I0830 10:54:15.034589   28245 kubelet.go:1879] SyncLoop (REMOVE, "api"): "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)"
Aug 30 10:54:15 infra-node-0 atomic-openshift-node: I0830 10:54:15.034634   28245 kubelet.go:2081] Failed to delete pod "docker-registry-1-rg9jj_default(2f05e71c-ac64-11e8-8a03-fa163ee43a34)", err: pod not found

Comment 3 Michał Dulko 2018-09-03 13:52:57 UTC
This seems to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1618685.

Comment 4 Luis Tomas Bolivar 2018-09-04 08:32:19 UTC
Jon and I confirmed that it is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1618685 as Michal suggested. 

The workaround for this is to raise the Octavia K8s-API LB timeout so that the connection is not closed while waiting for the pods to become ready.

The steps for the workaround for OSP 13 to increase Octavia LB timeout are:
1.- Log in into overcloud controller
2.- Add the files on https://github.com/openstack/octavia/tree/stable/queens/octavia/common/jinja/haproxy/templates on /var/lib/config-data/puppet-generated/octavia/
3.- Modify the base.j2 file to increase the default timeouts (for instance from 50000 to 500000, i.e., from 50 seconds to 500 seconds): https://github.com/openstack/octavia/blob/stable/queens/octavia/common/jinja/haproxy/templates/base.j2#L42-L43
4.- restart octavia-worker container
5.- Trigger openshift-ansible provisioning playbooks

*** This bug has been marked as a duplicate of bug 1618685 ***

Comment 5 Luis Tomas Bolivar 2018-09-04 08:44:16 UTC
Upps, sorry, after step number 3, another step is missing to change octavia.conf configuration to point to the template:
3.1.- edit file /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf to point to the haproxy template just copied (note it is mounted in a different directory inside the container):
[haproxy_amphora]
haproxy_template = /var/lib/kolla/config_files/src/haproxy.cfg.j2

Comment 6 rlopez 2018-09-26 15:14:44 UTC
Regarding the changes, those changes since they are not done via director if there is an update/upgrade of the OSP environment these changes would be lost. What parameters need to be set in an overcloud update for them to be set?

Comment 7 Jon Uriarte 2018-10-04 09:18:09 UTC
Re-opening this BZ, as [1] fix will not be backported to OCP 3.10 and OCP 3.11, Octavia default timeouts for Load Balancers need to be increased.

The change must be done via Director in order to be persistent to updates and upgrades.

Documentation in OCP 3.10 and OCP 3.11 OpenStack playbooks needs to be updated accordingly and reflect these changes.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1618685

Comment 8 Jon Uriarte 2018-10-05 14:31:12 UTC
Filed a BZ in OSP 13 [1] requesting support for changing Octavia LB timeouts in TripleO.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1636496

Comment 9 Luis Tomas Bolivar 2019-03-12 10:27:47 UTC
Closing the bug as the documentation update has already being handled on https://bugzilla.redhat.com/show_bug.cgi?id=1685481

*** This bug has been marked as a duplicate of bug 1685481 ***


Note You need to log in before you can comment on or make changes to this bug.