Created attachment 1237223 [details] journal logs of master node Description of problem: Router not working after OSE upgrade from 3.3 to 3.4 Steps to Reproduce: 1. subscription-manager repos --disable="rhel-7-server-ose-3.3-rpms" 2. yum clean all 3. wget http://file.rdu.redhat.com/tdawson/repo/aos-unsigned-latest.repo -P /etc/yum.repos.d/ 4. yum update atomic-openshift-utils [root@dhcp47-115 ~]# rpm -qa | grep openshift atomic-openshift-clients-3.3.1.7-1.git.0.0988966.el7.x86_64 atomic-openshift-node-3.3.1.7-1.git.0.0988966.el7.x86_64 openshift-ansible-callback-plugins-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-playbooks-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-3.3.1.7-1.git.0.0988966.el7.x86_64 openshift-ansible-docs-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-lookup-plugins-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-roles-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-utils-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-master-3.3.1.7-1.git.0.0988966.el7.x86_64 tuned-profiles-atomic-openshift-node-3.3.1.7-1.git.0.0988966.el7.x86_64 atomic-openshift-sdn-ovs-3.3.1.7-1.git.0.0988966.el7.x86_64 openshift-ansible-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-filter-plugins-3.4.41-1.git.0.449ee52.el7.noarch 5. yum install atomic-openshift-excluder atomic-openshift-docker-excluder [root@dhcp47-115 ~]# rpm -qa | grep openshift atomic-openshift-clients-3.3.1.7-1.git.0.0988966.el7.x86_64 atomic-openshift-node-3.3.1.7-1.git.0.0988966.el7.x86_64 openshift-ansible-callback-plugins-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-playbooks-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-docker-excluder-3.4.0.38-1.git.0.8561cba.el7.noarch atomic-openshift-3.3.1.7-1.git.0.0988966.el7.x86_64 openshift-ansible-docs-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-lookup-plugins-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-roles-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-utils-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-master-3.3.1.7-1.git.0.0988966.el7.x86_64 tuned-profiles-atomic-openshift-node-3.3.1.7-1.git.0.0988966.el7.x86_64 atomic-openshift-sdn-ovs-3.3.1.7-1.git.0.0988966.el7.x86_64 openshift-ansible-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-filter-plugins-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-excluder-3.4.0.38-1.git.0.8561cba.el7.noarch [root@dhcp47-115 ~]# rpm -qa | grep excluder atomic-openshift-docker-excluder-3.4.0.38-1.git.0.8561cba.el7.noarch atomic-openshift-excluder-3.4.0.38-1.git.0.8561cba.el7.noarch 6. atomic-openshift-excluder unexclude 7. atomic-openshift-installer upgrade [root@dhcp47-115 ~]# atomic-openshift-installer -u -c openshift-installer.cfg.yml upgrade This tool will help you upgrade your existing OpenShift installation. Currently running: openshift-enterprise 3.3 (1) Update to latest 3.3 (2) Upgrade to next release: 3.4 Choose an option from above: 2 OpenShift will be upgraded from openshift-enterprise 3.3 to latest openshift-enterprise 3.4 on the following hosts: * dhcp47-115.lab.eng.blr.redhat.com * dhcp46-130.lab.eng.blr.redhat.com * dhcp46-92.lab.eng.blr.redhat.com * dhcp46-121.lab.eng.blr.redhat.com Play 1/61 (Verify Ansible version is greater than or equal to 2.1.0.0) . Play 2/61 (localhost) .. Play 3/61 (l_oo_all_hosts) . Play 4/61 (Populate config host groups) ................ Play 5/61 (Set oo_options) ....... Play 6/61 (Ensure that all non-node hosts are accessible) . Play 7/61 (Initialize host facts) ........... Play 8/61 (l_oo_all_hosts) .. Play 9/61 (Filter list of nodes to be upgraded if necessary) ..... Play 10/61 (Update repos and initialize facts on all hosts) .............. Play 11/61 (Set openshift_no_proxy_internal_hostnames) .. Play 12/61 (Verify upgrade can proceed on first master) ..... Play 13/61 (l_oo_all_hosts) .. Play 14/61 (Determine openshift_version to configure on first master) .......................................................................................... Play 15/61 (Set openshift_version for all hosts) .......................................................................................... Play 16/61 (Verify master processes) ............ Play 17/61 (Verify upgrade targets) .... [WARNING]: Consider using yum module rather than running yum ..... Play 18/61 (Verify docker upgrade targets) ... [WARNING]: Consider using yum, dnf or zypper module rather than running rpm .......... Play 19/61 (Flag pre-upgrade checks complete for hosts without errors) .. Play 20/61 (Cleanup unused Docker images) ...... Play 21/61 (Evaluate additional groups for upgrade) .. Play 22/61 (Set master embedded_etcd fact) ......... Play 23/61 (Populate config host groups) ................ Play 24/61 (Evaluate additional groups for etcd) ... Play 25/61 (Backup etcd) ................... Play 26/61 (Gate on etcd backup) .... Play 27/61 (Backup etcd) ................... Play 28/61 (Gate on etcd backup) .... Play 29/61 (Upgrade master packages) .......... Play 30/61 (Determine if service signer cert must be created) .. Play 31/61 (Create local temp directory for syncing certs) . Play 32/61 (Create service signer certificate) ..... Play 33/61 (Deploy service signer certificate) .. Play 34/61 (Delete local temp directory) . Play 35/61 (Set OpenShift master facts) ............... Play 36/61 (Upgrade master config and systemd units) ..................................................... Play 37/61 (Set master update status to complete) .. Play 38/61 (Gate on master update) .... Play 39/61 (Populate config host groups) ................ Play 40/61 (Validate configuration for rolling restart) .......... Play 41/61 (Create temp file on localhost) . Play 42/61 (Check if temp file exists on any masters) .. Play 43/61 (Cleanup temp file on localhost) . Play 44/61 (Warn if restarting the system where ansible is running) ... Play 45/61 (Restart masters) ....... Play 46/61 (Reconcile Cluster Roles and Cluster Role Bindings and Security Context Constraints) .................................................................................................................................................................................................................................................................................. Play 47/61 (Gate on reconcile) .... Play 48/61 (Upgrade default router and default registry) .......................................................................................................................................................................................................................................................................................................... Play 49/61 (Check for warnings) ... Play 50/61 (Evacuate and upgrade nodes) ......................................................................................... Play 51/61 (Evacuate and upgrade nodes) ......................................................................................... Play 52/61 (Evacuate and upgrade nodes) ...................................................................... ................... Play 53/61 (Evacuate and upgrade nodes) ........................................ .............................. ................... dhcp46-121.lab.eng.blr.redhat.com : ok=112 changed=14 unreachable=0 failed=0 dhcp46-130.lab.eng.blr.redhat.com : ok=112 changed=14 unreachable=0 failed=0 dhcp46-92.lab.eng.blr.redhat.com : ok=112 changed=14 unreachable=0 failed=0 dhcp47-115.lab.eng.blr.redhat.com : ok=356 changed=56 unreachable=0 failed=0 localhost : ok=38 changed=3 unreachable=0 failed=0 Installation Complete: Note: Play count is an estimate and some were skipped because your install does not require them Upgrade completed! Rebooting all hosts is recommended. [root@dhcp47-115 ~]# rpm -qa | grep openshift openshift-ansible-callback-plugins-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-playbooks-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-node-3.4.0.38-1.git.0.8561cba.el7.x86_64 atomic-openshift-docker-excluder-3.4.0.38-1.git.0.8561cba.el7.noarch tuned-profiles-atomic-openshift-node-3.4.0.38-1.git.0.8561cba.el7.x86_64 atomic-openshift-3.4.0.38-1.git.0.8561cba.el7.x86_64 openshift-ansible-docs-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-lookup-plugins-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-roles-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-utils-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-sdn-ovs-3.4.0.38-1.git.0.8561cba.el7.x86_64 openshift-ansible-3.4.41-1.git.0.449ee52.el7.noarch openshift-ansible-filter-plugins-3.4.41-1.git.0.449ee52.el7.noarch atomic-openshift-excluder-3.4.0.38-1.git.0.8561cba.el7.noarch atomic-openshift-clients-3.4.0.38-1.git.0.8561cba.el7.x86_64 atomic-openshift-master-3.4.0.38-1.git.0.8561cba.el7.x86_64 [root@dhcp47-115 ~]# oadm version oadm v3.4.0.38 kubernetes v1.4.0+776c994 Server https://dhcp47-115.lab.eng.blr.redhat.com:8443 openshift v3.4.0.38 kubernetes v1.4.0+776c994 9. update the /etc/sysconfig/docker add registry to pull the ose3.4 images 10. reboot nodes one after the other. [root@dhcp47-115 ~]# oc get pods NAME READY STATUS RESTARTS AGE aplo-router-1-ocneo 1/1 Running 1 1h glusterfs-dc-dhcp46-121.lab.eng.blr.redhat.com-1-tbutq 1/1 Running 1 1h glusterfs-dc-dhcp46-130.lab.eng.blr.redhat.com-1-z87rg 1/1 Running 1 1h glusterfs-dc-dhcp46-92.lab.eng.blr.redhat.com-1-zhkov 1/1 Running 1 1h heketi-1-2b2sh 1/1 Running 4 1h mongodb-1-7gp1u 1/1 Running 5 1h [root@dhcp47-115 ~]# oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY aplo-router 2 1 0 config glusterfs-dc-dhcp46-121.lab.eng.blr.redhat.com 1 1 1 config glusterfs-dc-dhcp46-130.lab.eng.blr.redhat.com 1 1 1 config glusterfs-dc-dhcp46-92.lab.eng.blr.redhat.com 1 1 1 config heketi 1 1 1 config jenkins 2 1 0 config,image(jenkins:latest) mongodb 1 1 1 config,image(mongodb:3.2) [root@dhcp47-115 ~]# oc describe dc aplo-router Name: aplo-router Namespace: aplo Created: 23 hours ago Labels: router=aplo-router Annotations: <none> Latest Version: 2 Selector: router=aplo-router Replicas: 1 Triggers: Config Strategy: Rolling Template: Labels: router=aplo-router Service Account: router Containers: router: Image: openshift3/ose-haproxy-router:v3.4.0.38 Ports: 80/TCP, 443/TCP, 1936/TCP Requests: cpu: 100m memory: 256Mi Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Volume Mounts: /etc/pki/tls/private from server-certificate (ro) Environment Variables: DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private ROUTER_EXTERNAL_HOST_HOSTNAME: ROUTER_EXTERNAL_HOST_HTTPS_VSERVER: ROUTER_EXTERNAL_HOST_HTTP_VSERVER: ROUTER_EXTERNAL_HOST_INSECURE: false ROUTER_EXTERNAL_HOST_PARTITION_PATH: ROUTER_EXTERNAL_HOST_PASSWORD: ROUTER_EXTERNAL_HOST_PRIVKEY: /etc/secret-volume/router.pem ROUTER_EXTERNAL_HOST_USERNAME: ROUTER_SERVICE_HTTPS_PORT: 443 ROUTER_SERVICE_HTTP_PORT: 80 ROUTER_SERVICE_NAME: aplo-router ROUTER_SERVICE_NAMESPACE: aplo ROUTER_SUBDOMAIN: STATS_PASSWORD: I2gclLv7ZU STATS_PORT: 1936 STATS_USERNAME: admin Volumes: server-certificate: Type: Secret (a volume populated by a Secret) SecretName: aplo-router-certs Deployment #2 (latest): Created: about an hour ago Status: Failed Replicas: 0 current / 0 desired Deployment #1: Name: aplo-router-1 Created: 23 hours ago Status: Complete Replicas: 1 current / 1 desired Selector: deployment=aplo-router-1,deploymentconfig=aplo-router,router=aplo-router Labels: openshift.io/deployment-config.name=aplo-router,router=aplo-router Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 {deploymentconfig-controller } Normal DeploymentCreated Created new replication controller "aplo-router-2" for version 2 1h 1h 1 {deployments-controller } Warning Failed aplo-router-2: Deployer pod "aplo-router-2-deploy" has gone missing [root@dhcp47-115 ~]# oc describe pod aplo-router-1-ocneo Name: aplo-router-1-ocneo Namespace: aplo Security Policy: privileged Node: dhcp46-130.lab.eng.blr.redhat.com/10.70.46.130 Start Time: Wed, 04 Jan 2017 19:27:21 +0530 Labels: deployment=aplo-router-1 deploymentconfig=aplo-router router=aplo-router Status: Running IP: 10.70.46.130 Controllers: ReplicationController/aplo-router-1 Containers: router: Container ID: docker://595674241fa18240dd8e58ef0631bb2e73558879f4bf4f7aa1f8ca396f76ea69 Image: openshift3/ose-haproxy-router:v3.3.1.7 Image ID: docker-pullable://registry.access.redhat.com/openshift3/ose-haproxy-router@sha256:f2f75cfd2b828c3143ca8022e26593a7491ca040dab6d6472472ed040d1c1b83 Ports: 80/TCP, 443/TCP, 1936/TCP Requests: cpu: 100m memory: 256Mi State: Running Started: Wed, 04 Jan 2017 20:26:59 +0530 Last State: Terminated Reason: Error Exit Code: 1 Started: Wed, 04 Jan 2017 20:17:57 +0530 Finished: Wed, 04 Jan 2017 20:26:13 +0530 Ready: True Restart Count: 1 Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Volume Mounts: /etc/pki/tls/private from server-certificate (ro) /var/run/secrets/kubernetes.io/serviceaccount from router-token-94h2m (ro) Environment Variables: DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private ROUTER_EXTERNAL_HOST_HOSTNAME: ROUTER_EXTERNAL_HOST_HTTPS_VSERVER: ROUTER_EXTERNAL_HOST_HTTP_VSERVER: ROUTER_EXTERNAL_HOST_INSECURE: false ROUTER_EXTERNAL_HOST_PARTITION_PATH: ROUTER_EXTERNAL_HOST_PASSWORD: ROUTER_EXTERNAL_HOST_PRIVKEY: /etc/secret-volume/router.pem ROUTER_EXTERNAL_HOST_USERNAME: ROUTER_SERVICE_HTTPS_PORT: 443 ROUTER_SERVICE_HTTP_PORT: 80 ROUTER_SERVICE_NAME: aplo-router ROUTER_SERVICE_NAMESPACE: aplo ROUTER_SUBDOMAIN: STATS_PASSWORD: I2gclLv7ZU STATS_PORT: 1936 STATS_USERNAME: admin Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: server-certificate: Type: Secret (a volume populated by a Secret) SecretName: aplo-router-certs router-token-94h2m: Type: Secret (a volume populated by a Secret) SecretName: router-token-94h2m QoS Class: Burstable Tolerations: <none> Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 {default-scheduler } Normal Scheduled Successfully assigned aplo-router-1-ocneo to dhcp46-130.lab.eng.blr.redhat.com 51m 51m 1 {kubelet dhcp46-130.lab.eng.blr.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImageInspectError: "Failed to inspect image \"openshift3/ose-pod:v3.4.0.38\": Cannot connect to the Docker daemon. Is the docker daemon running on this host?" 1h 35m 14 {kubelet dhcp46-130.lab.eng.blr.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "unauthorized: authentication required" 1h 34m 199 {kubelet dhcp46-130.lab.eng.blr.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"openshift3/ose-pod:v3.4.0.38\"" 34m 34m 1 {kubelet dhcp46-130.lab.eng.blr.redhat.com} spec.containers{router} Normal Pulling pulling image "openshift3/ose-haproxy-router:v3.3.1.7" 34m 34m 1 {kubelet dhcp46-130.lab.eng.blr.redhat.com} spec.containers{router} Normal Pulled Successfully pulled image "openshift3/ose-haproxy-router:v3.3.1.7" 33m 33m 1 {kubelet dhcp46-130.lab.eng.blr.redhat.com} spec.containers{router} Normal Created Created container with docker id 1b6c21a5e5a1; Security:[seccomp=unconfined] 33m 33m 1 {kubelet dhcp46-130.lab.eng.blr.redhat.com} spec.containers{router} Normal Started Started container with docker id 1b6c21a5e5a1 29m 29m 4 {kubelet dhcp46-130.lab.eng.blr.redhat.com} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/secret/b61bb2ff-d285-11e6-894c-005056b329c4-router-token-94h2m" (spec.Name: "router-token-94h2m") pod "b61bb2ff-d285-11e6-894c-005056b329c4" (UID: "b61bb2ff-d285-11e6-894c-005056b329c4") with: Get https://dhcp47-115.lab.eng.blr.redhat.com:8443/api/v1/namespaces/aplo/secrets/router-token-94h2m: dial tcp 10.70.47.115:8443: getsockopt: connection refused 29m 29m 4 {kubelet dhcp46-130.lab.eng.blr.redhat.com} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/secret/b61bb2ff-d285-11e6-894c-005056b329c4-server-certificate" (spec.Name: "server-certificate") pod "b61bb2ff-d285-11e6-894c-005056b329c4" (UID: "b61bb2ff-d285-11e6-894c-005056b329c4") with: Get https://dhcp47-115.lab.eng.blr.redhat.com:8443/api/v1/namespaces/aplo/secrets/aplo-router-certs: dial tcp 10.70.47.115:8443: getsockopt: connection refused 24m 24m 1 {kubelet dhcp46-130.lab.eng.blr.redhat.com} spec.containers{router} Normal Pulled Container image "openshift3/ose-haproxy-router:v3.3.1.7" already present on machine 24m 24m 1 {kubelet dhcp46-130.lab.eng.blr.redhat.com} spec.containers{router} Normal Created Created container with docker id 595674241fa1; Security:[seccomp=unconfined] 24m 24m 1 {kubelet dhcp46-130.lab.eng.blr.redhat.com} spec.containers{router} Normal Started Started container with docker id 595674241fa1 As we can see from the above output router is not working as expected. Router1 still working and having ose3.3 image url and not ose3.4 I have attached the journal logs of master node.
It worked with the steps mentioned in comment2.