Bug 1472740

Summary: Application pods getting deleted / going into error state after upgrade process
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Tejas Chaphekar <tchaphek>
Component: heketiAssignee: Ramakrishna Reddy Yekulla <rreddy>
Status: CLOSED NOTABUG QA Contact: Tejas Chaphekar <tchaphek>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.6CC: aos-bugs, hchiramm, jokerman, mliyazud, mmccomas, pprakash, rhs-bugs, sdodson, storage-qa-internal, tchaphek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1473618 (view as bug list) Environment:
Last Closed: 2017-07-27 09:09:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1445448, 1473618    

Description Tejas Chaphekar 2017-07-19 10:09:17 UTC
Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.6.153-1.git.0.5a6bf7d.el7.noarch

rpm -q ansible
ansible-2.2.3.0-1.el7.noarch

ansible --version

ansible 2.2.3.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides

How reproducible:

Once

Steps to Reproduce:
1. Do an Upgrade from RHEL7.3 to RHEL 7.4
2. Disable OCP3.5 repo and add OCP3.6 repo 
3. perform 

yum update atomic-openshift-utils

yum install atomic-openshift-excluder atomic-openshift-docker-excluder

Actual results:

[root@dhcp47-190 ~]# oc get pods
NAME                             READY     STATUS    RESTARTS   AGE
glusterfs-0dn97                  1/1       Running   2          19d
glusterfs-9lph2                  1/1       Running   1          19d
glusterfs-s8jxz                  1/1       Running   1          19d
heketi-1-0mkvn                   1/1       Running   1          19d
nginx1                           1/1       Running   1          19d
nginx2                           1/1       Running   0          7m
nginx3                           1/1       Running   0          4m
nginx4                           1/1       Running   0          4m
nginx5                           1/1       Running   0          1m
storage-project-router-1-z8kzq   1/1       Running   3          19d

[root@dhcp47-190 ~]# atomic-openshift-installer upgrade

        This tool will help you upgrade your existing OpenShift installation.
        Currently running: openshift-enterprise 3.5

(1) Update to latest 3.5
(2) Upgrade to next release: 3.6

Choose an option from above: 2
OpenShift will be upgraded from openshift-enterprise 3.5 to latest openshift-enterprise 3.6 on the following hosts:

  * dhcp47-190.lab.eng.blr.redhat.com
  * dhcp47-48.lab.eng.blr.redhat.com
  * dhcp46-235.lab.eng.blr.redhat.com
  * dhcp46-212.lab.eng.blr.redhat.com

Do you want to proceed? [y/N]: y

Play 1/62 (Create initial host groups for localhost)
.
Play 2/62 (Populate config host groups)
.....................
Play 3/62 (Set oo_option facts)
........
Play 4/62 (Ensure that all non-node hosts are accessible)
.
Play 5/62 (Initialize host facts)
........................
Play 6/62 (Ensure firewall is not switched during upgrade)
...
Play 7/62 (Configure the upgrade target for the common upgrade tasks)
..
Play 8/62 (Filter list of nodes to be upgraded if necessary)
.....
Play 9/62 (Update repos and initialize facts on all hosts)
...............
Play 10/62 (Set openshift_no_proxy_internal_hostnames)
..
Play 11/62 (Verify upgrade can proceed on first master)
.....
Play 12/62 (Disable excluders)
.........................................
Play 13/62 (Disable excluders)
.........................................
Play 14/62 (Verify compatible yum/subscription-manager combination)
..
Play 15/62 (Determine openshift_version to configure on first master)
..................................................................................................................................
Play 16/62 (Set openshift_version for all hosts)
..................................................................................................................................
Play 17/62 (Verify master processes)
..........................
Play 18/62 (Validate configuration for rolling restart)
........................
Play 19/62 (Create temp file on localhost)
.
Play 20/62 (Check if temp file exists on any masters)
..
Play 21/62 (Cleanup temp file on localhost)
.
Play 22/62 (Warn if restarting the system where ansible is running)
...
Play 23/62 (Verify upgrade targets)
........
Play 24/62 (Verify docker upgrade targets)
.............
Play 25/62 (Verify 3.6 specific upgrade checks)
..
Play 26/62 (Flag pre-upgrade checks complete for hosts without errors)
..
Play 27/62 (Cleanup unused Docker images)
......
Play 28/62 (Pre master upgrade - Upgrade all storage)
..
Play 29/62 (Set master embedded_etcd fact)
.......................
Play 30/62 (Backup etcd)
.................................................
Play 31/62 (Gate on etcd backup)
....
Play 32/62 (Backup etcd)
.................................................
Play 33/62 (Gate on etcd backup)
....
Play 34/62 (Determine if service signer cert must be created)
..
Play 35/62 (Create local temp directory for syncing certs)
.
Play 36/62 (Create service signer certificate)
.....
Play 37/62 (Deploy service signer certificate)
..
Play 38/62 (Delete local temp directory)
.
Play 39/62 (Set OpenShift master facts)
.................................
Play 40/62 (Upgrade master)
....................................................................................
Play 41/62 (Post master upgrade - Upgrade clusterpolicies storage)
..
Play 42/62 (Gate on master update)
....
Play 43/62 (Reconcile Cluster Roles and Cluster Role Bindings and Security Context Constraints)
....................................................................................................................................................................................................................................................................................................................................................................................................................................
Play 44/62 (Gate on reconcile)
....
Play 45/62 (Drain and upgrade master nodes)
.........................................................................................................................................................................................................................................................................................................................................................................................................
Play 46/62 (Drain and upgrade nodes)
.........................................................................................................................................................................................................................................................................................................................................................................................................................
Play 47/62 (Drain and upgrade nodes)
.........................................................................................................................................................................................................................................................................................................................................................................................................................
Play 48/62 (Drain and upgrade nodes)
.........................................................................................................................................................................................................................................................................................................................................................................................................................
Play 49/62 (Upgrade default router and default registry)
...........................................................................................................................................................................................................................................................................................................................................................................................................
Play 50/62 (Check for warnings)
...
Play 51/62 (Re-enable excluder if it was previously enabled)
................
dhcp46-212.lab.eng.blr.redhat.com : ok=201  changed=32   unreachable=0    failed=0   
dhcp46-235.lab.eng.blr.redhat.com : ok=201  changed=32   unreachable=0    failed=0   
dhcp47-190.lab.eng.blr.redhat.com : ok=505  changed=70   unreachable=0    failed=0   
dhcp47-48.lab.eng.blr.redhat.com : ok=201  changed=32   unreachable=0    failed=0   
localhost                  : ok=23   changed=0    unreachable=0    failed=0   


Installation Complete: Note: Play count is only an estimate, some plays may have been skipped or dynamically added

Upgrade completed! Rebooting all hosts is recommended.

[root@dhcp47-190 ~]# oc version
oc v3.6.153
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://10.70.47.190:8443
openshift v3.6.153
kubernetes v1.6.1+5115d708d7
Please include the entire output from the last TASK line through the end of output if an error is generated

[root@dhcp47-197 ~]# oc get pods
NAME                              READY     STATUS              RESTARTS   AGE
glusterfs-0dn97                   0/1       Error               3          19d
glusterfs-9lph2                   0/1       Error               2          19d
glusterfs-s8jxz                   0/1       Error               2          19d
heketi-1-8hq3w                    0/1       ContainerCreating   0          24m
storage-project-router-1-171cx    0/1       ContainerCreating   0          24m
storage-project-router-2-deploy   0/1       ContainerCreating   0          16m




Expected results:

The Application pods should not be deleted and other pods should be in Running state

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Tejas Chaphekar 2017-07-27 09:46:32 UTC
The error was resolved by adding openshift_docker_additional_registries: x.x.com 
into installer.yml and 

Setting environment variables as follows

OO_INSTALL_INSECURE_REGISTRIES=registry.ops.openshift.com OO_INSTALL_ADDITIONAL_REGISTRIES=registry.ops.openshift.com