Description of problem: The IMAGE_VERSION wasn't added for service atomic-openshift-master-api and atomic-openshift-master-controllers. after restart it still use the latested. Version-Release number of selected component (if applicable): atomic-openshift-utils-3.0.68 How reproducible: always Steps to Reproduce: 1. setup containerlized HA env 2. upgrade to ose3.2. 3. after upgrade check the image versions Actual results: [root@ha2-master1 ~]# cat /etc/sysconfig/atomic-openshift-master |grep IMAGE_VERSION IMAGE_VERSION=v3.2.0.8 [root@ha2-master1 ~]# cat /etc/sysconfig/atomic-openshift-master-api |grep IMAGE_VERSION [root@ha2-master1 ~]# cat /etc/sysconfig/atomic-openshift-master-controllers |grep IMAGE_VERSION [root@ha2-master1 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e24696b83ce1 openshift3/ose "/usr/bin/openshift s" 6 seconds ago Up Less than a second atomic-openshift-master-api a918b8c60a36 openshift3/ose "/usr/bin/openshift s" 7 minutes ago Up 7 minutes atomic-openshift-master-controllers 6a85de187f94 openshift3/node:v3.2.0.8 "/usr/local/bin/origi" 11 minutes ago Up 11 minutes atomic-openshift-node c82d5c11d5e6 openshift3/openvswitch:v3.2.0.8 "/usr/local/bin/ovs-r" 12 minutes ago Up 12 minutes openvswitch 563d4f4d4cd9 openshift3/ose:v3.2.0.8 "/usr/bin/openshift s" 12 minutes ago Up 12 minutes atomic-openshift-master 9c66cc3d945f registry.access.redhat.com/rhel7/etcd "/usr/bin/etcd" 12 minutes ago Up 12 minutes etcd_container Expected results: Additional info:
https://github.com/openshift/openshift-ansible/pull/1690
Brenton, Blocked by bug https://bugzilla.redhat.com/show_bug.cgi?id=1323057#c1
After add OSE3.2 repos, I can continue to upgrade. and I get this error 'https://bugzilla.redhat.com/show_bug.cgi?id=1322788'. since there are still no IMAGE_VERSION for /etc/sysconfig/atomic-openshift-master*. Maybe it is the root cause for bug 1322788.
Hi Anping, I'll retest this one today. Seriously, you are catching some really good bugs.
https://github.com/openshift/openshift-ansible/pull/1695
The pull request has not merged. Marking this back on assigned.
Upgrade still failed, the IMAGE_VERSION=v ls /etc/sysconfig/atomic-openshift-* openvswitch |xargs grep IMAGE_VERSION /etc/sysconfig/atomic-openshift-master:IMAGE_VERSION=v /etc/sysconfig/atomic-openshift-master-api:IMAGE_VERSION=v /etc/sysconfig/atomic-openshift-master-controllers:IMAGE_VERSION=v /etc/sysconfig/atomic-openshift-node:IMAGE_VERSION=v openvswitch:IMAGE_VERSION=v
Created attachment 1144531 [details] The IMAGE_VERSION=v logs
Note that: In comment 9, the root cases of the failure wasn't IMAGE_VERSION. But I observed that the playbook can't to find the current image version. the version was set to v at that point.
Could you provide the inventory file you used?
Looking in the logs I can see that docker-1.8.2-8 is being downgraded to docker-1.8.2-7. This should never happen. I suspect it's due to the missing ".stdout". I'm working on a PR to fix this: https://github.com/brenton/openshift-ansible/blob/docker1/roles/docker/tasks/main.yml#L13 Previously the downgrade was happening in many cases where it shouldn't. I think that's what broke you upgrade. If something goes wrong with docker I could easily see the IMAGE_VERSION being set wrong since we rely on docker for that to work. One thing I'm noticing in docker 1.8.2 is that you cannot use --add-registry for a registry in which you are not logged in. This was breaking my dev environment because I tend to have the QE registry always enabled. I simply would logout before running the 3.1 install to ensure I pulled GA images. I think with docker 1.8.2 we have to make sure all the registries passed to --add-registry have the images we're intending to use.
If the atomic-openshift container wasn't running, openshift_container_versions.sh will set correct curr_version to "". In this case, after docker downgrade, the docker service was restarted,and those atomic-openshift* were restarted too. Due to there are latest images, it need time to download images and start atomic-openshift* containers, or the container may failed to restarted. which cause the openshift_container_versions.sh failed to get the correct curr_version. After downgrade docker to 1.8.2-7.el7 (Docker was downgraded to the lowest version, so no downgrade occur). ansible can got the correct curr_version. Could we run openshift_container_versions.sh immediately after "Ensure Node/MASTER is running"?
The new scripts works well, so move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064