This is the same selinux error I referred to in https://bugzilla.redhat.com/show_bug.cgi?id=1535931 when running using crio. I assume it works if you don't use crio? If so can we mark this as a duplicate?
Also, it failed at "Backup source database", which is before tasks that bring up the new pod, switch the service over to the new pod, verify it's up, restore the database, and then finally delete the old pod. From the above apb output I cannot see how the old pod was deleted.
Can you send me the full broker logs, describe the asb pod, describe the failed apb pod, and send me the config for your cluster where this failed, please. The failed apb log is not sufficient information for me to reproduce this. Did some other task delete the deployment while this APB was attempting an update? I am unable to reproduce this with the upstream APB going from 9.5 prod to 9.6 dev (or any other combination I've tried). And again as you can see from the apb log below the old dc is not removed until about 11 tasks passed the point where your log shows it failed. registry: - type: "dockerhub" name: "dh" url: "docker.io" org: "ansibleplaybookbundle" tag: "latest" white_list: - ".*-apb$" oc logs -f -n dh-postgresql-apb-upda-r8h6f apb-0c87ffec-ef4c-4d62-9d61-ab7ebcf717c6 + [[ update --extra-vars {"_apb_plan_id":"dev","_apb_service_class_id":"1dda1477cace09730bd8ed7a6505607e","_apb_service_instance_id":"e3f15205-9436-4d5a-aa5d-3b5e03be737b","cluster":"openshift","namespace":"test","postgresql_database":"admin","postgresql_password":"changeme","postgresql_user":"admin","postgresql_version":"9.6"} == *\s\2\i\/\a\s\s\e\m\b\l\e* ]] + ACTION=update + shift + playbooks=/opt/apb/actions + CREDS=/var/tmp/bind-creds + TEST_RESULT=/var/tmp/test-result + whoami + '[' -w /etc/passwd ']' ++ id -u + echo 'apb:x:1000130000:0:apb user:/opt/apb:/sbin/nologin' + set +x + [[ -e /opt/apb/actions/update.yaml ]] + ANSIBLE_ROLES_PATH=/etc/ansible/roles:/opt/ansible/roles + ansible-playbook /opt/apb/actions/update.yaml --extra-vars '{"_apb_plan_id":"dev","_apb_service_class_id":"1dda1477cace09730bd8ed7a6505607e","_apb_service_instance_id":"e3f15205-9436-4d5a-aa5d-3b5e03be737b","cluster":"openshift","namespace":"test","postgresql_database":"admin","postgresql_password":"changeme","postgresql_user":"admin","postgresql_version":"9.6"}' [WARNING]: Could not match supplied host pattern, ignoring: all [WARNING]: provided hosts list is empty, only localhost is available [WARNING]: While constructing a mapping from /opt/apb/actions/update.yaml, line 1, column 3, found a duplicate dict key (vars). Using last defined value only. PLAY [Deploy rhscl-postgresql-apb to "openshift"] ****************************** TASK [ansible.kubernetes-modules : Install latest openshift client] ************ skipping: [localhost] TASK [ansibleplaybookbundle.asb-modules : debug] ******************************* skipping: [localhost] TASK [rhscl-postgresql-apb : Find pod we need to update] *********************** changed: [localhost] TASK [rhscl-postgresql-apb : Find dc we will clean up] ************************* changed: [localhost] TASK [rhscl-postgresql-apb : Find deployment we will clean up] ***************** skipping: [localhost] TASK [rhscl-postgresql-apb : Create source pgpass] ***************************** changed: [localhost] TASK [rhscl-postgresql-apb : Set permissions on source pgpass] ***************** changed: [localhost] TASK [rhscl-postgresql-apb : Backup source database] *************************** changed: [localhost] TASK [rhscl-postgresql-apb : Copy over db backup] ****************************** changed: [localhost] TASK [rhscl-postgresql-apb : set service state to present] ********************* changed: [localhost] TASK [rhscl-postgresql-apb : include_tasks] ************************************ included: /opt/ansible/roles/rhscl-postgresql-apb/tasks/dev.yml for localhost TASK [rhscl-postgresql-apb : set development deployment config state to present] *** skipping: [localhost] TASK [rhscl-postgresql-apb : set development deployment config state to present] *** changed: [localhost] TASK [rhscl-postgresql-apb : include_tasks] ************************************ skipping: [localhost] TASK [rhscl-postgresql-apb : Wait for postgres to come up] ********************* ok: [localhost] TASK [rhscl-postgresql-apb : Find pod we need to restore] ********************** changed: [localhost] TASK [rhscl-postgresql-apb : Copy over db backup] ****************************** changed: [localhost] TASK [rhscl-postgresql-apb : Create destination pgpass] ************************ changed: [localhost] TASK [rhscl-postgresql-apb : Set permissions on destination pgpass] ************ changed: [localhost] TASK [rhscl-postgresql-apb : Restore database] ********************************* changed: [localhost] TASK [rhscl-postgresql-apb : Remove deployment config] ************************* changed: [localhost] TASK [rhscl-postgresql-apb : Remove deployment] ******************************** skipping: [localhost] TASK [rhscl-postgresql-apb : ensure production volume is absent] *************** ok: [localhost] => (item=9.4) changed: [localhost] => (item=9.5) ok: [localhost] => (item=9.6) TASK [rhscl-postgresql-apb : encode bind credentials] ************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=18 changed=16 unreachable=0 failed=0 + EXIT_CODE=0 + set +ex + '[' -f /var/tmp/test-result ']' + exit 0
Created attachment 1393542 [details] asb pod log
I saw this fail on your machine. The task that was stuck was: kubectl cp -n test /tmp/db.dump postgresql-9.5-dev-1-b5222:tmp/db.dump I ran it manually and it seemed to work fine. Now I can't get it to reproduce. On the upstream images we have: origin-clients-3.7.0-1.0.7ed6862.x86_64 Downstream we have: atomic-openshift-clients-3.9.0-0.34.0.git.0.d72c7b9.el7.x86_64 I am inclined to think that we may need to update the clients package when we are able, though I don't know of any 3.9 RPMS upstream at this time. The upstream images not consistently working with the downstream clusters may be better suited as an upstream issue against the apb-base repo. Once there is an upstream repo we can work with we can update the clients package there.
1535931 seems to be at odds with my comments above. I see postgres is also stuck using the downstream image. I tried in that environment, where postgres is still getting stuck, updating mariadb and mysql. The difference is that postgres is was updated to use 'kubectl cp' when it was updated to work with a kubernetes cluster and those are still using 'oc cp'. They worked OK, so I am going to switch postgres back to use oc and see if the issue goes away.
I have pushed a new image to upstream postgresql-apb:latest and pushed openshift-enterprise-postgresql-apb-v3.9.0-0.41.0.2 to QE on the errata. Can you re-test with these and see if you can reproduce the issue, please.
Jason, I use ASB 1.1.10, v3.9 downstream image. postgresql-apb-v3.9.0-0.41.0.2, Postgresql , MariaDB, Mysql can update version or plan successfully for the first time . So mark this bug as verified. After the first time update succeed, update the same instance for the second time fails, I open bug 1544186 to trace.