Bug 1542410 - Update plan or version of Postgresql or MariaDB APB which is in prod plan failed
Summary: Update plan or version of Postgresql or MariaDB APB which is in prod plan failed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Jason Montleon
QA Contact: Zihan Tang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-06 10:14 UTC by Zihan Tang
Modified: 2018-06-18 18:28 UTC (History)
5 users (show)

Fixed In Version: postgresql-apb-v3.9.0-0.41.0.2
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-06-18 14:36:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
asb pod log (1.64 MB, text/plain)
2018-02-09 05:54 UTC, Zihan Tang
no flags Details

Comment 2 Jason Montleon 2018-02-07 20:34:12 UTC
This is the same selinux error I referred to in https://bugzilla.redhat.com/show_bug.cgi?id=1535931 when running using crio.

I assume it works if you don't use crio? If so can we mark this as a duplicate?

Comment 3 Jason Montleon 2018-02-07 20:42:33 UTC
Also, it failed at "Backup source database", which is before tasks that bring up the new pod, switch the service over to the new pod, verify it's up, restore the database, and then finally delete the old pod. From the above apb output I cannot see how the old pod was deleted.

Comment 6 Jason Montleon 2018-02-08 14:03:49 UTC
Can you send me the full broker logs, describe the asb pod, describe the failed apb pod, and send me the config for your cluster where this failed, please. The failed apb log is not sufficient information for me to reproduce this.

Did some other task delete the deployment while this APB was attempting an update? I am unable to reproduce this with the upstream APB going from 9.5 prod to 9.6 dev (or any other combination I've tried). And again as you can see from the apb log below the old dc is not removed until about 11 tasks passed the point where your log shows it failed.

      registry:
        - type: "dockerhub"
          name: "dh"
          url: "docker.io"
          org: "ansibleplaybookbundle"
          tag: "latest"
          white_list:
            - ".*-apb$"


oc logs -f -n dh-postgresql-apb-upda-r8h6f        apb-0c87ffec-ef4c-4d62-9d61-ab7ebcf717c6
+ [[ update --extra-vars {"_apb_plan_id":"dev","_apb_service_class_id":"1dda1477cace09730bd8ed7a6505607e","_apb_service_instance_id":"e3f15205-9436-4d5a-aa5d-3b5e03be737b","cluster":"openshift","namespace":"test","postgresql_database":"admin","postgresql_password":"changeme","postgresql_user":"admin","postgresql_version":"9.6"} == *\s\2\i\/\a\s\s\e\m\b\l\e* ]]
+ ACTION=update
+ shift
+ playbooks=/opt/apb/actions
+ CREDS=/var/tmp/bind-creds
+ TEST_RESULT=/var/tmp/test-result
+ whoami
+ '[' -w /etc/passwd ']'
++ id -u
+ echo 'apb:x:1000130000:0:apb user:/opt/apb:/sbin/nologin'
+ set +x
+ [[ -e /opt/apb/actions/update.yaml ]]
+ ANSIBLE_ROLES_PATH=/etc/ansible/roles:/opt/ansible/roles
+ ansible-playbook /opt/apb/actions/update.yaml --extra-vars '{"_apb_plan_id":"dev","_apb_service_class_id":"1dda1477cace09730bd8ed7a6505607e","_apb_service_instance_id":"e3f15205-9436-4d5a-aa5d-3b5e03be737b","cluster":"openshift","namespace":"test","postgresql_database":"admin","postgresql_password":"changeme","postgresql_user":"admin","postgresql_version":"9.6"}'
 [WARNING]: Could not match supplied host pattern, ignoring: all
 [WARNING]: provided hosts list is empty, only localhost is available
 [WARNING]: While constructing a mapping from /opt/apb/actions/update.yaml,
line 1, column 3, found a duplicate dict key (vars). Using last defined value
only.
PLAY [Deploy rhscl-postgresql-apb to "openshift"] ******************************
TASK [ansible.kubernetes-modules : Install latest openshift client] ************
skipping: [localhost]
TASK [ansibleplaybookbundle.asb-modules : debug] *******************************
skipping: [localhost]
TASK [rhscl-postgresql-apb : Find pod we need to update] ***********************
changed: [localhost]
TASK [rhscl-postgresql-apb : Find dc we will clean up] *************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Find deployment we will clean up] *****************
skipping: [localhost]
TASK [rhscl-postgresql-apb : Create source pgpass] *****************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Set permissions on source pgpass] *****************
changed: [localhost]
TASK [rhscl-postgresql-apb : Backup source database] ***************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Copy over db backup] ******************************
changed: [localhost]
TASK [rhscl-postgresql-apb : set service state to present] *********************
changed: [localhost]
TASK [rhscl-postgresql-apb : include_tasks] ************************************
included: /opt/ansible/roles/rhscl-postgresql-apb/tasks/dev.yml for localhost
TASK [rhscl-postgresql-apb : set development deployment config state to present] ***
skipping: [localhost]
TASK [rhscl-postgresql-apb : set development deployment config state to present] ***
changed: [localhost]
TASK [rhscl-postgresql-apb : include_tasks] ************************************
skipping: [localhost]
TASK [rhscl-postgresql-apb : Wait for postgres to come up] *********************
ok: [localhost]
TASK [rhscl-postgresql-apb : Find pod we need to restore] **********************
changed: [localhost]
TASK [rhscl-postgresql-apb : Copy over db backup] ******************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Create destination pgpass] ************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Set permissions on destination pgpass] ************
changed: [localhost]
TASK [rhscl-postgresql-apb : Restore database] *********************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Remove deployment config] *************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Remove deployment] ********************************
skipping: [localhost]
TASK [rhscl-postgresql-apb : ensure production volume is absent] ***************
ok: [localhost] => (item=9.4)
changed: [localhost] => (item=9.5)
ok: [localhost] => (item=9.6)
TASK [rhscl-postgresql-apb : encode bind credentials] **************************
changed: [localhost]
PLAY RECAP *********************************************************************
localhost                  : ok=18   changed=16   unreachable=0    failed=0   
+ EXIT_CODE=0
+ set +ex
+ '[' -f /var/tmp/test-result ']'
+ exit 0

Comment 8 Zihan Tang 2018-02-09 05:54:29 UTC
Created attachment 1393542 [details]
asb pod log

Comment 10 Jason Montleon 2018-02-09 14:19:59 UTC
I saw this fail on your machine. The task that was stuck was:
kubectl cp -n test /tmp/db.dump postgresql-9.5-dev-1-b5222:tmp/db.dump

I ran it manually and it seemed to work fine. Now I can't get it to reproduce.

On the upstream images we have:
origin-clients-3.7.0-1.0.7ed6862.x86_64

Downstream we have:
atomic-openshift-clients-3.9.0-0.34.0.git.0.d72c7b9.el7.x86_64

I am inclined to think that we may need to update the clients package when we are able, though I don't know of any 3.9 RPMS upstream at this time. 

The upstream images not consistently working with the downstream clusters may be better suited as an upstream issue against the apb-base repo. Once there is an upstream repo we can work with we can update the clients package there.

Comment 11 Jason Montleon 2018-02-09 15:00:09 UTC
1535931 seems to be at odds with my comments above. I see postgres is also stuck using the downstream image.

I tried in that environment, where postgres is still getting stuck, updating mariadb and mysql. The difference is that postgres is was updated to use 'kubectl cp' when it was updated to work with a kubernetes cluster and those are still using 'oc cp'.

They worked OK, so I am going to switch postgres back to use oc and see if the issue goes away.

Comment 12 Jason Montleon 2018-02-09 16:54:45 UTC
I have pushed a new image to upstream postgresql-apb:latest and pushed openshift-enterprise-postgresql-apb-v3.9.0-0.41.0.2 to QE on the errata.

Can you re-test with these and see if you can reproduce the issue, please.

Comment 13 Zihan Tang 2018-02-11 07:14:39 UTC
Jason,
I use ASB 1.1.10, v3.9 downstream image.
postgresql-apb-v3.9.0-0.41.0.2,

Postgresql , MariaDB, Mysql can update  version or plan successfully for the first time .
So mark this bug as verified.

After the first time update succeed, update the same instance for the second time fails, I open bug 1544186 to trace.


Note You need to log in before you can comment on or make changes to this bug.