Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1542410

Summary:

Update plan or version of Postgresql or MariaDB APB which is in prod plan failed

Product:

OpenShift Container Platform

Reporter:

Zihan Tang <zitang>

Component:

Service Broker

Assignee:

Jason Montleon <jmontleo>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Zihan Tang <zitang>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

3.9.0

CC:

aos-bugs, chezhang, jmatthew, smunilla, zitang

Target Milestone:

---

Target Release:

3.9.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

postgresql-apb-v3.9.0-0.41.0.2

Doc Type:

No Doc Update

Doc Text:

undefined

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-06-18 14:36:17 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
asb pod log	none

Comment 2 Jason Montleon 2018-02-07 20:34:12 UTC

This is the same selinux error I referred to in https://bugzilla.redhat.com/show_bug.cgi?id=1535931 when running using crio.

I assume it works if you don't use crio? If so can we mark this as a duplicate?

Comment 3 Jason Montleon 2018-02-07 20:42:33 UTC

Also, it failed at "Backup source database", which is before tasks that bring up the new pod, switch the service over to the new pod, verify it's up, restore the database, and then finally delete the old pod. From the above apb output I cannot see how the old pod was deleted.

Comment 6 Jason Montleon 2018-02-08 14:03:49 UTC

Can you send me the full broker logs, describe the asb pod, describe the failed apb pod, and send me the config for your cluster where this failed, please. The failed apb log is not sufficient information for me to reproduce this.

Did some other task delete the deployment while this APB was attempting an update? I am unable to reproduce this with the upstream APB going from 9.5 prod to 9.6 dev (or any other combination I've tried). And again as you can see from the apb log below the old dc is not removed until about 11 tasks passed the point where your log shows it failed.

      registry:
        - type: "dockerhub"
          name: "dh"
          url: "docker.io"
          org: "ansibleplaybookbundle"
          tag: "latest"
          white_list:
            - ".*-apb$"


oc logs -f -n dh-postgresql-apb-upda-r8h6f        apb-0c87ffec-ef4c-4d62-9d61-ab7ebcf717c6
+ [[ update --extra-vars {"_apb_plan_id":"dev","_apb_service_class_id":"1dda1477cace09730bd8ed7a6505607e","_apb_service_instance_id":"e3f15205-9436-4d5a-aa5d-3b5e03be737b","cluster":"openshift","namespace":"test","postgresql_database":"admin","postgresql_password":"changeme","postgresql_user":"admin","postgresql_version":"9.6"} == *\s\2\i\/\a\s\s\e\m\b\l\e* ]]
+ ACTION=update
+ shift
+ playbooks=/opt/apb/actions
+ CREDS=/var/tmp/bind-creds
+ TEST_RESULT=/var/tmp/test-result
+ whoami
+ '[' -w /etc/passwd ']'
++ id -u
+ echo 'apb:x:1000130000:0:apb user:/opt/apb:/sbin/nologin'
+ set +x
+ [[ -e /opt/apb/actions/update.yaml ]]
+ ANSIBLE_ROLES_PATH=/etc/ansible/roles:/opt/ansible/roles
+ ansible-playbook /opt/apb/actions/update.yaml --extra-vars '{"_apb_plan_id":"dev","_apb_service_class_id":"1dda1477cace09730bd8ed7a6505607e","_apb_service_instance_id":"e3f15205-9436-4d5a-aa5d-3b5e03be737b","cluster":"openshift","namespace":"test","postgresql_database":"admin","postgresql_password":"changeme","postgresql_user":"admin","postgresql_version":"9.6"}'
 [WARNING]: Could not match supplied host pattern, ignoring: all
 [WARNING]: provided hosts list is empty, only localhost is available
 [WARNING]: While constructing a mapping from /opt/apb/actions/update.yaml,
line 1, column 3, found a duplicate dict key (vars). Using last defined value
only.
PLAY [Deploy rhscl-postgresql-apb to "openshift"] ******************************
TASK [ansible.kubernetes-modules : Install latest openshift client] ************
skipping: [localhost]
TASK [ansibleplaybookbundle.asb-modules : debug] *******************************
skipping: [localhost]
TASK [rhscl-postgresql-apb : Find pod we need to update] ***********************
changed: [localhost]
TASK [rhscl-postgresql-apb : Find dc we will clean up] *************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Find deployment we will clean up] *****************
skipping: [localhost]
TASK [rhscl-postgresql-apb : Create source pgpass] *****************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Set permissions on source pgpass] *****************
changed: [localhost]
TASK [rhscl-postgresql-apb : Backup source database] ***************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Copy over db backup] ******************************
changed: [localhost]
TASK [rhscl-postgresql-apb : set service state to present] *********************
changed: [localhost]
TASK [rhscl-postgresql-apb : include_tasks] ************************************
included: /opt/ansible/roles/rhscl-postgresql-apb/tasks/dev.yml for localhost
TASK [rhscl-postgresql-apb : set development deployment config state to present] ***
skipping: [localhost]
TASK [rhscl-postgresql-apb : set development deployment config state to present] ***
changed: [localhost]
TASK [rhscl-postgresql-apb : include_tasks] ************************************
skipping: [localhost]
TASK [rhscl-postgresql-apb : Wait for postgres to come up] *********************
ok: [localhost]
TASK [rhscl-postgresql-apb : Find pod we need to restore] **********************
changed: [localhost]
TASK [rhscl-postgresql-apb : Copy over db backup] ******************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Create destination pgpass] ************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Set permissions on destination pgpass] ************
changed: [localhost]
TASK [rhscl-postgresql-apb : Restore database] *********************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Remove deployment config] *************************
changed: [localhost]
TASK [rhscl-postgresql-apb : Remove deployment] ********************************
skipping: [localhost]
TASK [rhscl-postgresql-apb : ensure production volume is absent] ***************
ok: [localhost] => (item=9.4)
changed: [localhost] => (item=9.5)
ok: [localhost] => (item=9.6)
TASK [rhscl-postgresql-apb : encode bind credentials] **************************
changed: [localhost]
PLAY RECAP *********************************************************************
localhost                  : ok=18   changed=16   unreachable=0    failed=0   
+ EXIT_CODE=0
+ set +ex
+ '[' -f /var/tmp/test-result ']'
+ exit 0

Comment 8 Zihan Tang 2018-02-09 05:54:29 UTC

Created attachment 1393542 [details]
asb pod log

Comment 10 Jason Montleon 2018-02-09 14:19:59 UTC

I saw this fail on your machine. The task that was stuck was:
kubectl cp -n test /tmp/db.dump postgresql-9.5-dev-1-b5222:tmp/db.dump

I ran it manually and it seemed to work fine. Now I can't get it to reproduce.

On the upstream images we have:
origin-clients-3.7.0-1.0.7ed6862.x86_64

Downstream we have:
atomic-openshift-clients-3.9.0-0.34.0.git.0.d72c7b9.el7.x86_64

I am inclined to think that we may need to update the clients package when we are able, though I don't know of any 3.9 RPMS upstream at this time. 

The upstream images not consistently working with the downstream clusters may be better suited as an upstream issue against the apb-base repo. Once there is an upstream repo we can work with we can update the clients package there.

Comment 11 Jason Montleon 2018-02-09 15:00:09 UTC

1535931 seems to be at odds with my comments above. I see postgres is also stuck using the downstream image.

I tried in that environment, where postgres is still getting stuck, updating mariadb and mysql. The difference is that postgres is was updated to use 'kubectl cp' when it was updated to work with a kubernetes cluster and those are still using 'oc cp'.

They worked OK, so I am going to switch postgres back to use oc and see if the issue goes away.

Comment 12 Jason Montleon 2018-02-09 16:54:45 UTC

I have pushed a new image to upstream postgresql-apb:latest and pushed openshift-enterprise-postgresql-apb-v3.9.0-0.41.0.2 to QE on the errata.

Can you re-test with these and see if you can reproduce the issue, please.

Comment 13 Zihan Tang 2018-02-11 07:14:39 UTC

Jason,
I use ASB 1.1.10, v3.9 downstream image.
postgresql-apb-v3.9.0-0.41.0.2,

Postgresql , MariaDB, Mysql can update  version or plan successfully for the first time .
So mark this bug as verified.

After the first time update succeed, update the same instance for the second time fails, I open bug 1544186 to trace.