Description of problem: I provisioned a PostgreSQL APB with dev plan, wrote create database, table and write data to it. 1) After updating plan from dev to prod, the data lost. 2) After updating database version, the data lost. Version-Release number of selected component (if applicable): openshift v3.9.0-0.20.0 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.8 brew.....ose-ansible-service-broker v3.9 brew.....ose-service-catalog v3.9 How reproducible: Alway Steps to Reproduce: 1. Provision a PostgreSQL APB with development plan 2. Check ServiceInstance and pod # oc edit serviceinstance rh-postgresql-apb-ck24x # oc get pod 3. Write data into the database CREATE DATABASE testdb; \c testdb; CREATE TABLE COMPANY( ID INT PRIMARY KEY NOT NULL, NAME TEXT NOT NULL, AGE INT NOT NULL, ADDRESS CHAR(50), SALARY REAL, JOIN_DATE DATE ); INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE) VALUES (1, 'Paul', 32, 'California', 20000.00 ,'2001-07-13'); 4. Update the plan from dev to prod # oc edit serviceinstance rh-postgresql-apb-ck24x 5. Check ServiceInstance and pod again 6. Check data in the database Actual results: 2. [root@host-xxx ~]# oc get pod NAME READY STATUS RESTARTS AGE po/postgresql-9.4-dev-1-fz2jc 1/1 Running 0 14m 3. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- admin | admin | UTF8 | en_US.utf8 | en_US.utf8 | postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres testdb | postgres | UTF8 | en_US.utf8 | en_US.utf8 | (5 rows) testdb=# \dt List of relations Schema | Name | Type | Owner --------+---------+-------+---------- public | company | table | postgres (1 row) testdb=# SELECT * FROM COMPANY; id | name | age | address | salary | join_date ----+------+-----+----------------------------------------------------+--------+------------ 1 | Paul | 32 | California | 20000 | 2001-07-13 (1 row) 5. [root@host-xxx ~]# oc get pod NAME READY STATUS RESTARTS AGE postgresql-9.4-prod-1-qjpk7 1/1 Running 0 2m 6. [root@host-xxx ~]# oc rsh postgresql-9.4-prod-1-qjpk7 sh-4.2$ psql psql (9.4.14) Type "help" for help. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- admin | admin | UTF8 | en_US.utf8 | en_US.utf8 | postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres (4 rows) Expected results: 6. Should be the same with step 3 Additional info:
Given I set this during provision APB {"postgresql_database":"admin","postgresql_user":"admin","postgresql_version":"9.5","postgresql_password":"admin"} I wrote data to the database/table: admin/admin, update plan and parameters, data was retained. New database wasn't retained if the user is superuser. I'm not sure it went out of the feature scope. After updating: sh-4.2$ psql -h 127.0.0.1 admin admin psql (9.5.9) Type "help" for help. admin=> \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- admin | admin | UTF8 | en_US.utf8 | en_US.utf8 | postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres (4 rows) admin=> \dt List of relations Schema | Name | Type | Owner --------+---------+-------+------- public | company | table | admin (1 row) admin=> SELECT * FROM COMPANY; id | name | age | address | salary | join_date ----+------+-----+----------------------------------------------------+--------+------------ 1 | Paul | 32 | California | 20000 | 2001-07-13 (1 row) admin=> INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE) VALUES (2, 'Tom', 42, 'California', 10000.00 ,'2001-09-11'); INSERT 0 1 admin=> SELECT * FROM COMPANY; id | name | age | address | salary | join_date ----+------+-----+----------------------------------------------------+--------+------------ 1 | Paul | 32 | California | 20000 | 2001-07-13 2 | Tom | 42 | California | 10000 | 2001-09-11 (2 rows)
Why not use the database created at the time you ran the APB? We might be able to dump all databases instead of a specific one. I'd have to investigate.
Comment 1 is the database created when I ran the APB. It worked as expected.
This should save everything for postgresql. Still investigating for MariaDB and MySQL. https://github.com/ansibleplaybookbundle/postgresql-apb/pull/30
https://github.com/ansibleplaybookbundle/mysql-apb/pull/21 https://github.com/ansibleplaybookbundle/mariadb-apb/pull/21
Is the pod actually running? Are you able to determine if it is stuck trying to run some process when it stops at, "TASK [rhscl-postgresql-apb : Find deployment we will clean up] *****************". As I mentioned your logs on the APB are cut off early and it appears without an errors, which doesn't make much sense to me. I used openshift v3.9.0-0.34.0 yesterday and was unable to reproduce this. If you have an environment where this is happening that I can log into and look I'd be happy to try and figure out what's going on.
The task: "TASK [rhscl-postgresql-apb : Find deployment we will clean up] *****************" will only run when the task: TASK [rhscl-postgresql-apb : Find dc we will clean up] ************************* is skipped. Are you able to consistently reproduce the playbook hanging?
The next task would exec to the postgres pod and dump the database. Is the postgres pod running?
I set up a 3.9 multinode environment with one master and four nodes and multi tenant sdn networking to see if a more complex environment would tease the issue out, but I am still unable to reproduce it. after running several updates in multiple projects switching plans and versions in different directions.
Can you please provide oc describe output for the broker pod, postgres pod, the APB that is getting stuck, provide full logs for the stuck APB and broker, as well as provide the inventory file used to set up the cluster. And if I can access the environment I can try to diagnose the issue directly.
I wonder if what you are seeing is an selinux issue with cri-o. After deploying postgres I see constant errors in audit.log: type=AVC msg=audit(1517846970.413:1596): avc: denied { write } for pid=25075 comm="pg_ctl" name=".s.PGSQL.5432" dev="dm-0" ino=103097197 scontext=system_u:system_r:svirt_lxc_net_t:s0:c5,c11 tcontext=system_u:object_r:container_share_t:s0 tclass=sock_file My update pod did not hang, but I did get an error at the same point: ... TASK [rhscl-postgresql-apb : Find deployment we will clean up] ***************** skipping: [localhost] TASK [rhscl-postgresql-apb : Backup source database] *************************** fatal: [localhost]: FAILED! => {"changed": true, "cmd": "kubectl exec -n test postgresql-9.4-dev-1-dkv89 -- /bin/bash -c \"pg_dumpall -f /tmp/db.dump\"", "delta": "0:00:00.547038", "end": "2018-02-05 16:10:21.036604", "msg": "non-zero return code", "rc": 1, "start": "2018-02-05 16:10:20.489566", "stderr": "pg_dumpall: could not connect to database \"template1\": could not connect to server: Permission denied\n\tIs the server running locally and accepting\n\tconnections on Unix domain socket \"/var/run/postgresql/.s.PGSQL.5432\"?\n\ncommand terminated with exit code 1", "stderr_lines": ["pg_dumpall: could not connect to database \"template1\": could not connect to server: Permission denied", "\tIs the server running locally and accepting", "\tconnections on Unix domain socket \"/var/run/postgresql/.s.PGSQL.5432\"?", "", "command terminated with exit code 1"], "stdout": "", "stdout_lines": []} to retry, use: --limit @/opt/apb/actions/update.retry ... Can you try disabling selinux and see if it works. If it does I'd be inclined to think this is a cri-o selinux bug as it's preventing the rhscl postgresql pod from performing necessary tasks.
This looks like a socket stored on COW file system? Is this a socket listening on /run/ directory?
Correct, in this case there was no persistent storage of any kind attached. With selinux disabled I have: $ ls /var/run/postgresql/.s.PGSQL.5432 /run/postgresql/.s.PGSQL.5432 -l srwxrwxrwx. 1 1000120000 root 0 Feb 6 13:28 /run/postgresql/.s.PGSQL.5432 srwxrwxrwx. 1 1000120000 root 0 Feb 6 13:28 /var/run/postgresql/.s.PGSQL.5432
Can you confirm this is working without crio and crio if selinux is disabled and verify it? If it's a crio / selinux issue preventing the application container from writing the .sock file, dump files, etc. I'd recommend opening a new bug against the correct component as there's not much I'll be able to do about that.
Json, I try to disable selinux by "setenforce 0", then selinux status: [root@host-172-16-120-8 ~]# sestatus SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: permissive Mode from config file: enforcing Policy MLS status: enabled Policy deny_unknown status: allowed Max kernel policy version: 31 Using the latest downstream image In cri-o env, updating plan from dev to prod with DB created still failed with the same log. The sandbox pod blocked in the running status. Now using project "rh-postgresql-apb-upda-v74l4" on server "https://172.16.120.8:8443". [root@host-172-16-120-8 ~]# oc get pod NAME READY STATUS RESTARTS AGE apb-8e1c6679-6db3-4f83-ab7e-de6d420c55fd 1/1 Running 0 15m [root@host-172-16-120-8 ~]# oc logs -f apb-8e1c6679-6db3-4f83-ab7e-de6d420c55fd + [[ update --extra-vars {"_apb_plan_id":"prod","_apb_service_class_id":"d5915e05b253df421efe6e41fb6a66ba","_apb_service_instance_id":"df138cda-d3b6-4465-8c93-d3b96bbc9996","cluster":"openshift","namespace":"post-5","postgresql_database":"admin","postgresql_password":"dddd","postgresql_user":"admin","postgresql_version":"9.6"} == *\s\2\i\/\a\s\s\e\m\b\l\e* ]] + ACTION=update + shift + playbooks=/opt/apb/actions + CREDS=/var/tmp/bind-creds + TEST_RESULT=/var/tmp/test-result + whoami + '[' -w /etc/passwd ']' ++ id -u + echo 'apb:x:1000260000:0:apb user:/opt/apb:/sbin/nologin' + set +x + [[ -e /opt/apb/actions/update.yaml ]] + ANSIBLE_ROLES_PATH=/etc/ansible/roles:/opt/ansible/roles + ansible-playbook /opt/apb/actions/update.yaml --extra-vars '{"_apb_plan_id":"prod","_apb_service_class_id":"d5915e05b253df421efe6e41fb6a66ba","_apb_service_instance_id":"df138cda-d3b6-4465-8c93-d3b96bbc9996","cluster":"openshift","namespace":"post-5","postgresql_database":"admin","postgresql_password":"dddd","postgresql_user":"admin","postgresql_version":"9.6"}' [WARNING]: While constructing a mapping from /opt/apb/actions/update.yaml, line 1, column 3, found a duplicate dict key (vars). Using last defined value only. PLAY [Deploy rhscl-postgresql-apb to "openshift"] ****************************** TASK [ansible.kubernetes-modules : Install latest openshift client] ************ skipping: [localhost] TASK [ansibleplaybookbundle.asb-modules : debug] ******************************* skipping: [localhost] TASK [rhscl-postgresql-apb : Find pod we need to update] *********************** changed: [localhost] TASK [rhscl-postgresql-apb : Find dc we will clean up] ************************* changed: [localhost] TASK [rhscl-postgresql-apb : Find deployment we will clean up] ***************** skipping: [localhost] TASK [rhscl-postgresql-apb : Backup source database] *************************** The asb log: [2018-02-09T06:59:36.381Z] [DEBUG] - ServiceInstance Parameters: [map[_apb_service_class_id:d5915e05b253df421efe6e41fb6a66ba _apb_service_instance_id:df138cda-d3b6-4465-8c93-d3b96bbc9996 postgresql_database:admin postgresql_password:dddd postgresql_user:admin postgresql_version:9.6 _apb_plan_id:prod]] [2018-02-09T06:59:36.381Z] [INFO] - ASYNC update in progress [2018-02-09T06:59:36.381Z] [NOTICE] - ============================================================ [2018-02-09T06:59:36.381Z] [NOTICE] - UPDATING [2018-02-09T06:59:36.381Z] [NOTICE] - ============================================================ [2018-02-09T06:59:36.381Z] [NOTICE] - Spec.ID: d5915e05b253df421efe6e41fb6a66ba [2018-02-09T06:59:36.381Z] [NOTICE] - Spec.Name: rh-postgresql-apb [2018-02-09T06:59:36.381Z] [NOTICE] - Spec.Image: registry.access.stage.redhat.com/openshift3/postgresql-apb:v3.9 [2018-02-09T06:59:36.381Z] [NOTICE] - Spec.Description: SCL PostgreSQL apb implementation [2018-02-09T06:59:36.381Z] [NOTICE] - ============================================================ [2018-02-09T06:59:36.381Z] [INFO] - Checking if namespace post-5 exists. [2018-02-09T06:59:36.383Z] [DEBUG] - ExecutingApb: [2018-02-09T06:59:36.383Z] [DEBUG] - name:[ rh-postgresql-apb ] [2018-02-09T06:59:36.383Z] [DEBUG] - image:[ registry.access.stage.redhat.com/openshift3/postgresql-apb:v3.9 ] [2018-02-09T06:59:36.383Z] [DEBUG] - action:[ update ] [2018-02-09T06:59:36.383Z] [DEBUG] - pullPolicy:[ IfNotPresent ] [2018-02-09T06:59:36.383Z] [DEBUG] - role:[ edit ] [2018-02-09T06:59:36.383Z] [DEBUG] - No proxy env vars found to be configured. 10.129.0.4 - - [09/Feb/2018:06:59:36 +0000] "PATCH /ansible-service-broker/v2/service_instances/df138cda-d3b6-4465-8c93-d3b96bbc9996?accepts_incomplete=true HTTP/1.1" 202 58 [2018-02-09T06:59:36.466Z] [DEBUG] - Trying to create apb sandbox: [ apb-8e1c6679-6db3-4f83-ab7e-de6d420c55fd ], with edit permissions in namespace rh-postgresql-apb-upda-v74l4 [2018-02-09T06:59:36.466Z] [NOTICE] - Creating RoleBinding apb-8e1c6679-6db3-4f83-ab7e-de6d420c55fd [2018-02-09T06:59:36.585Z] [DEBUG] - service_id: d5915e05b253df421efe6e41fb6a66ba [2018-02-09T06:59:36.585Z] [DEBUG] - plan_id: 4acaf1511a92890cd8910b1d8473be97 [2018-02-09T06:59:36.585Z] [DEBUG] - operation: f8953e65-6969-4eca-84ea-54d775be4813 [2018-02-09T06:59:36.586Z] [DEBUG] - state: in progress 10.129.0.4 - - [09/Feb/2018:06:59:36 +0000] "GET /ansible-service-broker/v2/service_instances/df138cda-d3b6-4465-8c93-d3b96bbc9996/last_operation?operation=f8953e65-6969-4eca-84ea-54d775be4813&plan_id=4acaf1511a92890cd8910b1d8473be97&service_id=d5915e05b253df421efe6e41fb6a66ba HTTP/1.1" 200 29 [2018-02-09T06:59:36.631Z] [NOTICE] - Creating RoleBinding apb-8e1c6679-6db3-4f83-ab7e-de6d420c55fd [2018-02-09T06:59:36.651Z] [DEBUG] - service_id: d5915e05b253df421efe6e41fb6a66ba [2018-02-09T06:59:36.651Z] [DEBUG] - plan_id: 4acaf1511a92890cd8910b1d8473be97 [2018-02-09T06:59:36.651Z] [DEBUG] - operation: f8953e65-6969-4eca-84ea-54d775be4813 [2018-02-09T06:59:36.652Z] [DEBUG] - state: in progress The update in other env is also failed using the latest image in downstream , but the status are different. step 1. Provision Postgresql in dev plan 2. create a DB 3.update plan to prod. the status is the same with bug 1542410 #comment7.
I have pushed a new image to upstream postgresql-apb:latest and pushed openshift-enterprise-postgresql-apb-v3.9.0-0.41.0.2 to QE on the errata. Can you re-test with these and see if you can reproduce the issue, please.
Jason, I use asb v1.1.10 , downstream image v3.9 and postgresql-apb-v3.9.0-0.41.0.2 to re-test In cri-o env, after create DB, and perform update . postgresql and mariaDB pending at 'TASK [rhscl-postgresql-apb : Backup source database]' In other env, I test mysql and postgresql, the update also failed if I create DB or Table. Scenario: 1. mysql 5.7 prod- > create table ->update to dev failed update sandbox failed with error: 2. postgresql -> create DB -> update failed, sandbox is deleted. [root@host-172-16-120-76 ~]# oc get pod NAME READY STATUS RESTARTS AGE postgresql-9.6-dev-1-4t2gq 1/1 Running 0 1h postgresql-9.6-prod-1-qwhfz 1/1 Running 0 2h [root@host-172-16-120-76 ~]# oc get pod NAME READY STATUS RESTARTS AGE apb-2b61ad77-a129-4018-8720-beb0790c1574 0/1 Error 0 1h [root@host-172-16-120-76 ~]# oc logs -f apb-2b61ad77-a129-4018-8720-beb0790c1574 + [[ update --extra-vars {"_apb_plan_id":"dev","_apb_service_class_id":"73ead67495322cc462794387fa9884f5","_apb_service_instance_id":"aac1936a-7f4b-4a81-a880-c07ea016a382","cluster":"openshift","mysql_database":"devel","mysql_password":"dddd","mysql_user":"devel","mysql_version":"5.7","namespace":"mysql-t","service_name":"mysql"} == *\s\2\i\/\a\s\s\e\m\b\l\e* ]] + ACTION=update + shift + playbooks=/opt/apb/actions + CREDS=/var/tmp/bind-creds + TEST_RESULT=/var/tmp/test-result + whoami + '[' -w /etc/passwd ']' ++ id -u + echo 'apb:x:1000390000:0:apb user:/opt/apb:/sbin/nologin' + set +x + [[ -e /opt/apb/actions/update.yaml ]] + [[ -e /opt/apb/actions/update.yml ]] + ANSIBLE_ROLES_PATH=/etc/ansible/roles:/opt/ansible/roles + ansible-playbook /opt/apb/actions/update.yml --extra-vars '{"_apb_plan_id":"dev","_apb_service_class_id":"73ead67495322cc462794387fa9884f5","_apb_service_instance_id":"aac1936a-7f4b-4a81-a880-c07ea016a382","cluster":"openshift","mysql_database":"devel","mysql_password":"dddd","mysql_user":"devel","mysql_version":"5.7","namespace":"mysql-t","service_name":"mysql"}' PLAY [mysql-apb playbook to provision the application] ************************* TASK [ansible.kubernetes-modules : Install latest openshift client] ************ skipping: [localhost] TASK [ansibleplaybookbundle.asb-modules : debug] ******************************* skipping: [localhost] TASK [rhscl-mysql-apb-openshift : Find pod we need to update] ****************** changed: [localhost] TASK [rhscl-mysql-apb-openshift : Find dc we will clean up] ******************** changed: [localhost] TASK [rhscl-mysql-apb-openshift : Backup source database] ********************** changed: [localhost] TASK [rhscl-mysql-apb-openshift : Copy over db backup] ************************* changed: [localhost] TASK [rhscl-mysql-apb-openshift : Set mysql service state to present] ********** changed: [localhost] TASK [rhscl-mysql-apb-openshift : include_tasks] ******************************* included: /opt/ansible/roles/rhscl-mysql-apb-openshift/tasks/dev.yml for localhost TASK [rhscl-mysql-apb-openshift : set MySQL deployment with ephemeral storage to present] *** changed: [localhost] TASK [rhscl-mysql-apb-openshift : include_tasks] ******************************* skipping: [localhost] TASK [rhscl-mysql-apb-openshift : Wait for mysql to come up] ******************* fatal: [localhost]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 172.30.15.21:3306"} PLAY RECAP ********************************************************************* localhost : ok=7 changed=6 unreachable=0 failed=1 [WARNING]: Could not create retry file '/opt/apb/actions/update.retry'. [Errno 13] Permission denied: u'/opt/apb/actions/update.retry' + EXIT_CODE=2 + set +ex + '[' -f /var/tmp/test-result ']' + exit 2
Has host-8-245-30.host.centralci.eng.rdu2.redhat.com been reprovisioned? I went to log in to look and the libra.pem key does not get me in and the web interface isn't running on 8443.
I created a test2 project in your environment. There I was able to successfully update a mariadb apb from 10.1 dev to 10.2 dev and then again to 10.2 prod in your new environment. I was also able to successfully go from postgresql 9.4 dev to 9.6 prod. I also did mysql 5.6 prod to 5.7 dev successfully. I think this is VERIFIED if you want to try for yourself and look at the ansible-service-broker log to confirm you see I ran the updates. [2018-02-13T08:30:00.747Z] [INFO] - Listening for update messages [2018-02-13T14:42:25.773Z] [INFO] - ASYNC update in progress [2018-02-13T14:42:26.121Z] [INFO] - Update requested for instance 2967f467-7f78-4d17-81cc-031577a81467, but job is already in progress [2018-02-13T14:48:10.174Z] [INFO] - ASYNC update in progress [2018-02-13T14:52:42.823Z] [INFO] - ASYNC update in progress [2018-02-13T15:00:56.809Z] [INFO] - ASYNC update in progress And finally I spotted a separate issue, which I think we've regressed on that a separate bug should be filed for. Your storage is RWO. If you try to do a rolling update of mediawiki it can't start the new pod because the old pod continues to use the storage. We should switch to a Recreate strategy.
Jason, I have verified with crio and non-crio env. Asb version : 1.1.10 in non-crio env, I think the bug is fixed. test secnario: 1. postgresql apb: 9.6 dev -> table created with 'admin' user ->9.5 prod : data preserved; 9.5 dev-> db created with 'root' user ->9.6 prod : data preserved; 2. mariaDB apb: 10.2 prod -> table created with 'admin' user -> 10.1 dev : data preserved; 10.0 dev -> db created with 'root' user -> 10.2 prod : data preserved; 3. mysql apb 5.6 prod -> db created with 'root' user -> 5.7, dev : data preserved; But In crio env ,disable selinux , if I create db in postsql , the update still waiting at " TASK [rhscl-postgresql-apb : Backup source database] *************************** " But it's better to open another bug to trace. So please change the status to ON_QA, I'll mark as verified.
Verified based on comment 31
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489