Bug 1447588 - Install OCP with native glusterfs failed when set "openshift_storage_glusterfs_namespace=<other namespace>"
Summary: Install OCP with native glusterfs failed when set "openshift_storage_glusterf...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Jose A. Rivera
QA Contact: Wenkai Shi
URL:
Whiteboard:
Depends On:
Blocks: 1433735
TreeView+ depends on / blocked
 
Reported: 2017-05-03 09:19 UTC by Wenkai Shi
Modified: 2017-08-16 19:51 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-08-10 05:21:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Wenkai Shi 2017-05-03 09:19:44 UTC
Description of problem:
Install OCP with native glusterfs failed when set "openshift_storage_glusterfs_namespace=<other namespace>"

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.51-1.git.0.18eb563.el7

How reproducible:
100%

Steps to Reproduce:
1. install OCP with native glusterfs, set "openshift_storage_glusterfs_namespace=glusterfs"
2.
3.

Actual results:
# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
...
TASK [openshift_storage_glusterfs : Load heketi topology] **********************
Wednesday 03 May 2017  08:13:12 +0000 (0:00:03.145)       0:25:11.776 ********* 
fatal: [master.example.com]: FAILED! => {
    "changed": true, 
    "cmd": [
        "heketi-cli", 
        "-s", 
        "http://10.128.2.2:8080", 
        "--user", 
        "admin", 
        "--secret", 
        "", 
        "topology", 
        "load", 
        "--json=/tmp/openshift-glusterfs-ansible-rRMhiq/topology.json", 
        "2>&1"
    ], 
    "delta": "0:00:01.834207", 
    "end": "2017-05-03 04:13:13.879441", 
    "failed": true, 
    "failed_when_result": true, 
    "rc": 0, 
    "start": "2017-05-03 04:13:12.045234", 
    "warnings": []
}

STDOUT:

Creating cluster ... ID: cfc044eddb224dd217889fc92efca600
	Creating node glusterfsnode1.example.com ... ID: fb3f241ef776df50f4267820ca3831b0
		Adding device /dev/vsda ... Unable to add device: Failed to get list of pods
	Creating node glusterfsnode2.example.com ... Unable to create node: Failed to get list of pods
	Creating node glusterfsnode3.example.com ... Unable to create node: Failed to get list of pods

Expected results:
Installation succeed

Additional info:
# oc get po -n glusterfs
NAME                    READY     STATUS    RESTARTS   AGE
deploy-heketi-1-w0j51   1/1       Running   0          47m
glusterfs-5g22n         1/1       Running   0          50m
glusterfs-g4nxr         1/1       Running   0          50m
glusterfs-k3d54         1/1       Running   0          50m

# heketi-cli -s http://10.128.2.2:8080 --user admin topology load --json=/tmp/openshift-glusterfs-ansible-rRMhiq/topology.json
	Found node glusterfsnode1.example.com on cluster cfc044eddb224dd217889fc92efca600
		Adding device /dev/vsda ... Unable to add device: Failed to get list of pods
	Creating node glusterfsnode2.example.com ... Unable to create node: Failed to get list of pods
	Creating node glusterfsnode3.example.com ... Unable to create node: Failed to get list of pods

Comment 1 Jose A. Rivera 2017-05-19 16:59:18 UTC
This should be fixed by the following PR:

https://github.com/openshift/openshift-ansible/pull/4245

Comment 3 Wenkai Shi 2017-06-21 06:42:06 UTC
Check with version openshift-ansible-3.6.121-1.git.0.ed0b72c.el7, installation still fail:

# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
...
TASK [openshift_storage_glusterfs : Verify heketi service] *********************
Wednesday 21 June 2017  06:07:51 +0000 (0:00:00.043)       0:14:28.856 ******** 
fatal: [master.example.com]: FAILED! => {
    "changed": false, 
    "cmd": [
        "oc", 
        "rsh", 
        "deploy-heketi-storage-1-zq9m8", 
        "heketi-cli", 
        "-s", 
        "http://localhost:8080", 
        "--user", 
        "admin", 
        "--secret", 
        "r8SizUA1YQJs0lyWRplZEXl1eNy8lLnP4a67Kqq/OuA=", 
        "cluster", 
        "list"
    ], 
    "delta": "0:00:00.196119", 
    "end": "2017-06-21 02:07:51.045884", 
    "failed": true, 
    "rc": 1, 
    "start": "2017-06-21 02:07:50.849765", 
    "warnings": []
}

STDERR:

Error from server (NotFound): pods "deploy-heketi-storage-1-zq9m8" not found
...

Comment 4 Jose A. Rivera 2017-06-21 20:49:22 UTC
Even newer PR that should hopefully actually fix this BZ:

https://github.com/openshift/openshift-ansible/pull/4534

Comment 6 Wenkai Shi 2017-06-23 05:11:55 UTC
Check with version openshift-ansible-3.6.122-1.git.0.62fcd88.el7, still failed:

# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
...
TASK [openshift_storage_glusterfs : Verify heketi service] *********************
Friday 23 June 2017  05:07:35 +0000 (0:00:00.076)       0:14:48.712 *********** 

fatal: [host-8-175-81.host.centralci.eng.rdu2.redhat.com]: FAILED! => {
    "changed": false, 
    "cmd": [
        "oc", 
        "rsh", 
        "deploy-heketi-storage-1-25jv8", 
        "heketi-cli", 
        "-s", 
        "http://localhost:8080", 
        "--user", 
        "admin", 
        "--secret", 
        "h7i7qOayOEI1IXflM0sx4UeccTiYx0HMTiJhP4o0Tyc=", 
        "cluster", 
        "list"
    ], 
    "delta": "0:00:00.201332", 
    "end": "2017-06-23 01:07:34.857759", 
    "failed": true, 
    "rc": 1, 
    "start": "2017-06-23 01:07:34.656427", 
    "warnings": []
}

STDERR:

Error from server (NotFound): pods "deploy-heketi-storage-1-25jv8" not found
...

Comment 7 Jose A. Rivera 2017-06-26 17:43:36 UTC
I don't know why it was moved to ON_QA, it hasn't merged yet.

Comment 8 Jose A. Rivera 2017-06-27 15:01:22 UTC
PR merged.

Comment 9 Wenkai Shi 2017-06-28 09:41:33 UTC
Verified with version openshift-ansible-3.6.126.0-1.git.0.f9c47bf.el7, installation succeed.
Pretty cool jobs!

# oc get po
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-12z27    1/1       Running   0          5m
registry-console-1-rdvsq   1/1       Running   0          4m
router-1-q3pwh             1/1       Running   0          6m

# oc get po -n glusterfs
NAME                      READY     STATUS    RESTARTS   AGE
glusterfs-storage-5hkh1   1/1       Running   0          10m
glusterfs-storage-xccjh   1/1       Running   0          10m
glusterfs-storage-xfvz4   1/1       Running   0          10m
heketi-storage-1-q9zr7    1/1       Running   0          7m

Comment 11 errata-xmlrpc 2017-08-10 05:21:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.