1658463 – OCS ansible install is failing at Wait for deploy-heketi pod task

Bug 1658463 - OCS ansible install is failing at Wait for deploy-heketi pod task

Summary: OCS ansible install is failing at Wait for deploy-heketi pod task

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	cns-ansible
Sub Component:
Version:	ocs-3.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jose A. Rivera
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-12 08:23 UTC by Ashmitha Ambastha
Modified:	2020-01-23 18:39 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-13 03:41:47 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ashmitha Ambastha 2018-12-12 08:23:46 UTC

Description of problem:
On a deployment of OCP 3.11.43, while deploying OCS 3.10 in converged mode, the deployment fails to pass the task Wait for deploy-heketi pods. 

ansible playbook run, 

# ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.yml -vvv

On describing the deploy-heketi pod,
The error seen is-

kubelet, dhcp46-26.lab.eng.blr.redhat.com  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "64b45294eea6bc10e657bcd9dfe4e21b84f62ea4ac2c7c0ab495f05daf936620" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to set up pod "deploy-heketi-storage-1-deploy_glusterfs" network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused, failed to clean up sandbox container "64b45294eea6bc10e657bcd9dfe4e21b84f62ea4ac2c7c0ab495f05daf936620" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to teardown pod "deploy-heketi-storage-1-deploy_glusterfs" network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused]


Version-Release number of selected component (if applicable): OCP 3.11.43 and OCS 3.10


How reproducible: twice


Steps to Reproduce:
1. Deploy OCP 3.11.43 and run OCS deployment.
2. The deployment fails at task, Wait for deploy-heketi pod.

Actual results: Fails at Wait for deploy-heketi pod task.


Expected results: The OCS deployment should pass
Additional info:

Comment 2 Ashmitha Ambastha 2018-12-12 08:26:14 UTC

############Snippet from ansible deployment of OCS 3.10 in converged mode:

<dhcp46-113.lab.eng.blr.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root
<dhcp46-113.lab.eng.blr.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/f7e374e19b dhcp46-113.lab.eng.blr.redhat.com '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
<dhcp46-113.lab.eng.blr.redhat.com> (0, '/root\n', '')
<dhcp46-113.lab.eng.blr.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root
<dhcp46-113.lab.eng.blr.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/f7e374e19b dhcp46-113.lab.eng.blr.redhat.com '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182 `" && echo ansible-tmp-1544602634.29-231169423874182="` echo /root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182 `" ) && sleep 0'"'"''
<dhcp46-113.lab.eng.blr.redhat.com> (0, 'ansible-tmp-1544602634.29-231169423874182=/root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182\n', '')
Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_obj.py
<dhcp46-113.lab.eng.blr.redhat.com> PUT /root/.ansible/tmp/ansible-local-50261RC_izL/tmpZbVhKn TO /root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182/oc_obj.py
<dhcp46-113.lab.eng.blr.redhat.com> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/f7e374e19b '[dhcp46-113.lab.eng.blr.redhat.com]'
<dhcp46-113.lab.eng.blr.redhat.com> (0, 'sftp> put /root/.ansible/tmp/ansible-local-50261RC_izL/tmpZbVhKn /root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182/oc_obj.py\n', '')
<dhcp46-113.lab.eng.blr.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root
<dhcp46-113.lab.eng.blr.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/f7e374e19b dhcp46-113.lab.eng.blr.redhat.com '/bin/sh -c '"'"'chmod u+x /root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182/ /root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182/oc_obj.py && sleep 0'"'"''
<dhcp46-113.lab.eng.blr.redhat.com> (0, '', '')
<dhcp46-113.lab.eng.blr.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root
<dhcp46-113.lab.eng.blr.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/f7e374e19b -tt dhcp46-113.lab.eng.blr.redhat.com '/bin/sh -c '"'"'/usr/bin/python /root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182/oc_obj.py && sleep 0'"'"''
<dhcp46-113.lab.eng.blr.redhat.com> (0, '\r\n{"invocation": {"module_args": {"files": null, "kind": "pod", "force": false, "name": null, "field_selector": null, "all_namespaces": null, "namespace": "glusterfs", "delete_after": false, "kubeconfig": "/etc/origin/master/admin.kubeconfig", "content": null, "state": "list", "debug": false, "selector": "glusterfs=deploy-heketi-storage-pod"}}, "state": "list", "changed": false, "results": {"returncode": 0, "cmd": "/usr/bin/oc get pod --selector=glusterfs=deploy-heketi-storage-pod -o json -n glusterfs", "results": [{"items": [], "kind": "List", "apiVersion": "v1", "metadata": {"selfLink": "", "resourceVersion": ""}}]}}\r\n', 'Shared connection to dhcp46-113.lab.eng.blr.redhat.com closed.\r\n')
<dhcp46-113.lab.eng.blr.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root
<dhcp46-113.lab.eng.blr.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/f7e374e19b dhcp46-113.lab.eng.blr.redhat.com '/bin/sh -c '"'"'rm -f -r /root/.ansible/tmp/ansible-tmp-1544602634.29-231169423874182/ > /dev/null 2>&1 && sleep 0'"'"''
<dhcp46-113.lab.eng.blr.redhat.com> (0, '', '')
FAILED - RETRYING: Wait for deploy-heketi pod (42 retries left).Result was: {
    "attempts": 139, 
    "changed": false, 
    "invocation": {
        "module_args": {
            "all_namespaces": null, 
            "content": null, 
            "debug": false, 
            "delete_after": false, 
            "field_selector": null, 
            "files": null, 
            "force": false, 
            "kind": "pod", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": null, 
            "namespace": "glusterfs", 
            "selector": "glusterfs=deploy-heketi-storage-pod", 
            "state": "list"
        }
    }, 
    "results": {
        "cmd": "/usr/bin/oc get pod --selector=glusterfs=deploy-heketi-storage-pod -o json -n glusterfs", 
        "results": [
            {
                "apiVersion": "v1", 
                "items": [], 
                "kind": "List", 
                "metadata": {
                    "resourceVersion": "", 
                    "selfLink": ""
                }
            }
        ], 
        "returncode": 0
    }, 
    "retries": 181, 
    "state": "list"
}

######### End of ansible run snippet
######### oc describe of the deploy-heketi pod

# oc describe  pod deploy-heketi-storage-1-deploy
Name:               deploy-heketi-storage-1-deploy
Namespace:          glusterfs
Priority:           0
PriorityClassName:  <none>
Node:               dhcp46-26.lab.eng.blr.redhat.com/10.70.46.26
Start Time:         Wed, 12 Dec 2018 13:22:53 +0530
Labels:             openshift.io/deployer-pod-for.name=deploy-heketi-storage-1
Annotations:        openshift.io/deployment-config.name=deploy-heketi-storage
                    openshift.io/deployment.name=deploy-heketi-storage-1
                    openshift.io/scc=restricted
Status:             Pending
IP:                 
Containers:
  deployment:
    Container ID:   
    Image:          registry.access.redhat.com/openshift3/ose-deployer:v3.11
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      OPENSHIFT_DEPLOYMENT_NAME:       deploy-heketi-storage-1
      OPENSHIFT_DEPLOYMENT_NAMESPACE:  glusterfs
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from deployer-token-xq8v6 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  deployer-token-xq8v6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  deployer-token-xq8v6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason                  Age                From                                       Message
  ----     ------                  ----               ----                                       -------
  Normal   Scheduled               20m                default-scheduler                          Successfully assigned glusterfs/deploy-heketi-storage-1-deploy to dhcp46-26.lab.eng.blr.redhat.com
  Warning  FailedCreatePodSandBox  20m                kubelet, dhcp46-26.lab.eng.blr.redhat.com  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "64b45294eea6bc10e657bcd9dfe4e21b84f62ea4ac2c7c0ab495f05daf936620" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to set up pod "deploy-heketi-storage-1-deploy_glusterfs" network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused, failed to clean up sandbox container "64b45294eea6bc10e657bcd9dfe4e21b84f62ea4ac2c7c0ab495f05daf936620" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to teardown pod "deploy-heketi-storage-1-deploy_glusterfs" network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused]
  Normal   SandboxChanged          0s (x58 over 20m)  kubelet, dhcp46-26.lab.eng.blr.redhat.com  Pod sandbox changed, it will be killed and re-created.

############# End ################

Comment 3 William Caban 2019-01-16 20:47:34 UTC

I'm seeing the exact same issue with OCP 3.11.59 and OCS 3.11.0. In my case is reproducible 100% of the time. (Tested using ovs-networkpolicy plugin, ovs-multitenant plugin and calico sdn)


#### From the "oc describe  pod deploy-heketi-storage-1-deploy":

...<snip>...
  PodScheduled      True
Volumes:
  deployer-token-cxrkf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  deployer-token-cxrkf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               8m                 default-scheduler        Successfully assigned openshift-storage/deploy-heketi-storage-1-deploy to ocp.shift.zone
  Warning  FailedCreatePodSandBox  8m                 kubelet, ocp.shift.zone  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "d78170e16282e24dce3167e5bef44355ed5abdb559c72276179485f6c54f494f" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to set up pod "deploy-heketi-storage-1-deploy_openshift-storage" network: context deadline exceeded, failed to clean up sandbox container "d78170e16282e24dce3167e5bef44355ed5abdb559c72276179485f6c54f494f" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to teardown pod "deploy-heketi-storage-1-deploy_openshift-storage" network: context deadline exceeded]
  Normal   SandboxChanged          11s (x23 over 8m)  kubelet, ocp.shift.zone  Pod sandbox changed, it will be killed and re-created.


#### Sample RUN 1
#### From "oc get events" at project level (installer deploying Heketi to infrastructure node):
0m       10m       1         deploy-heketi-storage.157a61202a3aebde            DeploymentConfig                                Normal    DeploymentCreated        deploymentconfig-controller    Created new replication controller "deploy-heketi-storage-1" for version 1
10m       10m       1         deploy-heketi-storage-1-deploy.157a61202d393214   Pod                                             Normal    Scheduled                default-scheduler              Successfully assigned openshift-storage/deploy-heketi-storage-1-deploy to ocp-inf2.shift.zone
10m       10m       1         deploy-heketi-storage-1-deploy.157a61251b464d7f   Pod                                             Warning   FailedCreatePodSandBox   kubelet, ocp-inf2.shift.zone   Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "c381b14226dfc3408b898154a840a95eb1aed4eda5cd28523b03d3038955ab40" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to set up pod "deploy-heketi-storage-1-deploy_openshift-storage" network: context deadline exceeded, failed to clean up sandbox container "c381b14226dfc3408b898154a840a95eb1aed4eda5cd28523b03d3038955ab40" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to teardown pod "deploy-heketi-storage-1-deploy_openshift-storage" network: context deadline exceeded]
1s        10m       28        deploy-heketi-storage-1-deploy.157a612550469fcd   Pod                                             Normal    SandboxChanged           kubelet, ocp-inf2.shift.zone   Pod sandbox changed, it will be killed and re-created.

#### SAMPLE RUN 2
#### From "oc get events" at project level (installer deploying Heketi to master node):

14m         14m          1         deploy-heketi-storage.157a695943f0da1f            DeploymentConfig                                Normal    DeploymentCreated        deploymentconfig-controller   Created new replication controller "deploy-heketi-storage-1" for version 1
14m         14m          1         deploy-heketi-storage-1-deploy.157a69594a295e88   Pod                                             Normal    Scheduled                default-scheduler             Successfully assigned openshift-storage/deploy-heketi-storage-1-deploy to ocp.shift.zone
14m         14m          1         deploy-heketi-storage-1-deploy.157a695e39b5c646   Pod                                             Warning   FailedCreatePodSandBox   kubelet, ocp.shift.zone       Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "d78170e16282e24dce3167e5bef44355ed5abdb559c72276179485f6c54f494f" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to set up pod "deploy-heketi-storage-1-deploy_openshift-storage" network: context deadline exceeded, failed to clean up sandbox container "d78170e16282e24dce3167e5bef44355ed5abdb559c72276179485f6c54f494f" network for pod "deploy-heketi-storage-1-deploy": NetworkPlugin cni failed to teardown pod "deploy-heketi-storage-1-deploy_openshift-storage" network: context deadline exceeded]
3m          14m          28        deploy-heketi-storage-1-deploy.157a695e611867e2   Pod                                             Normal    SandboxChanged           kubelet, ocp.shift.zone       Pod sandbox changed, it will be killed and re-created.

#### Additional Information
- The installation successfully deploy the Gluster DaemonSet 

- I also tried this but no difference: https://access.redhat.com/solutions/3785221

Comment 4 William Caban 2019-01-17 15:22:40 UTC

Note: Update to fix this issue when using OCP 3.11.59 and OCS 3.11.0 (Answering to my own report.)

I took two actions and that fixed the problem, so I'm documenting them here in case it helps someone else:

1) I found uninstalling OCP left behind some iptables rules and as re-installations occur the more of these iptables that are left behind (up to several pages). So, now I manually reset the iptables configuration to the default basic rules (allow 22, related, established, etc). I have not had this OCS installation problem since doing this after each uninstall.

2) When using calico, I also found some configurations are left behind under /etc/cni. I'm also removing those after the uninstall.

Comment 5 Niels de Vos 2019-02-06 19:03:05 UTC

Hi Ashmitha, did he hints from William in comment #4 help you?

Comment 6 Ashmitha Ambastha 2019-03-07 05:39:30 UTC

Hi Niels, 

I still hit this issue during 3.11 testing and what William has mentioned has helped. 
This is definitely not what is expected to happen. Any reason why this would happen during fresh installs?

Comment 7 Jose A. Rivera 2019-04-16 12:17:25 UTC

Was OCP installed on freshly provisioned VMs, or were the machines being reused?

In any case, is this still an issue? If not, please close this BZ.

Comment 8 Ashmitha Ambastha 2019-04-30 06:27:23 UTC

(In reply to Jose A. Rivera from comment #7)
> Was OCP installed on freshly provisioned VMs, or were the machines being
> reused?
> 
> In any case, is this still an issue? If not, please close this BZ.

I've seen this issue on both freshly provisioned VMs and VMs which were being reused after cleanup. 
And this is still an issue.

Comment 9 Jose A. Rivera 2019-05-06 12:10:49 UTC

Hmm... odd. Is this happening all the time, or only some times?

What are the exact workaround steps you apply when this occurs in the case of a freshly provisioned VM?

Is the following sequence of events correct?
1. Provision VM
2. Attempt OCP install, install fails
3. Apply workarounds
4. OCP install succeeds

What happens if you apply the workarounds prior to installing OCP?

Comment 10 William Caban 2019-05-06 13:18:16 UTC

If using the latest OCP 3.11.98 and OCS 3.11.2, I've found that with certain setups (resources constrains or slow disks) setting the following variable also helps as it give enough time for the GLusterfs cluster to sync among themselves:

openshift_storage_glusterfs_timeout=900

Another thing to keep in mind, if doing a re-install, make sure there is nothing left behind under "/etc/glusterfs" as failed configurations may interfere with new ones.

Comment 13 Mister X 2020-01-23 14:41:37 UTC

I can confirm that this issue is 100% reproducible in my case even with fresh new clean vms

Comment 14 Mister X 2020-01-23 14:46:40 UTC

I'm trying to overcome this issue in my deployment for around a week, I ran the deployment scripts prerequisites.yml, deploy_cluster.yml, uninstall.yml, again and again still same issue.

"name": "heketi", "ready": false, "restartCount": 0, "state": {"waiting": {"reason": "ContainerCreating"}}}], "hostIP": "192.168.1.212", "phase": "Pending", "qosClass": "BestEffort", "startTime": "2020-01-20T17:37:40Z"}}], "kind": "List", "metadata": {"resourceVersion": "", "selfLink": ""}}], "returncode": 0}, "state": "list"}

`

PLAY RECAP ****************************************************************************************************************************************************************************************
localhost : ok=12 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
os-infra.mydomain.com : ok=144 changed=36 unreachable=0 failed=0 skipped=163 rescued=0 ignored=0
os-master.mydomain.com : ok=471 changed=193 unreachable=0 failed=1 skipped=589 rescued=0 ignored=0
os-node.mydomain.com : ok=129 changed=36 unreachable=0 failed=0 skipped=159 rescued=0 ignored=0
os-storage.mydomain.com : ok=129 changed=36 unreachable=0 failed=0 skipped=159 rescued=0 ignored=0

INSTALLER STATUS **********************************************************************************************************************************************************************************
Initialization : Complete (0:00:26)
Health Check : Complete (0:00:07)
Node Bootstrap Preparation : Complete (0:03:14)
etcd Install : Complete (0:00:41)
Master Install : Complete (0:04:26)
Master Additional Install : Complete (0:00:40)
Node Join : Complete (0:00:43)
GlusterFS Install : In Progress (0:13:46)
This phase can be restarted by running: playbooks/openshift-glusterfs/new_install.yml

Failure summary:

Hosts: os-master.mydomain.com
Play: Configure GlusterFS
Task: Wait for heketi pod
Message: Failed without returning a message.`
Can someone please advice what should I do to be able to successfully deploy?

I'm using as standalone ESXi VMware as the hypervisor, and an RPM install of Origin.
ansible 2.9.2
Origin 3.11
Centos 7 as the OS for the nodes

`[root@os-master ~]# docker version
Client:
Version: 1.13.1
API version: 1.26
Package version: docker-1.13.1-103.git7f2769b.el7.centos.x86_64
Go version: go1.10.3
Git commit: 7f2769b/1.13.1
Built: Sun Sep 15 14:06:47 2019
OS/Arch: linux/amd64

Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Package version: docker-1.13.1-103.git7f2769b.el7.centos.x86_64
Go version: go1.10.3
Git commit: 7f2769b/1.13.1
Built: Sun Sep 15 14:06:47 2019
OS/Arch: linux/amd64
Experimental: false`

Here is my inventory:
[OSEv3:children]
masters
etcd
nodes
glusterfs
glusterfs_registry

[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin
openshift_release="3.11"
openshift_image_tag="v3.11"
openshift_master_default_subdomain=apps.mydomain.com
openshift_docker_selinux_enabled=True
openshift_check_min_host_memory_gb=16
openshift_check_min_host_disk_gb=50
openshift_disable_check=docker_image_availability
openshift_master_dynamic_provisioning_enabled=true
openshift_registry_selector="role=infra"
openshift_hosted_registry_storage_kind=glusterfs

openshift_metrics_install_metrics=true
openshift_metrics_cassandra_storage_type=pv
openshift_metrics_hawkular_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_cassandra_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_heapster_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_storage_volume_size=20Gi
openshift_metrics_cassandra_pvc_storage_class_name="glusterfs-registry-block"

openshift_logging_install_logging=true
openshift_logging_es_pvc_dynamic=true openshift_logging_storage_kind=dynamic
openshift_logging_kibana_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_curator_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_pvc_size=20Gi
openshift_logging_es_pvc_storage_class_name="glusterfs-registry-block"

openshift_storage_glusterfs_registry_namespace=infra-storage
openshift_storage_glusterfs_registry_storageclass=false
openshift_storage_glusterfs_registry_storageclass_default=false
openshift_storage_glusterfs_registry_block_deploy=true
openshift_storage_glusterfs_registry_block_host_vol_create=true
openshift_storage_glusterfs_registry_block_host_vol_size=100
openshift_storage_glusterfs_registry_block_storageclass=true
openshift_storage_glusterfs_registry_block_storageclass_default=false

[masters]
os-master.mydomain.com

[etcd]
os-master.mydomain.com

[nodes]
os-master.mydomain.com openshift_node_group_name="node-config-master"
os-infra.mydomain.com openshift_node_group_name="node-config-infra"
os-storage.mydomain.com openshift_node_group_name="node-config-compute"
os-node.mydomain.com openshift_node_group_name="node-config-compute"

[glusterfs_registry]
os-infra.mydomain.com glusterfs_ip='192.168.1.213' glusterfs_devices='["/dev/sdb"]'
os-node.mydomain.com glusterfs_ip='192.168.1.214' glusterfs_devices='["/dev/sdb"]'
os-storage.mydomain.com glusterfs_ip='192.168.1.215' glusterfs_devices='["/dev/sdb"]'

[glusterfs]
os-infra.mydomain.com glusterfs_ip='192.168.1.213' glusterfs_devices='["/dev/sdb"]'
os-node.mydomain.com glusterfs_ip='192.168.1.214' glusterfs_devices='["/dev/sdb"]'
os-storage.mydomain.com glusterfs_ip='192.168.1.215' glusterfs_devices='["/dev/sdb"]'`

Many Thanks on advance.

Comment 15 Mister X 2020-01-23 18:39:51 UTC

Unfortunately the workaround (openshift_storage_glusterfs_timeout=900) doesn’t work in my case

Note You need to log in before you can comment on or make changes to this bug.