Description of problem: I had a working installation of CNS 3.6 on OCP 3.6 and had successfully migrated registry storage to glusterfs storage and metrics and logging to glusterblock storage. Due to maintenance on the underlying infrastructure, all OCP nodes needed to be shut down for a couple of hours. When powering to cluster back on, the glusterblock volumes fails to mount in the cassandra and elasticsearch pods while the registry pod can mount its glusterfs volume again without problem. # oc get pods --all-namespaces | egrep "es-data|cassandra" logging logging-es-data-master-2yyubnfn-1-z90sp 0/1 ContainerCreating 0 3d logging logging-es-data-master-80izip4w-1-bbqhr 0/1 ContainerCreating 0 3d logging logging-es-data-master-is16luj3-1-th2gb 0/1 ContainerCreating 0 3d openshift-infra hawkular-cassandra-1-5ctm1 0/1 ContainerCreating 0 3d Similar error messages are seen in each of these pods: # oc describe pod logging-es-data-master-is16luj3-1-th2gb -n logging ... Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 3d 23m 2281 kubelet, cns01.example.com Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/iscsi/9dd4d291-0286-11e8-9b78-001a4a160453-pvc-002a3213-01b2-11e8-92a8-001a4a160352" (spec.Name: "pvc-002a3213-01b2-11e8-92a8-001a4a160352") pod "9dd4d291-0286-11e8-9b78-001a4a160453" (UID: "9dd4d291-0286-11e8-9b78-001a4a160453") with: failed to get any path for iscsi disk, last err seen: Could not attach disk: Timeout after 10s 3d 11m 2529 kubelet, cns01.example.com Warning FailedSync Error syncing pod 3d 2m 2531 kubelet, cns01.example.com Warning FailedMount Unable to mount volumes for pod "logging-es-data-master-is16luj3-1-th2gb_logging(9dd4d291-0286-11e8-9b78-001a4a160453)": timeout expired waiting for volumes to attach/mount for pod "logging"/"logging-es-data-master-is16luj3-1-th2gb". list of unattached/unmounted volumes=[elasticsearch-storage] Version-Release number of selected component (if applicable): Openshift Container Platform 3.6 Container Native Storage 3.6 How reproducible: 1 out of 1 try for me Steps to Reproduce: 1. Install OCP 3.6 cluster according to documentation: https://docs.openshift.com/container-platform/3.6/install_config/install/advanced_install.html 2. Deploy CNS 3.6 according to documentation: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/container-native_storage_for_openshift_container_platform/chap-documentation-install_upgrade_matrix_red_hat_gluster_storage_container_native_with_openshift_platform-introduction_containerized_rhgs#idm140179699791088 3. Configure block storage according to documentation: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/#Block_Storage 4. Migrate Registry to CNS backed glusterfs volume according to documentation: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/#chap-Documentation-Red_Hat_Gluster_Storage_Container_Native_with_OpenShift_Platform-Updating_Registry 5. Migrate Metrics and Logging to CNS backed glusterblock volumes according to documentation: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/#Logging_Metrics 6. Reboot each node in the cluster. Actual results: The registry pod successfully mounts the glusterfs volume again. The cassandra and elasticsearch pods fail to mount the glusterblock volumes with the following error: # oc describe pod hawkular-cassandra-1-5ctm1 -n openshift-infra ... 3d 4m 1895 kubelet, cns01.example.com Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/iscsi/19bb1a09-029c-11e8-aecf-001a4a160352-pvc-3684bb5a-01bc-11e8-a7a9-001a4a160352" (spec.Name: "pvc-3684bb5a-01bc-11e8-a7a9-001a4a160352") pod "19bb1a09-029c-11e8-aecf-001a4a160352" (UID: "19bb1a09-029c-11e8-aecf-001a4a160352") with: failed to get any path for iscsi disk, last err seen: Could not attach disk: Timeout after 10s In /var/log/messages on the CNS node, the following errors appear: Jan 29 19:52:38 cns01.example.com journal: E0129 19:52:38.145119 14821 iscsi_util.go:272] iscsi: failed to get any path for iscsi disk, last err seen: Jan 29 19:52:38 cns01.example.com atomic-openshift-node: E0129 19:52:38.145618 14821 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/iscsi/9dd4d291-0286-11e8-9b78-001a4a160453-pvc-002a3213-01b2-11e8-92a8-001a4a160352\" (\"9dd4d291-0286-11e8-9b78-001a4a160453\")" failed. No retries permitted until 2018-01-29 19:54:38.145189855 +0100 CET (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/iscsi/9dd4d291-0286-11e8-9b78-001a4a160453-pvc-002a3213-01b2-11e8-92a8-001a4a160352" (spec.Name: "pvc-002a3213-01b2-11e8-92a8-001a4a160352") pod "9dd4d291-0286-11e8-9b78-001a4a160453" (UID: "9dd4d291-0286-11e8-9b78-001a4a160453") with: failed to get any path for iscsi disk, last err seen: Jan 29 19:52:38 cns01.example.com atomic-openshift-node: Could not attach disk: Timeout after 10s Jan 29 19:52:38 cns01.example.com journal: Could not attach disk: Timeout after 10s Jan 29 19:52:38 cns01.example.com journal: E0129 19:52:38.145129 14821 disk_manager.go:50] failed to attach disk Jan 29 19:52:38 cns01.example.com journal: E0129 19:52:38.145132 14821 iscsi.go:247] iscsi: failed to setup Jan 29 19:52:38 cns01.example.com journal: E0129 19:52:38.145618 14821 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/iscsi/9dd4d291-0286-11e8-9b78-001a4a160453-pvc-002a3213-01b2-11e8-92a8-001a4a160352\" (\"9dd4d291-0286-11e8-9b78-001a4a160453\")" failed. No retries permitted until 2018-01-29 19:54:38.145189855 +0100 CET (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/iscsi/9dd4d291-0286-11e8-9b78-001a4a160453-pvc-002a3213-01b2-11e8-92a8-001a4a160352" (spec.Name: "pvc-002a3213-01b2-11e8-92a8-001a4a160352") pod "9dd4d291-0286-11e8-9b78-001a4a160453" (UID: "9dd4d291-0286-11e8-9b78-001a4a160453") with: failed to get any path for iscsi disk, last err seen: Jan 29 19:52:38 cns01.example.com journal: Could not attach disk: Timeout after 10s Expected results: The glusterblock volumes should successfully mount in the cassandra and elasticsearch pods.
I can add that the environment above is running on RHHI v1.1.
PR is upstream: https://github.com/openshift/openshift-ansible/pull/7198
Moving this BZ to OCP as this is a bug with the openshift-ansible installer.
Fix is in openshift-ansible-3.9.0-0.46.0
Will verify this once BZ #1547229 fix.
Verified with version openshift-ansible-3.9.1-1.git.0.9862628.el7, code merged. Once installation done, target mount has been added for gluster block. # oc export ds glusterfs-registry ... - mountPath: /etc/target name: glusterfs-target ... - hostPath: path: /etc/target type: "" name: glusterfs-target ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489
Hello Prasanna, What can we do for this custimer who re-open the case? I'm reopening this because I see that the BZ is now closed: https://bugzilla.redhat.com/show_bug.cgi?id=1540080 With errata: https://access.redhat.com/errata/RHBA-2018:0489 The errata is only for OCP 3.9, but we need this for 3.6. We can't upgrade our cluster because RHMAP that we are running on top of OCP is only supported on 3.6. Thanks and Regards Oonkwee Lim Enterprise Cloud Support
https://github.com/openshift/openshift-ansible/pull/7767 This went in. So now after new release done this should work for 3.6