1412281 – gluster pods loose /var/log mounts and fail to start.

Bug 1412281 - gluster pods loose /var/log mounts and fail to start.

Summary: gluster pods loose /var/log mounts and fail to start.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	cns-deploy-tool
Sub Component:
Version:	cns-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	CNS 3.4
Assignee:	Mohamed Ashiq
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:	1412728
Blocks:	1385247
TreeView+	depends on / blocked

Reported:	2017-01-11 16:49 UTC by Prasanth
Modified:	2018-12-18 06:50 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-18 21:58:44 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:0148	0	normal	SHIPPED_LIVE	heketi bug fix and enhancement update	2017-01-19 02:53:24 UTC

Description Prasanth 2017-01-11 16:49:22 UTC

Description of problem:

cns-deployment tool fails to deploy CNS as the gluster pods were not up and it eventually aborted the setup.

$$$$$$$$$$$$
# oc get pods
NAME                             READY     STATUS    RESTARTS   AGE
glusterfs-b15tq                  0/1       Running   0          5m
glusterfs-bixbl                  0/1       Running   1          5m
glusterfs-f9dsr                  0/1       Running   0          5m
storage-project-router-1-z42i4   1/1       Running   0          10m
$$$$$$$$$$$$


###########################
# cns-deploy topology.json --deploy-gluster --cli oc --templates_dir=/usr/share/heketi/templates --namespace storage-project --yes --log-file=/var/log/cns-deploy/1-latest-cns-deploy.log --verbose
Using OpenShift CLI.
NAME              STATUS    AGE
storage-project   Active    7m
Using namespace "storage-project".
template "deploy-heketi" created
serviceaccount "heketi-service-account" created
template "heketi" created
template "glusterfs" created
Marking 'dhcp47-1.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp47-1.lab.eng.blr.redhat.com" labeled
Marking 'dhcp46-50.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-50.lab.eng.blr.redhat.com" labeled
Marking 'dhcp47-136.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp47-136.lab.eng.blr.redhat.com" labeled
Deploying GlusterFS pods.
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ... Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         3s
glusterfs-bixbl   0/1       ContainerCreating   0         3s
glusterfs-f9dsr   0/1       ContainerCreating   0         3s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         5s
glusterfs-bixbl   0/1       ContainerCreating   0         5s
glusterfs-f9dsr   0/1       ContainerCreating   0         5s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         7s
glusterfs-bixbl   0/1       ContainerCreating   0         7s
glusterfs-f9dsr   0/1       ContainerCreating   0         7s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         9s
glusterfs-bixbl   0/1       ContainerCreating   0         9s
glusterfs-f9dsr   0/1       ContainerCreating   0         9s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         12s
glusterfs-bixbl   0/1       ContainerCreating   0         12s
glusterfs-f9dsr   0/1       ContainerCreating   0         12s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         14s
glusterfs-bixbl   0/1       ContainerCreating   0         14s
glusterfs-f9dsr   0/1       ContainerCreating   0         14s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         16s
glusterfs-bixbl   0/1       ContainerCreating   0         16s
glusterfs-f9dsr   0/1       ContainerCreating   0         16s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         18s
glusterfs-bixbl   0/1       ContainerCreating   0         18s
glusterfs-f9dsr   0/1       ContainerCreating   0         18s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         21s
glusterfs-bixbl   0/1       ContainerCreating   0         21s
glusterfs-f9dsr   0/1       ContainerCreating   0         21s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         23s
glusterfs-bixbl   0/1       ContainerCreating   0         23s
glusterfs-f9dsr   0/1       ContainerCreating   0         23s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         25s
glusterfs-bixbl   0/1       ContainerCreating   0         25s
glusterfs-f9dsr   0/1       ContainerCreating   0         25s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         27s
glusterfs-bixbl   0/1       ContainerCreating   0         27s
glusterfs-f9dsr   0/1       ContainerCreating   0         27s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         30s
glusterfs-bixbl   0/1       ContainerCreating   0         30s
glusterfs-f9dsr   0/1       ContainerCreating   0         30s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         32s
glusterfs-bixbl   0/1       ContainerCreating   0         32s
glusterfs-f9dsr   0/1       ContainerCreating   0         32s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         34s
glusterfs-bixbl   0/1       ContainerCreating   0         34s
glusterfs-f9dsr   0/1       ContainerCreating   0         34s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         36s
glusterfs-bixbl   0/1       ContainerCreating   0         36s
glusterfs-f9dsr   0/1       ContainerCreating   0         36s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         38s
glusterfs-bixbl   0/1       ContainerCreating   0         38s
glusterfs-f9dsr   0/1       ContainerCreating   0         38s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         41s
glusterfs-bixbl   0/1       ContainerCreating   0         41s
glusterfs-f9dsr   0/1       ContainerCreating   0         41s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         43s
glusterfs-bixbl   0/1       ContainerCreating   0         43s
glusterfs-f9dsr   0/1       ContainerCreating   0         43s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         45s
glusterfs-bixbl   0/1       ContainerCreating   0         45s
glusterfs-f9dsr   0/1       ContainerCreating   0         45s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         47s
glusterfs-bixbl   0/1       ContainerCreating   0         47s
glusterfs-f9dsr   0/1       ContainerCreating   0         47s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         50s
glusterfs-bixbl   0/1       ContainerCreating   0         50s
glusterfs-f9dsr   0/1       ContainerCreating   0         50s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         52s
glusterfs-bixbl   0/1       ContainerCreating   0         52s
glusterfs-f9dsr   0/1       ContainerCreating   0         52s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         54s
glusterfs-bixbl   0/1       ContainerCreating   0         54s
glusterfs-f9dsr   0/1       ContainerCreating   0         54s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         56s
glusterfs-bixbl   0/1       ContainerCreating   0         56s
glusterfs-f9dsr   0/1       ContainerCreating   0         56s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         59s
glusterfs-bixbl   0/1       ContainerCreating   0         59s
glusterfs-f9dsr   0/1       ContainerCreating   0         59s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         1m
glusterfs-bixbl   0/1       ContainerCreating   0         1m
glusterfs-f9dsr   0/1       ContainerCreating   0         1m
..Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         1m
glusterfs-bixbl   0/1       Running             0         1m
glusterfs-f9dsr   0/1       ContainerCreating   0         1m
.......................Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       ContainerCreating   0         2m
glusterfs-bixbl   0/1       Running             0         2m
glusterfs-f9dsr   0/1       ContainerCreating   0         2m
.........................Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       Running             0         3m
glusterfs-bixbl   0/1       Running             0         3m
glusterfs-f9dsr   0/1       ContainerCreating   0         3m
.......Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       Running   0         3m
glusterfs-bixbl   0/1       Running   0         3m
glusterfs-f9dsr   0/1       Running   0         3m
............Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       Running   0         3m
glusterfs-bixbl   0/1       Running   1         3m
glusterfs-f9dsr   0/1       Running   0         3m
.....Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       Running   0         4m
glusterfs-bixbl   0/1       Running   1         4m
glusterfs-f9dsr   0/1       Running   0         4m
..........................Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       Running   0         5m
glusterfs-bixbl   0/1       Running   1         5m
glusterfs-f9dsr   0/1       Running   0         5m
...............Checking status of pods matching 'glusterfs-node=pod':
glusterfs-b15tq   0/1       Running   1         5m
glusterfs-bixbl   0/1       Running   1         5m
glusterfs-f9dsr   0/1       Running   0         5m
Timed out waiting for pods matching 'glusterfs-node=pod'.
No resources found
Error from server: deploymentconfig "heketi" not found
Error from server: services "heketi" not found
Error from server: routes "heketi" not found
Error from server: services "heketi-storage-endpoints" not found
serviceaccount "heketi-service-account" deleted
template "deploy-heketi" deleted
template "heketi" deleted
Removing label from 'dhcp47-1.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp47-1.lab.eng.blr.redhat.com" labeled
Removing label from 'dhcp46-50.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-50.lab.eng.blr.redhat.com" labeled
Removing label from 'dhcp47-136.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp47-136.lab.eng.blr.redhat.com" labeled
daemonset "glusterfs" deleted
template "glusterfs" deleted
############################

Version-Release number of selected component (if applicable):
heketi-client-3.1.0-14.el7rhgs.x86_64
cns-deploy-3.1.0-14.el7rhgs.x86_64

etcd-3.0.15-1.el7.x86_64
docker-common-1.12.5-12.el7.x86_64
docker-rhel-push-plugin-1.12.5-12.el7.x86_64
docker-1.12.5-12.el7.x86_64
docker-client-1.12.5-12.el7.x86_64
openshift-ansible-filter-plugins-3.4.44-1.git.0.efa61c6.el7.noarch
openshift-ansible-playbooks-3.4.44-1.git.0.efa61c6.el7.noarch
atomic-openshift-node-3.4.0.39-1.git.0.5f32f06.el7.x86_64
openshift-ansible-3.4.44-1.git.0.efa61c6.el7.noarch
openshift-ansible-lookup-plugins-3.4.44-1.git.0.efa61c6.el7.noarch
openshift-ansible-roles-3.4.44-1.git.0.efa61c6.el7.noarch
atomic-openshift-utils-3.4.44-1.git.0.efa61c6.el7.noarch
atomic-openshift-clients-3.4.0.39-1.git.0.5f32f06.el7.x86_64
openshift-ansible-docs-3.4.44-1.git.0.efa61c6.el7.noarch
openshift-ansible-callback-plugins-3.4.44-1.git.0.efa61c6.el7.noarch
atomic-openshift-3.4.0.39-1.git.0.5f32f06.el7.x86_64
atomic-openshift-master-3.4.0.39-1.git.0.5f32f06.el7.x86_64
tuned-profiles-atomic-openshift-node-3.4.0.39-1.git.0.5f32f06.el7.x86_64
atomic-openshift-sdn-ovs-3.4.0.39-1.git.0.5f32f06.el7.x86_64

How reproducible: Seen twice in 2 different setups while testing the latest RC build + RHEL 7.3.2 updates from Stage


Steps to Reproduce:
1. Setup OCP 3.4 with the latest build from http://download.lab.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.4/latest/x86_64/os
2. Create and setup a router
3. Execute # cns-deploy topology.json --deploy-gluster --cli oc --templates_dir=/usr/share/heketi/templates --namespace storage-project --yes --log-file=/var/log/1-latest-cns-deploy.log --verbose

Actual results: CNS deployment failed as glusterd failed to start in the pods


Expected results: CNS deployment should be succesful. 


Additional info:

########################
# oc describe pod glusterfs-f9dsr
Name:                   glusterfs-f9dsr
Namespace:              storage-project
Security Policy:        privileged
Node:                   dhcp46-50.lab.eng.blr.redhat.com/10.70.46.50
Start Time:             Wed, 11 Jan 2017 22:01:23 +0530
Labels:                 glusterfs-node=pod
Status:                 Running
IP:                     10.70.46.50
Controllers:            DaemonSet/glusterfs
Containers:
  glusterfs:
    Container ID:       docker://480bbb4912f6cd1e5073e32e5e9c553a0837643e867c5927ec87297730e93a0e
    Image:              rhgs3/rhgs-server-rhel7:3.1.3-17
    Image ID:           docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7@sha256:781ad1e4a9d229fa881702818b4e5b84022b9c0480fa210e5f6782a4605309a5
    Port:
    State:              Running
      Started:          Wed, 11 Jan 2017 22:04:38 +0530
    Ready:              False
    Restart Count:      0
    Liveness:           exec [/bin/bash -c systemctl status glusterd.service] delay=100s timeout=3s period=10s #success=1 #failure=3
    Readiness:          exec [/bin/bash -c systemctl status glusterd.service] delay=100s timeout=3s period=10s #success=1 #failure=3
    Volume Mounts:
      /dev from glusterfs-dev (rw)
      /etc/glusterfs from glusterfs-etc (rw)
      /run from glusterfs-run (rw)
      /run/lvm from glusterfs-lvm (rw)
      /sys/fs/cgroup from glusterfs-cgroup (ro)
      /var/lib/glusterd from glusterfs-config (rw)
      /var/lib/heketi from glusterfs-heketi (rw)
      /var/lib/misc/glusterfsd from glusterfs-misc (rw)
      /var/log/glusterfs from glusterfs-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ibzaf (ro)
    Environment Variables:      <none>
Conditions:
  Type          Status
  Initialized   True 
  Ready         False 
  PodScheduled  True 
Volumes:
  glusterfs-heketi:
    Type:       HostPath (bare host directory volume)
    Path:       /var/lib/heketi
  glusterfs-run:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  glusterfs-lvm:
    Type:       HostPath (bare host directory volume)
    Path:       /run/lvm
  glusterfs-etc:
    Type:       HostPath (bare host directory volume)
    Path:       /etc/glusterfs
  glusterfs-logs:
    Type:       HostPath (bare host directory volume)
    Path:       /var/log/glusterfs
  glusterfs-config:
    Type:       HostPath (bare host directory volume)
    Path:       /var/lib/glusterd
  glusterfs-dev:
    Type:       HostPath (bare host directory volume)
    Path:       /dev
  glusterfs-misc:
    Type:       HostPath (bare host directory volume)
    Path:       /var/lib/misc/glusterfsd
  glusterfs-cgroup:
    Type:       HostPath (bare host directory volume)
    Path:       /sys/fs/cgroup
  default-token-ibzaf:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-ibzaf
QoS Class:      BestEffort
Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                                            SubobjectPath                   Type            Reason          Message
  ---------     --------        -----   ----                                            -------------                   --------        ------          -------
  5m            5m              1       {kubelet dhcp46-50.lab.eng.blr.redhat.com}      spec.containers{glusterfs}      Normal          Pulling         pulling image "rhgs3/rhgs-server-rhel7:3.1.3-17"
  2m            2m              1       {kubelet dhcp46-50.lab.eng.blr.redhat.com}      spec.containers{glusterfs}      Normal          Pulled          Successfully pulled image "rhgs3/rhgs-server-rhel7:3.1.3-17"
  1m            1m              1       {kubelet dhcp46-50.lab.eng.blr.redhat.com}      spec.containers{glusterfs}      Normal          Created         Created container with docker id 480bbb4912f6; Security:[seccomp=unconfined]
  1m            1m              1       {kubelet dhcp46-50.lab.eng.blr.redhat.com}      spec.containers{glusterfs}      Normal          Started         Started container with docker id 480bbb4912f6
  11s           11s             1       {kubelet dhcp46-50.lab.eng.blr.redhat.com}      spec.containers{glusterfs}      Warning         Unhealthy       Readiness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-01-11 16:34:39 UTC; 1min 43s ago
  Process: 113 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=255)

Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com glusterd[113]: ERROR: failed to create logfile "/var/log/glusterfs/etc-glusterfs-glusterd.vol.log" (No such file or directory)
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com glusterd[113]: ERROR: failed to open logfile /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=255
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.

  11s   11s     1       {kubelet dhcp46-50.lab.eng.blr.redhat.com}      spec.containers{glusterfs}      Warning Unhealthy       Liveness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-01-11 16:34:39 UTC; 1min 43s ago
  Process: 113 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=255)

Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com glusterd[113]: ERROR: failed to create logfile "/var/log/glusterfs/etc-glusterfs-glusterd.vol.log" (No such file or directory)
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com glusterd[113]: ERROR: failed to open logfile /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=255
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.

  1s    1s      1       {kubelet dhcp46-50.lab.eng.blr.redhat.com}      spec.containers{glusterfs}      Warning Unhealthy       Liveness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-01-11 16:34:39 UTC; 1min 53s ago
  Process: 113 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=255)

Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com glusterd[113]: ERROR: failed to create logfile "/var/log/glusterfs/etc-glusterfs-glusterd.vol.log" (No such file or directory)
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com glusterd[113]: ERROR: failed to open logfile /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=255
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.

  1s    1s      1       {kubelet dhcp46-50.lab.eng.blr.redhat.com}      spec.containers{glusterfs}      Warning Unhealthy       Readiness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-01-11 16:34:39 UTC; 1min 53s ago
  Process: 113 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=255)

Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com glusterd[113]: ERROR: failed to create logfile "/var/log/glusterfs/etc-glusterfs-glusterd.vol.log" (No such file or directory)
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com glusterd[113]: ERROR: failed to open logfile /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=255
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Jan 11 16:34:39 dhcp46-50.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.
########################

Comment 3 Humble Chirammal 2017-01-12 05:02:24 UTC

Looks to be a selinux issue, but I am not absolutely certain. Any luck if we disable selinux in the host ?

Comment 4 Prasanth 2017-01-12 08:19:55 UTC

(In reply to Humble Chirammal from comment #3)
> Looks to be a selinux issue, but I am not absolutely certain. Any luck if we
> disable selinux in the host ?

I've disabled SELinux in all the nodes and retested the same again. However, the issue still remains is what i noticed.

################################
[root@dhcp47-1 audit]# getenforce 
Permissive

[root@dhcp47-1 ~]# sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28


[root@dhcp46-50 audit]# getenforce 
Permissive

[root@dhcp46-50 ~]# sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28


[root@dhcp47-136 audit]# getenforce 
Permissive

[root@dhcp47-136 ~]# sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28



# cns-deploy topology.json --deploy-gluster --cli oc --templates_dir=/usr/share/heketi/templates --namespace storage-project --yes --log-file=/var/log/cns-deploy/5-latest-cns-deploy.log --verbose
Using OpenShift CLI.
NAME              STATUS    AGE
storage-project   Active    14h
Using namespace "storage-project".
template "deploy-heketi" created
serviceaccount "heketi-service-account" created
template "heketi" created
template "glusterfs" created
Marking 'dhcp47-1.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp47-1.lab.eng.blr.redhat.com" labeled
Marking 'dhcp46-50.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-50.lab.eng.blr.redhat.com" labeled
Marking 'dhcp47-136.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp47-136.lab.eng.blr.redhat.com" labeled
Deploying GlusterFS pods.
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ... Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       ContainerCreating   0         3s
glusterfs-r58z0   0/1       ContainerCreating   0         3s
glusterfs-vm5o6   0/1       ContainerCreating   0         3s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       ContainerCreating   0         5s
glusterfs-r58z0   0/1       ContainerCreating   0         5s
glusterfs-vm5o6   0/1       ContainerCreating   0         5s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       ContainerCreating   0         7s
glusterfs-r58z0   0/1       ContainerCreating   0         7s
glusterfs-vm5o6   0/1       ContainerCreating   0         7s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         9s
glusterfs-r58z0   0/1       Running   0         9s
glusterfs-vm5o6   0/1       Running   0         9s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         12s
glusterfs-r58z0   0/1       Running   0         12s
glusterfs-vm5o6   0/1       Running   0         12s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         14s
glusterfs-r58z0   0/1       Running   0         14s
glusterfs-vm5o6   0/1       Running   0         14s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         16s
glusterfs-r58z0   0/1       Running   0         16s
glusterfs-vm5o6   0/1       Running   0         16s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         18s
glusterfs-r58z0   0/1       Running   0         18s
glusterfs-vm5o6   0/1       Running   0         18s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         21s
glusterfs-r58z0   0/1       Running   0         21s
glusterfs-vm5o6   0/1       Running   0         21s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         23s
glusterfs-r58z0   0/1       Running   0         23s
glusterfs-vm5o6   0/1       Running   0         23s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         25s
glusterfs-r58z0   0/1       Running   0         25s
glusterfs-vm5o6   0/1       Running   0         25s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         27s
glusterfs-r58z0   0/1       Running   0         27s
glusterfs-vm5o6   0/1       Running   0         27s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         29s
glusterfs-r58z0   0/1       Running   0         29s
glusterfs-vm5o6   0/1       Running   0         29s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         32s
glusterfs-r58z0   0/1       Running   0         32s
glusterfs-vm5o6   0/1       Running   0         32s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         34s
glusterfs-r58z0   0/1       Running   0         34s
glusterfs-vm5o6   0/1       Running   0         34s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         36s
glusterfs-r58z0   0/1       Running   0         36s
glusterfs-vm5o6   0/1       Running   0         36s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         39s
glusterfs-r58z0   0/1       Running   0         39s
glusterfs-vm5o6   0/1       Running   0         39s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         41s
glusterfs-r58z0   0/1       Running   0         41s
glusterfs-vm5o6   0/1       Running   0         41s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         43s
glusterfs-r58z0   0/1       Running   0         43s
glusterfs-vm5o6   0/1       Running   0         43s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         45s
glusterfs-r58z0   0/1       Running   0         45s
glusterfs-vm5o6   0/1       Running   0         45s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         47s
glusterfs-r58z0   0/1       Running   0         47s
glusterfs-vm5o6   0/1       Running   0         47s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         50s
glusterfs-r58z0   0/1       Running   0         50s
glusterfs-vm5o6   0/1       Running   0         50s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         52s
glusterfs-r58z0   0/1       Running   0         52s
glusterfs-vm5o6   0/1       Running   0         52s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         54s
glusterfs-r58z0   0/1       Running   0         54s
glusterfs-vm5o6   0/1       Running   0         54s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         56s
glusterfs-r58z0   0/1       Running   0         56s
glusterfs-vm5o6   0/1       Running   0         56s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         59s
glusterfs-r58z0   0/1       Running   0         59s
glusterfs-vm5o6   0/1       Running   0         59s
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         1m
glusterfs-r58z0   0/1       Running   0         1m
glusterfs-vm5o6   0/1       Running   0         1m
..........................Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         2m
glusterfs-r58z0   0/1       Running   0         2m
glusterfs-vm5o6   0/1       Running   0         2m
..................Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   0         2m
glusterfs-r58z0   0/1       Running   1         2m
glusterfs-vm5o6   0/1       Running   0         2m
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   1         2m
glusterfs-r58z0   0/1       Running   1         2m
glusterfs-vm5o6   0/1       Running   1         2m
......Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   1         3m
glusterfs-r58z0   0/1       Running   1         3m
glusterfs-vm5o6   0/1       Running   1         3m
..........................Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   1         4m
glusterfs-r58z0   0/1       Running   1         4m
glusterfs-vm5o6   0/1       Running   1         4m
.........................Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   1         5m
glusterfs-r58z0   0/1       Running   1         5m
glusterfs-vm5o6   0/1       Running   1         5m
.Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   1         5m
glusterfs-r58z0   0/1       Running   2         5m
glusterfs-vm5o6   0/1       Running   2         5m
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-8u6ce   0/1       Running   2         5m
glusterfs-r58z0   0/1       Running   2         5m
glusterfs-vm5o6   0/1       Running   2         5m
.............Timed out waiting for pods matching 'glusterfs-node=pod'.
No resources found
Error from server: deploymentconfig "heketi" not found
Error from server: services "heketi" not found
Error from server: routes "heketi" not found
Error from server: services "heketi-storage-endpoints" not found
serviceaccount "heketi-service-account" deleted
template "deploy-heketi" deleted
template "heketi" deleted
Removing label from 'dhcp47-1.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp47-1.lab.eng.blr.redhat.com" labeled
Removing label from 'dhcp46-50.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-50.lab.eng.blr.redhat.com" labeled
Removing label from 'dhcp47-136.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp47-136.lab.eng.blr.redhat.com" labeled
daemonset "glusterfs" deleted
template "glusterfs" deleted
################################

I'll update the audit logs and other useful logs soon for further debugging.

Comment 5 Prasanth 2017-01-12 08:22:05 UTC

[root@dhcp46-209 ~]# oc get pods
NAME                             READY     STATUS    RESTARTS   AGE
glusterfs-8u6ce                  0/1       Running   0          19s
glusterfs-r58z0                  0/1       Running   0          19s
glusterfs-vm5o6                  0/1       Running   0          19s
storage-project-router-1-z42i4   1/1       Running   1          14h


See the errors seen inside the gluster containers below:

****************************************
[root@dhcp46-209 ~]# oc rsh glusterfs-8u6ce
sh-4.2# systemctl status gluster-setup 
● gluster-setup.service - Configuring GlusterFS in container
   Loaded: loaded (/etc/systemd/system/gluster-setup.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2017-01-12 06:27:20 UTC; 18s ago
  Process: 102 ExecStart=/usr/sbin/gluster-setup.sh (code=exited, status=1/FAILURE)
 Main PID: 102 (code=exited, status=1/FAILURE)

Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Starting Configuring GlusterFS in container...
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com gluster-setup.sh[102]: ls: cannot access /var/log/glusterfs: No such file or directory
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com gluster-setup.sh[102]: cp: cannot stat '/var/log/glusterfs_bkp/*': No such file or directory
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com gluster-setup.sh[102]: Failed to copy /var/log/glusterfs
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: gluster-setup.service: main process exited, code=exited, status=1/FAILURE
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Failed to start Configuring GlusterFS in container.
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Unit gluster-setup.service entered failed state.
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: gluster-setup.service failed.



sh-4.2# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2017-01-12 06:27:20 UTC; 40s ago
  Process: 116 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=255)

Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com glusterd[116]: ERROR: failed to create logfile "/var/log/glusterfs/etc-glusterfs-glusterd.vol.log" (No such file ...rectory)
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com glusterd[116]: ERROR: failed to open logfile /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=255
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.


sh-4.2# systemctl status glusterd -l
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2017-01-12 06:27:20 UTC; 43s ago
  Process: 116 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=255)

Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com glusterd[116]: ERROR: failed to create logfile "/var/log/glusterfs/etc-glusterfs-glusterd.vol.log" (No such file or directory)
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com glusterd[116]: ERROR: failed to open logfile /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=255
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Jan 12 06:27:20 dhcp47-1.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.


sh-4.2# systemctl status rpcbind
● rpcbind.service - RPC bind service
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled)
   Active: inactive (dead)
****************************************

Comment 6 Prasanth 2017-01-12 08:22:27 UTC

# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-NODEPORT-NON-LOCAL  all  --  anywhere             anywhere             /* Ensure that non-local NodePort traffic can flow */
KUBE-FIREWALL  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             /* traffic from docker */
ACCEPT     all  --  anywhere             anywhere             /* traffic from SDN */
ACCEPT     udp  --  anywhere             anywhere             multiport dports 4789 /* 001 vxlan incoming */
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:ssh
OS_FIREWALL_ALLOW  all  --  anywhere             anywhere            
REJECT     all  --  anywhere             anywhere             reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  10.128.0.0/14        anywhere            
ACCEPT     all  --  anywhere             10.128.0.0/14       
DOCKER-ISOLATION  all  --  anywhere             anywhere            
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
REJECT     all  --  anywhere             anywhere             reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-NODEPORT-NON-LOCAL (1 references)
target     prot opt source               destination         

Chain KUBE-SERVICES (1 references)
target     prot opt source               destination         

Chain OS_FIREWALL_ALLOW (1 references)
target     prot opt source               destination         
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:10250
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:http
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:https
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:10255
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:10255
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:4789
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:24007
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:24008
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:EtherNet/IP-1
ACCEPT     tcp  --  anywhere             anywhere             state NEW multiport dports 49152:49664

Comment 7 Prasanth 2017-01-12 08:28:16 UTC

As requested while debugging the issue, I'm providing the following additional details as well from the gluster pods:

*************************************
[root@dhcp46-209 ~]# oc get pods
NAME                             READY     STATUS    RESTARTS   AGE
glusterfs-3vcx8                  0/1       Running   0          12s
glusterfs-btc16                  0/1       Running   0          12s
glusterfs-nalx7                  0/1       Running   0          12s
storage-project-router-1-z42i4   1/1       Running   1          15h


[root@dhcp46-209 ~]# oc rsh glusterfs-3vcx8
sh-4.2# df -Th
Filesystem                        Type      Size  Used Avail Use% Mounted on
/dev/dm-10                        xfs        10G  292M  9.7G   3% /
devtmpfs                          devtmpfs   24G     0   24G   0% /dev
shm                               tmpfs      64M     0   64M   0% /dev/shm
/dev/sdb1                         xfs        40G  477M   40G   2% /run
/dev/mapper/rhel_dhcp47--183-root xfs        50G  1.9G   49G   4% /etc/glusterfs
tmpfs                             tmpfs      24G   26M   24G   1% /run/lvm
tmpfs                             tmpfs      24G     0   24G   0% /sys/fs/cgroup
tmpfs                             tmpfs      24G   16K   24G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                             tmpfs     4.0E     0  4.0E   0% /tmp
tmpfs                             tmpfs     4.0E  4.0K  4.0E   1% /var/log


sh-4.2# mount
/dev/mapper/docker-8:17-33554517-e9ea3ec5b1b81940f6ec9a54390ce536b84a91273264b2f7d1ef90d69666e76f on / type xfs (rw,relatime,seclabel,nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=24637396k,nr_inodes=6159349,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
/dev/sdb1 on /run type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /etc/resolv.conf type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/mapper/rhel_dhcp47--183-root on /etc/glusterfs type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /dev/termination-log type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c474,c506",size=65536k)
/dev/sdb1 on /etc/hosts type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /etc/hostname type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
tmpfs on /run/lvm type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
/dev/sdb1 on /run/secrets type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /var/lib/heketi type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
/dev/sdb1 on /var/lib/glusterd type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /var/log/glusterfs type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /var/lib/misc/glusterfsd type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
tmpfs on /run/secrets/kubernetes.io/serviceaccount type tmpfs (ro,relatime,rootcontext=system_u:object_r:svirt_sandbox_file_t:s0,seclabel)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=4503599627370496k)
tmpfs on /var/log type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=4503599627370496k)
/dev/sdb1 on /var/log/journal/3bd7ff1f60c92b19bb03ab05ec9aae08 type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
configfs on /sys/kernel/config type configfs (rw,relatime)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)



sh-4.2# cd /etc/glusterfs
sh-4.2# ls
gluster-rsyslog-5.8.conf  gluster-rsyslog-7.2.conf  glusterd.vol  glusterfs-georep-logrotate  glusterfs-logrotate  group-virt.example  logger.conf.example
sh-4.2# ls -al
total 36
drwxr-xr-x.  2 root root  204 Jan 12 06:27 .
drwxr-xr-x. 64 root root 4096 Jan 12 08:24 ..
-rw-r--r--.  1 root root 1822 Jan 12 06:27 gluster-rsyslog-5.8.conf
-rw-r--r--.  1 root root 2564 Jan 12 06:27 gluster-rsyslog-7.2.conf
-rw-r--r--.  1 root root  358 Jan 12 06:27 glusterd.vol
-rw-r--r--.  1 root root 1001 Jan 12 06:27 glusterfs-georep-logrotate
-rw-r--r--.  1 root root  513 Jan 12 06:27 glusterfs-logrotate
-rw-r--r--.  1 root root  220 Jan 12 06:27 group-virt.example
-rw-r--r--.  1 root root  338 Jan 12 06:27 logger.conf.example
sh-4.2# 
sh-4.2# 
sh-4.2# cd /var/lib/glusterd
sh-4.2# ls -al
total 0
drwxr-xr-x.  2 root root   6 Jan 12 06:27 .
drwxr-xr-x. 20 root root 265 Jan 12 08:24 ..
*************************************

Comment 8 Humble Chirammal 2017-01-12 08:31:55 UTC

(In reply to Prasanth from comment #7)
> As requested while debugging the issue, I'm providing the following
> additional details as well from the gluster pods:
> 
> *************************************
> [root@dhcp46-209 ~]# oc get pods
> NAME                             READY     STATUS    RESTARTS   AGE
> glusterfs-3vcx8                  0/1       Running   0          12s
> glusterfs-btc16                  0/1       Running   0          12s
> glusterfs-nalx7                  0/1       Running   0          12s
> storage-project-router-1-z42i4   1/1       Running   1          15h
> 
> 
> [root@dhcp46-209 ~]# oc rsh glusterfs-3vcx8
> sh-4.2# df -Th
> Filesystem                        Type      Size  Used Avail Use% Mounted on
> /dev/dm-10                        xfs        10G  292M  9.7G   3% /
> devtmpfs                          devtmpfs   24G     0   24G   0% /dev
> shm                               tmpfs      64M     0   64M   0% /dev/shm
> /dev/sdb1                         xfs        40G  477M   40G   2% /run
> /dev/mapper/rhel_dhcp47--183-root xfs        50G  1.9G   49G   4%
> /etc/glusterfs
> tmpfs                             tmpfs      24G   26M   24G   1% /run/lvm
> tmpfs                             tmpfs      24G     0   24G   0%
> /sys/fs/cgroup
> tmpfs                             tmpfs      24G   16K   24G   1%
> /run/secrets/kubernetes.io/serviceaccount
> tmpfs                             tmpfs     4.0E     0  4.0E   0% /tmp
> tmpfs                             tmpfs     4.0E  4.0K  4.0E   1% /var/log
> 
> 


Thanks Prasanth for recording the state, yes, as we debugged bind mount is missing. We will update the details soon here.

Comment 11 Mohamed Ashiq 2017-01-12 13:56:17 UTC

Hi,

Thanks for the setup prasanth, this reduced lot of time to replicate the setup.

The Bug was in the Docker build 12(docker-1.12.5-14.el7.x86_64).

RCA:

rtalur, humble and myself saw the bindmount from host to container was failing due to some issue in docker. With which even subscription manager entitlements were not working as expected. To be more blunt, Anything bind mounted to a container under /var directory was failing and log files were cleaned up by default mount of docker which is '/var/log'. since glusterd could not find /var/log/glusterfs which is parent directory for all the log files, glusterd failed with no such file or directory.

We also verified that it is not related to systemd as we found the issue even with heketi container when we tried bind mounting some file on /var/lib/something.

Even without privileged We were able to hit this issue.


Before filling a bug against docker, prasanth told us about the build which was added to the errata today.

docker-1.12.5-14.el7.x86_64

I would request to retest the same with the above build.

--Ashiq

Comment 12 Mohamed Ashiq 2017-01-12 14:02:45 UTC

Hi,

I do not see the bind mount issue on /var on the docker-1.12.5-14.el7.x86_64 build. I still see the '/var/log' issue which is after all the startup process the /var/log is bind mounted from some tmpfs which overwrites the /var/log of the container so we do not have /var/log/glusterfs.

--Ashiq

Comment 14 Humble Chirammal 2017-01-12 17:17:30 UTC

The mounts inside the container was overshadowed by some actions from oci-systemd-hook. That said, the /var/ mounts were actually happened but later oci hook overshadow the entire /var path which cause the glusterd service to fail.  We have opened https://bugzilla.redhat.com/show_bug.cgi?id=1412728 and tracking the issue on the same bz.

Comment 17 Humble Chirammal 2017-01-13 06:24:01 UTC

This bug would be fixed with oci-systemd-hook-0.1.4-9.git671c428.el7.x86_64.rpm which is available @ https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12348885 .

Can we please try this package and try to reproduce this issue ?

Comment 18 Prasanth 2017-01-13 07:20:45 UTC

(In reply to Humble Chirammal from comment #17)
> This bug would be fixed with
> oci-systemd-hook-0.1.4-9.git671c428.el7.x86_64.rpm which is available @
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12348885 .
> 
> Can we please try this package and try to reproduce this issue ?

Thanks Humble. That's a good news!

Sure, I'll take this build now and will try to reproduce this issue again in a clean setup. I'll provide my updates here once it's done.

Comment 19 Prasanth 2017-01-13 09:52:58 UTC

(In reply to Prasanth from comment #18)
> (In reply to Humble Chirammal from comment #17)
> > This bug would be fixed with
> > oci-systemd-hook-0.1.4-9.git671c428.el7.x86_64.rpm which is available @
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12348885 .
> > 
> > Can we please try this package and try to reproduce this issue ?
> 
> Thanks Humble. That's a good news!
> 
> Sure, I'll take this build now and will try to reproduce this issue again in
> a clean setup. I'll provide my updates here once it's done.

I've quickly tried to deploy CNS after updating to this build (oci-systemd-hook-0.1.4-9.git671c428.el7) in my existing setup and looks like the deployment had gone through successfully. So looks like the reported issue is indeed fixed in this build.

Comment 20 Humble Chirammal 2017-01-13 12:03:54 UTC

(In reply to Prasanth from comment #19)

> 
> I've quickly tried to deploy CNS after updating to this build
> (oci-systemd-hook-0.1.4-9.git671c428.el7) in my existing setup and looks
> like the deployment had gone through successfully. So looks like the
> reported issue is indeed fixed in this build.

Thanks Prasanth for quick verification!

Comment 21 Prasanth 2017-01-13 14:32:48 UTC

(In reply to Prasanth from comment #18)
> (In reply to Humble Chirammal from comment #17)
> > This bug would be fixed with
> > oci-systemd-hook-0.1.4-9.git671c428.el7.x86_64.rpm which is available @
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12348885 .
> > 
> > Can we please try this package and try to reproduce this issue ?
> 
> Thanks Humble. That's a good news!
> 
> Sure, I'll take this build now and will try to reproduce this issue again in
> a clean setup. I'll provide my updates here once it's done.

Tried again in a new clean setup and the cns-deploy was successful in deploying the CNS cluster, which was failing earlier. See the output below:

############
[root@dhcp47-183 ~]# date
Fri Jan 13 19:57:46 IST 2017

[root@dhcp47-183 ~]# rpm -qa |grep oci
oci-register-machine-0-1.11.gitdd0daef.el7.x86_64
oci-systemd-hook-0.1.4-9.git671c428.el7.x86_64

[root@dhcp47-183 ~]# rpm -qa |grep docker
docker-selinux-1.10.3-57.el7.x86_64
docker-client-1.12.5-14.el7.x86_64
docker-1.12.5-14.el7.x86_64
cockpit-docker-126-1.el7.x86_64
docker-common-1.12.5-14.el7.x86_64
docker-rhel-push-plugin-1.12.5-14.el7.x86_64

[root@dhcp47-183 ~]# rpm -qa |grep atomic-openshift
atomic-openshift-clients-3.4.0.39-1.git.0.5f32f06.el7.x86_64
atomic-openshift-utils-3.4.44-1.git.0.efa61c6.el7.noarch
atomic-openshift-3.4.0.39-1.git.0.5f32f06.el7.x86_64
atomic-openshift-master-3.4.0.39-1.git.0.5f32f06.el7.x86_64
tuned-profiles-atomic-openshift-node-3.4.0.39-1.git.0.5f32f06.el7.x86_64
atomic-openshift-node-3.4.0.39-1.git.0.5f32f06.el7.x86_64
atomic-openshift-sdn-ovs-3.4.0.39-1.git.0.5f32f06.el7.x86_64

[root@dhcp47-183 ~]# rpm -qa |grep selinux
selinux-policy-targeted-3.13.1-102.el7_3.13.noarch
docker-selinux-1.10.3-57.el7.x86_64
libselinux-utils-2.5-6.el7.x86_64
libselinux-python-2.5-6.el7.x86_64
selinux-policy-3.13.1-102.el7_3.13.noarch
container-selinux-1.12.5-14.el7.x86_64
libselinux-2.5-6.el7.x86_64


# oc get all
NAME                        REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/heketi                   1          1         1         config
dc/storage-project-router   1          1         1         config

NAME                          DESIRED   CURRENT   READY     AGE
rc/heketi-1                   1         1         1         32m
rc/storage-project-router-1   1         1         1         48m

NAME            HOST/PORT                                                   PATH      SERVICES   PORT      TERMINATION
routes/heketi   heketi-storage-project.cloudapps.mystorage.com ... 1 more             heketi     <all>     

NAME                           CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
svc/heketi                     172.30.92.133    <none>        8080/TCP                  32m
svc/heketi-storage-endpoints   172.30.139.214   <none>        1/TCP                     33m
svc/storage-project-router     172.30.100.247   <none>        80/TCP,443/TCP,1936/TCP   48m

NAME                                READY     STATUS    RESTARTS   AGE
po/glusterfs-6dxca                  1/1       Running   0          46m
po/glusterfs-fempf                  1/1       Running   0          46m
po/glusterfs-moge8                  1/1       Running   0          46m
po/heketi-1-6xmt0                   1/1       Running   0          32m
po/storage-project-router-1-ctgj9   1/1       Running   0          48m


[root@dhcp47-183 ~]# heketi-cli volume list
Id:02248b8b4e4b5723765494eeb4984ac0    Cluster:e0fc406952249928d0972bd9ffb9811b    Name:heketidbstorage


[root@dhcp47-183 ~]# heketi-cli volume info 02248b8b4e4b5723765494eeb4984ac0
Name: heketidbstorage
Size: 2
Volume Id: 02248b8b4e4b5723765494eeb4984ac0
Cluster Id: e0fc406952249928d0972bd9ffb9811b
Mount: 10.70.46.178:heketidbstorage
Mount Options: backup-volfile-servers=10.70.46.198,10.70.47.82
Durability Type: replicate
Distributed+Replica: 3
#################

Comment 23 Prasanth 2017-01-13 14:37:58 UTC

Based on Comment 21 and Comment 22, marking this BZ as Verified.

PS: Will start with the rest of the dynamic provisioning testing and validation once the Stage is back online.

Comment 24 errata-xmlrpc 2017-01-18 21:58:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0148.html

Note You need to log in before you can comment on or make changes to this bug.