Description of problem: Rebooting a node of three node CNS cluster failed to respawn gluster container hosted in that node. The gluster's trusted storage pool had 91 volumes including the heketidbstorage vol. Time of reboot: Wed Nov 16 15:33:25 IST 2016 Memory of each of the work nodes: 48GB oc get pods NAME READY STATUS RESTARTS AGE glusterfs-dc-dhcp46-118.lab.eng.blr.redhat.com-1-2mv70 1/1 Running 2 20h glusterfs-dc-dhcp46-119.lab.eng.blr.redhat.com-1-kpjch 1/1 Running 5 6d glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b 0/1 CrashLoopBackOff 12 6d heketi-1-cvuq4 1/1 Running 4 2h storage-project-router-1-4f2sv 1/1 Running 5 7d oc get nodes NAME STATUS AGE dhcp46-118.lab.eng.blr.redhat.com Ready 7d dhcp46-119.lab.eng.blr.redhat.com Ready 7d dhcp46-123.lab.eng.blr.redhat.com Ready 7d dhcp46-146.lab.eng.blr.redhat.com Ready,SchedulingDisabled 7d [root@dhcp46-146 ~]# oc describe pods/glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b Name: glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b Namespace: storage-project Security Policy: privileged Node: dhcp46-123.lab.eng.blr.redhat.com/10.70.46.123 Start Time: Wed, 09 Nov 2016 16:13:25 +0530 Labels: deployment=glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1 deploymentconfig=glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com glusterfs=pod glusterfs-node=dhcp46-123.lab.eng.blr.redhat.com name=glusterfs Status: Running IP: 10.70.46.123 Controllers: ReplicationController/glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1 Containers: glusterfs: Container ID: docker://62e10140d90f356c5e6754cef2182e4197a219ab808f494b61ce206115b1d124 Image: rhgs3/rhgs-server-rhel7 Image ID: docker://sha256:d440f833317de8c3cb96c36d49c72ff390b355df021af6eeb31a9079d5ee9d4d Port: State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 137 Started: Wed, 16 Nov 2016 15:57:57 +0530 Finished: Wed, 16 Nov 2016 15:59:37 +0530 Ready: False Restart Count: 12 Liveness: exec [/bin/bash -c systemctl status glusterd.service] delay=60s timeout=3s period=10s #success=1 #failure=3 Readiness: exec [/bin/bash -c systemctl status glusterd.service] delay=60s timeout=3s period=10s #success=1 #failure=3 Volume Mounts: /dev from glusterfs-dev (rw) /etc/glusterfs from glusterfs-etc (rw) /run from glusterfs-run (rw) /run/lvm from glusterfs-lvm (rw) /sys/fs/cgroup from glusterfs-cgroup (rw) /var/lib/glusterd from glusterfs-config (rw) /var/lib/heketi from glusterfs-heketi (rw) /var/log/glusterfs from glusterfs-logs (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-7xdw5 (ro) Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: glusterfs-heketi: Type: HostPath (bare host directory volume) Path: /var/lib/heketi glusterfs-run: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: glusterfs-lvm: Type: HostPath (bare host directory volume) Path: /run/lvm glusterfs-etc: Type: HostPath (bare host directory volume) Path: /etc/glusterfs glusterfs-logs: Type: HostPath (bare host directory volume) Path: /var/log/glusterfs glusterfs-config: Type: HostPath (bare host directory volume) Path: /var/lib/glusterd glusterfs-dev: Type: HostPath (bare host directory volume) Path: /dev glusterfs-cgroup: Type: HostPath (bare host directory volume) Path: /sys/fs/cgroup default-token-7xdw5: Type: Secret (a volume populated by a Secret) SecretName: default-token-7xdw5 QoS Class: BestEffort Tolerations: <none> Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 27m 27m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} Warning FailedSync Error syncing pod, skipping: Error response from daemon: devmapper: Unknown device ea5d28b75dfea4e8172b026c7e040e4cc55e210236b163fb52023f697adfbbf1 27m 27m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 53d5c073dd4a; Security:[seccomp=unconfined] 27m 27m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 53d5c073dd4a 25m 25m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Liveness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: activating (start) since Wed 2016-11-16 05:06:16 EST; 2s ago Control: 972 (glusterd) CGroup: /system.slice/docker-53d5c073dd4ad1194b9a8b8051fe1a499a28c06ca39dcc18af06239ba1a2107d.scope/system.slice/glusterd.service ├─ 972 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO ├─ 973 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO └─1115 /usr/bin/python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py -c /var/lib/glusterd/geo-replication/gsyncd_template.conf --config-set-rx gluster-params aux-gfid-mount acl . Nov 16 05:06:16 dhcp46-123.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... 25m 25m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Readiness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: activating (start) since Wed 2016-11-16 05:06:16 EST; 2s ago Control: 972 (glusterd) CGroup: /system.slice/docker-53d5c073dd4ad1194b9a8b8051fe1a499a28c06ca39dcc18af06239ba1a2107d.scope/system.slice/glusterd.service ├─ 972 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO ├─ 973 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO └─1091 /usr/bin/python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py -c /var/lib/glusterd/geo-replication/gsyncd_template.conf --config-set-rx gluster-command-dir /usr/sbin/ . Nov 16 05:06:16 dhcp46-123.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... 25m 25m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 53d5c073dd4a: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 25m 25m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 2ad9588b980f 25m 25m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 2ad9588b980f; Security:[seccomp=unconfined] 23m 23m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 2ad9588b980f: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 23m 23m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 4afd8abda915; Security:[seccomp=unconfined] 23m 23m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 4afd8abda915 21m 21m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 4afd8abda915: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 21m 21m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id e023f5f4a1fd 21m 21m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id e023f5f4a1fd; Security:[seccomp=unconfined] 19m 19m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id e023f5f4a1fd: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 19m 19m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 63da263bdf30; Security:[seccomp=unconfined] 19m 19m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 63da263bdf30 17m 17m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 63da263bdf30: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 17m 17m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 02aa8e2074de 17m 17m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 02aa8e2074de; Security:[seccomp=unconfined] 16m 16m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Readiness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: activating (start) since Wed 2016-11-16 05:15:36 EST; 3s ago Control: 951 (glusterd) CGroup: /system.slice/docker-02aa8e2074deee45556b88579326c42b6f0a82cef0309607ef8ccafb74f02c3f.scope/system.slice/glusterd.service ├─951 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO └─952 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO Nov 16 05:15:36 dhcp46-123.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... 16m 16m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 02aa8e2074de: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 16m 13m 14 {kubelet dhcp46-123.lab.eng.blr.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "glusterfs" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=glusterfs pod=glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" 13m 13m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 6664b7da06d5 13m 13m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 6664b7da06d5; Security:[seccomp=unconfined] 12m 12m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Readiness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: activating (start) since Wed 2016-11-16 05:20:06 EST; 3s ago Control: 912 (glusterd) CGroup: /system.slice/docker-6664b7da06d5e70a5b17e3988fef0518c955aefe29c2b72a89b81dd9e6009312.scope/system.slice/glusterd.service ├─912 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO └─913 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO Nov 16 05:20:06 dhcp46-123.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... 12m 12m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Liveness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: activating (start) since Wed 2016-11-16 05:20:06 EST; 3s ago Control: 912 (glusterd) CGroup: /system.slice/docker-6664b7da06d5e70a5b17e3988fef0518c955aefe29c2b72a89b81dd9e6009312.scope/system.slice/glusterd.service ├─912 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO └─913 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO Nov 16 05:20:06 dhcp46-123.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... 11m 11m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 6664b7da06d5: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 6m 6m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 77ce5a592c24 6m 6m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 77ce5a592c24; Security:[seccomp=unconfined] 27m 4m 9 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Pulling pulling image "rhgs3/rhgs-server-rhel7" 4m 4m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 77ce5a592c24: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 27m 4m 9 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Pulled Successfully pulled image "rhgs3/rhgs-server-rhel7" 4m 4m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 62e10140d90f; Security:[seccomp=unconfined] 4m 4m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 62e10140d90f 26m 3m 9 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Liveness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: inactive (dead) 26m 3m 9 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Readiness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: inactive (dead) 2m 2m 1 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 62e10140d90f: pod "glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" container "glusterfs" is unhealthy, it will be killed and re-created. 11m 11s 37 {kubelet dhcp46-123.lab.eng.blr.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "glusterfs" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=glusterfs pod=glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-ylm2b_storage-project(57ac768f-a669-11e6-a52d-005056b380ec)" 16m 11s 51 {kubelet dhcp46-123.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning BackOff Back-off restarting failed docker container Version-Release number of selected component (if applicable): openshift version openshift v3.4.0.23+24b1a58 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 How reproducible: Reproducible depending on the number of volumes in the system. More the volumes, higher the chance of reproducibility Steps to Reproduce: 1. create 3 node CNS cluster 2. create around 90 volumes - by 90 pvc requests 3. Reboot the three nodes one by one. i.e., reboot a node, wait for the node to turn up. check if gluster and other containers are spawned successfully. proceed with next node's reboot. When this test was run, node 'dhcp46-119.lab.eng.blr.redhat.com' turned up without any issues. However, gluster container in node 'dhcp46-123.lab.eng.blr.redhat.com' failed to spawn successfully. Actual results: gluster container in node 'dhcp46-123.lab.eng.blr.redhat.com' failed to spawn successfully. Expected results: Node reboot should spawn all the containers hosted by the node successfully. Additional info: I suspect that the liveliness probe of 60 seconds to be the reason behind failure of gluster pod to be respun. With more number of volumes, time taken to create the brick process increases and eventually glusterd process doesn't start before 60 seconds, resulting in container being killed and the complete process getting started once again. I'll leave it to dev to either take this theory or see if there is some other reason. Logs shall be attached shortly.
Created attachment 1221104 [details] oc_describe_glusterfs-dc-dhcp46-123.output
Created attachment 1221105 [details] heketi_volumelist.output
Created attachment 1221106 [details] oc_get_pvc.output
Created attachment 1221107 [details] oc_get_events.output
Karthick, thanks for the detailed bug report. :) I am yet to check the logs, however I have couple of questions. iic, all three nodes has same number of brick processes and same hardware configuration and only one of the node is failing. Is that correct? Secondly, can you confirm this issue does not pop up when you reduce the number of bricks in your setup ? say 70-80 volumes ? The pointer about the probe timeout is valid, it was 'kind of' working for last release volume scaling test. But we could definitely rethink about it.
(In reply to Humble Chirammal from comment #7) > Karthick, thanks for the detailed bug report. :) > > I am yet to check the logs, however I have couple of questions. > > iic, all three nodes has same number of brick processes and same hardware > configuration and only one of the node is failing. Is that correct? That's right. > > Secondly, can you confirm this issue does not pop up when you reduce the > number of bricks in your setup ? say 70-80 volumes ? Unfortunately, trying that won't be possible. We don't have bandwidth :) It takes a good amount of time to setup and run this test. When the container crashes it again takes time to get back the setup in a working state. But I've seen this issue when volumes were 137 and 175 and the issue was seen both the times with 100% hit rate. Hope this information helps. > The pointer about the probe timeout is valid, it was 'kind of' working for > last release volume scaling test. But we could definitely rethink about it.
(In reply to krishnaram Karthick from comment #10) > (In reply to krishnaram Karthick from comment #9) > > Given that there is a possibility of this issue being seen in previous > > release of CNS, it would be nice if we come up with workaround to get out of > > this issue to help any existing customers with scaled volume. > > Found a workaround, > > 1) delete the deployment config and pod which is in failed state > > # oc delete dc/glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com > # oc delete pods/glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com > > 2) Delete the glusterfs template > > # oc get templates > NAME DESCRIPTION PARAMETERS OBJECTS > glusterfs GlusterFS container deployment template 1 (1 blank) 1 > heketi Heketi service deployment template 8 (7 blank) 3 > > # oc delete templates/glusterfs > > 3) Edit the /usr/share/heketi/templates/glusterfs-template.json file to > increase liveliness probe from 60 seconds to a higher value. (I used 90 > seconds) > > 4) create the glusterfs template again > > # oc create -f /usr/share/heketi/templates/glusterfs-template.json > > 5) Deploy RHGS container > > # oc process glusterfs -v GLUSTERFS_NODE=dhcp46-123.lab.eng.blr.redhat.com | > oc create -f - > > 6) Wait for glusterfs pod to be up and running > > oc get pods > NAME READY STATUS > RESTARTS AGE > glusterfs-dc-dhcp46-118.lab.eng.blr.redhat.com-1-2mv70 1/1 Running > 2 1d > glusterfs-dc-dhcp46-119.lab.eng.blr.redhat.com-1-kpjch 1/1 Running > 5 7d > glusterfs-dc-dhcp46-123.lab.eng.blr.redhat.com-1-t5swh 1/1 Running > 1 12m > heketi-1-cvuq4 1/1 Running > 4 19h > storage-project-router-1-4f2sv 1/1 Running > 5 7d > > I'd once again wait for dev to approve these steps before sharing it. Karthick, when you said, its difficult to setup, I was planning to ask you to edit the template for ~100s and deploy it again. We are increasing the template timeout to "100" , thats what we planned yesterday (https://github.com/heketi/heketi/pull/576) in next heketi build. We know we had a scale test ~70 volumes went without issues on "60s", as the volumes are more, it is good to increase to some more higher. The only issue with increasing this value is that, if there are not much volumes, the probe may be delayed unwantedly. However considering we are a storage container, it looks fine to me.
@Karthick, the timeout is now 100s as discussed here. The new values is available in latest heketi build mentioned here: New build with heketi-3.1.0.5, rhgs-volmanager-docker:3.1.3-19 is available. Brew Task link : https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=528359.
Timeout has been increased to 100 seconds. containers: - image: rhgs3/rhgs-server-rhel7:3.1.3-16 imagePullPolicy: Always livenessProbe: exec: command: - /bin/bash - -c - systemctl status glusterd.service failureThreshold: 3 initialDelaySeconds: 100 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 3 name: glusterfs readinessProbe: exec: command: - /bin/bash - -c - systemctl status glusterd.service failureThreshold: 3 initialDelaySeconds: 100 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 3 resources: {} securityContext: capabilities: {} privileged: true And the issue reported in this bug is no more seen with the scale tests anymore.
@Humble, How are we handling this change for upgrades from 3.3? Are we documenting the steps to change the value from 60 to 100 if this change is not handled automatically?
(In reply to krishnaram Karthick from comment #17) > @Humble, How are we handling this change for upgrades from 3.3? Are we > documenting the steps to change the value from 60 to 100 if this change is > not handled automatically? The new settings will be available in new template we ship with 3.4. So, its automatically taken care with the upgrade.
Thanks for the update Humble, I'll move the bug to verified as there is no doc text required based on C#18.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0148.html