1598322 – delay gluster-blockd start until all bricks come up

Bug 1598322 - delay gluster-blockd start until all bricks come up

Summary: delay gluster-blockd start until all bricks come up

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-block
Sub Component:
Version:	cns-3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	CNS 3.10
Assignee:	Pranith Kumar K
QA Contact:	Nitin Goyal
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1613073 (view as bug list)
Depends On:	1560418 1610787
Blocks:	1568862 1570976
TreeView+	depends on / blocked

Reported:	2018-07-05 06:45 UTC by Prasanna Kumar Kalever
Modified:	2018-09-12 09:28 UTC (History)
CC List:	15 users (show)
Fixed In Version:	gluster-block-0.2.1-24.el7rhgs
Doc Type:	Bug Fix
Doc Text:	Previously, the gluster-block daemon depended on glusterd to be running and did not verify if the block hosting volumes were online and ready to be consumed before beginning its operations. This resulted in failures when the gluster-block daemon attempted to load target configuration. With this update, gluster-block daemon now waits for block hosting volumes bricks to be available before attempting to load target configuration.
Clone Of:
Clones:	1598353 (view as bug list)
Environment:
Last Closed:	2018-09-12 09:27:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:2691	0	None	None	None	2018-09-12 09:28:22 UTC

Description Prasanna Kumar Kalever 2018-07-05 06:45:03 UTC

Description of problem:

At the moment gluster-blockd is started only if glusterd is online (pre-requisite), but that doesn't guaranty the volume is fully ready/functional for gluster-blockd to start consuming the volume. When the bricks are down, starting gluster-blockd will result in failures while loading the targets, which is a severe issue, in-turn resulting in failing a path/portal for the block device.

We need a mechanism at gluster-blockd to make-sure that the bricks come online before starting gluster-blockd. May be we can delay the gluster-blockd bring-up until some timeout, giving some time for bricks to come-up by then.

Comment 5 Pranith Kumar K 2018-07-12 06:11:02 UTC

https://github.com/gluster/gluster-block/pull/98

Comment 9 Nitin Goyal 2018-08-01 05:26:46 UTC

Hi Pranith,

I was working on this bug, so i deleted one gluster pod out of 3 pods to verify that gluster-blockd is coming up after glusterd and all bricks. after some time when it came up everything (glusterd and all brick processes) was up except gluster-blockd on that gluster pod. gluster-blockd is not coming up with error. 


sh-4.2# systemctl status gluster-blockd -l
● gluster-blockd.service - Gluster block storage utility
   Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Gluster block storage utility.
Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Job gluster-blockd.service/start failed with result 'dependency'.


sh-4.2# systemctl status glusterd -l
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-07-31 13:01:20 UTC; 58min ago
  Process: 27534 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 27537 (glusterd)
   CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4ccc22aa_94c1_11e8_8bae_005056a5f2d4.slice/docker-028937c3bdc69b3af7bd5a67bfa26254edd155a755d6aa386eff7aad98874365.scope/system.slice/glusterd.service
           ├─ 9723 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/07a4ac5d617c977cec7f2014901b113b.socket --xlator-option *replicate*.node-uuid=4c78819d-dc92-4898-b0db-fb3f6c17aeab
           ├─10085 /usr/sbin/glusterfsd -s 10.70.46.217 --volfile-id heketidbstorage.10.70.46.217.var-lib-heketi-mounts-vg_2dbaf4459b1591a754aa69f4b9c1ae41-brick_0d127fc5efe4a251b5534a2df36734b0-brick -p /var/run/gluster/vols/heketidbstorage/10.70.46.217-var-lib-heketi-mounts-vg_2dbaf4459b1591a754aa69f4b9c1ae41-brick_0d127fc5efe4a251b5534a2df36734b0-brick.pid -S /var/run/gluster/f4a121376ae22851db2fbbc5f84a18d7.socket --brick-name /var/lib/heketi/mounts/vg_2dbaf4459b1591a754aa69f4b9c1ae41/brick_0d127fc5efe4a251b5534a2df36734b0/brick -l /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_2dbaf4459b1591a754aa69f4b9c1ae41-brick_0d127fc5efe4a251b5534a2df36734b0-brick.log --xlator-option *-posix.glusterd-uuid=4c78819d-dc92-4898-b0db-fb3f6c17aeab --brick-port 49152 --xlator-option heketidbstorage-server.listen-port=49152
           ├─10289 /usr/sbin/glusterfsd -s 10.70.46.217 --volfile-id vol110.10.70.46.217.var-lib-heketi-mounts-vg_2dbaf4459b1591a754aa69f4b9c1ae41-brick_46538c9223275aaff1e0680461f04eab-brick -p /var/run/gluster/vols/vol110/10.70.46.217-var-lib-heketi-mounts-vg_2dbaf4459b1591a754aa69f4b9c1ae41-brick_46538c9223275aaff1e0680461f04eab-brick.pid -S /var/run/gluster/efa7e06610767d27a6aa0e8f32c8f4af.socket --brick-name /var/lib/heketi/mounts/vg_2dbaf4459b1591a754aa69f4b9c1ae41/brick_46538c9223275aaff1e0680461f04eab/brick -l /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_2dbaf4459b1591a754aa69f4b9c1ae41-brick_46538c9223275aaff1e0680461f04eab-brick.log --xlator-option *-posix.glusterd-uuid=4c78819d-dc92-4898-b0db-fb3f6c17aeab --brick-port 49153 --xlator-option vol110-server.listen-port=49153
           └─27537 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Jul 31 13:00:54 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Jul 31 13:01:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.


brick processes for bricks
sh-4.2# gluster v status | grep 10289 | wc -l (id from that pod where gluster-blockd is not coming up)
292
sh-4.2# gluster v status | grep 4835 | wc -l
292
sh-4.2# gluster v status | grep 4799 | wc -l
292

sosreports ->
http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1598322/

Pranith,
Can u pls debug weather it is because of the same bug or do i need to raise a new bug.

Comment 10 Nitin Goyal 2018-08-01 05:59:32 UTC

from further analysis it is seen that the tcmu-runner service has failed to start upon gluster pod restart.


sh-4.2# ps -aux | grep Ds
root      1516  0.0  0.0 584520 19656 ?        Ds   Jul31   0:00 /usr/bin/tcmu-runner --tcmu-log-dir /var/log/glusterfs/gluster-block
root     27770  0.0  0.0   9088   660 pts/4    S+   05:56   0:00 grep Ds
sh-4.2# 
sh-4.2# systemctl status tcmu-runner -l
● tcmu-runner.service - LIO Userspace-passthrough daemon
   Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; static; vendor preset: disabled)
   Active: failed (Result: timeout) since Tue 2018-07-31 13:14:20 UTC; 16h ago
  Process: 1501 ExecStartPre=/usr/libexec/gluster-block/wait-for-bricks.sh 120 (code=exited, status=1/FAILURE)
 Main PID: 1516
   CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4ccc22aa_94c1_11e8_8bae_005056a5f2d4.slice/docker-028937c3bdc69b3af7bd5a67bfa26254edd155a755d6aa386eff7aad98874365.scope/system.slice/tcmu-runner.service
           └─1516 /usr/bin/tcmu-runner --tcmu-log-dir /var/log/glusterfs/gluster-block

Jul 31 13:01:40 dhcp46-217.lab.eng.blr.redhat.com tcmu-runner[1516]: 2018-07-31 13:01:40.044 1516 [ERROR] add_device:516 : could not open /dev/uio1
Jul 31 13:01:40 dhcp46-217.lab.eng.blr.redhat.com tcmu-runner[1516]: add_device:516 : could not open /dev/uio1
Jul 31 13:01:40 dhcp46-217.lab.eng.blr.redhat.com tcmu-runner[1516]: 2018-07-31 13:01:40.044 1516 [ERROR] add_device:516 : could not open /dev/uio10
Jul 31 13:01:40 dhcp46-217.lab.eng.blr.redhat.com tcmu-runner[1516]: add_device:516 : could not open /dev/uio10
Jul 31 13:11:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: tcmu-runner.service start operation timed out. Terminating.
Jul 31 13:12:50 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: tcmu-runner.service stop-final-sigterm timed out. Killing.
Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: tcmu-runner.service still around after final SIGKILL. Entering failed mode.
Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Failed to start LIO Userspace-passthrough daemon.
Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Unit tcmu-runner.service entered failed state.
Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: tcmu-runner.service failed.
sh-4.2# 
sh-4.2# systemctl status gluster-blockd -l
● gluster-blockd.service - Gluster block storage utility
   Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Gluster block storage utility.
Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Job gluster-blockd.service/start failed with result 'dependency'.
sh-4.2# 
sh-4.2# systemctl status gluster-block-target -l
● gluster-block-target.service - Restore LIO kernel target configuration
   Loaded: loaded (/usr/lib/systemd/system/gluster-block-target.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Restore LIO kernel target configuration.
Jul 31 13:14:20 dhcp46-217.lab.eng.blr.redhat.com systemd[1]: Job gluster-block-target.service/start failed with result 'dependency'.

Comment 11 Nitin Goyal 2018-08-01 06:02:06 UTC

All packages from gluster pod

sh-4.2# uname -r
3.10.0-862.11.2.el7.x86_64
sh-4.2# 
sh-4.2# rpm -qa | grep gluster
glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64
glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-api-3.8.4-54.15.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64
glusterfs-server-3.8.4-54.15.el7rhgs.x86_64
gluster-block-0.2.1-23.el7rhgs.x86_64
sh-4.2# 
sh-4.2# rpm -qa | grep tcmu
libtcmu-1.2.0-23.el7rhgs.x86_64
tcmu-runner-1.2.0-23.el7rhgs.x86_64
sh-4.2#

Comment 12 Pranith Kumar K 2018-08-01 08:10:26 UTC

(In reply to Nitin Goyal from comment #11)
> All packages from gluster pod
> 
> sh-4.2# uname -r
> 3.10.0-862.11.2.el7.x86_64
> sh-4.2# 
> sh-4.2# rpm -qa | grep gluster
> glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64
> glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64
> glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64
> glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64
> glusterfs-3.8.4-54.15.el7rhgs.x86_64
> glusterfs-api-3.8.4-54.15.el7rhgs.x86_64
> glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64
> glusterfs-server-3.8.4-54.15.el7rhgs.x86_64
> gluster-block-0.2.1-23.el7rhgs.x86_64
> sh-4.2# 
> sh-4.2# rpm -qa | grep tcmu
> libtcmu-1.2.0-23.el7rhgs.x86_64
> tcmu-runner-1.2.0-23.el7rhgs.x86_64
> sh-4.2#

If you execute "/usr/libexec/gluster-block/wait-for-bricks.sh 120" what is the behavior? The script should exit within 2 minutes. Is that happening?

Comment 13 Nitin Goyal 2018-08-01 09:13:45 UTC

> If you execute "/usr/libexec/gluster-block/wait-for-bricks.sh 120" what is
> the behavior? The script should exit within 2 minutes. Is that happening?

sh-4.2# time ./usr/libexec/gluster-block/wait-for-bricks.sh 120

real	0m0.008s
user	0m0.003s
sys	0m0.006s
sh-4.2# time /usr/libexec/gluster-block/wait-for-bricks.sh 120

real	0m0.010s
user	0m0.005s
sys	0m0.005s

I have setup please let me know if u need setup.

Comment 15 Pranith Kumar K 2018-08-01 11:14:11 UTC

(In reply to Nitin Goyal from comment #13)
> > If you execute "/usr/libexec/gluster-block/wait-for-bricks.sh 120" what is
> > the behavior? The script should exit within 2 minutes. Is that happening?
> 
> sh-4.2# time ./usr/libexec/gluster-block/wait-for-bricks.sh 120
> 
> real	0m0.008s
> user	0m0.003s
> sys	0m0.006s
> sh-4.2# time /usr/libexec/gluster-block/wait-for-bricks.sh 120
> 
> real	0m0.010s
> user	0m0.005s
> sys	0m0.005s
> 
> I have setup please let me know if u need setup.

Based on this, it doesn't look like the issue we were trying to solve as part of this bug.

Comment 16 Nitin Goyal 2018-08-01 13:05:12 UTC

Qe can not verify this as of now because we are blocked on other bug "1610787" when we will get fix for that we will verify it.

Comment 27 Pranith Kumar K 2018-08-03 13:03:50 UTC

We need the following changes to make it work inside container:
sh-4.2# diff wait-for-bricks1.sh /usr/libexec/gluster-block/wait-for-bricks.sh 
119c119
<         if ! systemctl is-active --quiet glusterd.service > /dev/null 2>&1
---
>         if ! pidof glusterd > /dev/null 2>&1


Worked as expected after this change:
[2018-08-03 12:55:24] WARNING: Timeout Expired, bricks of volumes:"vol11 (3/3), vol12 (3/3), vol13 (3/3), vol14 (3/3), vol15 (3/3), vol16 (3/3), vol17 (3/3), vol18 (3/3), vol19 (3/3), vol20 (3/3)" are yet to come online

Comment 28 Prasanna Kumar Kalever 2018-08-03 13:07:42 UTC

(In reply to Pranith Kumar K from comment #27)
> We need the following changes to make it work inside container:
> sh-4.2# diff wait-for-bricks1.sh
> /usr/libexec/gluster-block/wait-for-bricks.sh 
> 119c119
> <         if ! systemctl is-active --quiet glusterd.service > /dev/null 2>&1
> ---
> >         if ! pidof glusterd > /dev/null 2>&1

Just an intermediate opinion, as I'm not sure if the above suggested command is portable across ...

How about using 'ps -aux | grep -w glusterd' or something ps command oriented ?

Thanks!

> 
> 
> Worked as expected after this change:
> [2018-08-03 12:55:24] WARNING: Timeout Expired, bricks of volumes:"vol11
> (3/3), vol12 (3/3), vol13 (3/3), vol14 (3/3), vol15 (3/3), vol16 (3/3),
> vol17 (3/3), vol18 (3/3), vol19 (3/3), vol20 (3/3)" are yet to come online

Comment 29 Pranith Kumar K 2018-08-03 13:11:15 UTC

(In reply to Prasanna Kumar Kalever from comment #28)
> (In reply to Pranith Kumar K from comment #27)
> > We need the following changes to make it work inside container:
> > sh-4.2# diff wait-for-bricks1.sh
> > /usr/libexec/gluster-block/wait-for-bricks.sh 
> > 119c119
> > <         if ! systemctl is-active --quiet glusterd.service > /dev/null 2>&1
> > ---
> > >         if ! pidof glusterd > /dev/null 2>&1
> 
> Just an intermediate opinion, as I'm not sure if the above suggested command
> is portable across ...
> 
> How about using 'ps -aux | grep -w glusterd' or something ps command
> oriented ?
> 
> Thanks!
> 
> > 
> > 
> > Worked as expected after this change:
> > [2018-08-03 12:55:24] WARNING: Timeout Expired, bricks of volumes:"vol11
> > (3/3), vol12 (3/3), vol13 (3/3), vol14 (3/3), vol15 (3/3), vol16 (3/3),
> > vol17 (3/3), vol18 (3/3), vol19 (3/3), vol20 (3/3)" are yet to come online

It matches quite a few processes including its own.

sh-4.2# ps -aux | grep -w glusterd
root       609  0.0  0.0 983172 21668 ?        Ssl  06:37   0:03 /usr/sbin/glusterfsd -s 10.70.47.165 --volfile-id heketidbstorage.10.70.47.165.var-lib-heketi-mounts-vg_7ce5bebe83af0d394eb711c2249ba339-brick_de0182db142b42a6eca1f7c18a328d27-brick -p /var/run/gluster/vols/heketidbstorage/10.70.47.165-var-lib-heketi-mounts-vg_7ce5bebe83af0d394eb711c2249ba339-brick_de0182db142b42a6eca1f7c18a328d27-brick.pid -S /var/run/gluster/11063b292f8e975e04045d4a4db135b0.socket --brick-name /var/lib/heketi/mounts/vg_7ce5bebe83af0d394eb711c2249ba339/brick_de0182db142b42a6eca1f7c18a328d27/brick -l /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_7ce5bebe83af0d394eb711c2249ba339-brick_de0182db142b42a6eca1f7c18a328d27-brick.log --xlator-option *-posix.glusterd-uuid=f62a850a-9b26-4f5f-9870-7a9d25f2a04d --brick-port 49152 --xlator-option heketidbstorage-server.listen-port=49152
root      1516  0.0  0.0   9088   664 pts/2    S+   13:09   0:00 grep -w glusterd
root     11151  0.2  0.0 504116 20284 ?        Ssl  10:43   0:25 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

(I don't think these comments need to be private)

Comment 30 Nitin Goyal 2018-08-03 13:15:11 UTC

based on comment 27 marking this as Failed Qa

Comment 31 Pranith Kumar K 2018-08-03 13:19:54 UTC

(In reply to Pranith Kumar K from comment #29)
> (In reply to Prasanna Kumar Kalever from comment #28)
> > (In reply to Pranith Kumar K from comment #27)
> > > We need the following changes to make it work inside container:
> > > sh-4.2# diff wait-for-bricks1.sh
> > > /usr/libexec/gluster-block/wait-for-bricks.sh 
> > > 119c119
> > > <         if ! systemctl is-active --quiet glusterd.service > /dev/null 2>&1
> > > ---
> > > >         if ! pidof glusterd > /dev/null 2>&1
> > 
> > Just an intermediate opinion, as I'm not sure if the above suggested command
> > is portable across ...
> > 
> > How about using 'ps -aux | grep -w glusterd' or something ps command
> > oriented ?
> > 
> > Thanks!
> > 
> > > 
> > > 
> > > Worked as expected after this change:
> > > [2018-08-03 12:55:24] WARNING: Timeout Expired, bricks of volumes:"vol11
> > > (3/3), vol12 (3/3), vol13 (3/3), vol14 (3/3), vol15 (3/3), vol16 (3/3),
> > > vol17 (3/3), vol18 (3/3), vol19 (3/3), vol20 (3/3)" are yet to come online
> 
> It matches quite a few processes including its own.
> 
> sh-4.2# ps -aux | grep -w glusterd
> root       609  0.0  0.0 983172 21668 ?        Ssl  06:37   0:03
> /usr/sbin/glusterfsd -s 10.70.47.165 --volfile-id
> heketidbstorage.10.70.47.165.var-lib-heketi-mounts-
> vg_7ce5bebe83af0d394eb711c2249ba339-brick_de0182db142b42a6eca1f7c18a328d27-
> brick -p
> /var/run/gluster/vols/heketidbstorage/10.70.47.165-var-lib-heketi-mounts-
> vg_7ce5bebe83af0d394eb711c2249ba339-brick_de0182db142b42a6eca1f7c18a328d27-
> brick.pid -S /var/run/gluster/11063b292f8e975e04045d4a4db135b0.socket
> --brick-name
> /var/lib/heketi/mounts/vg_7ce5bebe83af0d394eb711c2249ba339/
> brick_de0182db142b42a6eca1f7c18a328d27/brick -l
> /var/log/glusterfs/bricks/var-lib-heketi-mounts-
> vg_7ce5bebe83af0d394eb711c2249ba339-brick_de0182db142b42a6eca1f7c18a328d27-
> brick.log --xlator-option
> *-posix.glusterd-uuid=f62a850a-9b26-4f5f-9870-7a9d25f2a04d --brick-port
> 49152 --xlator-option heketidbstorage-server.listen-port=49152
> root      1516  0.0  0.0   9088   664 pts/2    S+   13:09   0:00 grep -w
> glusterd
> root     11151  0.2  0.0 504116 20284 ?        Ssl  10:43   0:25
> /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
> 
> (I don't think these comments need to be private)

We can do one thing, we will check for the presence of pidof and systemctl CLI before using these commands to figure out if glusterd is running or not. Either one of these commands should exist for the script to be useful. Otherwise let us bail saying the script needs one of these two. Thoughts?

Comment 36 Pranith Kumar K 2018-08-06 11:18:17 UTC

https://github.com/gluster/gluster-block/pull/109

Comment 37 Prasanna Kumar Kalever 2018-08-08 06:47:11 UTC

*** Bug 1613073 has been marked as a duplicate of this bug. ***

Comment 39 Nitin Goyal 2018-08-28 13:59:55 UTC

I verified this bug on below rpms ->

sh-4.2# rpm -qa | grep gluster
glusterfs-client-xlators-3.12.2-18.el7rhgs.x86_64
glusterfs-cli-3.12.2-18.el7rhgs.x86_64
glusterfs-fuse-3.12.2-18.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-18.el7rhgs.x86_64
gluster-block-0.2.1-25.el7rhgs.x86_64
glusterfs-libs-3.12.2-18.el7rhgs.x86_64
glusterfs-3.12.2-18.el7rhgs.x86_64
glusterfs-api-3.12.2-18.el7rhgs.x86_64
python2-gluster-3.12.2-18.el7rhgs.x86_64
glusterfs-server-3.12.2-18.el7rhgs.x86_64
sh-4.2# rpm -qa | grep tcmu-runner
tcmu-runner-1.2.0-24.el7rhgs.x86_64


When bricks were down on the same node where i am stopping tcmu-runner it was taking exact 2 minutes to come up

sh-4.2# systemctl stop tcmu-runner; systemctl is-active tcmu-runner; systemctl is-active gluster-blockd; echo `date`; systemctl start gluster-blockd; echo `date`
inactive
inactive
Tue Aug 28 19:16:57 UTC 2018
Tue Aug 28 19:18:59 UTC 2018


When bricks were down on the another node again it was taking near about 2 minutes (hence it is monitoring all the brick processes)

sh-4.2# systemctl stop tcmu-runner; systemctl is-active tcmu-runner; systemctl is-active gluster-blockd; echo `date`; systemctl start gluster-blockd; echo `date`
inactive
inactive
Tue Aug 28 19:28:03 UTC 2018
Tue Aug 28 19:30:06 UTC 2018 


When bricks was already up on all nodes it was taking just 1 second to come up

sh-4.2# systemctl stop tcmu-runner; systemctl is-active tcmu-runner; systemctl is-active gluster-blockd; echo `date`; systemctl start gluster-blockd; echo `date`
inactive
inactive
Tue Aug 28 19:22:56 UTC 2018
Tue Aug 28 19:22:57 UTC 2018


Hence marking this as verified.

Comment 45 errata-xmlrpc 2018-09-12 09:27:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2691

Note You need to log in before you can comment on or make changes to this bug.