1479355 – make sure we bring glusterd up before gluster-blockd and rest of the services comesup

Bug 1479355 - make sure we bring glusterd up before gluster-blockd and rest of the services comesup

Summary: make sure we bring glusterd up before gluster-blockd and rest of the services...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-block
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Prasanna Kumar Kalever
QA Contact:	Sweta Anandpara
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1417151
TreeView+	depends on / blocked

Reported:	2017-08-08 12:20 UTC by Prasanna Kumar Kalever
Modified:	2017-09-21 04:20 UTC (History)
CC List:	8 users (show)
Fixed In Version:	gluster-block-0.2.1-8.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-21 04:20:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:2773	0	normal	SHIPPED_LIVE	new packages: gluster-block	2017-09-21 08:16:22 UTC

Description Prasanna Kumar Kalever 2017-08-08 12:20:01 UTC

Description of problem:

In a node reboot (for example) case if glusterd has not comeup before gluster-blockd service we see a lot of failures due to storage absence. 

Ideally in can be any case when the gluster-blockd is brought up before glusterd

Comment 8 Sweta Anandpara 2017-08-28 06:51:56 UTC

Scenario 1:
* Stop glusterd - that results in gluster-blockd service going into inactive state
* Start/restart gluster-blockd - that fails with dependency error (as expected)

Scenario 2:
* Stop gluster-block-target - which results in gluster-blockd service again going to inactive state
* Stop glusterd
* Start/restart gluster-blockd - that fails with dependency. Tcmu-runner, gluster-block-target, glusterd - all remain down
* Start glusterd - tcmu-runner, gluster-block-target, gluster-blockd all remain down
* Start gluster-blockd - all the mentioned services come up successfully

Scenario 3:
* Stop glusterd - that results in gluster-blockd service going to inactive state
* Start/restart gluster-blockd - that fails with dependency error (as expected)
* Start glusterd - glusterd comes up. Gluster-blockd continues to remain down
* Start gluster-blockd - that gets gluster-blockd up

Have tested the above mentioned scenarios and multiple permutations of the services. All justify and prove the order of gluster-blockd dependent on tcmu-runner, tcmu-runner dependent on gluster-block-target, which in turn is dependent on glusterd.

One last question, before I move this bug to verified:

Scenario 3, step3: If glusterd is brought up, are we expecting gluster-blockd to automatically come up? In other words, after we do a start/restart in step2, should we make gluster-blockd service check for glusterd at regular intervals, so that gluster-blockd can get itself back online as soon as it sees glusterd up? It presently doesn't..

Prasanna/Atin, please ignore the question of comment7. Keeping the need_info on this bug for the query mentioned above.

Comment 9 Prasanna Kumar Kalever 2017-08-28 07:58:01 UTC

sweta, 

Is it expected to start gluster-blockd when glusterd is brought back ?

If there is such a requirement then we should explore Wanted= or PartOf= options in systemd units.

Comment 10 Prasanna Kumar Kalever 2017-08-28 09:28:50 UTC

Just to confirm that we understand it right, 

Add "WantedBy=glusterd.service" to [Unit] section of gluster-blockd.service.

The modified unit looks like

#cat /usr/lib/systemd/system/gluster-blockd.service
[Unit]
Description=Gluster block storage utility
BindsTo=tcmu-runner.service rpcbind.service
After=tcmu-runner.service rpcbind.service
WantedBy=glusterd.service

[Service]
Type=simple
Environment="GB_GLFS_LRU_COUNT=5"
Environment="GB_LOG_LEVEL=INFO"
EnvironmentFile=-/etc/sysconfig/gluster-blockd
ExecStart=/usr/sbin/gluster-blockd --glfs-lru-count $GB_GLFS_LRU_COUNT --log-level $GB_LOG_LEVEL $GB_EXTRA_ARGS
KillMode=process

[Install]
WantedBy=multi-user.target


Should give you what you are asking for, but we might need a justification why we would need this.

Comment 11 Sweta Anandpara 2017-08-28 11:32:24 UTC

My opinion - it is nice to have. At /this/ stage of the release? We can live with it for now, unless it becomes a bigger problem in CNS

Karthick/Humble, thoughts? If glusterd goes down, does the entire pod go down? If yes, then we might not hit this scenario at all. If no, then please guide/reply to comment9. 

I will be moving this bug to verified if this is acceptable in the CNS environment. A new bug can be raised (if needed) for the new change.

Comment 12 Humble Chirammal 2017-08-28 17:20:27 UTC

(In reply to Sweta Anandpara from comment #11)
> My opinion - it is nice to have. At /this/ stage of the release? We can live
> with it for now, unless it becomes a bigger problem in CNS
> 
> Karthick/Humble, thoughts? If glusterd goes down, does the entire pod go
> down? If yes, then we might not hit this scenario at all. If no, then please
> guide/reply to comment9. 

Yes, if glusterd is down, the pod is restarted.

> 
> I will be moving this bug to verified if this is acceptable in the CNS
> environment. A new bug can be raised (if needed) for the new change.

Comment 15 errata-xmlrpc 2017-09-21 04:20:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2773

Note You need to log in before you can comment on or make changes to this bug.