1417140 – glusterd: service glusterd status gives wrong glusterd info after node reboot

Bug 1417140 - glusterd: service glusterd status gives wrong glusterd info after node reboot

Summary: glusterd: service glusterd status gives wrong glusterd info after node reboot

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Atin Mukherjee
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-27 09:45 UTC by Anil Shah
Modified:	2019-03-04 06:18 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-06 16:04:29 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Anil Shah 2017-01-27 09:45:59 UTC

Description of problem:

After node reboot, service glusterd status shows service failed.
However ps -ef | grep glusterd shows glusterd process id. All glusterd command works normal.

Note: Snapshots were scheduled using scheduler.
There were 214 present in the system, out of which 100 snapshots were activated.


Version-Release number of selected component (if applicable):

2/2

How reproducible:


Steps to Reproduce:
1. Created 3*2 distributed replicated volume
2. Enabled shared storage 
3. Scheduled snapshot using scheduler 
4. Restart one of the Server node


Actual results:

service glusterd status shows wrong information

Expected results:

service glusterd status should show service as started after node reboot.

Additional info:


[root@rhs-client47 ~]# pgrep glusterd
5369

==================================================

[root@rhs-client47 ~]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Fri 2017-01-27 14:58:13 IST; 8min ago
  Process: 5359 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=killed, signal=TERM)
   CGroup: /system.slice/glusterd.service



=================================================

[root@rhs-client47 ~]# ps -ef | grep glusterd
root      5369     1  5 14:56 ?        00:00:34 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     12718     1  0 15:00 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.36.71 --volfile-id gluster_shared_storage.10.70.36.71.var-lib-glusterd-ss_brick -p /var/lib/glusterd/vols/gluster_shared_storage/run/10.70.36.71-var-lib-glusterd-ss_brick.pid -S /var/run/gluster/b4b9de5f50f03a63767ac2ff35837e9b.socket --brick-name /var/lib/glusterd/ss_brick -l /var/log/glusterfs/bricks/var-lib-glusterd-ss_brick.log --xlator-option *-posix.glusterd-uuid=d2106cc7-a569-420c-b8f1-67d0af8d9088 --brick-port 49152 --xlator-option gluster_shared_storage-server.listen-port=49152
root     12724     1  0 15:00 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.36.71 --volfile-id vol1.10.70.36.71.rhs-brick3-b6 -p /var/lib/glusterd/vols/vol1/run/10.70.36.71-rhs-brick3-b6.pid -S /var/run/gluster/d400b1c7ba59bb493bea0849a4a1f228.socket --brick-name /rhs/brick3/b6 -l /var/log/glusterfs/bricks/rhs-brick3-b6.log --xlator-option *-posix.glusterd-uuid=d2106cc7-a569-420c-b8f1-67d0af8d9088 --brick-port 49154 --xlator-option vol1-server.listen-port=49154
root     12733     1  0 15:00 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.36.71 --volfile-id /snaps/snap0/f13392d210834b358190ed819871fbe9.10.70.36.71.run-gluster-snaps-f13392d210834b358190ed819871fbe9-brick6-b6 -p /var/lib/glusterd/snaps/snap0/f13392d210834b358190ed819871fbe9/run/10.70.36.71-run-gluster-snaps-f13392d210834b358190ed819871fbe9-brick6-b6.pid -S /var/run/gluster/8e4a712cc3999ec51996a70d35afb7f5.socket --brick-name /run/gluster/snaps/f13392d210834b358190ed819871fbe9/brick6/b6 -l /var/log/glusterfs/bricks/run-gluster-snaps-f13392d210834b358190ed819871fbe9-brick6-b6.log --xlator-option *-posix.glusterd-uuid=d2106cc7-a569-420c-b8f1-67d0af8d9088 --brick-port 49156 --xlator-option f13392d210834b358190ed819871fbe9-server.listen-port=49156
root     12739     1  0 15:00 ?        00:00:01 /usr/sbin/glusterfsd -s 10.70.36.71 --volfile-id /snaps/snap0/f13392d210834b358190ed819871fbe9.10.70.36.71.run-gluster-snaps-f13392d210834b358190ed819871fbe9-brick2-b2 -p /var/lib/glusterd/snaps/snap0/f13392d210834b358190ed819871fbe9/run/10.70.36.71-run-gluster-snaps-f13392d210834b358190ed819871fbe9-brick2-b2.pid -S /var/run/gluster/b64939264020b4f34d07a6529cb2eea3.socket --brick-name /run/gluster/snaps/f13392d210834b358190ed819871fbe9/brick2/b2 -l /var/log/glusterfs/bricks/run-gluster-snaps-f13392d210834b358190ed819871fbe9-brick2-b2.log --xlator-option *-posix.glusterd-uuid=d2106cc7-a569-420c-b8f1-67d0af8d9088 --brick-port 49155 --xlator-option f13392d210834b358190ed819871fbe9-server.listen-port=49155
root     12745     1  0 15:00 ?        00:00:01 /usr/sbin/glusterfsd -s 10.70.36.71 --volfile-id /snaps/snap1/5f6b89e7b8e1413a98863d3cc046dbbe.10.70.36.71.run-gluster-snaps-5f6b89e7b8e1413a98863d3cc046dbbe-brick2-b2 -p /var/lib/glusterd/snaps/snap1/5f6b89e7b8e1413a98863d3cc046dbbe/run/10.70.36.71-run-gluster-snaps-5f6b89e7b8e1413a98863d3cc046dbbe-brick2-b2.pid -S /var/run/gluster/d27c49377a2d1fede1c5937937e585ba.socket --brick-name /run/gluster/snaps/5f6b89e7b8e1413a98863d3cc046dbbe/brick2/b2 -l /var/log/glusterfs/bricks/run-gluster-snaps-5f6b89e7b8e1413a98863d3cc046dbbe-brick2-b2.log --xlator-option *-posix.glusterd-uuid=d2106cc7-a569-420c-b8f1-67d0af8d9088 --brick-port 49157 --xlator-option 5f6b89e7b8e1413a98863d3cc046dbbe-server.listen-port=49157
root     12751     1  0 15:00 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.36.71 --volfile-id vol1.10.70.36.71.rhs-brick2-b2 -p /var/lib/glusterd/vols/vol1/run/10.70.36.71-rhs-brick2-b2.pid -S /var/run/gluster/3450cfa89579f17c22f82ab0b253c1c0.socket --brick-name /rhs/brick2/b2 -l /var/log/glusterfs/bricks/rhs-brick2-b2.log --xlator-option *-posix.glusterd-uuid=d2106cc7-a569-420c-b8f1-67d0af8d9088 --brick-port 49153 --xlator-option vol1-server.listen-port=49153

===================================================

Comment 3 Atin Mukherjee 2017-01-27 10:17:17 UTC

Every time a node reboots, glusterd has to start many bricks (because of the number of activated snapshots) one after another. This is delaying glusterd to come up on an immediate basis. "systemcts status glusterd" shows glusterd to be failed though the process is running. I assume, systemd's wait time is getting completed, before glusterd fully comes up and responds back to systemd. We won't be able to address this as its a design limitation, with brick multiplexing coming in this should improve, we should revisit this test once brick multiplexing feature is in upstream and evaluate if anything more has to be done. For now, probably exploring about systemd's wait time may give us a workaround to live with this problem.

Comment 7 Atin Mukherjee 2018-03-17 04:51:14 UTC

(In reply to Atin Mukherjee from comment #3)
> Every time a node reboots, glusterd has to start many bricks (because of the
> number of activated snapshots) one after another. This is delaying glusterd
> to come up on an immediate basis. "systemcts status glusterd" shows glusterd
> to be failed though the process is running. I assume, systemd's wait time is
> getting completed, before glusterd fully comes up and responds back to
> systemd. We won't be able to address this as its a design limitation, with
> brick multiplexing coming in this should improve, we should revisit this
> test once brick multiplexing feature is in upstream and evaluate if anything
> more has to be done. For now, probably exploring about systemd's wait time
> may give us a workaround to live with this problem.

One caveat here is snapshot bricks are not multiplexed, so even with brick multiplexing enabled, this can not be eliminated completely.

Note You need to log in before you can comment on or make changes to this bug.