1417097 – glusterd: shared storage volume didn't get mount after node reboot.

Bug 1417097 - glusterd: shared storage volume didn't get mount after node reboot.

Summary: glusterd: shared storage volume didn't get mount after node reboot.

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Sunny Kumar
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:	shared-storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-27 08:49 UTC by Anil Shah
Modified:	2019-10-21 07:39 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	glusterd takes time to initialize if the setup is slow. As a result, by the time /etc/fstab entries are mounted, glusterd on the node is not ready to serve that mount, and the glusterd mount fails. Due to this, shared storage may not get mounted after node reboots. Workaround: If shared storage is not mounted after the node reboots, check if glusterd is up and mount the shared storage volume manually.
Clone Of:
Environment:
Last Closed:	2018-10-30 16:25:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Anil Shah 2017-01-27 08:49:24 UTC

Description of problem:

After node reboot, shared storage brick didn't get mount after node reboot.

Note: Snapshot were scheduled using scheduler.
There were 214 present in the system, out of which 100 snapshots were activated.


Version-Release number of selected component (if applicable):

glusterfs-3.8.4-12.el7rhgs.x86_64

How reproducible:

2/2

Steps to Reproduce:
1. Created 3*2 distributed replicated volume
2. Enabled shared storage 
3. Scheduled snapshot using scheduler 
4. Restart one of the Server node

Actual results:

After reboot, shared storage brick didn't get mount 


Expected results:

Shared storage brick should get mount after node reboot.

Additional info:

Comment 3 Atin Mukherjee 2017-01-27 10:53:06 UTC

Although this issue is consistently reproducible in a particular node of this setup I have some details around the issue. Shared storage didn't mount automatically because glusterd didn't come up by the time the mount request was sent. Now to understand why glusterd takes ~4-5 minutes in initialization every time on this particular node , here are the set of things I did:

1. restart glusterd and tail /var/log/glusterd.log

After few seconds, the tail output paused after dumping the following:

[2017-01-27 10:38:01.591263] D [MSGID: 0] [glusterd-locks.c:446:glusterd_multiple_mgmt_v3_unlock] 0-management: Returning 0
[2017-01-27 10:38:01.591309] D [MSGID: 0] [glusterd-mgmt-handler.c:789:glusterd_mgmt_v3_unlock_send_resp] 0-management: Responded to mgmt_v3 unlock, ret: 0

and then begun logging with

[2017-01-27 10:41:48.171773] D [logging.c:1829:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present

So it seems like that logging was stuck for 3 mins 47 seconds and then log timer is timed out which gives me an indication that there is something wrong in the underlying file system.

I've couple of questions for you here:

1. Was there any xfs related issue observed in the same node?
2. Is this issue seen on a different setup?

If the answer of 1 is yes and 2 is no then I am inclined to close this issue as not a bug.

Comment 4 Atin Mukherjee 2017-01-27 11:51:21 UTC

To add to above, when glusterd process was taken into gdb during the interval of 3 mins 40 secs mentioned in the above comment, I didn't see any evidence of threads getting stuck and processing any events.

Comment 6 Avra Sengupta 2017-01-30 09:00:35 UTC

The problem with doing so isn't in the implementation, but the user behaviour. A user unmounting shared storage, is outside glusterd's purview and scope. In such a situation we should not be remounting shared storage, because the user has explicitly unmounted it.

To implement such a move would mean adding unnecessary complexity to glusterd, and confusion for the user.

Comment 7 Atin Mukherjee 2017-01-30 09:11:21 UTC

(In reply to Avra Sengupta from comment #6)
> The problem with doing so isn't in the implementation, but the user
> behaviour. A user unmounting shared storage, is outside glusterd's purview
> and scope. In such a situation we should not be remounting shared storage,
> because the user has explicitly unmounted it.
> 
> To implement such a move would mean adding unnecessary complexity to
> glusterd, and confusion for the user.

I didn't mean that GlusterD has to remount the shared storage, what I am looking for a dependency chain where the mount attempt will *only* be made once GlusterD has finished its initialization and an active pid is available.

Comment 10 Bhavana 2017-03-13 16:15:24 UTC

Updated the doc text slightly for the release notes.

Note You need to log in before you can comment on or make changes to this bug.