Summary: | glusterd: shared storage volume didn't get mount after node reboot. | ||
---|---|---|---|
Product: | Red Hat Gluster Storage | Reporter: | Anil Shah <ashah> |
Component: | glusterd | Assignee: | Sunny Kumar <sunkumar> |
Status: | CLOSED WONTFIX | QA Contact: | Bala Konda Reddy M <bmekala> |
Severity: | urgent | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.2 | CC: | amukherj, asriram, atumball, bmohanra, rcyriac, rhs-bugs, storage-qa-internal, vbellur |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | shared-storage | ||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
glusterd takes time to initialize if the setup is slow. As a result, by the time /etc/fstab entries are mounted, glusterd on the node is not ready to serve that mount, and the glusterd mount fails. Due to this, shared storage may not get mounted after node reboots.
Workaround: If shared storage is not mounted after the node reboots, check if glusterd is up and mount the shared storage volume manually.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-30 16:25:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: |
Description
Anil Shah
2017-01-27 08:49:24 UTC
Although this issue is consistently reproducible in a particular node of this setup I have some details around the issue. Shared storage didn't mount automatically because glusterd didn't come up by the time the mount request was sent. Now to understand why glusterd takes ~4-5 minutes in initialization every time on this particular node , here are the set of things I did: 1. restart glusterd and tail /var/log/glusterd.log After few seconds, the tail output paused after dumping the following: [2017-01-27 10:38:01.591263] D [MSGID: 0] [glusterd-locks.c:446:glusterd_multiple_mgmt_v3_unlock] 0-management: Returning 0 [2017-01-27 10:38:01.591309] D [MSGID: 0] [glusterd-mgmt-handler.c:789:glusterd_mgmt_v3_unlock_send_resp] 0-management: Responded to mgmt_v3 unlock, ret: 0 and then begun logging with [2017-01-27 10:41:48.171773] D [logging.c:1829:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present So it seems like that logging was stuck for 3 mins 47 seconds and then log timer is timed out which gives me an indication that there is something wrong in the underlying file system. I've couple of questions for you here: 1. Was there any xfs related issue observed in the same node? 2. Is this issue seen on a different setup? If the answer of 1 is yes and 2 is no then I am inclined to close this issue as not a bug. To add to above, when glusterd process was taken into gdb during the interval of 3 mins 40 secs mentioned in the above comment, I didn't see any evidence of threads getting stuck and processing any events. The problem with doing so isn't in the implementation, but the user behaviour. A user unmounting shared storage, is outside glusterd's purview and scope. In such a situation we should not be remounting shared storage, because the user has explicitly unmounted it. To implement such a move would mean adding unnecessary complexity to glusterd, and confusion for the user. (In reply to Avra Sengupta from comment #6) > The problem with doing so isn't in the implementation, but the user > behaviour. A user unmounting shared storage, is outside glusterd's purview > and scope. In such a situation we should not be remounting shared storage, > because the user has explicitly unmounted it. > > To implement such a move would mean adding unnecessary complexity to > glusterd, and confusion for the user. I didn't mean that GlusterD has to remount the shared storage, what I am looking for a dependency chain where the mount attempt will *only* be made once GlusterD has finished its initialization and an active pid is available. Updated the doc text slightly for the release notes. |