Bug 1623433 - Brick fails to come online after shutting down and restarting a node
Summary: Brick fails to come online after shutting down and restarting a node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhgs-server-container
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 3.11
Assignee: Saravanakumar
QA Contact: Rachael
URL: https://github.com/gluster/gluster-co...
Whiteboard:
Depends On:
Blocks: 1568868 1589277 1610903 1629575
TreeView+ depends on / blocked
 
Reported: 2018-08-29 10:55 UTC by Rachael
Modified: 2019-02-13 09:26 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, bricks were not properly mounted in the gluster pod and lvmetad processes (one in the container, one on the host) were competing for resources. Hence, this prevented certain logical volumes from being detected and/or available which resulted in failed mounted bricks. Now with this fix, do not rely on the lvmetad service, and ensure it is not started in the container. Hence, resulting in a single metadata daemon process running and managing the devices and logical volumes.
Clone Of:
: 1623465 (view as bug list)
Environment:
Last Closed: 2018-10-24 05:57:39 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github gluster gluster-containers pull 104 None None None 2018-10-02 09:42:47 UTC
Red Hat Bugzilla 1536511 None CLOSED Gluster pod with 850 volumes fails to come up after node reboot 2019-10-09 16:30:42 UTC
Red Hat Bugzilla 1589277 None None None 2019-10-09 16:30:42 UTC
Red Hat Bugzilla 1623438 None VERIFIED [Tracking-RHGS-BZ#1623874] IO errors on block device post rebooting one brick node 2019-10-09 16:30:43 UTC
Red Hat Bugzilla 1627104 None CLOSED cant deploy gluster with crio because LVM commands fail 2019-10-09 16:30:42 UTC
Red Hat Bugzilla 1656724 None NEW Heketi brick is in N/A state post upgrading the OCS setup with rhgs-server and volmanager image 2019-10-09 16:30:43 UTC
Red Hat Product Errata RHBA-2018:2990 None None None 2018-10-24 05:59:07 UTC


Comment 6 Raghavendra Talur 2018-08-29 13:41:02 UTC
I think I found one problem with the mount script. Please look at the output of 

sh-4.2# cat /var/log/glusterfs/container/mountfstab 
mount: special device /dev/mapper/vg_8377c3ad7380bc5110664247eff77dc6-brick_2de9b07bd586eeae37a378737b137c96 does not exist
mount command exited with code 32


Now, I am able to mount the same device using the command:

mount -a --fstab /var/lib/heketi/fstab


We need to debug this further. At the minimum we need to ensure that the pod does not start at all if /etc/systemd/system/gluster-setup.service fails.

Comment 14 Humble Chirammal 2018-08-30 14:26:20 UTC
The upstream fix PR # https://github.com/gluster/gluster-containers/pull/103

Comment 16 Humble Chirammal 2018-08-31 11:51:15 UTC
This is interesting that the issue is always seen for bricks which are part of BHV.

Comment 31 Humble Chirammal 2018-09-05 08:15:22 UTC
saravana, can you please confirm or update the workaround mentioned in c#29 ?

Comment 36 Humble Chirammal 2018-09-07 08:18:34 UTC
Workaround: ( Thanks to Atin++ and Saravana++)


Mount  the brick(s) :
# mount -a --fstab /var/lib/heketi/fstab

Start the corresponding volume
# gluster volume start <volume name> force

Comment 43 Michael Adam 2018-09-20 20:48:29 UTC
The explanation is very plausible, and let's treat the image of
https://bugzilla.redhat.com/show_bug.cgi?id=1536511#c21
as a proposed patch for this problem.

Comment 44 Michael Adam 2018-09-20 20:54:01 UTC
proposing for 3.11.0

Comment 49 Anjana KD 2018-10-12 09:46:54 UTC
Updated doc text kindly review for technical accuracy.

Comment 50 Niels de Vos 2018-10-12 11:05:09 UTC
Looks good to me, thanks! I just changed the last words to "logical volumes", as there are multiple of them, not only one.

Comment 51 Anjana KD 2018-10-12 13:13:09 UTC
Neils, thank you for the update.

Comment 53 errata-xmlrpc 2018-10-24 05:57:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2990


Note You need to log in before you can comment on or make changes to this bug.