1623433 – Brick fails to come online after shutting down and restarting a node

Bug 1623433 - Brick fails to come online after shutting down and restarting a node

Summary: Brick fails to come online after shutting down and restarting a node

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhgs-server-container
Sub Component:
Version:	cns-3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 3.11
Assignee:	Saravanakumar
QA Contact:	Rachael
Docs Contact:
URL:	https://github.com/gluster/gluster-co...
Whiteboard:
Depends On:
Blocks:	1568868 1589277 1610903 1629575
TreeView+	depends on / blocked

Reported:	2018-08-29 10:55 UTC by Rachael
Modified:	2019-02-13 09:26 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, bricks were not properly mounted in the gluster pod and lvmetad processes (one in the container, one on the host) were competing for resources. Hence, this prevented certain logical volumes from being detected and/or available which resulted in failed mounted bricks. Now with this fix, do not rely on the lvmetad service, and ensure it is not started in the container. Hence, resulting in a single metadata daemon process running and managing the devices and logical volumes.
Clone Of:
Clones:	1623465 (view as bug list)
Environment:
Last Closed:	2018-10-24 05:57:39 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	gluster gluster-containers pull 104	0	None	closed	Do not run udev and lvmetad inside the container	2020-06-18 21:07:47 UTC
Red Hat Bugzilla	1536511	0	unspecified	CLOSED	Gluster pod with 850 volumes fails to come up after node reboot	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1589277	1	None	None	None	2024-09-18 00:48:01 UTC
Red Hat Bugzilla	1623438	0	unspecified	CLOSED	[Tracking-RHGS-BZ#1623874] IO errors on block device post rebooting one brick node	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1627104	0	urgent	CLOSED	cant deploy gluster with crio because LVM commands fail	2022-03-13 15:32:14 UTC
Red Hat Bugzilla	1656724	0	unspecified	CLOSED	Heketi brick is in N/A state post upgrading the OCS setup with rhgs-server and volmanager image	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2018:2990	0	None	None	None	2018-10-24 05:59:07 UTC

Internal Links: 1536511 1589277 1623438 1627104 1656724

Comment 6 Raghavendra Talur 2018-08-29 13:41:02 UTC

I think I found one problem with the mount script. Please look at the output of 

sh-4.2# cat /var/log/glusterfs/container/mountfstab 
mount: special device /dev/mapper/vg_8377c3ad7380bc5110664247eff77dc6-brick_2de9b07bd586eeae37a378737b137c96 does not exist
mount command exited with code 32


Now, I am able to mount the same device using the command:

mount -a --fstab /var/lib/heketi/fstab


We need to debug this further. At the minimum we need to ensure that the pod does not start at all if /etc/systemd/system/gluster-setup.service fails.

Comment 14 Humble Chirammal 2018-08-30 14:26:20 UTC

The upstream fix PR # https://github.com/gluster/gluster-containers/pull/103

Comment 16 Humble Chirammal 2018-08-31 11:51:15 UTC

This is interesting that the issue is always seen for bricks which are part of BHV.

Comment 31 Humble Chirammal 2018-09-05 08:15:22 UTC

saravana, can you please confirm or update the workaround mentioned in c#29 ?

Comment 36 Humble Chirammal 2018-09-07 08:18:34 UTC

Workaround: ( Thanks to Atin++ and Saravana++)


Mount  the brick(s) :
# mount -a --fstab /var/lib/heketi/fstab

Start the corresponding volume
# gluster volume start <volume name> force

Comment 43 Michael Adam 2018-09-20 20:48:29 UTC

The explanation is very plausible, and let's treat the image of
https://bugzilla.redhat.com/show_bug.cgi?id=1536511#c21
as a proposed patch for this problem.

Comment 44 Michael Adam 2018-09-20 20:54:01 UTC

proposing for 3.11.0

Comment 49 Anjana KD 2018-10-12 09:46:54 UTC

Updated doc text kindly review for technical accuracy.

Comment 50 Niels de Vos 2018-10-12 11:05:09 UTC

Looks good to me, thanks! I just changed the last words to "logical volumes", as there are multiple of them, not only one.

Comment 51 Anjana KD 2018-10-12 13:13:09 UTC

Neils, thank you for the update.

Comment 53 errata-xmlrpc 2018-10-24 05:57:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2990

Note You need to log in before you can comment on or make changes to this bug.