1794260 – Missing brick directory while deploying RHHI-V

Bug 1794260 - Missing brick directory while deploying RHHI-V

Summary: Missing brick directory while deploying RHHI-V

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-ansible
Sub Component:
Version:	rhgs-3.5
Hardware:	x86_64
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 1
Assignee:	Gobinda Das
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1787001
TreeView+	depends on / blocked

Reported:	2020-01-23 05:56 UTC by Gobinda Das
Modified:	2020-02-05 09:37 UTC (History)
CC List:	8 users (show)
Fixed In Version:	gluster-ansible-infra-1.0.4-5
Doc Type:	Bug Fix
Doc Text:	RHHI-V re-deployment fails to mount few bricks as systemd silently unmounts those bricks and returns success without error, because of which brick is created directly on the root filesystem /gluster_bricks/engine/engine. Reload of systemd, mounts XFS filesystem back and /gluster_bricks/engine becomes the mount point and it is empty, killing the already running glusterfsd(brick) process. With this update we reload systemd daemon before mounting the bricks to ensure bricks are available.
Clone Of:	1787001
Environment:
Last Closed:	2020-01-30 06:45:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	gluster gluster-ansible-infra pull 77	0	None	closed	Reload systemctl daemon before mounting disks	2020-07-13 07:23:21 UTC
Red Hat Product Errata	RHBA-2020:0289	0	None	None	None	2020-01-30 06:45:42 UTC

Description Gobinda Das 2020-01-23 05:56:39 UTC

Description of problem:
while hosted engine deployment with non 4k devices .The brick is been killed and the directory  '/gluster_bricks/engine/engine/'  is missing 

Version-Release number of selected component (if applicable):
RHGS 3.5.1 (6.0-25)
RHVH-4.3.8


Steps to Reproduce:
step1 : complete the gluster deployment with  engine brick on 4k and other brick on VDO 4k device  
step2 : Now complete the hosted engine deployment

Actual results:
directory /gluster_bricks/engine/engine/ is missing 
and the brick is not online 
Expected results:

The directory "/gluster_bricks/engine/engine/" should be present 
should not prompt ant error on the terminal

--- Additional comment from RHEL Program Management on 2019-12-30 09:15:15 UTC ---

This bug is automatically being proposed for RHHI-V 1.7 release at Red Hat Hyperconverged Infrastructure for Virtualization product, by setting the release flag 'rhiv‑1.7' to '?'.


This issue happens because of RHEL 7 systemd bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1494014

So ideally, this issue happens with redeployment scenario.
Suppose, if some one umount /gluster_bricks/engine and in the attempt
to redeploy, again tries to mount engine brick at the same location - /gluster_bricks/engine,
then systemd silently mounts and umounts the brick, with rc=0

This is evident from:
<snip>
Jan 22 08:19:24 rhsqa-grafton10-nic2 systemd: Unit gluster_bricks-engine.mount is bound to inactive unit dev-disk-by\x2duuid-41e87589\x2d905e\x2d4fea\x
2daef2\x2dd4157129add3.device. Stopping, too.
Jan 22 08:19:24 rhsqa-grafton10-nic2 systemd: Stopped Migrate local SELinux policy changes from the old store structure to the new structure.
Jan 22 08:19:24 rhsqa-grafton10-nic2 systemd: Stopped target Local File Systems.
Jan 22 08:19:24 rhsqa-grafton10-nic2 systemd: Unmounting /gluster_bricks/engine...
Jan 22 08:19:24 rhsqa-grafton10-nic2 kernel: XFS (dm-13): Unmounting Filesystem
Jan 22 08:19:24 rhsqa-grafton10-nic2 systemd: Unmounted /gluster_bricks/engine.
Jan 22 08:19:30 rhsqa-grafton10-nic2 python: ansible-mount Invoked with src=UUID=7702205b-2c11-49b9-8a0e-fb3bbe6ed5db dump=None boot=True fstab=None pa
ssno=None fstype=xfs state=mounted path=/gluster_bricks/test backup=False opts=inode64,noatime,nodiratime
Jan 22 08:19:30 rhsqa-grafton10-nic2 kernel: XFS (dm-24): Mounting V5 Filesystem
Jan 22 08:19:30 rhsqa-grafton10-nic2 kernel: XFS (dm-24): Ending clean mount
Jan 22 08:19:30 rhsqa-grafton10-nic2 systemd: Unit gluster_bricks-test.mount is bound to inactive unit dev-disk-by\x2duuid-17f47f26\x2d61d8\x2d40f3\x2d
b47e\x2dd56512794a7c.device. Stopping, too.
</snip>

In this case, remember /gluster_bricks/engine is not yet mounted,
but ansible is unaware of this. When creating the gluster volume, brick is created
directly on the root filesystem /gluster_bricks/engine/engine

Now when deployment reaches HE deployment, some step here calls - 'systemctl daemon-reload'
which leads to the mount point to appear from nowhere and its mounted over the existing
contents. This is evident from:

<snip>
Jan 22 08:22:26 rhsqa-grafton10-nic2 systemd: gluster_bricks-test.mount: Directory /gluster_bricks/test to mount over is not empty, mounting anyway.
Jan 22 08:22:26 rhsqa-grafton10-nic2 systemd: Mounting /gluster_bricks/test...
Jan 22 08:22:26 rhsqa-grafton10-nic2 systemd: gluster_bricks-engine.mount: Directory /gluster_bricks/engine to mount over is not empty, mounting anyway.
Jan 22 08:22:26 rhsqa-grafton10-nic2 systemd: Mounting /gluster_bricks/engine...
Jan 22 08:22:26 rhsqa-grafton10-nic2 kernel: XFS (dm-24): Mounting V5 Filesystem
Jan 22 08:22:26 rhsqa-grafton10-nic2 kernel: XFS (dm-13): Mounting V5 Filesystem
Jan 22 08:22:26 rhsqa-grafton10-nic2 kernel: XFS (dm-24): Ending clean mount
Jan 22 08:22:26 rhsqa-grafton10-nic2 kernel: XFS (dm-13): Ending clean mount
Jan 22 08:22:26 rhsqa-grafton10-nic2 systemd: Mounted /gluster_bricks/engine.
Jan 22 08:22:26 rhsqa-grafton10-nic2 systemd: Mounted /gluster_bricks/test.
</snip>

So now /gluster_bricks/engine becomes the mount point and its empty. This is how the 'engine' dir under /gluster_bricks/engine disappears


Workaround from the bug is to do 'systemctl daemon-reload' before mounting the bricks


Treating this bug as BLOCKER as it may affect customers, while redeploying.

Comment 3 SATHEESARAN 2020-01-28 07:28:06 UTC

Verified with RHVH 4.3.8 + gluster-ansible-infra-1.0.4-5.el7rhgs

1. Created gluster deployment as part of RHHI-V deployment
2. After gluster deployment is complete, clean the gluster deployment using the same inventory file
3. Re-create the gluster deployment again.
4. Make sure, all the bricks are mounted

Repeat steps 1-4 many times, and make sure all the bricks (XFS) are mounted and available post gluster deployment/configuration

Comment 4 Anjana KD 2020-01-29 17:33:52 UTC

Have updated the doc text, kindly verify the updated doc text field.

Comment 6 errata-xmlrpc 2020-01-30 06:45:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0289

Comment 7 SATHEESARAN 2020-02-05 09:37:53 UTC

(In reply to Anjana KD from comment #4)
> Have updated the doc text, kindly verify the updated doc text field.

yes, that looks good

Note You need to log in before you can comment on or make changes to this bug.