1605056 – [RHHi] Mount hung and not accessible

Bug 1605056 - [RHHi] Mount hung and not accessible

Summary: [RHHi] Mount hung and not accessible

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	sharding
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Krutika Dhananjay
QA Contact:	bugs@gluster.org
Docs Contact:
URL:
Whiteboard:
Depends On:	1603118
Blocks:	1641440
TreeView+	depends on / blocked

Reported:	2018-07-20 05:15 UTC by Krutika Dhananjay
Modified:	2019-03-25 16:30 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-6.0
Clone Of:	1603118
Clones:	1641440 (view as bug list)
Environment:
Last Closed:	2018-10-23 15:14:52 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Krutika Dhananjay 2018-07-20 05:15:49 UTC

+++ This bug was initially created as a clone of Bug #1603118 +++

Description of problem:

One of the hosts in the ovirt-gluster hyperconverged cluster is in non-operational status. Looking through the vdsm logs, the following error is seen.

2018-07-18 18:02:08,353+0530 WARN  (itmap/1) [storage.scanDomains] Could not collect metadata file for domain path /rhev/data-center/mnt/glusterSD/rhsdev-grafton2.lab.eng.blr.redhat.com:_vmstore (fileSD:845)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 834, in collectMetaFiles
    metaFiles = oop.getProcessPool(client_name).glob.glob(mdPattern)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 107, in glob
    return self._iop.glob(pattern)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 560, in glob
    return self._sendCommand("glob", {"pattern": pattern}, self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 451, in _sendCommand
    raise OSError(errcode, errstr)
OSError: [Errno 11] Resource temporarily unavailable

There are no errors in the vmstore mount logs, however the mount is hung - cannot access /rhev/data-center/mnt/glusterSD/rhsdev-grafton2.lab.eng.blr.redhat.com:_vmstore

Version-Release number of selected component (if applicable):
glusterfs-3.8.4-54.12

How reproducible:
This was seen on a running environment.

Steps to Reproduce:
NA

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-07-19 04:33:28 EDT ---

This bug is automatically being proposed for the release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Krutika Dhananjay on 2018-07-19 12:07:20 EDT ---

Found the RCA.

For this bug to be hit, the LRU list in shard needs to become full. Its size is 16K. That means at least 16K shards should have been accessed from a glusterfs/RHHI mount.

(How do you know when you've hit that mark and can stop generating more data when you're testing? Take the statedump of the mount. Under section "[features/shard.$VOLNAME-shard]" you should see the following line "inode-count=16384")

Next, a vm should have been migrated from the host where it was created/used for a while to another host.

Next, delete the vm from the new host.

Access this vm from the old host. You should get "No such file or directory".
(I don't know if RHV accessed this file from the first host triggering destruction ("forget") of the base inode of this vm. But it is also quite likely in this case that fuse forced the destruction of the inode upon seeing high memory pressure, and at some point this now invalid pointer is accessed leading to strange behavior - like a crash or a hang).

Now perform some more io on one of the other vms that the first host might be managing. At some point, your client will either crash or hang.

======================================================
Here is a simpler way to hit the bug on your non-RHHI (even single node) setup:

1. Create a replica 3 volume and start it.
2. Enable shard on it. And set shard-block-size to 4MB.
3. Create 2 fuse mounts - $M1 and $M2.
4. From $M1, create a 65GB size file (use dd maybe).
(Why 64GB? To hit the lru limit, you need 16K shards. That's 16K * 4MB = 64GB. This is where setting shard-block size to 4MB helps for the purpose of this test. With default 64MB size, more time and space will be needed to recreate the issue since now a 1TB image will need to be created to hit the bug).

5. Read that file entirely from $M2. (use cat maybe).
6. Delete the file from $M1.
7. Stat the file from $M2. (Should fail with "No such file or directory").
8. Now start dd on a second file from $M2.

Mount process associated with $M2 must crash soon.

-Krutika

--- Additional comment from Krutika Dhananjay on 2018-07-19 12:09:50 EDT ---

(In reply to Krutika Dhananjay from comment #2)
> Found the RCA.
> 
> For this bug to be hit, the LRU list in shard needs to become full. Its size
> is 16K. That means at least 16K shards should have been accessed from a
> glusterfs/RHHI mount.
> 
> (How do you know when you've hit that mark and can stop generating more data
> when you're testing? Take the statedump of the mount. Under section
> "[features/shard.$VOLNAME-shard]" you should see the following line
> "inode-count=16384")
> 
> Next, a vm should have been migrated from the host where it was created/used
> for a while to another host.
> 
> Next, delete the vm from the new host.
> 
> Access this vm from the old host. You should get "No such file or directory".
> (I don't know if RHV accessed this file from the first host triggering
> destruction ("forget") of the base inode of this vm. But it is also quite
> likely in this case that fuse forced the destruction of the inode upon
> seeing high memory pressure, and at some point this now invalid pointer is
> accessed leading to strange behavior - like a crash or a hang).
> 
> Now perform some more io on one of the other vms that the first host might
> be managing. At some point, your client will either crash or hang.
> 
> ======================================================
> Here is a simpler way to hit the bug on your non-RHHI (even single node)
> setup:
> 
> 1. Create a replica 3 volume and start it.
> 2. Enable shard on it. And set shard-block-size to 4MB.
> 3. Create 2 fuse mounts - $M1 and $M2.
> 4. From $M1, create a 65GB size file (use dd maybe).

Sorry, typo. This should be 64GB (although the bug is recreatable with 65GB block size too!)

-Krutika

> (Why 64GB? To hit the lru limit, you need 16K shards. That's 16K * 4MB =
> 64GB. This is where setting shard-block size to 4MB helps for the purpose of
> this test. With default 64MB size, more time and space will be needed to
> recreate the issue since now a 1TB image will need to be created to hit the
> bug).
> 
> 5. Read that file entirely from $M2. (use cat maybe).
> 6. Delete the file from $M1.
> 7. Stat the file from $M2. (Should fail with "No such file or directory").
> 8. Now start dd on a second file from $M2.
> 
> Mount process associated with $M2 must crash soon.
> 
> -Krutika

Comment 1 Worker Ant 2018-07-23 07:33:01 UTC

COMMIT: https://review.gluster.org/20544 committed in master by "Krutika Dhananjay" <kdhananj> with a commit message- features/shard: Make lru limit of inode list configurable

Currently this lru limit is hard-coded to 16384. This patch makes it
configurable to make it easier to hit the lru limit and enable testing
of different cases that arise when the limit is reached.

The option is features.shard-lru-limit. It is by design allowed to
be configured only in init() but not in reconfigure(). This is to avoid
all the complexity associated with eviction of least recently used shards
when the list is shrunk.

Change-Id: Ifdcc2099f634314fafe8444e2d676e192e89e295
updates: bz#1605056
Signed-off-by: Krutika Dhananjay <kdhananj>

Comment 2 Worker Ant 2018-07-23 07:43:43 UTC

REVIEW: https://review.gluster.org/20550 (features/shard: Hold a ref on base inode when adding a shard to lru list) posted (#1) for review on master by Krutika Dhananjay

Comment 3 Worker Ant 2018-10-16 03:38:29 UTC

COMMIT: https://review.gluster.org/20550 committed in master by "Krutika Dhananjay" <kdhananj> with a commit message- features/shard: Hold a ref on base inode when adding a shard to lru list

In __shard_update_shards_inode_list(), previously shard translator
was not holding a ref on the base inode whenever a shard was added to
the lru list. But if the base shard is forgotten and destroyed either
by fuse due to memory pressure or due to the file being deleted at some
point by a different client with this client still containing stale
shards in its lru list, the client would crash at the time of locking
lru_base_inode->lock owing to illegal memory access.

So now the base shard is ref'd into the inode ctx of every shard that
is added to lru list until it gets lru'd out.

The patch also handles the case where none of the shards associated
with a file that is about to be deleted are part of the LRU list and
where an unlink at the beginning of the operation destroys the base
inode (because there are no refkeepers) and hence all of the shards
that are about to be deleted will be resolved without the existence
of a base shard in-memory. This, if not handled properly, could lead
to a crash.

Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499
updates: bz#1605056
Signed-off-by: Krutika Dhananjay <kdhananj>

Comment 4 Shyamsundar 2018-10-23 15:14:52 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 5 Shyamsundar 2019-03-25 16:30:33 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.