1556895 – [RHHI]Fuse mount crashed with only one VM running with its image on that volume

Bug 1556895 - [RHHI]Fuse mount crashed with only one VM running with its image on that volume

Summary: [RHHI]Fuse mount crashed with only one VM running with its image on that volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	sharding
Sub Component:
Version:	rhhiv-1.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Pranith Kumar K
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1585044 (view as bug list)
Depends On:
Blocks:	1503137 1556891 1559831 1585044 1585046
TreeView+	depends on / blocked

Reported:	2018-03-15 13:08 UTC by SATHEESARAN
Modified:	2018-09-04 06:45 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.12.2-6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1556891
Clones:	1557876 1585044 1585046 (view as bug list)
Environment:
Last Closed:	2018-09-04 06:44:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
fuse-mount-log (132.94 KB, text/plain) 2018-03-15 14:51 UTC, SATHEESARAN	no flags	Details
sosreport (15.90 MB, application/x-xz) 2018-03-15 15:13 UTC, SATHEESARAN	no flags	Details
coredump-file (1.06 MB, application/x-gzip) 2018-03-15 15:14 UTC, SATHEESARAN	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:45:42 UTC

Description SATHEESARAN 2018-03-15 13:08:46 UTC

+++ This bug was initially created as a clone of Bug #1556891 +++

Description of problem:
-----------------------
With single node RHHI installation, engine volume is the only one VM running with its image on the gluster volume. After sometime, the fuse mount crashed

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHV 4.2.2-4
RHGS 3.4.0 - glusterfs-3.12.2-5.el7rhgs ( interim build )

How reproducible:
-----------------
Hit it once

Steps to Reproduce:
-------------------
1. Create distribute volume ( 1x1 ) and fuse mount it
2. Engine VM is running with its image on gluster volume

Actual results:
---------------
Observed fuse mount crash

Expected results:
-----------------
Fuse mount should not crash

Comment 1 Nithya Balachandran 2018-03-15 14:06:20 UTC

Is there a coredump or backtrace for this crash? Do they symbols indicate that the crash was in dht?

Comment 2 SATHEESARAN 2018-03-15 14:45:50 UTC

Engine VM is running on the plain distribute volume with sharding enabled with shard block-size set to 64MB

1. Cluster info
----------------
There is only one node in the cluster

2. Volume info
---------------
[root@ ]# gluster volume info engine
 
Volume Name: engine
Type: Distribute
Volume ID: 17806a7c-64fb-4a9f-a313-f4e99df6231c
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.70.36.244:/gluster_bricks/engine/engine
Options Reconfigured:
auth.ssl-allow: rhsqa-grafton10.lab.eng.blr.redhat.com,rhsqa-grafton11.lab.eng.blr.redhat.com,rhsqa-grafton12.lab.eng.blr.redhat.com
client.ssl: on
server.ssl: on
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
user.cifs: off
network.ping-timeout: 30
network.remote-dio: off
performance.strict-o-direct: on
performance.low-prio-threads: 32
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
transport.address-family: inet
nfs.disable: on

[root@ ]# gluster volume status engine
Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.36.244:/gluster_bricks/engine/e
ngine                                       49152     0          Y       49791
 
Task Status of Volume engine
------------------------------------------------------------------------------
There are no active volume tasks

3. Other information
---------------------
1. Gluster encryption is enabled on management and data path
2. Sharding is enabled on this volume with shard-block-size is 64MB

Comment 3 SATHEESARAN 2018-03-15 14:47:29 UTC

(In reply to Nithya Balachandran from comment #1)
> Is there a coredump or backtrace for this crash? Do they symbols indicate
> that the crash was in dht?

Nithya, 

You are too quick to respond before I could provide all the detail.
Thanks for that follow-up. I will update the required info one by one

Comment 4 SATHEESARAN 2018-03-15 14:50:47 UTC

[2018-03-14 18:37:46.880095] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/bbd1fada-3cf8-42ba-8440-9a93990c37d9.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/bbd1fada-3cf8-42ba-8440-9a93990c37d9 (hash=engine-client-0/cache=<nul>)
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2 (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2.backup (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:35.920700] and [2018-03-14 18:37:36.339496]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2 (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:35.923482] and [2018-03-14 18:37:36.342288]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229 (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229.backup (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:36.162693] and [2018-03-14 18:37:36.575512]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229 (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:36.165706] and [2018-03-14 18:37:36.578852]
pending frames:
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-03-15 01:31:32
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f496c6163f0]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f496c620334]
/lib64/libc.so.6(+0x36280)[0x7f496ac75280]
[0x7f494402b148]

Comment 5 SATHEESARAN 2018-03-15 14:51:42 UTC

Created attachment 1408453 [details]
fuse-mount-log

Comment 6 SATHEESARAN 2018-03-15 15:13:36 UTC

Created attachment 1408470 [details]
sosreport

Comment 7 SATHEESARAN 2018-03-15 15:14:10 UTC

Created attachment 1408471 [details]
coredump-file

Comment 8 SATHEESARAN 2018-03-15 15:20:36 UTC

The other information which was missed in comment2 is that the brick is created over VDO volume

Comment 18 SATHEESARAN 2018-04-29 10:35:34 UTC

Tested with RHV 4.2 & RHGS 3.4.0 nightly (3.12.2-8)

This issue is not seen with the steps in comment0

Comment 19 Sunil Kumar Acharya 2018-06-12 09:18:36 UTC

*** Bug 1585044 has been marked as a duplicate of this bug. ***

Comment 21 errata-xmlrpc 2018-09-04 06:44:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.