Bug 1556895

Summary: [RHHI]Fuse mount crashed with only one VM running with its image on that volume
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: shardingAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhhiv-1.5CC: nbalacha, pkarampu, rhs-bugs, sabose, sankarshan, sasundar, sheggodu, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1556891
: 1557876 1585044 1585046 (view as bug list) Environment:
Last Closed: 2018-09-04 06:44:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503137, 1556891, 1559831, 1585044, 1585046    
Attachments:
Description Flags
fuse-mount-log
none
sosreport
none
coredump-file none

Description SATHEESARAN 2018-03-15 13:08:46 UTC
+++ This bug was initially created as a clone of Bug #1556891 +++

Description of problem:
-----------------------
With single node RHHI installation, engine volume is the only one VM running with its image on the gluster volume. After sometime, the fuse mount crashed

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHV 4.2.2-4
RHGS 3.4.0 - glusterfs-3.12.2-5.el7rhgs ( interim build )

How reproducible:
-----------------
Hit it once

Steps to Reproduce:
-------------------
1. Create distribute volume ( 1x1 ) and fuse mount it
2. Engine VM is running with its image on gluster volume

Actual results:
---------------
Observed fuse mount crash

Expected results:
-----------------
Fuse mount should not crash

Comment 1 Nithya Balachandran 2018-03-15 14:06:20 UTC
Is there a coredump or backtrace for this crash? Do they symbols indicate that the crash was in dht?

Comment 2 SATHEESARAN 2018-03-15 14:45:50 UTC
Engine VM is running on the plain distribute volume with sharding enabled with shard block-size set to 64MB

1. Cluster info
----------------
There is only one node in the cluster

2. Volume info
---------------
[root@ ]# gluster volume info engine
 
Volume Name: engine
Type: Distribute
Volume ID: 17806a7c-64fb-4a9f-a313-f4e99df6231c
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.70.36.244:/gluster_bricks/engine/engine
Options Reconfigured:
auth.ssl-allow: rhsqa-grafton10.lab.eng.blr.redhat.com,rhsqa-grafton11.lab.eng.blr.redhat.com,rhsqa-grafton12.lab.eng.blr.redhat.com
client.ssl: on
server.ssl: on
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
user.cifs: off
network.ping-timeout: 30
network.remote-dio: off
performance.strict-o-direct: on
performance.low-prio-threads: 32
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
transport.address-family: inet
nfs.disable: on

[root@ ]# gluster volume status engine
Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.36.244:/gluster_bricks/engine/e
ngine                                       49152     0          Y       49791
 
Task Status of Volume engine
------------------------------------------------------------------------------
There are no active volume tasks

3. Other information
---------------------
1. Gluster encryption is enabled on management and data path
2. Sharding is enabled on this volume with shard-block-size is 64MB

Comment 3 SATHEESARAN 2018-03-15 14:47:29 UTC
(In reply to Nithya Balachandran from comment #1)
> Is there a coredump or backtrace for this crash? Do they symbols indicate
> that the crash was in dht?

Nithya, 

You are too quick to respond before I could provide all the detail.
Thanks for that follow-up. I will update the required info one by one

Comment 4 SATHEESARAN 2018-03-15 14:50:47 UTC
[2018-03-14 18:37:46.880095] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/bbd1fada-3cf8-42ba-8440-9a93990c37d9.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/bbd1fada-3cf8-42ba-8440-9a93990c37d9 (hash=engine-client-0/cache=<nul>)
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2 (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2.backup (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:35.920700] and [2018-03-14 18:37:36.339496]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2 (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:35.923482] and [2018-03-14 18:37:36.342288]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229 (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229.backup (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:36.162693] and [2018-03-14 18:37:36.575512]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229 (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:36.165706] and [2018-03-14 18:37:36.578852]
pending frames:
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-03-15 01:31:32
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f496c6163f0]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f496c620334]
/lib64/libc.so.6(+0x36280)[0x7f496ac75280]
[0x7f494402b148]

Comment 5 SATHEESARAN 2018-03-15 14:51:42 UTC
Created attachment 1408453 [details]
fuse-mount-log

Comment 6 SATHEESARAN 2018-03-15 15:13:36 UTC
Created attachment 1408470 [details]
sosreport

Comment 7 SATHEESARAN 2018-03-15 15:14:10 UTC
Created attachment 1408471 [details]
coredump-file

Comment 8 SATHEESARAN 2018-03-15 15:20:36 UTC
The other information which was missed in comment2 is that the brick is created over VDO volume

Comment 18 SATHEESARAN 2018-04-29 10:35:34 UTC
Tested with RHV 4.2 & RHGS 3.4.0 nightly (3.12.2-8)

This issue is not seen with the steps in comment0

Comment 19 Sunil Kumar Acharya 2018-06-12 09:18:36 UTC
*** Bug 1585044 has been marked as a duplicate of this bug. ***

Comment 21 errata-xmlrpc 2018-09-04 06:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607