Bug 1556895

Summary:

[RHHI]Fuse mount crashed with only one VM running with its image on that volume

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

SATHEESARAN <sasundar>

Component:

sharding

Assignee:

Pranith Kumar K <pkarampu>

Status:

CLOSED ERRATA

QA Contact:

SATHEESARAN <sasundar>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

rhhiv-1.5

CC:

nbalacha, pkarampu, rhs-bugs, sabose, sankarshan, sasundar, sheggodu, storage-qa-internal

Target Milestone:

---

Target Release:

RHGS 3.4.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

glusterfs-3.12.2-6

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

1556891

Clones:

1557876 1585044 1585046 (view as bug list)

Environment:

Last Closed:

2018-09-04 06:44:14 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1503137, 1556891, 1559831, 1585044, 1585046

Attachments:

Description	Flags
fuse-mount-log	none
sosreport	none
coredump-file	none

Description SATHEESARAN 2018-03-15 13:08:46 UTC

+++ This bug was initially created as a clone of Bug #1556891 +++

Description of problem:
-----------------------
With single node RHHI installation, engine volume is the only one VM running with its image on the gluster volume. After sometime, the fuse mount crashed

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHV 4.2.2-4
RHGS 3.4.0 - glusterfs-3.12.2-5.el7rhgs ( interim build )

How reproducible:
-----------------
Hit it once

Steps to Reproduce:
-------------------
1. Create distribute volume ( 1x1 ) and fuse mount it
2. Engine VM is running with its image on gluster volume

Actual results:
---------------
Observed fuse mount crash

Expected results:
-----------------
Fuse mount should not crash

Comment 1 Nithya Balachandran 2018-03-15 14:06:20 UTC

Is there a coredump or backtrace for this crash? Do they symbols indicate that the crash was in dht?

Comment 2 SATHEESARAN 2018-03-15 14:45:50 UTC

Engine VM is running on the plain distribute volume with sharding enabled with shard block-size set to 64MB

1. Cluster info
----------------
There is only one node in the cluster

2. Volume info
---------------
[root@ ]# gluster volume info engine
 
Volume Name: engine
Type: Distribute
Volume ID: 17806a7c-64fb-4a9f-a313-f4e99df6231c
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.70.36.244:/gluster_bricks/engine/engine
Options Reconfigured:
auth.ssl-allow: rhsqa-grafton10.lab.eng.blr.redhat.com,rhsqa-grafton11.lab.eng.blr.redhat.com,rhsqa-grafton12.lab.eng.blr.redhat.com
client.ssl: on
server.ssl: on
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
user.cifs: off
network.ping-timeout: 30
network.remote-dio: off
performance.strict-o-direct: on
performance.low-prio-threads: 32
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
transport.address-family: inet
nfs.disable: on

[root@ ]# gluster volume status engine
Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.36.244:/gluster_bricks/engine/e
ngine                                       49152     0          Y       49791
 
Task Status of Volume engine
------------------------------------------------------------------------------
There are no active volume tasks

3. Other information
---------------------
1. Gluster encryption is enabled on management and data path
2. Sharding is enabled on this volume with shard-block-size is 64MB

Comment 3 SATHEESARAN 2018-03-15 14:47:29 UTC

(In reply to Nithya Balachandran from comment #1)
> Is there a coredump or backtrace for this crash? Do they symbols indicate
> that the crash was in dht?

Nithya, 

You are too quick to respond before I could provide all the detail.
Thanks for that follow-up. I will update the required info one by one

Comment 4 SATHEESARAN 2018-03-15 14:50:47 UTC

[2018-03-14 18:37:46.880095] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/bbd1fada-3cf8-42ba-8440-9a93990c37d9.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/bbd1fada-3cf8-42ba-8440-9a93990c37d9 (hash=engine-client-0/cache=<nul>)
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2 (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2.backup (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:35.920700] and [2018-03-14 18:37:36.339496]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/591f5eb6-8451-48c9-8654-2ab3b69a2fb2 (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:35.923482] and [2018-03-14 18:37:36.342288]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229 (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229.backup (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:36.162693] and [2018-03-14 18:37:36.575512]
The message "I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-engine-dht: renaming /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229.temp (hash=engine-client-0/cache=engine-client-0) => /6e464c6f-1f1a-45b7-a7a7-8faf7a88e155/master/tasks/02dc32cb-7692-453a-bd12-be617103c229 (hash=engine-client-0/cache=<nul>)" repeated 9 times between [2018-03-14 18:37:36.165706] and [2018-03-14 18:37:36.578852]
pending frames:
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-03-15 01:31:32
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f496c6163f0]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f496c620334]
/lib64/libc.so.6(+0x36280)[0x7f496ac75280]
[0x7f494402b148]

Comment 5 SATHEESARAN 2018-03-15 14:51:42 UTC

Created attachment 1408453 [details]
fuse-mount-log

Comment 6 SATHEESARAN 2018-03-15 15:13:36 UTC

Created attachment 1408470 [details]
sosreport

Comment 7 SATHEESARAN 2018-03-15 15:14:10 UTC

Created attachment 1408471 [details]
coredump-file

Comment 8 SATHEESARAN 2018-03-15 15:20:36 UTC

The other information which was missed in comment2 is that the brick is created over VDO volume

Comment 18 SATHEESARAN 2018-04-29 10:35:34 UTC

Tested with RHV 4.2 & RHGS 3.4.0 nightly (3.12.2-8)

This issue is not seen with the steps in comment0

Comment 19 Sunil Kumar Acharya 2018-06-12 09:18:36 UTC

*** Bug 1585044 has been marked as a duplicate of this bug. ***

Comment 21 errata-xmlrpc 2018-09-04 06:44:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607