Bug 1785611

Summary: glusterfsd cashes after a few seconds
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: coreAssignee: Xavi Hernandez <jahernan>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, jahernan, robin.van.oosten
Target Milestone: ---   
Target Release: ---   
Hardware: armv7l   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1785323 Environment:
Last Closed: 2020-01-10 00:58:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1785323    

Description Xavi Hernandez 2019-12-20 12:51:35 UTC
+++ This bug was initially created as a clone of Bug #1785323 +++

Description of problem:
glusterfsd cashes after a few seconds

How reproducible:
After the command "gluster volume start gv0 force" glusterfsd is started but crashes after a few seconds.

Additional info:

OS:		Armbian 5.95 Odroidxu4 Ubuntu bionic default
Kernel:		Linux 4.14.141
Build date:	02.09.2019
Gluster:	7.0
Hardware:	node1 - node4:	Odroid HC2 + WD RED 10TB
		node5:		Odroid HC2 + Samsung SSD 850 EVO 250GB

root@hc2-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-12-19 13:32:41 CET; 1s ago
     Docs: man:glusterd(8)
  Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s
 Main PID: 12735 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           ├─12772 /usr/sbin/glusterfsd -s hc2-1 --volfile-id gv0.hc2-1.data-brick1-gv0 -p /var/run/gluster/vols/gv0/hc2-1-data
           └─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl

Dec 19 13:32:37 hc2-1 systemd[1]: Starting GlusterFS, a clustered file-system server...
Dec 19 13:32:41 hc2-1 systemd[1]: Started GlusterFS, a clustered file-system server.


root@hc2-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-12-19 13:32:41 CET; 15s ago
     Docs: man:glusterd(8)
  Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s
 Main PID: 12735 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           └─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl

Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: dlfcn 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: libpthread 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: llistxattr 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: setfsid 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: spinlock 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: epoll.h 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: xattr.h 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: st_atim.tv_nsec 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: package-string: glusterfs 7.0
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: ---------
root@hc2-1:~# 


root@hc2-9:~# gluster volume status
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick hc2-1:/data/brick1/gv0                N/A       N/A        N       N/A  
Brick hc2-2:/data/brick1/gv0                49152     0          Y       1322 
Brick hc2-5:/data/brick1/gv0                49152     0          Y       1767 
Brick hc2-3:/data/brick1/gv0                49152     0          Y       1474 
Brick hc2-4:/data/brick1/gv0                49152     0          Y       1472 
Brick hc2-5:/data/brick2/gv0                49153     0          Y       1787 
Self-heal Daemon on localhost               N/A       N/A        Y       1314 
Self-heal Daemon on hc2-5                   N/A       N/A        Y       1808 
Self-heal Daemon on hc2-3                   N/A       N/A        Y       1485 
Self-heal Daemon on hc2-4                   N/A       N/A        Y       1486 
Self-heal Daemon on hc2-1                   N/A       N/A        Y       13522
Self-heal Daemon on hc2-2                   N/A       N/A        Y       1348 
 
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks



root@hc2-9:~# gluster volume heal gv0 info summary
Brick hc2-1:/data/brick1/gv0
Status: Transport endpoint is not connected
Total Number of entries: -
Number of entries in heal pending: -
Number of entries in split-brain: -
Number of entries possibly healing: -

Brick hc2-2:/data/brick1/gv0
Status: Connected
Total Number of entries: 977
Number of entries in heal pending: 977
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-5:/data/brick1/gv0
Status: Connected
Total Number of entries: 977
Number of entries in heal pending: 977
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-3:/data/brick1/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-4:/data/brick1/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-5:/data/brick2/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0


root@hc2-9:~# gluster volume info
 
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 9fcb6792-3899-4802-828f-84f37c026881
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: hc2-1:/data/brick1/gv0
Brick2: hc2-2:/data/brick1/gv0
Brick3: hc2-5:/data/brick1/gv0 (arbiter)
Brick4: hc2-3:/data/brick1/gv0
Brick5: hc2-4:/data/brick1/gv0
Brick6: hc2-5:/data/brick2/gv0 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet

--- Additional comment from Xavi Hernandez on 2019-12-19 19:22:54 CET ---

Currently I can't test it on an ARM machine. Is it possible for you to open the coredump with gdb with symbols loaded and run this command to get some information about the reason of the crash ?

(gdb) t a a bt

--- Additional comment from Robin van Oosten on 2019-12-19 20:23:47 CET ---

I can open the coredump with gdb but where do I find the symbols file?

gdb /usr/sbin/glusterfs /core
.
.
.
Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done.

--- Additional comment from Robin van Oosten on 2019-12-19 21:35:15 CET ---



--- Additional comment from Robin van Oosten on 2019-12-19 21:39:17 CET ---

After "apt install glusterfs-dbg" I was able to load the symbols file.

Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/.build-id/31/453c4877ad5c7f1a2553147feb1c0816f67654.debug...done.

See attachment 1646676 [details].

--- Additional comment from Xavi Hernandez on 2019-12-19 22:08:39 CET ---

You will also need to install debug symbols for libc because it doesn't seem able to correctly decode the backtraces inside that library.

--- Additional comment from Robin van Oosten on 2019-12-19 23:35:19 CET ---

Installed libc6-dbg now.

Comment 1 Worker Ant 2019-12-20 13:23:07 UTC
REVIEW: https://review.gluster.org/23912 (multiple: fix bad type cast) posted (#1) for review on master by Xavi Hernandez

Comment 2 Worker Ant 2020-01-10 00:58:52 UTC
REVIEW: https://review.gluster.org/23912 (multiple: fix bad type cast) merged (#3) on master by Amar Tumballi