Bug 1785611 - glusterfsd cashes after a few seconds
Summary: glusterfsd cashes after a few seconds
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: armv7l
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1785323
TreeView+ depends on / blocked
 
Reported: 2019-12-20 12:51 UTC by Xavi Hernandez
Modified: 2020-01-10 00:58 UTC (History)
3 users (show)

Fixed In Version:
Clone Of: 1785323
Environment:
Last Closed: 2020-01-10 00:58:52 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 23912 0 None Merged multiple: fix bad type cast 2020-01-10 00:58:51 UTC

Description Xavi Hernandez 2019-12-20 12:51:35 UTC
+++ This bug was initially created as a clone of Bug #1785323 +++

Description of problem:
glusterfsd cashes after a few seconds

How reproducible:
After the command "gluster volume start gv0 force" glusterfsd is started but crashes after a few seconds.

Additional info:

OS:		Armbian 5.95 Odroidxu4 Ubuntu bionic default
Kernel:		Linux 4.14.141
Build date:	02.09.2019
Gluster:	7.0
Hardware:	node1 - node4:	Odroid HC2 + WD RED 10TB
		node5:		Odroid HC2 + Samsung SSD 850 EVO 250GB

root@hc2-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-12-19 13:32:41 CET; 1s ago
     Docs: man:glusterd(8)
  Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s
 Main PID: 12735 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           ├─12772 /usr/sbin/glusterfsd -s hc2-1 --volfile-id gv0.hc2-1.data-brick1-gv0 -p /var/run/gluster/vols/gv0/hc2-1-data
           └─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl

Dec 19 13:32:37 hc2-1 systemd[1]: Starting GlusterFS, a clustered file-system server...
Dec 19 13:32:41 hc2-1 systemd[1]: Started GlusterFS, a clustered file-system server.


root@hc2-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-12-19 13:32:41 CET; 15s ago
     Docs: man:glusterd(8)
  Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s
 Main PID: 12735 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           └─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl

Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: dlfcn 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: libpthread 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: llistxattr 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: setfsid 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: spinlock 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: epoll.h 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: xattr.h 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: st_atim.tv_nsec 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: package-string: glusterfs 7.0
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: ---------
root@hc2-1:~# 


root@hc2-9:~# gluster volume status
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick hc2-1:/data/brick1/gv0                N/A       N/A        N       N/A  
Brick hc2-2:/data/brick1/gv0                49152     0          Y       1322 
Brick hc2-5:/data/brick1/gv0                49152     0          Y       1767 
Brick hc2-3:/data/brick1/gv0                49152     0          Y       1474 
Brick hc2-4:/data/brick1/gv0                49152     0          Y       1472 
Brick hc2-5:/data/brick2/gv0                49153     0          Y       1787 
Self-heal Daemon on localhost               N/A       N/A        Y       1314 
Self-heal Daemon on hc2-5                   N/A       N/A        Y       1808 
Self-heal Daemon on hc2-3                   N/A       N/A        Y       1485 
Self-heal Daemon on hc2-4                   N/A       N/A        Y       1486 
Self-heal Daemon on hc2-1                   N/A       N/A        Y       13522
Self-heal Daemon on hc2-2                   N/A       N/A        Y       1348 
 
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks



root@hc2-9:~# gluster volume heal gv0 info summary
Brick hc2-1:/data/brick1/gv0
Status: Transport endpoint is not connected
Total Number of entries: -
Number of entries in heal pending: -
Number of entries in split-brain: -
Number of entries possibly healing: -

Brick hc2-2:/data/brick1/gv0
Status: Connected
Total Number of entries: 977
Number of entries in heal pending: 977
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-5:/data/brick1/gv0
Status: Connected
Total Number of entries: 977
Number of entries in heal pending: 977
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-3:/data/brick1/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-4:/data/brick1/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-5:/data/brick2/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0


root@hc2-9:~# gluster volume info
 
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 9fcb6792-3899-4802-828f-84f37c026881
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: hc2-1:/data/brick1/gv0
Brick2: hc2-2:/data/brick1/gv0
Brick3: hc2-5:/data/brick1/gv0 (arbiter)
Brick4: hc2-3:/data/brick1/gv0
Brick5: hc2-4:/data/brick1/gv0
Brick6: hc2-5:/data/brick2/gv0 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet

--- Additional comment from Xavi Hernandez on 2019-12-19 19:22:54 CET ---

Currently I can't test it on an ARM machine. Is it possible for you to open the coredump with gdb with symbols loaded and run this command to get some information about the reason of the crash ?

(gdb) t a a bt

--- Additional comment from Robin van Oosten on 2019-12-19 20:23:47 CET ---

I can open the coredump with gdb but where do I find the symbols file?

gdb /usr/sbin/glusterfs /core
.
.
.
Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done.

--- Additional comment from Robin van Oosten on 2019-12-19 21:35:15 CET ---



--- Additional comment from Robin van Oosten on 2019-12-19 21:39:17 CET ---

After "apt install glusterfs-dbg" I was able to load the symbols file.

Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/.build-id/31/453c4877ad5c7f1a2553147feb1c0816f67654.debug...done.

See attachment 1646676 [details].

--- Additional comment from Xavi Hernandez on 2019-12-19 22:08:39 CET ---

You will also need to install debug symbols for libc because it doesn't seem able to correctly decode the backtraces inside that library.

--- Additional comment from Robin van Oosten on 2019-12-19 23:35:19 CET ---

Installed libc6-dbg now.

Comment 1 Worker Ant 2019-12-20 13:23:07 UTC
REVIEW: https://review.gluster.org/23912 (multiple: fix bad type cast) posted (#1) for review on master by Xavi Hernandez

Comment 2 Worker Ant 2020-01-10 00:58:52 UTC
REVIEW: https://review.gluster.org/23912 (multiple: fix bad type cast) merged (#3) on master by Amar Tumballi


Note You need to log in before you can comment on or make changes to this bug.