+++ This bug was initially created as a clone of Bug #1785323 +++ Description of problem: glusterfsd cashes after a few seconds How reproducible: After the command "gluster volume start gv0 force" glusterfsd is started but crashes after a few seconds. Additional info: OS: Armbian 5.95 Odroidxu4 Ubuntu bionic default Kernel: Linux 4.14.141 Build date: 02.09.2019 Gluster: 7.0 Hardware: node1 - node4: Odroid HC2 + WD RED 10TB node5: Odroid HC2 + Samsung SSD 850 EVO 250GB root@hc2-1:~# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2019-12-19 13:32:41 CET; 1s ago Docs: man:glusterd(8) Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s Main PID: 12735 (glusterd) CGroup: /system.slice/glusterd.service ├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO ├─12772 /usr/sbin/glusterfsd -s hc2-1 --volfile-id gv0.hc2-1.data-brick1-gv0 -p /var/run/gluster/vols/gv0/hc2-1-data └─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl Dec 19 13:32:37 hc2-1 systemd[1]: Starting GlusterFS, a clustered file-system server... Dec 19 13:32:41 hc2-1 systemd[1]: Started GlusterFS, a clustered file-system server. root@hc2-1:~# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2019-12-19 13:32:41 CET; 15s ago Docs: man:glusterd(8) Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s Main PID: 12735 (glusterd) CGroup: /system.slice/glusterd.service ├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO └─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: dlfcn 1 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: libpthread 1 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: llistxattr 1 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: setfsid 1 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: spinlock 1 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: epoll.h 1 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: xattr.h 1 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: st_atim.tv_nsec 1 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: package-string: glusterfs 7.0 Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: --------- root@hc2-1:~# root@hc2-9:~# gluster volume status Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick hc2-1:/data/brick1/gv0 N/A N/A N N/A Brick hc2-2:/data/brick1/gv0 49152 0 Y 1322 Brick hc2-5:/data/brick1/gv0 49152 0 Y 1767 Brick hc2-3:/data/brick1/gv0 49152 0 Y 1474 Brick hc2-4:/data/brick1/gv0 49152 0 Y 1472 Brick hc2-5:/data/brick2/gv0 49153 0 Y 1787 Self-heal Daemon on localhost N/A N/A Y 1314 Self-heal Daemon on hc2-5 N/A N/A Y 1808 Self-heal Daemon on hc2-3 N/A N/A Y 1485 Self-heal Daemon on hc2-4 N/A N/A Y 1486 Self-heal Daemon on hc2-1 N/A N/A Y 13522 Self-heal Daemon on hc2-2 N/A N/A Y 1348 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks root@hc2-9:~# gluster volume heal gv0 info summary Brick hc2-1:/data/brick1/gv0 Status: Transport endpoint is not connected Total Number of entries: - Number of entries in heal pending: - Number of entries in split-brain: - Number of entries possibly healing: - Brick hc2-2:/data/brick1/gv0 Status: Connected Total Number of entries: 977 Number of entries in heal pending: 977 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick hc2-5:/data/brick1/gv0 Status: Connected Total Number of entries: 977 Number of entries in heal pending: 977 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick hc2-3:/data/brick1/gv0 Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick hc2-4:/data/brick1/gv0 Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick hc2-5:/data/brick2/gv0 Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 root@hc2-9:~# gluster volume info Volume Name: gv0 Type: Distributed-Replicate Volume ID: 9fcb6792-3899-4802-828f-84f37c026881 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: hc2-1:/data/brick1/gv0 Brick2: hc2-2:/data/brick1/gv0 Brick3: hc2-5:/data/brick1/gv0 (arbiter) Brick4: hc2-3:/data/brick1/gv0 Brick5: hc2-4:/data/brick1/gv0 Brick6: hc2-5:/data/brick2/gv0 (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet --- Additional comment from Xavi Hernandez on 2019-12-19 19:22:54 CET --- Currently I can't test it on an ARM machine. Is it possible for you to open the coredump with gdb with symbols loaded and run this command to get some information about the reason of the crash ? (gdb) t a a bt --- Additional comment from Robin van Oosten on 2019-12-19 20:23:47 CET --- I can open the coredump with gdb but where do I find the symbols file? gdb /usr/sbin/glusterfs /core . . . Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done. --- Additional comment from Robin van Oosten on 2019-12-19 21:35:15 CET --- --- Additional comment from Robin van Oosten on 2019-12-19 21:39:17 CET --- After "apt install glusterfs-dbg" I was able to load the symbols file. Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/.build-id/31/453c4877ad5c7f1a2553147feb1c0816f67654.debug...done. See attachment 1646676 [details]. --- Additional comment from Xavi Hernandez on 2019-12-19 22:08:39 CET --- You will also need to install debug symbols for libc because it doesn't seem able to correctly decode the backtraces inside that library. --- Additional comment from Robin van Oosten on 2019-12-19 23:35:19 CET --- Installed libc6-dbg now.
REVIEW: https://review.gluster.org/23912 (multiple: fix bad type cast) posted (#1) for review on master by Xavi Hernandez
REVIEW: https://review.gluster.org/23912 (multiple: fix bad type cast) merged (#3) on master by Amar Tumballi