Description of problem: Glusterfs brick is crashed due to segfault caused by broken gfid symlink. # gdb /usr/sbin/glusterfsd core.12867 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s glusterv3-04.namecheapcloud.net --volfile-id easywp_pro'. Program terminated with signal 11, Segmentation fault. #0 __strftime_internal (s=0x7f85e44863f0 "", maxsize=256, format=0x7f861bb39dfb "%F %T", tp=0x7f85e44863b0, tzset_called=tzset_called@entry=0x7f85e4486320, loc=0xce4bbb31a32ca014) at strftime_l.c:472 472 struct __locale_data *const current = loc->__locales[LC_TIME]; Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-4.1.7-1.el7.x86_64 (gdb) bt #0 __strftime_internal (s=0x7f85e44863f0 "", maxsize=256, format=0x7f861bb39dfb "%F %T", tp=0x7f85e44863b0, tzset_called=tzset_called@entry=0x7f85e4486320, loc=0xce4bbb31a32ca014) at strftime_l.c:472 #1 0x00007f861a172423 in __GI___strftime_l (s=<optimized out>, maxsize=<optimized out>, format=<optimized out>, tp=<optimized out>, loc=<optimized out>) at strftime_l.c:459 #2 0x00007f861ba89414 in gf_glusterlog_log_repetitions.isra.6 () from /lib64/libglusterfs.so.0 #3 0x00007f861ba89953 in gf_log_flush_message () from /lib64/libglusterfs.so.0 #4 0x00007f861ba89a39 in gf_log_flush_list () from /lib64/libglusterfs.so.0 #5 0x00007f861ba89cbd in gf_log_set_log_buf_size () from /lib64/libglusterfs.so.0 #6 0x00007f861ba89d17 in gf_log_disable_suppression_before_exit () from /lib64/libglusterfs.so.0 #7 0x00007f861ba905c5 in gf_print_trace () from /lib64/libglusterfs.so.0 #8 <signal handler called> #9 __GI_____strtoul_l_internal (nptr=nptr@entry=0x7f85e449e18c "0cee86d5-51c9-4094-a670-091ebc518c08", endptr=endptr@entry=0x0, base=base@entry=16, group=group@entry=0, loc=0xce4bbb31a32ca014) at ../stdlib/strtol_l.c:241 #10 0x00007f861a0efe22 in __GI_strtoul (nptr=nptr@entry=0x7f85e449e18c "0cee86d5-51c9-4094-a670-091ebc518c08", endptr=endptr@entry=0x0, base=base@entry=16) at ../stdlib/strtol.c:103 #11 0x00007f861b20f4bf in uuid_parse (in=0x7f85e449e18c "0cee86d5-51c9-4094-a670-091ebc518c08", uu=0x7f85e44ab2f0 "\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273", <incomplete sequence \316>...) at libuuid/src/parse.c:65 #12 0x00007f860e395f4f in posix_make_ancestryfromgfid () from /usr/lib64/glusterfs/4.1.7/xlator/storage/posix.so #13 0x088c51bc1e0970a6 in ?? () If we check this gfid we'll see that it's broken symlink: # file 0cee86d5-51c9-4094-a670-091ebc518c08 0cee86d5-51c9-4094-a670-091ebc518c08: broken symbolic link to `../../14/a0/14a02ca3-31bb-4bce-9167-e560517c2a6a/..' # file 14a02ca3-31bb-4bce-9167-e560517c2a6a 14a02ca3-31bb-4bce-9167-e560517c2a6a: broken symbolic link to `../../0c/ee/0cee86d5-51c9-4094-a670-091ebc518c08/wp-admin' After removing those broken symlink and execution 'gluster volume start force', brick is back online. We have another gluster cluster version 4.1.5, it has broken symlinks as well, but brick is never crushed. Version-Release number of selected component (if applicable): # uname -r 4.18.16-1.el7.elrepo.x86_64 # rpm -qa|grep gluster glusterfs-4.1.7-1.el7.x86_64 glusterfs-fuse-4.1.7-1.el7.x86_64 glusterfs-geo-replication-4.1.7-1.el7.x86_64 centos-release-gluster41-1.0-3.el7.centos.noarch glusterfs-libs-4.1.7-1.el7.x86_64 glusterfs-client-xlators-4.1.7-1.el7.x86_64 glusterfs-extra-xlators-4.1.7-1.el7.x86_64 glusterfs-api-4.1.7-1.el7.x86_64 glusterfs-server-4.1.7-1.el7.x86_64 glusterfs-devel-4.1.7-1.el7.x86_64 python2-gluster-4.1.7-1.el7.x86_64 glusterfs-cli-4.1.7-1.el7.x86_64 rpm -qa|grep gcc libgcc-4.8.5-36.el7.x86_64 rpm -qa|grep glibc glibc-common-2.17-260.el7.x86_64 glibc-debuginfo-common-2.17-260.el7.x86_64 glibc-2.17-260.el7.x86_64 glibc-debuginfo-2.17-260.el7.x86_64 How reproducible: Steps to Reproduce: 1. Deploy gluster 4.1.7 in distributed replicated mode 2. Find symlink in gfid directory (.glusterfs) for one of brick and break it 3. Brick with broken symlink should be crashed Actual results: Crushed with segfault Expected results: Brick should ignore broken symlink
tested by injecting broken symlink on release-6.0 branch, not happening. From the log: --- [2019-02-21 13:33:34.841279] E [posix-handle.c:325:posix_is_malformed_link] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x11a)[0x7f2fa4d718ea] (--> /usr/local/lib/glusterfs/7dev/xlator/storage/posix.so(+0xe315)[0x7f2f9264d315] (--> /usr/local/lib/glusterfs/7dev/xlator/storage/posix.so(+0xe3fd)[0x7f2f9264d3fd] (--> /usr/local/lib/glusterfs/7dev/xlator/storage/posix.so(+0xe702)[0x7f2f9264d702] (--> /usr/local/lib/glusterfs/7dev/xlator/storage/posix.so(+0x34054)[0x7f2f92673054] ))))) 0-demo-posix: malformed internal link .. for /home/testdir/bricks/abcd.0/.glusterfs/91/e4/91e48e9c-8474-45db-9f7c-90fbeceeca6a [2019-02-21 13:33:34.841315] W [MSGID: 113077] [posix-inode-fd-ops.c:5354:posix_readdirp_fill] 0-demo-posix: Failed to create handle path, fd=0x7f2f680078a8, gfid=91e48e9c-8474-45db-9f7c-90fbeceeca6a --- So, the broken gfid issue is handled in latest release. Will get to test more possibilities in backend changes later. As we don't support touch gluster backend directly, not taking it as priority. Would be good to know how you reached to this situation.
Hi Amar Tumballi, I didn't check it on release-6.0, only on stable 4.1.7. Also i found the root cause, this issue is happening if quota is enabled and reproduced all time on 4.1.7. After quota disable brick is back online without any tricks with removing broken symlinks. Please try to enable quota, also i'll try to reproduce it on release-6.0 from my side as well. Thanks!
Tha
Hi, Were you able to hit this on the latest release? As 4 series is not supported anymore, we will have to close this bug. If the issue persists on the latest release, please do file a bug there. We will take it forward from there. Regards, Hari.
Closing this bug as we haven't see this being reported on the latest master. If we come across this please, feel free to open it.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days