Bug 1677555 - Glusterfs brick is crashed due to segfault caused by broken gfid symlink
Summary: Glusterfs brick is crashed due to segfault caused by broken gfid symlink
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: quota
Version: 4.1
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: hari gowtham
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-15 08:44 UTC by loman
Modified: 2023-09-14 05:23 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-10 10:38:15 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description loman 2019-02-15 08:44:10 UTC
Description of problem:
Glusterfs brick is crashed due to segfault caused by broken gfid symlink.

# gdb /usr/sbin/glusterfsd core.12867

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfsd -s glusterv3-04.namecheapcloud.net --volfile-id easywp_pro'.
Program terminated with signal 11, Segmentation fault.
#0  __strftime_internal (s=0x7f85e44863f0 "", maxsize=256, format=0x7f861bb39dfb "%F %T", tp=0x7f85e44863b0, 
    tzset_called=tzset_called@entry=0x7f85e4486320, loc=0xce4bbb31a32ca014) at strftime_l.c:472
472       struct __locale_data *const current = loc->__locales[LC_TIME];
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-4.1.7-1.el7.x86_64
(gdb) bt
#0  __strftime_internal (s=0x7f85e44863f0 "", maxsize=256, format=0x7f861bb39dfb "%F %T", tp=0x7f85e44863b0, 
    tzset_called=tzset_called@entry=0x7f85e4486320, loc=0xce4bbb31a32ca014) at strftime_l.c:472
#1  0x00007f861a172423 in __GI___strftime_l (s=<optimized out>, maxsize=<optimized out>, format=<optimized out>, tp=<optimized out>, 
    loc=<optimized out>) at strftime_l.c:459
#2  0x00007f861ba89414 in gf_glusterlog_log_repetitions.isra.6 () from /lib64/libglusterfs.so.0
#3  0x00007f861ba89953 in gf_log_flush_message () from /lib64/libglusterfs.so.0
#4  0x00007f861ba89a39 in gf_log_flush_list () from /lib64/libglusterfs.so.0
#5  0x00007f861ba89cbd in gf_log_set_log_buf_size () from /lib64/libglusterfs.so.0
#6  0x00007f861ba89d17 in gf_log_disable_suppression_before_exit () from /lib64/libglusterfs.so.0
#7  0x00007f861ba905c5 in gf_print_trace () from /lib64/libglusterfs.so.0
#8  <signal handler called>
#9  __GI_____strtoul_l_internal (nptr=nptr@entry=0x7f85e449e18c "0cee86d5-51c9-4094-a670-091ebc518c08", endptr=endptr@entry=0x0, 
    base=base@entry=16, group=group@entry=0, loc=0xce4bbb31a32ca014) at ../stdlib/strtol_l.c:241
#10 0x00007f861a0efe22 in __GI_strtoul (nptr=nptr@entry=0x7f85e449e18c "0cee86d5-51c9-4094-a670-091ebc518c08", endptr=endptr@entry=0x0, 
    base=base@entry=16) at ../stdlib/strtol.c:103
#11 0x00007f861b20f4bf in uuid_parse (in=0x7f85e449e18c "0cee86d5-51c9-4094-a670-091ebc518c08", 
    uu=0x7f85e44ab2f0 "\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273KΑg\345`Q|*j\f\356\206\325Q\311@\224\246p\t\036\274Q\214\b\024\240,\243\061\273", <incomplete sequence \316>...) at libuuid/src/parse.c:65
#12 0x00007f860e395f4f in posix_make_ancestryfromgfid () from /usr/lib64/glusterfs/4.1.7/xlator/storage/posix.so
#13 0x088c51bc1e0970a6 in ?? ()

If we check this gfid we'll see that it's broken symlink:

# file 0cee86d5-51c9-4094-a670-091ebc518c08 0cee86d5-51c9-4094-a670-091ebc518c08: broken symbolic link to `../../14/a0/14a02ca3-31bb-4bce-9167-e560517c2a6a/..'

# file 14a02ca3-31bb-4bce-9167-e560517c2a6a
14a02ca3-31bb-4bce-9167-e560517c2a6a: broken symbolic link to `../../0c/ee/0cee86d5-51c9-4094-a670-091ebc518c08/wp-admin'


After removing those broken symlink and execution 'gluster volume start force', brick is back online. We have another gluster cluster version 4.1.5, it has broken symlinks as well, but brick is never crushed.


Version-Release number of selected component (if applicable):

# uname -r
4.18.16-1.el7.elrepo.x86_64

# rpm -qa|grep gluster
glusterfs-4.1.7-1.el7.x86_64
glusterfs-fuse-4.1.7-1.el7.x86_64
glusterfs-geo-replication-4.1.7-1.el7.x86_64
centos-release-gluster41-1.0-3.el7.centos.noarch
glusterfs-libs-4.1.7-1.el7.x86_64
glusterfs-client-xlators-4.1.7-1.el7.x86_64
glusterfs-extra-xlators-4.1.7-1.el7.x86_64
glusterfs-api-4.1.7-1.el7.x86_64
glusterfs-server-4.1.7-1.el7.x86_64
glusterfs-devel-4.1.7-1.el7.x86_64
python2-gluster-4.1.7-1.el7.x86_64
glusterfs-cli-4.1.7-1.el7.x86_64


rpm -qa|grep gcc
libgcc-4.8.5-36.el7.x86_64

rpm -qa|grep glibc
glibc-common-2.17-260.el7.x86_64
glibc-debuginfo-common-2.17-260.el7.x86_64
glibc-2.17-260.el7.x86_64
glibc-debuginfo-2.17-260.el7.x86_64


How reproducible:


Steps to Reproduce:
1. Deploy gluster 4.1.7 in distributed replicated mode
2. Find symlink in gfid directory (.glusterfs) for one of brick and break it 
3. Brick with broken symlink should be crashed

Actual results:
Crushed with segfault


Expected results:
Brick should ignore broken symlink

Comment 1 Amar Tumballi 2019-02-21 13:37:25 UTC
tested by injecting broken symlink on release-6.0 branch, not happening.

From the log:

---
[2019-02-21 13:33:34.841279] E [posix-handle.c:325:posix_is_malformed_link] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x11a)[0x7f2fa4d718ea] (--> /usr/local/lib/glusterfs/7dev/xlator/storage/posix.so(+0xe315)[0x7f2f9264d315] (--> /usr/local/lib/glusterfs/7dev/xlator/storage/posix.so(+0xe3fd)[0x7f2f9264d3fd] (--> /usr/local/lib/glusterfs/7dev/xlator/storage/posix.so(+0xe702)[0x7f2f9264d702] (--> /usr/local/lib/glusterfs/7dev/xlator/storage/posix.so(+0x34054)[0x7f2f92673054] ))))) 0-demo-posix: malformed internal link .. for /home/testdir/bricks/abcd.0/.glusterfs/91/e4/91e48e9c-8474-45db-9f7c-90fbeceeca6a
[2019-02-21 13:33:34.841315] W [MSGID: 113077] [posix-inode-fd-ops.c:5354:posix_readdirp_fill] 0-demo-posix: Failed to create handle path, fd=0x7f2f680078a8, gfid=91e48e9c-8474-45db-9f7c-90fbeceeca6a
---

So, the broken gfid issue is handled in latest release. Will get to test more possibilities in backend changes later. As we don't support touch gluster backend directly, not taking it as priority.

Would be good to know how you reached to this situation.

Comment 2 loman 2019-02-24 11:25:55 UTC
Hi Amar Tumballi,

I didn't check it on release-6.0, only on stable 4.1.7.

Also i found the root cause, this issue is happening if quota is enabled and reproduced all time on 4.1.7. After quota disable brick is back online without any tricks with removing broken symlinks.

Please try to enable quota, also i'll try to reproduce it on release-6.0 from my side as well.

Thanks!

Comment 3 Amar Tumballi 2019-07-10 06:16:19 UTC
Tha

Comment 4 hari gowtham 2019-07-23 08:47:02 UTC
Hi,

Were you able to hit this on the latest release?

As 4 series is not supported anymore, we will have to close this bug.
If the issue persists on the latest release, please do file a bug there.
We will take it forward from there.

Regards,
Hari.

Comment 5 hari gowtham 2019-10-10 10:38:15 UTC
Closing this bug as we haven't see this being reported on the latest master.

If we come across this please, feel free to open it.

Comment 6 Red Hat Bugzilla 2023-09-14 05:23:41 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.