1363613 – Crash of glusterd when using long username with geo-replication

Bug 1363613 - Crash of glusterd when using long username with geo-replication

Summary: Crash of glusterd when using long username with geo-replication

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Saravanakumar
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1368138 1403108 1403109
TreeView+	depends on / blocked

Reported:	2016-08-03 08:06 UTC by Mrten
Modified:	2017-11-07 10:37 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Clones:	1368138 1403109 (view as bug list)
Environment:
Last Closed:	2017-11-07 10:37:11 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Mrten 2016-08-03 08:06:41 UTC

Description of problem:

I have some existing data on the slave that I'm going to use for geo-rep, this in the hope that I don't have to transfer 400G of data over geo-rep (the data is already available at the location of the slave, just not in gluster)

Following this manual:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/sect-Preparing_to_Deploy_Geo-replication.html#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave

the crash happens at step 9.

This is (probably) expected:

root@gluster-3:/home/mrten# gluster volume geo-replication gl0 georeplication::glbackup create push-pem
gluster-4.glstr::glbackup is not empty. Please delete existing files in gluster-4.glstr::glbackup and retry, or use force to continue without deleting the existing files.
geo-replication command failed

So force it:

root@gluster-3:/home/mrten# gluster volume geo-replication gl0 georeplication::glbackup create push-pem force
Connection failed. Please check if gluster daemon is operational.
geo-replication command failed

At this stage, there is a crash log in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log:

pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-08-03 08:00:49
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.14
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f6db19d5a32]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f6db19facdd]
/lib/x86_64-linux-gnu/libc.so.6(+0x36cb0)[0x7f6db0dd3cb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f6db0dd3c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f6db0dd7028]
/lib/x86_64-linux-gnu/libc.so.6(+0x732a4)[0x7f6db0e102a4]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f6db0ea7bbc]
/lib/x86_64-linux-gnu/libc.so.6(+0x109a90)[0x7f6db0ea6a90]
/lib/x86_64-linux-gnu/libc.so.6(__stpncpy_chk+0x0)[0x7f6db0ea5ef0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(+0xc5d6b)[0x7f6dacf83d6b]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach_match+0x77)[0x7f6db19cf8a7]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach+0x18)[0x7f6db19cfa18]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_stage_gsync_create+0x1cea)[0x7f6dacf92faa]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_stage_validate+0xdb)[0x7f6dacf2184b]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(gd_stage_op_phase+0x16a)[0x7f6dacfb20ea]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x6de)[0x7f6dacfb3bbe]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x30)[0x7f6dacfb3ef0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(__glusterd_handle_gsync_set+0x628)[0x7f6dacf871b8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x30)[0x7f6dacf0c240]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(synctask_wrap+0x12)[0x7f6db1a232d2]
/lib/x86_64-linux-gnu/libc.so.6(+0x49800)[0x7f6db0de6800]

and glusterd is gone.

These are the log messages just before the crash, perhaps related:

[2016-08-03 08:00:49.674995] I [MSGID: 106316] [glusterd-geo-rep.c:3096:glusterd_op_stage_gsync_create] 0-management: georeplication::glbackup is not a valid slave volume. Error: gluster-4.glstr::glbackup is not empty. Please delete existing files in gluster-4.glstr::glbackup and retry, or use force to continue without deleting the existing files.. Force creating geo-rep session.
[2016-08-03 08:00:49.675032] W [MSGID: 106029] [glusterd-geo-rep.c:2522:glusterd_get_statefile_name] 0-management: Config file (/var/lib/glusterd/geo-replication/gl0_gluster-4.glstr_glbackup/gsyncd.conf) missing. Looking for template config file (/var/lib/glusterd/geo-replication/gsyncd_template.conf) [No such file or directory]
[2016-08-03 08:00:49.675048] I [MSGID: 106294] [glusterd-geo-rep.c:2531:glusterd_get_statefile_name] 0-management: Using default config template(/var/lib/glusterd/geo-replication/gsyncd_template.conf).



Version-Release number of selected component (if applicable):
3.7.14 but saw it in 3.7.13 as well.

How reproducible:
Every time

Additional info:

This is on Ubuntu 14.04, using the gluster PPA, kernel 3.13.0-92-generic.

Comment 1 Mrten 2016-08-03 13:25:16 UTC

Also crashes on 3.8.1:

[2016-08-03 13:23:03.870624] I [MSGID: 106294] [glusterd-geo-rep.c:2560:glusterd_get_statefile_name] 0-management: Using default config template(/var/lib/glusterd/geo-replication/gsyncd_template.conf).
[2016-08-03 13:23:03.870493] E [MSGID: 106316] [glusterd-geo-rep.c:2744:glusterd_verify_slave] 0-management: Not a valid slave
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-08-03 13:23:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.1
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f520b958b02]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f520b96204d]
/lib/x86_64-linux-gnu/libc.so.6(+0x36cb0)[0x7f520ad52cb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f520ad52c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f520ad56028]
/lib/x86_64-linux-gnu/libc.so.6(+0x732a4)[0x7f520ad8f2a4]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f520ae26bbc]
/lib/x86_64-linux-gnu/libc.so.6(+0x109a90)[0x7f520ae25a90]
/lib/x86_64-linux-gnu/libc.so.6(__stpncpy_chk+0x0)[0x7f520ae24ef0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x97d1b)[0x7f5206b60d1b]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach_match+0x77)[0x7f520b952847]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach+0x18)[0x7f520b9529b8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xa70f2)[0x7f5206b700f2]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x3283b)[0x7f5206afb83b]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc6b3a)[0x7f5206b8fb3a]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc860e)[0x7f5206b9160e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc8940)[0x7f5206b91940]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x9b158)[0x7f5206b64158]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x1db00)[0x7f5206ae6b00]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(synctask_wrap+0x12)[0x7f520b98d9a2]
/lib/x86_64-linux-gnu/libc.so.6(+0x49800)[0x7f520ad65800]

Comment 2 Mrten 2016-08-03 13:41:47 UTC

Seems deliberate.

This is from a strace (3.8.1), can't find the output anywhere in logs:

[pid 15408] open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = -1 ENXIO (No such device or address)
[pid 15408] writev(2, [{"*** ", 4}, {"buffer overflow detected", 24}, {" ***: ", 6}, {"/usr/sbin/glusterd", 18}, {" terminated\n", 12}], 5) = 64
[pid 15408] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6acead1000
[pid 15408] write(2, "======= Backtrace: =========\n", 29) = 29
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"7329f", 5}, {")", 1}, {"[0x", 3}, {"7f6acd80e29f", 12}, {"]\n", 2}], 8) = 58
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__fortify_fail", 14}, {"+0x", 3}, {"5c", 2}, {")", 1}, {"[0x", 3}, {"7f6acd8a5bbc", 12}, {"]\n", 2}], 9) = 69
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"109a90", 6}, {")", 1}, {"[0x", 3}, {"7f6acd8a4a90", 12}, {"]\n", 2}], 8) = 59
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__stpncpy_chk", 13}, {"+0x", 3}, {"0", 1}, {")", 1}, {"[0x", 3}, {"7f6acd8a3ef0", 12}, {"]\n", 2}], 9) = 67
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"97d1b", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95dfd1b", 12}, {"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43}, {"(", 1}, {"dict_foreach_match", 18}, {"+0x", 3}, {"77", 2}, {")", 1}, {"[0x", 3}, {"7f6ace3d1847", 12}, {"]\n", 2}], 9) = 85
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43}, {"(", 1}, {"dict_foreach", 12}, {"+0x", 3}, {"18", 2}, {")", 1}, {"[0x", 3}, {"7f6ace3d19b8", 12}, {"]\n", 2}], 9) = 79
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"a70f2", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95ef0f2", 12}, {"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"3283b", 5}, {")", 1}, {"[0x", 3}, {"7f6ac957a83b", 12}, {"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"c6b3a", 5}, {")", 1}, {"[0x", 3}, {"7f6ac960eb3a", 12}, {"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"c860e", 5}, {")", 1}, {"[0x", 3}, {"7f6ac961060e", 12}, {"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"c8940", 5}, {")", 1}, {"[0x", 3}, {"7f6ac9610940", 12}, {"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"9b158", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95e3158", 12}, {"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"1db00", 5}, {")", 1}, {"[0x", 3}, {"7f6ac9565b00", 12}, {"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43}, {"(", 1}, {"synctask_wrap", 13}, {"+0x", 3}, {"12", 2}, {")", 1}, {"[0x", 3}, {"7f6ace40c9a2", 12}, {"]\n", 2}], 9) = 80
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"49800", 5}, {")", 1}, {"[0x", 3}, {"7f6acd7e4800", 12}, {"]\n", 2}], 8) = 58
[pid 15408] write(2, "======= Memory map: ========\n", 29) = 29

I've omitted the memory map, lots of output there.

Comment 3 Mrten 2016-08-03 15:01:51 UTC

A stack trace from gdb for good measure:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f29402bb700 (LWP 29241)]
0x00007f29431d9c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f29431d9c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f29431dd028 in __GI_abort () at abort.c:89
#2  0x00007f29432162a4 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f2943322113 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f29432adbbc in __GI___fortify_fail (msg=<optimized out>, msg@entry=0x7f29433220aa "buffer overflow detected") at fortify_fail.c:38
#4  0x00007f29432aca90 in __GI___chk_fail () at chk_fail.c:28
#5  0x00007f29432abef0 in __strncpy_chk (s1=s1@entry=0x7f292c3febd0 "", s2=<optimized out>, n=n@entry=14, s1len=s1len@entry=9) at strncpy_chk.c:30
#6  0x00007f293efe7d1b in strncpy (__len=14, __src=<optimized out>, __dest=0x7f292c3febd0 "") at /usr/include/x86_64-linux-gnu/bits/string3.h:120
#7  get_slavehost_from_voluuid (dict=dict@entry=0x7f29415403c8, key=<optimized out>, value=<optimized out>, data=data@entry=0x7f292c3fead0) at glusterd-geo-rep.c:2917
#8  0x00007f2943dd9847 in dict_foreach_match (dict=0x7f29415403c8, match=0x7f2943dd6d60 <dict_match_everything>, match_data=0x0, action=0x7f293efe7bf0 <get_slavehost_from_voluuid>, action_data=0x7f292c3fead0) at dict.c:1236
#9  0x00007f2943dd99b8 in dict_foreach (dict=<optimized out>, fn=fn@entry=0x7f293efe7bf0 <get_slavehost_from_voluuid>, data=data@entry=0x7f292c3fead0) at dict.c:1194
#10 0x00007f293eff70f2 in glusterd_get_slavehost_from_voluuid (slave_host=<optimized out>, slave_vol=<optimized out>, slave1=0x7f292c3fead0, volinfo=0x7f29450e26a0) at glusterd-geo-rep.c:2963
#11 glusterd_op_stage_gsync_create (dict=dict@entry=0x7f2941541494, op_errstr=op_errstr@entry=0x7f292c406c00) at glusterd-geo-rep.c:3256
#12 0x00007f293ef8283b in glusterd_op_stage_validate (op=op@entry=GD_OP_GSYNC_CREATE, dict=dict@entry=0x7f2941541494, op_errstr=op_errstr@entry=0x7f292c406c00, rsp_dict=rsp_dict@entry=0x7f29415415ec) at glusterd-op-sm.c:5646
#13 0x00007f293f016b3a in gd_stage_op_phase (op=<optimized out>, op_ctx=op_ctx@entry=0x7f29415413e8, req_dict=0x7f2941541494, op_errstr=op_errstr@entry=0x7f292c406c00, txn_opinfo=txn_opinfo@entry=0x7f292c406c20) at glusterd-syncop.c:1272
#14 0x00007f293f01860e in gd_sync_task_begin (op_ctx=op_ctx@entry=0x7f29415413e8, req=req@entry=0x7f29450d48cc) at glusterd-syncop.c:1900
#15 0x00007f293f018940 in glusterd_op_begin_synctask (req=req@entry=0x7f29450d48cc, op=op@entry=GD_OP_GSYNC_CREATE, dict=0x7f29415413e8) at glusterd-syncop.c:1973
#16 0x00007f293efeb158 in __glusterd_handle_gsync_set (req=req@entry=0x7f29450d48cc) at glusterd-geo-rep.c:347
#17 0x00007f293ef6db00 in glusterd_big_locked_handler (req=0x7f29450d48cc, actor_fn=0x7f293efeab30 <__glusterd_handle_gsync_set>) at glusterd-handler.c:80
#18 0x00007f2943e149a2 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
#19 0x00007f29431ec800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#20 0x0000000000000000 in ?? ()

Got this by installing glusterfs-dbg package.

Comment 4 Mrten 2016-08-03 15:30:38 UTC

I think I got it: slave_vol_config struct is a 

struct slave_vol_config {
       char      old_slvhost[_POSIX_HOST_NAME_MAX+1];
       char      old_slvuser[_POSIX_LOGIN_NAME_MAX];
       unsigned  old_slvidx;
       char      slave_voluuid[GF_UUID_BUF_SIZE];
};

and _POSIX_LOGIN_NAME_MAX is ... 9.

my login name is 14 characters long, so, crash.

I'd suggest using LOGIN_NAME_MAX instead of _POSIX_LOGIN_NAME_MAX, which is 256 long.

Don't switch the _POSIX_HOST_NAME_MAX to HOST_NAME_MAX though, that's 255 vs 64.

Comment 5 Saravanakumar 2016-08-18 14:11:31 UTC

(In reply to Mrten from comment #4)
> I think I got it: slave_vol_config struct is a 
> 
> struct slave_vol_config {
>        char      old_slvhost[_POSIX_HOST_NAME_MAX+1];
>        char      old_slvuser[_POSIX_LOGIN_NAME_MAX];
>        unsigned  old_slvidx;
>        char      slave_voluuid[GF_UUID_BUF_SIZE];
> };
> 
> and _POSIX_LOGIN_NAME_MAX is ... 9.
> 
> my login name is 14 characters long, so, crash.
> 
> I'd suggest using LOGIN_NAME_MAX instead of _POSIX_LOGIN_NAME_MAX, which is
> 256 long.
> 
> Don't switch the _POSIX_HOST_NAME_MAX to HOST_NAME_MAX though, that's 255 vs
> 64.

Thanks for the detailed bug report and RCA.

Unfortunately, having LOGIN_NAME_MAX will not honour POSIX.
(Also, it will be inconsistent to have _POSIX_HOST_NAME_MAX and LOGIN_NAME as LOGIN_NAME_MAX)

I have posted a patch, which checks whether length is within _POSIX_LOGIN_NAME_MAX, so glusterd should no longer crash.

This is under review - http://review.gluster.org/#/c/15199

Comment 6 Niels de Vos 2016-09-12 05:39:56 UTC

All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 7 Niels de Vos 2017-11-07 10:37:11 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Note You need to log in before you can comment on or make changes to this bug.