Description of problem: I have some existing data on the slave that I'm going to use for geo-rep, this in the hope that I don't have to transfer 400G of data over geo-rep (the data is already available at the location of the slave, just not in gluster) Following this manual: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/sect-Preparing_to_Deploy_Geo-replication.html#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave the crash happens at step 9. This is (probably) expected: root@gluster-3:/home/mrten# gluster volume geo-replication gl0 georeplication::glbackup create push-pem gluster-4.glstr::glbackup is not empty. Please delete existing files in gluster-4.glstr::glbackup and retry, or use force to continue without deleting the existing files. geo-replication command failed So force it: root@gluster-3:/home/mrten# gluster volume geo-replication gl0 georeplication::glbackup create push-pem force Connection failed. Please check if gluster daemon is operational. geo-replication command failed At this stage, there is a crash log in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log: pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2016-08-03 08:00:49 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.14 /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f6db19d5a32] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f6db19facdd] /lib/x86_64-linux-gnu/libc.so.6(+0x36cb0)[0x7f6db0dd3cb0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f6db0dd3c37] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f6db0dd7028] /lib/x86_64-linux-gnu/libc.so.6(+0x732a4)[0x7f6db0e102a4] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f6db0ea7bbc] /lib/x86_64-linux-gnu/libc.so.6(+0x109a90)[0x7f6db0ea6a90] /lib/x86_64-linux-gnu/libc.so.6(__stpncpy_chk+0x0)[0x7f6db0ea5ef0] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(+0xc5d6b)[0x7f6dacf83d6b] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach_match+0x77)[0x7f6db19cf8a7] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach+0x18)[0x7f6db19cfa18] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_stage_gsync_create+0x1cea)[0x7f6dacf92faa] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_stage_validate+0xdb)[0x7f6dacf2184b] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(gd_stage_op_phase+0x16a)[0x7f6dacfb20ea] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x6de)[0x7f6dacfb3bbe] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x30)[0x7f6dacfb3ef0] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(__glusterd_handle_gsync_set+0x628)[0x7f6dacf871b8] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x30)[0x7f6dacf0c240] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(synctask_wrap+0x12)[0x7f6db1a232d2] /lib/x86_64-linux-gnu/libc.so.6(+0x49800)[0x7f6db0de6800] and glusterd is gone. These are the log messages just before the crash, perhaps related: [2016-08-03 08:00:49.674995] I [MSGID: 106316] [glusterd-geo-rep.c:3096:glusterd_op_stage_gsync_create] 0-management: georeplication::glbackup is not a valid slave volume. Error: gluster-4.glstr::glbackup is not empty. Please delete existing files in gluster-4.glstr::glbackup and retry, or use force to continue without deleting the existing files.. Force creating geo-rep session. [2016-08-03 08:00:49.675032] W [MSGID: 106029] [glusterd-geo-rep.c:2522:glusterd_get_statefile_name] 0-management: Config file (/var/lib/glusterd/geo-replication/gl0_gluster-4.glstr_glbackup/gsyncd.conf) missing. Looking for template config file (/var/lib/glusterd/geo-replication/gsyncd_template.conf) [No such file or directory] [2016-08-03 08:00:49.675048] I [MSGID: 106294] [glusterd-geo-rep.c:2531:glusterd_get_statefile_name] 0-management: Using default config template(/var/lib/glusterd/geo-replication/gsyncd_template.conf). Version-Release number of selected component (if applicable): 3.7.14 but saw it in 3.7.13 as well. How reproducible: Every time Additional info: This is on Ubuntu 14.04, using the gluster PPA, kernel 3.13.0-92-generic.
Also crashes on 3.8.1: [2016-08-03 13:23:03.870624] I [MSGID: 106294] [glusterd-geo-rep.c:2560:glusterd_get_statefile_name] 0-management: Using default config template(/var/lib/glusterd/geo-replication/gsyncd_template.conf). [2016-08-03 13:23:03.870493] E [MSGID: 106316] [glusterd-geo-rep.c:2744:glusterd_verify_slave] 0-management: Not a valid slave pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2016-08-03 13:23:04 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.8.1 /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f520b958b02] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f520b96204d] /lib/x86_64-linux-gnu/libc.so.6(+0x36cb0)[0x7f520ad52cb0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f520ad52c37] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f520ad56028] /lib/x86_64-linux-gnu/libc.so.6(+0x732a4)[0x7f520ad8f2a4] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f520ae26bbc] /lib/x86_64-linux-gnu/libc.so.6(+0x109a90)[0x7f520ae25a90] /lib/x86_64-linux-gnu/libc.so.6(__stpncpy_chk+0x0)[0x7f520ae24ef0] /usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x97d1b)[0x7f5206b60d1b] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach_match+0x77)[0x7f520b952847] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach+0x18)[0x7f520b9529b8] /usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xa70f2)[0x7f5206b700f2] /usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x3283b)[0x7f5206afb83b] /usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc6b3a)[0x7f5206b8fb3a] /usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc860e)[0x7f5206b9160e] /usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc8940)[0x7f5206b91940] /usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x9b158)[0x7f5206b64158] /usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x1db00)[0x7f5206ae6b00] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(synctask_wrap+0x12)[0x7f520b98d9a2] /lib/x86_64-linux-gnu/libc.so.6(+0x49800)[0x7f520ad65800]
Seems deliberate. This is from a strace (3.8.1), can't find the output anywhere in logs: [pid 15408] open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = -1 ENXIO (No such device or address) [pid 15408] writev(2, [{"*** ", 4}, {"buffer overflow detected", 24}, {" ***: ", 6}, {"/usr/sbin/glusterd", 18}, {" terminated\n", 12}], 5) = 64 [pid 15408] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6acead1000 [pid 15408] write(2, "======= Backtrace: =========\n", 29) = 29 [pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"7329f", 5}, {")", 1}, {"[0x", 3}, {"7f6acd80e29f", 12}, {"]\n", 2}], 8) = 58 [pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__fortify_fail", 14}, {"+0x", 3}, {"5c", 2}, {")", 1}, {"[0x", 3}, {"7f6acd8a5bbc", 12}, {"]\n", 2}], 9) = 69 [pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"109a90", 6}, {")", 1}, {"[0x", 3}, {"7f6acd8a4a90", 12}, {"]\n", 2}], 8) = 59 [pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__stpncpy_chk", 13}, {"+0x", 3}, {"0", 1}, {")", 1}, {"[0x", 3}, {"7f6acd8a3ef0", 12}, {"]\n", 2}], 9) = 67 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"97d1b", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95dfd1b", 12}, {"]\n", 2}], 8) = 92 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43}, {"(", 1}, {"dict_foreach_match", 18}, {"+0x", 3}, {"77", 2}, {")", 1}, {"[0x", 3}, {"7f6ace3d1847", 12}, {"]\n", 2}], 9) = 85 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43}, {"(", 1}, {"dict_foreach", 12}, {"+0x", 3}, {"18", 2}, {")", 1}, {"[0x", 3}, {"7f6ace3d19b8", 12}, {"]\n", 2}], 9) = 79 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"a70f2", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95ef0f2", 12}, {"]\n", 2}], 8) = 92 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"3283b", 5}, {")", 1}, {"[0x", 3}, {"7f6ac957a83b", 12}, {"]\n", 2}], 8) = 92 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"c6b3a", 5}, {")", 1}, {"[0x", 3}, {"7f6ac960eb3a", 12}, {"]\n", 2}], 8) = 92 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"c860e", 5}, {")", 1}, {"[0x", 3}, {"7f6ac961060e", 12}, {"]\n", 2}], 8) = 92 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"c8940", 5}, {")", 1}, {"[0x", 3}, {"7f6ac9610940", 12}, {"]\n", 2}], 8) = 92 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"9b158", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95e3158", 12}, {"]\n", 2}], 8) = 92 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65}, {"(", 1}, {"+0x", 3}, {"1db00", 5}, {")", 1}, {"[0x", 3}, {"7f6ac9565b00", 12}, {"]\n", 2}], 8) = 92 [pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43}, {"(", 1}, {"synctask_wrap", 13}, {"+0x", 3}, {"12", 2}, {")", 1}, {"[0x", 3}, {"7f6ace40c9a2", 12}, {"]\n", 2}], 9) = 80 [pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"49800", 5}, {")", 1}, {"[0x", 3}, {"7f6acd7e4800", 12}, {"]\n", 2}], 8) = 58 [pid 15408] write(2, "======= Memory map: ========\n", 29) = 29 I've omitted the memory map, lots of output there.
A stack trace from gdb for good measure: Program received signal SIGABRT, Aborted. [Switching to Thread 0x7f29402bb700 (LWP 29241)] 0x00007f29431d9c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x00007f29431d9c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f29431dd028 in __GI_abort () at abort.c:89 #2 0x00007f29432162a4 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f2943322113 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007f29432adbbc in __GI___fortify_fail (msg=<optimized out>, msg@entry=0x7f29433220aa "buffer overflow detected") at fortify_fail.c:38 #4 0x00007f29432aca90 in __GI___chk_fail () at chk_fail.c:28 #5 0x00007f29432abef0 in __strncpy_chk (s1=s1@entry=0x7f292c3febd0 "", s2=<optimized out>, n=n@entry=14, s1len=s1len@entry=9) at strncpy_chk.c:30 #6 0x00007f293efe7d1b in strncpy (__len=14, __src=<optimized out>, __dest=0x7f292c3febd0 "") at /usr/include/x86_64-linux-gnu/bits/string3.h:120 #7 get_slavehost_from_voluuid (dict=dict@entry=0x7f29415403c8, key=<optimized out>, value=<optimized out>, data=data@entry=0x7f292c3fead0) at glusterd-geo-rep.c:2917 #8 0x00007f2943dd9847 in dict_foreach_match (dict=0x7f29415403c8, match=0x7f2943dd6d60 <dict_match_everything>, match_data=0x0, action=0x7f293efe7bf0 <get_slavehost_from_voluuid>, action_data=0x7f292c3fead0) at dict.c:1236 #9 0x00007f2943dd99b8 in dict_foreach (dict=<optimized out>, fn=fn@entry=0x7f293efe7bf0 <get_slavehost_from_voluuid>, data=data@entry=0x7f292c3fead0) at dict.c:1194 #10 0x00007f293eff70f2 in glusterd_get_slavehost_from_voluuid (slave_host=<optimized out>, slave_vol=<optimized out>, slave1=0x7f292c3fead0, volinfo=0x7f29450e26a0) at glusterd-geo-rep.c:2963 #11 glusterd_op_stage_gsync_create (dict=dict@entry=0x7f2941541494, op_errstr=op_errstr@entry=0x7f292c406c00) at glusterd-geo-rep.c:3256 #12 0x00007f293ef8283b in glusterd_op_stage_validate (op=op@entry=GD_OP_GSYNC_CREATE, dict=dict@entry=0x7f2941541494, op_errstr=op_errstr@entry=0x7f292c406c00, rsp_dict=rsp_dict@entry=0x7f29415415ec) at glusterd-op-sm.c:5646 #13 0x00007f293f016b3a in gd_stage_op_phase (op=<optimized out>, op_ctx=op_ctx@entry=0x7f29415413e8, req_dict=0x7f2941541494, op_errstr=op_errstr@entry=0x7f292c406c00, txn_opinfo=txn_opinfo@entry=0x7f292c406c20) at glusterd-syncop.c:1272 #14 0x00007f293f01860e in gd_sync_task_begin (op_ctx=op_ctx@entry=0x7f29415413e8, req=req@entry=0x7f29450d48cc) at glusterd-syncop.c:1900 #15 0x00007f293f018940 in glusterd_op_begin_synctask (req=req@entry=0x7f29450d48cc, op=op@entry=GD_OP_GSYNC_CREATE, dict=0x7f29415413e8) at glusterd-syncop.c:1973 #16 0x00007f293efeb158 in __glusterd_handle_gsync_set (req=req@entry=0x7f29450d48cc) at glusterd-geo-rep.c:347 #17 0x00007f293ef6db00 in glusterd_big_locked_handler (req=0x7f29450d48cc, actor_fn=0x7f293efeab30 <__glusterd_handle_gsync_set>) at glusterd-handler.c:80 #18 0x00007f2943e149a2 in synctask_wrap (old_task=<optimized out>) at syncop.c:375 #19 0x00007f29431ec800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #20 0x0000000000000000 in ?? () Got this by installing glusterfs-dbg package.
I think I got it: slave_vol_config struct is a struct slave_vol_config { char old_slvhost[_POSIX_HOST_NAME_MAX+1]; char old_slvuser[_POSIX_LOGIN_NAME_MAX]; unsigned old_slvidx; char slave_voluuid[GF_UUID_BUF_SIZE]; }; and _POSIX_LOGIN_NAME_MAX is ... 9. my login name is 14 characters long, so, crash. I'd suggest using LOGIN_NAME_MAX instead of _POSIX_LOGIN_NAME_MAX, which is 256 long. Don't switch the _POSIX_HOST_NAME_MAX to HOST_NAME_MAX though, that's 255 vs 64.
(In reply to Mrten from comment #4) > I think I got it: slave_vol_config struct is a > > struct slave_vol_config { > char old_slvhost[_POSIX_HOST_NAME_MAX+1]; > char old_slvuser[_POSIX_LOGIN_NAME_MAX]; > unsigned old_slvidx; > char slave_voluuid[GF_UUID_BUF_SIZE]; > }; > > and _POSIX_LOGIN_NAME_MAX is ... 9. > > my login name is 14 characters long, so, crash. > > I'd suggest using LOGIN_NAME_MAX instead of _POSIX_LOGIN_NAME_MAX, which is > 256 long. > > Don't switch the _POSIX_HOST_NAME_MAX to HOST_NAME_MAX though, that's 255 vs > 64. Thanks for the detailed bug report and RCA. Unfortunately, having LOGIN_NAME_MAX will not honour POSIX. (Also, it will be inconsistent to have _POSIX_HOST_NAME_MAX and LOGIN_NAME as LOGIN_NAME_MAX) I have posted a patch, which checks whether length is within _POSIX_LOGIN_NAME_MAX, so glusterd should no longer crash. This is under review - http://review.gluster.org/#/c/15199
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.