+++ This bug was initially created as a clone of Bug #1542979 +++ Description of problem: Since the glibc fix for CVE-2018-1000001 geo-replication is broken on my system. volume geo-rep status reports Faulty for all three bricks. In the geo-rep logs it is seen that rsync fails with error code 3: [2018-02-04 05:25:39.803936] E [resource(/var/lib/gluster):210:errlog] Popen: command returned error cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-SEhnTW/1d72523484023f86f94b481d8714eaec.sock --compress georep:/proc/3897/cwd error=3 Rsync is called from /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py, and its strace looks like this: 24724 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_NOCLDSTOP, 0x7f1a492fd4b0}, NULL, 8) = 0 24724 rt_sigaction(SIGXFSZ, {SIG_IGN, [], SA_RESTORER|SA_NOCLDSTOP, 0x7f1a492fd4b0}, NULL, 8) = 0 24724 getcwd("(unreachable)/", 4095) = 15 24724 lstat(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 24724 lstat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 24724 openat(AT_FDCWD, "..", O_RDONLY|O_CLOEXEC) = 3 24724 fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 24724 fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 24724 fcntl(3, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) 24724 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 24724 mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1a49ea3000 24724 getdents(3, /* 11 entries */, 131072) = 376 24724 getdents(3, /* 0 entries */, 131072) = 0 24724 lseek(3, 0, SEEK_SET) = 0 24724 getdents(3, /* 11 entries */, 131072) = 376 24724 newfstatat(3, ".trashcan", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 24724 newfstatat(3, "acme", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 24724 newfstatat(3, "web", {st_mode=S_IFDIR|0777, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 24724 newfstatat(3, "XXX", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 24724 newfstatat(3, "glbackup", {st_mode=S_IFDIR, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 24724 getdents(3, /* 0 entries */, 131072) = 0 24724 munmap(0x7f1a49ea3000, 135168) = 0 24724 close(3) = 0 24724 write(2, "rsync: getcwd(): No such file or directory (2)", 46) = 46 24724 write(2, "\n", 1) = 1 24724 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7f1a492fd4b0}, NULL, 8) = 0 24724 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7f1a492fd4b0}, NULL, 8) = 0 24724 write(2, "rsync error: errors selecting input/output files, dirs (code 3) at util.c(1056) [Receiver=3.1.1]", 96) = 96 24724 write(2, "\n", 1) = 1 24724 exit_group(3) = ? 24724 +++ exited with 3 +++ The fix for the CVE is here: https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=52a713fdd0a30e1bd79818e2e3c4ab44ddca1a94 https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blobdiff;f=sysdeps/unix/sysv/linux/getcwd.c;h=866b9d26d51ab7b4eda28b28ac4abca85410950d;hp=f5451062898345f93e330c518358ee33da75530e;hb=52a713fdd0a30e1bd79818e2e3c4ab44ddca1a94;hpb=249a5895f120b13290a372a49bb4b499e749806f As you can see, '(unreachable)/'[0] is not / so getcwd fails. I've reported this for Ubuntu's glibc here: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1746995 I have tried to add a 'os.chdir("/")' to the rsync call in resource.py but that did not help, so I'm not even sure you *can* fix this. Temporary workaround is installing an older glibc. Version-Release number of selected component (if applicable): both 3.10 and 3.13, I run gluster on Ubuntu xenial (16.04.3) How reproducible: every time --- Additional comment from Red Hat Bugzilla Rules Engine on 2018-02-07 08:57:05 EST --- This bug is automatically being proposed for the release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Florian Weimer on 2018-02-08 09:42:14 EST --- There is now a report that my rsync patch (bug 1542180) does not fix this. I would have to find a way to apply --ignore-missing-args to the current directory, but I'm not convinced that this is right thing to do.
REVIEW: https://review.gluster.org/19544 (geo-rep: Remove lazy umount and use mount namespaces) posted (#4) for review on master by Kotresh HR
COMMIT: https://review.gluster.org/19544 committed in master by "Kotresh HR" <khiremat> with a commit message- geo-rep: Remove lazy umount and use mount namespaces Lazy umounting the master volume by worker causes issues with rsync's usage of getcwd. Henc removing the lazy umount and using private mount namespace for the same. On the slave, the lazy umount is retained as we can't use private namespace in non root geo-rep setup. Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a BUG: 1546129 Signed-off-by: Kotresh HR <khiremat>
Can this please be backported to 3.12 release?
Yes, I will do it
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report. glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html [2] https://www.gluster.org/pipermail/gluster-users/