Bug 1546129 - Geo-rep: glibc fix breaks geo-replication
Summary: Geo-rep: glibc fix breaks geo-replication
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On: 1542979
Blocks: 1544382
TreeView+ depends on / blocked
 
Reported: 2018-02-16 12:12 UTC by Kotresh HR
Modified: 2018-06-20 18:00 UTC (History)
12 users (show)

Fixed In Version: glusterfs-v4.1.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1542979
Environment:
Last Closed: 2018-06-20 18:00:23 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1746995 None None None 2018-02-16 12:12:34 UTC

Description Kotresh HR 2018-02-16 12:12:35 UTC
+++ This bug was initially created as a clone of Bug #1542979 +++

Description of problem:

Since the glibc fix for CVE-2018-1000001 geo-replication is broken on my system. volume geo-rep status reports Faulty for all three bricks.

In the geo-rep logs it is seen that rsync fails with error code 3:

[2018-02-04 05:25:39.803936] E [resource(/var/lib/gluster):210:errlog] Popen: command returned error    cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-SEhnTW/1d72523484023f86f94b481d8714eaec.sock --compress georep@gluster-4.glstr:/proc/3897/cwd        error=3


Rsync is called from /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py, and its strace looks like this:


24724 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_NOCLDSTOP, 0x7f1a492fd4b0}, NULL, 8) = 0
24724 rt_sigaction(SIGXFSZ, {SIG_IGN, [], SA_RESTORER|SA_NOCLDSTOP, 0x7f1a492fd4b0}, NULL, 8) = 0
24724 getcwd("(unreachable)/", 4095)    = 15
24724 lstat(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
24724 lstat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
24724 openat(AT_FDCWD, "..", O_RDONLY|O_CLOEXEC) = 3
24724 fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
24724 fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
24724 fcntl(3, F_GETFL)                 = 0x8000 (flags O_RDONLY|O_LARGEFILE)
24724 fcntl(3, F_SETFD, FD_CLOEXEC)     = 0
24724 mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1a49ea3000
24724 getdents(3, /* 11 entries */, 131072) = 376
24724 getdents(3, /* 0 entries */, 131072) = 0
24724 lseek(3, 0, SEEK_SET)             = 0
24724 getdents(3, /* 11 entries */, 131072) = 376
24724 newfstatat(3, ".trashcan", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
24724 newfstatat(3, "acme", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
24724 newfstatat(3, "web", {st_mode=S_IFDIR|0777, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
24724 newfstatat(3, "XXX", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
24724 newfstatat(3, "glbackup", {st_mode=S_IFDIR, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
24724 getdents(3, /* 0 entries */, 131072) = 0
24724 munmap(0x7f1a49ea3000, 135168)    = 0
24724 close(3)                          = 0
24724 write(2, "rsync: getcwd(): No such file or directory (2)", 46) = 46
24724 write(2, "\n", 1)                 = 1
24724 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7f1a492fd4b0}, NULL, 8) = 0
24724 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7f1a492fd4b0}, NULL, 8) = 0
24724 write(2, "rsync error: errors selecting input/output files, dirs (code 3) at util.c(1056) [Receiver=3.1.1]", 96) = 96
24724 write(2, "\n", 1)                 = 1
24724 exit_group(3)                     = ?
24724 +++ exited with 3 +++


The fix for the CVE is here:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=52a713fdd0a30e1bd79818e2e3c4ab44ddca1a94

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blobdiff;f=sysdeps/unix/sysv/linux/getcwd.c;h=866b9d26d51ab7b4eda28b28ac4abca85410950d;hp=f5451062898345f93e330c518358ee33da75530e;hb=52a713fdd0a30e1bd79818e2e3c4ab44ddca1a94;hpb=249a5895f120b13290a372a49bb4b499e749806f

As you can see, '(unreachable)/'[0] is not / so getcwd fails.


I've reported this for Ubuntu's glibc here:

https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1746995

I have tried to add a 'os.chdir("/")' to the rsync call in resource.py but that did not help, so I'm not even sure you *can* fix this.


Temporary workaround is installing an older glibc.


Version-Release number of selected component (if applicable):

both 3.10 and 3.13, I run gluster on Ubuntu xenial (16.04.3)

How reproducible:

every time

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-02-07 08:57:05 EST ---

This bug is automatically being proposed for the release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Florian Weimer on 2018-02-08 09:42:14 EST ---

There is now a report that my rsync patch (bug 1542180) does not fix this.  I would have to find a way to apply --ignore-missing-args to the current directory, but I'm not convinced that this is right thing to do.

Comment 1 Worker Ant 2018-02-16 12:15:09 UTC
REVIEW: https://review.gluster.org/19544 (geo-rep: Remove lazy umount and use mount namespaces) posted (#4) for review on master by Kotresh HR

Comment 2 Worker Ant 2018-02-22 09:33:49 UTC
COMMIT: https://review.gluster.org/19544 committed in master by "Kotresh HR" <khiremat@redhat.com> with a commit message- geo-rep: Remove lazy umount and use mount namespaces

Lazy umounting the master volume by worker causes
issues with rsync's usage of getcwd. Henc removing
the lazy umount and using private mount namespace
for the same. On the slave, the lazy umount is
retained as we can't use private namespace in non
root geo-rep setup.

Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a
BUG: 1546129
Signed-off-by: Kotresh HR <khiremat@redhat.com>

Comment 3 Dimitri Ars 2018-05-24 09:01:15 UTC
Can this please be backported to 3.12 release?

Comment 4 Kotresh HR 2018-05-25 03:18:55 UTC
Yes, I will do it

Comment 5 Shyamsundar 2018-06-20 18:00:23 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.