+++ This bug was initially created as a clone of Bug #1292463 +++ +++ This bug was initially created as a clone of Bug #1291988 +++ Description of problem: ======================= In a geo-rep setup if a ACTIVE master is rebooted observed ChangelogException as: [2015-12-15 12:30:39.725385] I [resource(/rhgs/brick1/master):1433:service_loop] GLUSTER: Register time: 1450162839 [2015-12-15 12:30:39.741044] I [master(/rhgs/brick1/master):549:crawlwrap] _GMaster: primary master with volume id e560607b-131b-4e35-b6b7-4e9b1e369224 ... [2015-12-15 12:30:39.744363] I [master(/rhgs/brick1/master):558:crawlwrap] _GMaster: crawl interval: 1 seconds [2015-12-15 12:30:39.748029] I [master(/rhgs/brick1/master):1173:crawl] _GMaster: starting history crawl... turns: 1, stime: (1450086953, 0), etime: 1450162839 [2015-12-15 12:30:39.748600] E [repce(agent):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54, in history num_parallel) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 100, in cl_history_changelog cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 2] No such file or directory [2015-12-15 12:30:39.749222] E [repce(/rhgs/brick1/master):207:__call__] RepceClient: call 13048:139759777986368:1450162839.75 (history) failed on peer with ChangelogException [2015-12-15 12:30:39.749366] E [resource(/rhgs/brick1/master):1448:service_loop] GLUSTER: Changelog History Crawl failed, [Errno 2] No such file or directory [2015-12-15 12:30:39.749612] I [syncdutils(/rhgs/brick1/master):220:finalize] <top>: exiting. [2015-12-15 12:30:39.753611] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2015-12-15 12:30:39.753813] I [syncdutils(agent):220:finalize] <top>: exiting. [2015-12-15 12:30:40.705105] I [monitor(monitor):229:monitor] Monitor: ------------------------------------------------------------ [2015-12-15 12:30:40.705337] I [monitor(monitor):230:monitor] Monitor: starting gsyncd worker [2015-12-15 12:30:40.717370] I [monitor(monitor):290:monitor] Monitor: worker(/rhgs/brick1/master) died in startup phase [2015-12-15 12:30:40.780814] I [gsyncd(/rhgs/brick2/master):652:main_i] <top>: syncing: gluster://localhost:Master_vol -> ssh://root.35.196:gluster://localhost:Slave_vol [2015-12-15 12:30:40.781518] I [changelogagent(agent):75:__init__] ChangelogAgent: Agent listining... [2015-12-15 12:30:43.313604] I [master(/rhgs/brick2/master):83:gmaster_builder] <top>: setting up xsync change detection mode [2015-12-15 12:30:43.314054] I [master(/rhgs/brick2/master):404:__init__] _GMaster: using 'rsync' as the sync engine [2015-12-15 12:30:43.314963] I [master(/rhgs/brick2/master):83:gmaster_builder] <top>: setting up changelog change detection mode [2015-12-15 12:30:43.315217] I [master(/rhgs/brick2/master):404:__init__] _GMaster: using 'rsync' as the sync engine [2015-12-15 12:30:43.316255] I [master(/rhgs/brick2/master):83:gmaster_builder] <top>: setting up changeloghistory change detection mode [2015-12-15 12:30:43.316457] I [master(/rhgs/brick2/master):404:__init__] _GMaster: using 'rsync' as the sync engine [2015-12-15 12:30:45.333230] I [master(/rhgs/brick2/master):1259:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/Master_vol/ssh%3A%2F%2Froot%4010.70.35.196%3Agluster%3A%2F%2F127.0.0.1%3ASla ve_vol/78ce16e9d2384148a6abf384c52b8ab1/xsync [2015-12-15 12:30:45.333684] I [resource(/rhgs/brick2/master):1433:service_loop] GLUSTER: Register time: 1450162845 [2015-12-15 12:30:45.346733] I [master(/rhgs/brick2/master):549:crawlwrap] _GMaster: primary master with volume id e560607b-131b-4e35-b6b7-4e9b1e369224 ... [2015-12-15 12:30:45.349627] I [master(/rhgs/brick2/master):558:crawlwrap] _GMaster: crawl interval: 1 seconds [2015-12-15 12:30:45.353549] I [master(/rhgs/brick2/master):1173:crawl] _GMaster: starting history crawl... turns: 1, stime: (1450086953, 0), etime: 1450162845 [2015-12-15 12:30:46.354907] E [repce(agent):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 57, in history_scan return Changes.cl_history_scan() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 89, in cl_history_scan cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 22] Invalid argument [2015-12-15 12:30:46.355663] E [repce(/rhgs/brick2/master):207:__call__] RepceClient: call 13117:140461545350976:1450162845.35 (history_scan) failed on peer with ChangelogException [2015-12-15 12:30:46.355832] E [resource(/rhgs/brick2/master):1448:service_loop] GLUSTER: Changelog History Crawl failed, [Errno 22] Invalid argument [2015-12-15 12:30:46.356125] I [syncdutils(/rhgs/brick2/master):220:finalize] <top>: exiting. [2015-12-15 12:30:46.359954] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2015-12-15 12:30:46.360156] I [syncdutils(agent):220:finalize] <top>: exiting. At the same time the errors logged in the changes.log ===================================================== [2015-12-15 07:02:21.780179] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs.so.0(mem_get 0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument] [2015-12-15 07:02:21.780396] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs.so.0(mem_get 0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument] [2015-12-15 07:02:21.780516] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs.so.0(mem_get 0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument] [2015-12-15 07:02:21.781449] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f249429dc12] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs .so.0(mem_get0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument] [2015-12-15 07:02:21.782716] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f249429dc12] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs .so.0(mem_get0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument] ve_vol/78ce16e9d2384148a6abf384c52b8ab1/.history/.processing/CHANGELOG.1450086954 to processed directory [2015-12-15 07:29:57.229711] I [gf-changelog.c:542:gf_changelog_register_generic] 0-gfchangelog: Registering brick: /rhgs/brick2/master [notify filter: 1] [2015-12-15 07:29:57.230785] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f49899686c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df] -->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool: invalid argument [Invalid argument] [2015-12-15 07:29:57.230955] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f49899686c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df] -->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool: invalid argument [Invalid argument] [2015-12-15 07:29:57.232027] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f4989925c12] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df] -->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool: invalid argument [Invalid argument] Version-Release number of selected component (if applicable): ============================================================= How reproducible: ================= Observed Traceback only on one setup and only once while able to reproduce invalid argument error with every reboot Steps to Reproduce: =================== 1. Create 2 node master and slave clusters 2. Create and start 2x2 master volume 3. Create and start 2x2 slave volume 4. Setup geo-rep between master and slave volume 5. Reboot master ACTIVE node Actual results: =============== Once the node is up the Worker goes faulty for sometime with traceback and errors [Invalid argument] Expected results: ================= Once the node is down, the bricks on peer node should be ACTIVE. Once the node comes back, the bricks should be PASSIVE without going to faulty or reporting error. --- Additional comment from Vijay Bellur on 2015-12-17 09:26:16 EST --- REVIEW: http://review.gluster.org/12997 (libgfchangelog: Allocate logbuf_pool in master xlator) posted (#1) for review on master by Kotresh HR (khiremat) --- Additional comment from Vijay Bellur on 2015-12-22 04:33:17 EST --- COMMIT: http://review.gluster.org/12997 committed in master by Venky Shankar (vshankar) ------ commit d73f5c9eae0f5c8796598fa1930844c15c2c37d4 Author: Kotresh HR <khiremat> Date: Tue Dec 15 19:55:05 2015 +0530 libgfchangelog: Allocate logbuf_pool in master xlator The master xlator needs to allocate 'logbuf_pool' else 'gf_msg' fails with EINVAL. Change-Id: I6b2d3450250de7e77126d12b75b0dbc4db414bfb BUG: 1292463 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: http://review.gluster.org/12997 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Venky Shankar <vshankar>
REVIEW: http://review.gluster.org/13064 (libgfchangelog: Allocate logbuf_pool in master xlator) posted (#1) for review on release-3.7 by Kotresh HR (khiremat)
COMMIT: http://review.gluster.org/13064 committed in release-3.7 by Venky Shankar (vshankar) ------ commit a37ea8462cfd87f30bc38c509f0a079aae14e797 Author: Kotresh HR <khiremat> Date: Tue Dec 15 19:55:05 2015 +0530 libgfchangelog: Allocate logbuf_pool in master xlator The master xlator needs to allocate 'logbuf_pool' else 'gf_msg' fails with EINVAL. BUG: 1293595 Change-Id: I51a1895b5ff9c8eaf6cc15a9cacc415fa8cd7bdd Reviewed-on: http://review.gluster.org/12997 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Venky Shankar <vshankar> Signed-off-by: Kotresh HR <khiremat> Reviewed-on: http://review.gluster.org/13064 Reviewed-by: Aravinda VK <avishwan>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report. glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user