Bug 1026780
Summary: | Dist-geo-rep : geo-rep worker crashed while init with [Errno 34] Numerical result out of range. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vijaykumar Koppad <vkoppad> | |
Component: | geo-replication | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> | |
Status: | CLOSED EOL | QA Contact: | storage-qa-internal <storage-qa-internal> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 2.1 | CC: | avishwan, chrisw, csaba, david.macdonald, nsathyan, rhinduja, rwheeler, vagarwal | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1285200 (view as bug list) | Environment: | ||
Last Closed: | Type: | Bug | ||
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1285200, 1294588, 1313311 |
Description
Vijaykumar Koppad
2013-11-05 12:35:14 UTC
diff --git a/geo-replication/syncdaemon/libcxattr.py b/geo-replication/syncdaemon/libcxattr.py index b5b6956..75c89ef 100644 --- a/geo-replication/syncdaemon/libcxattr.py +++ b/geo-replication/syncdaemon/libcxattr.py @@ -54,9 +54,13 @@ class Xattr(object): @classmethod def llistxattr(cls, path, siz=0): - ret = cls._query_xattr(path, siz, 'llistxattr') - if isinstance(ret, str): - ret = ret.split('\0') + + try: + ret = cls._query_xattr(path, siz, 'llistxattr') + if isinstance(ret, str): + ret = ret.split('\0') + except: + ret = -1 return ret @classmethod Amar, I think try ... catch won't help here as the call is via ctypes. A probable fix would be to handle ERANGE for _all_ getxattr calls. What do you think? This has happened again in the build glusterfs-3.4.0.58rhs-1. backtrace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2014-02-04 11:05:25.961899] I [master(/bricks/master_brick9):438:crawlwrap] _GMaster: crawl interval: 60 seconds [2014-02-04 11:05:25.967730] I [master(/bricks/master_brick9):918:update_worker_status] _GMaster: Creating new /var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick9.status [2014-02-04 11:05:25.975267] I [master(/bricks/master_brick9):1129:crawl] _GMaster: starting hybrid crawl... [2014-02-04 11:05:25.991782] E [syncdutils(/bricks/master_brick1):240:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1156, in service_loop g1.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 422, in crawlwrap volinfo_sys = self.volinfo_hook() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 138, in volinfo_hook return self.get_sys_volinfo() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 278, in get_sys_volinfo fgn_vis, nat_vi = self.master.server.aggregated.foreign_volume_infos(), \ File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 885, in foreign_volume_infos xattr_list = Xattr.llistxattr_buf('.') File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 87, in llistxattr_buf return cls.llistxattr(path, size) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 57, in llistxattr ret = cls._query_xattr(path, siz, 'llistxattr') File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 35, in _query_xattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 25, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 34] Numerical result out of range [2014-02-04 11:05:25.995131] I [syncdutils(/bricks/master_brick1):192:finalize] <top>: exiting. [2014-02-04 11:05:26.662705] I [master(/bricks/master_brick5):58:gmaster_builder] <top>: setting up xsync change detection mode >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This has happened again in the build, glusterfs-3.6.0.2-1.el6rhs. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2014-05-16 12:17:02.880822] I [master(/bricks/master_brick5):1251:crawl] _GMaster: processing xsync changelog /var/run/gluster/master/ssh%3A%2F%2Froot%4010.70.43.114%3Agluster%3A%2F%2F127.0.0.1%3Aslave/994c39ebca2dac30ef18cf407ed3322f/xsync/XSYNC-CHANGELOG.1400222822 [2014-05-16 12:17:02.891830] I [master(/bricks/master_brick5):1248:crawl] _GMaster: finished hybrid crawl syncing [2014-05-16 12:17:02.896193] E [syncdutils(/bricks/master_brick9):270:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 633, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1298, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 447, in crawlwrap volinfo_sys = self.volinfo_hook() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 155, in volinfo_hook return self.get_sys_volinfo() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 303, in get_sys_volinfo self.master.server.aggregated.foreign_volume_infos(), File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 958, in foreign_volume_infos xattr_list = Xattr.llistxattr_buf('.') File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 99, in llistxattr_buf return cls.llistxattr(path, size) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 69, in llistxattr ret = cls._query_xattr(path, siz, 'llistxattr') File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in _query_xattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 34] Numerical result out of range [2014-05-16 12:17:02.902026] I [syncdutils(/bricks/master_brick9):214:finalize] <top>: exiting. [2014-05-16 12:17:02.905908] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I see this happening with build: glusterfs-geo-replication-3.7.1-4.el6rhs.x86_64 Even without adding any new node. It happened when the volume type is "disperse" for both master and slave. [root@georep2 ~]# grep "OSError:" /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log OSError: [Errno 34] Numerical result out of range OSError: [Errno 34] Numerical result out of range [root@georep2 ~]# [2015-06-22 19:39:07.959126] I [monitor(monitor):221:monitor] Monitor: ------------------------------------------------------------ [2015-06-22 19:39:07.959476] I [monitor(monitor):222:monitor] Monitor: starting gsyncd worker [2015-06-22 19:39:08.93621] I [gsyncd(/rhs/brick1/b1):649:main_i] <top>: syncing: gluster://localhost:master -> ssh://root.46.103:gluster://localhost:slave [2015-06-22 19:39:08.95005] I [changelogagent(agent):75:__init__] ChangelogAgent: Agent listining... [2015-06-22 19:39:11.153399] I [master(/rhs/brick1/b1):83:gmaster_builder] <top>: setting up xsync change detection mode [2015-06-22 19:39:11.153790] I [master(/rhs/brick1/b1):404:__init__] _GMaster: using 'rsync' as the sync engine [2015-06-22 19:39:11.155164] I [master(/rhs/brick1/b1):83:gmaster_builder] <top>: setting up changelog change detection mode [2015-06-22 19:39:11.155376] I [master(/rhs/brick1/b1):404:__init__] _GMaster: using 'rsync' as the sync engine [2015-06-22 19:39:11.156248] I [master(/rhs/brick1/b1):83:gmaster_builder] <top>: setting up changeloghistory change detection mode [2015-06-22 19:39:11.156491] I [master(/rhs/brick1/b1):404:__init__] _GMaster: using 'rsync' as the sync engine [2015-06-22 19:39:13.201039] I [master(/rhs/brick1/b1):1208:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave/c19b89ac45352ab8c894d210d136dd56/xsync [2015-06-22 19:39:13.201385] I [resource(/rhs/brick1/b1):1432:service_loop] GLUSTER: Register time: 1434982153 [2015-06-22 19:39:15.791850] E [syncdutils(/rhs/brick1/b1):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1438, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 514, in crawlwrap volinfo_sys = self.volinfo_hook() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 172, in volinfo_hook return self.get_sys_volinfo() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 332, in get_sys_volinfo self.master.server.aggregated.foreign_volume_infos(), File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1084, in foreign_volume_infos xattr_list = Xattr.llistxattr_buf('.') File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 94, in llistxattr_buf return cls.llistxattr(path, size) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 69, in llistxattr ret = cls._query_xattr(path, siz, 'llistxattr') File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in _query_xattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 34] Numerical result out of range [2015-06-22 19:39:15.793708] I [syncdutils(/rhs/brick1/b1):220:finalize] <top>: exiting. [2015-06-22 19:39:15.795592] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2015-06-22 19:39:15.795979] I [syncdutils(agent):220:finalize] <top>: exiting. [2015-06-22 19:39:16.160041] I [monitor(monitor):282:monitor] Monitor: worker(/rhs/brick1/b1) died in startup phase [2015-06-22 19:39:26.346771] I [monitor(monitor):221:monitor] Monitor: ------------------------------------------------------------ : This happened on one of the node in master cluster, just after starting the geo-rep session and status went to Faulty. After multiple tries, the worked comes back and status becomes correctly passive. Will be attaching the new logs. Hit this bug on the normal distributed-volume as well with build glusterfs-3.7.1-14.el7rhgs.x86_64 [2015-09-08 17:50:29.959184] I [master(/bricks/brick2/master_brick8):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1441734629, 0) [2015-09-08 17:50:29.960731] E [syncdutils(/bricks/brick0/master_brick0):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1445, in service_loop g1.crawlwrap(oneshot=True, register_time=register_time) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 525, in crawlwrap volinfo_sys = self.volinfo_hook() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 172, in volinfo_hook return self.get_sys_volinfo() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 332, in get_sys_volinfo self.master.server.aggregated.foreign_volume_infos(), File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1084, in foreign_volume_infos xattr_list = Xattr.llistxattr_buf('.') File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 94, in llistxattr_buf return cls.llistxattr(path, size) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 69, in llistxattr ret = cls._query_xattr(path, siz, 'llistxattr') File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in _query_xattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 34] Numerical result out of range [root@georep1 syncdaemon]# gluster volume info master Volume Name: master Type: Distributed-Replicate Volume ID: 114cc338-b4ae-469a-8db7-105b5f671f9c Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again. Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again. |