Bug 1201732
Summary: | [dist-geo-rep]:Directory not empty and Stale file handle errors in geo-rep logs during deletes from master in history/changelog crawl | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | shilpa <smanjara> | |
Component: | geo-replication | Assignee: | Aravinda VK <avishwan> | |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | rhgs-3.0 | CC: | aavati, annair, avishwan, csaba, nlevinki, rhinduja, vagarwal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.1.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.7.0-2.el6rhs | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1211037 (view as bug list) | Environment: | ||
Last Closed: | 2015-07-29 04:39:08 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1202842, 1211037, 1218922, 1223636 |
Description
shilpa
2015-03-13 11:23:46 UTC
Also saw "OSError: [Errno 61] No data available" error for hardlinks: [2015-03-13 22:47:46.456286] E [syncdutils(/bricks/brick0/master_brick0):270:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 645, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1329, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 553, in crawlwrap self.crawl(no_stime_update=no_stime_update) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1334, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1017, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 981, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 61] No data available: '.gfid/b8c9ca6c-cb69-4930-b4d5-c50ca7710f66/hardlink_to_files/55030214%%8W5FSWYQIO' [2015-03-13 22:47:46.459382] I [syncdutils(/bricks/brick0/master_brick0):214:finalize] <top>: exiting. [2015-03-13 22:47:46.464014] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2015-03-13 22:47:46.464807] I [syncdutils(agent):214:finalize] <top>: exiting. [2015-03-13 22:47:46.887911] I [monitor(monitor):280:monitor] Monitor: worker(/bricks/brick0/master_brick0) died in startup phase Hit the same issue with just changelog crawl. Georep session was never stopped. Executed following from fuse and nfs client of master volume: Fuse: ==== [root@wingo master]# for i in {1..10}; do cp -rf /etc etc.1 ; sleep 10 ; rm -rf etc.1 ; sleep 10 ; done NFS: ==== [root@wingo master_nfs]# for i in {1..10}; do cp -rf /etc etc.2 ; sleep 10 ; rm -rf etc.2 ; sleep 10 ; done Status moved from ACTIVE to Faulty as: ====================================== [root@georep1 ~]# gluster volume geo-replication vol0 10.70.46.100::vol1 status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ---------------------------------------------------------------------------------------------------------------------------------- georep1 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1 Active N/A Changelog Crawl georep1 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1 Active N/A Changelog Crawl georep3 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1 Passive N/A N/A georep3 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1 Passive N/A N/A georep2 vol0 /rhs/brick1/b1 root 10.70.46.100::vol1 Passive N/A N/A georep2 vol0 /rhs/brick2/b2 root 10.70.46.101::vol1 Passive N/A N/A [root@georep1 ~]# [root@georep1 ~]# [root@georep1 ~]# [root@georep1 ~]# [root@georep1 ~]# gluster volume geo-replication vol0 10.70.46.100::vol1 status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ------------------------------------------------------------------------------------------------------------------------------- georep1 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1 faulty N/A N/A georep1 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1 faulty N/A N/A georep3 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1 Passive N/A N/A georep3 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1 Passive N/A N/A georep2 vol0 /rhs/brick1/b1 root 10.70.46.100::vol1 Passive N/A N/A georep2 vol0 /rhs/brick2/b2 root 10.70.46.101::vol1 Passive N/A N/A [root@georep1 ~] From faulty it went to active and changed the crawl to history crawl: ====================================================================== [root@georep1 ~]# gluster volume geo-replication vol0 10.70.46.100::vol1 status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS -------------------------------------------------------------------------------------------------------------------------------- georep1 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1 Active N/A History Crawl georep1 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1 Active N/A History Crawl georep3 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1 Passive N/A N/A georep3 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1 Passive N/A N/A georep2 vol0 /rhs/brick1/b1 root 10.70.46.100::vol1 Passive N/A N/A georep2 vol0 /rhs/brick2/b2 root 10.70.46.101::vol1 Passive N/A N/A [root@georep1 ~]# Log Snippet: ============ [2015-03-16 18:53:29.265461] I [syncdutils(/rhs/brick1/b1):214:finalize] <top>: exiting. [2015-03-16 18:53:29.264503] E [syncdutils(/rhs/brick2/b2):270:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 645, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1329, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 553, in crawlwrap self.crawl(no_stime_update=no_stime_update) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1334, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1017, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 981, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 39] Directory not empty: '.gfid/70b3b3b8-3e8d-4f32-9123-fab73574ce91/yum' [2015-03-16 18:53:29.266381] I [syncdutils(/rhs/brick2/b2):214:finalize] <top>: exiting. [2015-03-16 18:53:29.268383] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2015-03-16 18:53:29.268708] I [syncdutils(agent):214:finalize] <top>: exiting. [2015-03-16 18:53:29.269619] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2015-03-16 18:53:29.270034] I [syncdutils(agent):214:finalize] <top>: exiting. [2015-03-16 18:53:29.685185] I [monitor(monitor):280:monitor] Monitor: worker(/rhs/brick1/b1) died in startup phase [2015-03-16 18:53:30.220942] I [monitor(monitor):280:monitor] Monitor: worker(/rhs/brick2/b2) died in startup phase Output of history log: ====================== [root@georep1 ssh%3A%2F%2Froot%4010.70.46.100%3Agluster%3A%2F%2F127.0.0.1%3Avol1]# find . . ./c19b89ac45352ab8c894d210d136dd56 ./c19b89ac45352ab8c894d210d136dd56/xsync ./c19b89ac45352ab8c894d210d136dd56/.history ./c19b89ac45352ab8c894d210d136dd56/.history/tracker ./c19b89ac45352ab8c894d210d136dd56/.history/.current ./c19b89ac45352ab8c894d210d136dd56/.history/.processed ./c19b89ac45352ab8c894d210d136dd56/.history/.processing ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512222 ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512176 ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512161 ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512131 ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512191 ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512207 ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512146 ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512116 ./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512101 ./c19b89ac45352ab8c894d210d136dd56/tracker ./c19b89ac45352ab8c894d210d136dd56/.current ./c19b89ac45352ab8c894d210d136dd56/.processed ./c19b89ac45352ab8c894d210d136dd56/.processed/archive_201503.tar ./c19b89ac45352ab8c894d210d136dd56/.processing ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512372 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512342 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512282 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512357 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512327 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512267 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512387 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512312 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512237 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512252 ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512297 ./764586b145d7206a154a778f64bd2f50 ./764586b145d7206a154a778f64bd2f50/xsync ./764586b145d7206a154a778f64bd2f50/.history ./764586b145d7206a154a778f64bd2f50/.history/tracker ./764586b145d7206a154a778f64bd2f50/.history/.current ./764586b145d7206a154a778f64bd2f50/.history/.processed ./764586b145d7206a154a778f64bd2f50/.history/.processing ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512222 ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512176 ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512161 ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512131 ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512207 ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512146 ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512192 ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512116 ./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512101 ./764586b145d7206a154a778f64bd2f50/tracker ./764586b145d7206a154a778f64bd2f50/.current ./764586b145d7206a154a778f64bd2f50/.processed ./764586b145d7206a154a778f64bd2f50/.processed/archive_201503.tar ./764586b145d7206a154a778f64bd2f50/.processing ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512372 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512342 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512282 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512357 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512327 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512267 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512387 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512312 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512237 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512252 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512297 [root@georep1 ssh%3A%2F%2Froot%4010.70.46.100%3Agluster%3A%2F%2F127.0.0.1%3Avol1]# Upstream Patch sent for review http://review.gluster.org/#/c/10204/ Verified with build: glusterfs-3.7.1-9.el6rhs.x86_64 Worker didn't crash with "Directory not Empty" and "ESTALE". Though I see lots of logs with "directory not empty" during recursive remove and eventually directories do not get remove from the slave. This is tracked via https://bugzilla.redhat.com/show_bug.cgi?id=1235633#c5 Moving this bug to verified state. [root@georep1 ~]# grep -i "OSError" /var/log/glusterfs/geo-replication/master/* [root@georep1 ~]# [root@georep3 ~]# grep -i "OSError" /var/log/glusterfs/geo-replication-slaves/1712d430-4ccd-4560-9f7c-d826537a9600\:gluster%3A%2F%2F127.0.0.1%3Aslave.* [root@georep3 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html |