| Summary: | Dist-geo-rep : first xsync crawl failed to sync few hardlink to slave when there were some 200K hardlinks | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Vijaykumar Koppad <vkoppad> |
| Component: | geo-replication | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> |
| Status: | CLOSED EOL | QA Contact: | Rahul Hinduja <rhinduja> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.1 | CC: | asriram, avishwan, chrisw, csaba, david.macdonald, grajaiya, khiremat, nsathyan, rhinduja, rwheeler, vagarwal |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | consistency | ||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
When there are hundreds of thousands of hardlinks on the master volume prior to starting the Geo-replication session, some hardlinks are not getting synchronized to the slave volume.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-11-25 08:48:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 1035040 | ||
|
Description
Vijaykumar Koppad
2013-11-07 10:38:41 UTC
After these failure to sync files to slave, the deletion on files crashed gsyncd with following backtrace. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f-31b8-44a1-b626-c1c302dd218f ... [2013-11-06 22:54:07.616204] I [master(/bricks/brick3):413:crawlwrap] _GMaster: crawl interval: 3 seconds [2013-11-06 22:54:07.741698] E [repce(/bricks/brick3):188:__call__] RepceClient: call 10737:140265205274368:1383758647.66 (entry_ops) failed on peer with OSError [2013-11-06 22:54:07.742059] E [syncdutils(/bricks/brick3):207:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 535, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1134, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 437, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 858, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 815, in process if self.process_change(change, done, retry): File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 780, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__ raise res OSError: [Errno 61] No data available >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Which looks like the backtrace observed in this Bug 1027252 backtrace looks same as bug 1028343, which is fixed in .42rhs. Can this workload be tested on the same build? Looks like the fix for 1028343 must fix this issue also. Requesting QE to verify in the .42rhs build. I tried it on the build glusterfs-3.4.0.43rhs-1, still I see this issue. on master no of files are, [root@shaktiman ~]# find /mnt/master/ | wc -l 220201 and on slave it as synced only, [root@spiderman ~]# find /mnt/slave/ | wc -l 218535 some 1.5K files are missing, and those are not shown in the status detail skipped files also. MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- shaktiman.blr.redhat.com master /bricks/brick1 10.70.42.171::slave Active N/A Changelog Crawl 192326 0 0 0 0 snow.blr.redhat.com master /bricks/brick4 10.70.42.229::slave Passive N/A N/A 5601 0 0 0 0 targarean.blr.redhat.com master /bricks/brick3 10.70.43.159::slave Active N/A Changelog Crawl 193030 0 0 0 0 riverrun.blr.redhat.com master /bricks/brick2 10.70.42.225::slave Passive N/A N/A 0 0 0 0 0 Vijaykumar, Could you test with entry-timeout as 0. You'd need to configure "gluster_params" to have this option. # gluster volume geo <master> <slave> config gluster_params "aux-gfid-mount entry-timeout=0" This needs to go as a known issue, removing the corbett flag. Modified the DocText for this Known Issue. Please review and confirm. The Doc Text looks fine. Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again. Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again. |