Bug 1141862

Summary: dist-geo-rep: Deleting files on master volume is not propagated to slave volume and geo-rep session goes faulty after a rebalance on slave volume.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shilpa <smanjara>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: unspecified Docs Contact:
Priority: high    
Version: rhgs-3.0CC: avishwan, chrisw, csaba, khiremat, mzywusko, nlevinki, smohan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-16 15:57:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Slave logs
none
Master log none

Description shilpa 2014-09-15 15:34:19 UTC
Description of problem: After running add-brick/rebalance on slave volume, and on the master volume, while trying to delete the files on master volume found that the deletes were not propagated to slave.


Version-Release number of selected component (if applicable):
glusterfs-3.6.0.28-1.el6rhs.x86_64

How reproducible:
Tried once

Steps to Reproduce:
1. Set up a geo-rep session with 6*2 master and slave volumes. 
2. With changelog change_detector and FUSE mount, 0 start the geo-rep session. create files: crefi -T 10 -n 10 --multi -d 10 -b 10 --random --max=5K --min=1k /mnt/master
3. While the the geo-rep is in sync, add-brick/rebalance on the slave volume.
4. Once all the files are sync-ed and the rebalance is complete, check the arequal-checksum of master and slave and they should be equal.
5. Now do a "rm -rf /mnt/master/*" on master node and all the files on master volume are deleted.
6. Check the slave mount point to see if all the files have been deleted on slave as well. 

Actual results:

The deletes never propagated to the slave volume. Geo-rep status showed faulty. 


Expected results:
Deletes should have been propagated. 

Additional info:

[root@Tim master]# find /mnt/master -type f | wc -l
30000
[root@Tim master]# find /mnt/slave -type f | wc -l
30000


Found some traceback errors in logs such as:
[2014-09-15 20:29:50.241961] E [repce(/bricks/brick3/mastervol_b9):207:__call__] RepceClient: call 5055:140136763217664:14107931
89.84 (entry_ops) failed on peer with OSError
[2014-09-15 20:29:50.242613] E [syncdutils(/bricks/brick3/mastervol_b9):270:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1343, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 524, in crawlwrap
    self.crawl(no_stime_update=no_stime_update)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1174, in crawl
    self.process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 927, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 891, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 39] Directory not empty: '.gfid/b3c8c3cb-07a0-41fb-ae61-2a27b43eb876/level70'
[2014-09-15 20:29:50.245551] I [syncdutils(/bricks/brick3/mastervol_b9):214:finalize] <top>: exiting.
[2014-09-15 20:29:50.249872] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.

# gluster v geo mastervol 10.70.42.197::slavevol status detail
 
MASTER NODE              MASTER VOL    MASTER BRICK                    SLAVE                     STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED   
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Tim.blr.redhat.com       mastervol     /bricks/brick1/mastervol_b1     10.70.42.254::slavevol    faulty     N/A                  N/A             2472           0                0                532                0               
Tim.blr.redhat.com       mastervol     /bricks/brick2/mastervol_b5     10.70.42.254::slavevol    faulty     N/A                  N/A             2191           0                0                560                0               
Tim.blr.redhat.com       mastervol     /bricks/brick3/mastervol_b9     10.70.42.254::slavevol    faulty     N/A                  N/A             2417           0                0                568                0               
green                    mastervol     /bricks/brick1/mastervol_b4     10.70.42.151::slavevol    Passive    N/A                  N/A             0              0                0                0                  0               
green                    mastervol     /bricks/brick2/mastervol_b8     10.70.42.151::slavevol    Passive    N/A                  N/A             0              0                0                0                  0               
green                    mastervol     /bricks/brick3/mastervol_b12    10.70.42.151::slavevol    Passive    N/A                  N/A             0              0                0                0                  0               
purple                   mastervol     /bricks/brick1/mastervol_b3     10.70.42.197::slavevol    faulty     N/A                  N/A             1988           0                0                494                0               
purple                   mastervol     /bricks/brick2/mastervol_b7     10.70.42.197::slavevol    faulty     N/A                  N/A             2081           0                0                530                0               
purple                   mastervol     /bricks/brick3/mastervol_b11    10.70.42.197::slavevol    faulty     N/A                  N/A             2106           0                0                468                0               
Javier.blr.redhat.com    mastervol     /bricks/brick1/mastervol_b2     10.70.42.97::slavevol     Passive    N/A                  N/A             0              0                0                0                  0               
Javier.blr.redhat.com    mastervol     /bricks/brick2/mastervol_b6     10.70.42.97::slavevol     Passive    N/A                  N/A             0              0                0                0                  0               
Javier.blr.redhat.com    mastervol     /bricks/brick3/mastervol_b10    10.70.42.97::slavevol     Passive    N/A                  N/A             0              0                0                0                  0               

# gluster v status
Status of volume: mastervol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.42.190:/bricks/brick1/mastervol_b1		49155	Y	12359
Brick 10.70.43.88:/bricks/brick1/mastervol_b2		49155	Y	16855
Brick 10.70.42.29:/bricks/brick1/mastervol_b3		49155	Y	24731
Brick 10.70.42.88:/bricks/brick1/mastervol_b4		49155	Y	24650
Brick 10.70.42.190:/bricks/brick2/mastervol_b5		49156	Y	12370
Brick 10.70.43.88:/bricks/brick2/mastervol_b6		49156	Y	16866
Brick 10.70.42.29:/bricks/brick2/mastervol_b7		49156	Y	24742
Brick 10.70.42.88:/bricks/brick2/mastervol_b8		49156	Y	24661
Brick 10.70.42.190:/bricks/brick3/mastervol_b9		49157	Y	12381
Brick 10.70.43.88:/bricks/brick3/mastervol_b10		49157	Y	16877
Brick 10.70.42.29:/bricks/brick3/mastervol_b11		49157	Y	24753
Brick 10.70.42.88:/bricks/brick3/mastervol_b12		49157	Y	24672

Comment 4 shilpa 2014-09-16 11:35:33 UTC
Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=960910

Comment 8 shilpa 2014-09-19 08:51:17 UTC
Created attachment 939150 [details]
Slave logs

Comment 9 shilpa 2014-09-19 09:10:00 UTC
Created attachment 939153 [details]
Master log