Bug 1027252

Summary: Dist-geo-rep : gsyncd process crashed while removing files after remove brick on the master.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED ERRATA QA Contact: Vijaykumar Koppad <vkoppad>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: aavati, amarts, bbandari, csaba, grajaiya, vagarwal, vkoppad
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.43rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 15:46:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijaykumar Koppad 2013-11-06 12:29:29 UTC
Description of problem: gsyncd process crashed while removing files after remove brick on the master. This happened while removing files from master after the add-brick , rebalance and remove-brick had happened on the master volume.


Version-Release number of selected component (if applicable):glusterfs-3.4.0.39rhs-1


How reproducible: Didn't try to reproduce.


Steps to Reproduce:
1.Create and start a geo-rep relationship between master and slave.
2.put some data on master and let it sync to slave. 
3.add nodes to the cluster and add-bricks to the volume. 
4.start creating data on master and parallally start rebalance.
5.let the data to sync and rebalance to complete. 
6. Check the geo-rep status.
7. start creating data on master and parallely start remove brick of the bricks added.
8. let the data to sync and remove brick to complete.
9. check the geo-rep status.
10.wait for some time.
11. Start removing files on the master.
12. Check the geo-rep status.


Actual results: The geo-rep status for active replica nodes went to faulty and also while removing file it for directory not enpty errors. 


Expected results: removing of files should happen properly. 



Additional info:

Backtrace in geo-rep log . 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-11-06 17:08:29.881779] E [repce(/bricks/brick1):188:__call__] RepceClient: call 31582:139962570012416:1383737909.84 (entry_ops) failed on peer with OSError
[2013-11-06 17:08:29.882508] E [syncdutils(/bricks/brick1):207:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 535, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1134, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 437, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 858, in crawl
    self.process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 815, in process
    if self.process_change(change, done, retry):
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 780, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__
    raise res
OSError: [Errno 61] No data available
[2013-11-06 17:08:29.885715] I [syncdutils(/bricks/brick1):159:finalize] <top>: exiting.
[2013-11-06 17:08:29.896702] I [monitor(monitor):81:set_state] Monitor: new state: faulty
[2013-11-06 17:08:39.910705] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


while doing "rm -rf" on master mount point, some failed with errors 
m: cannot remove `/mnt/master/level08/level18/level28/level38/level48/level58/level68/level78/level88/level98': Directory not empty
rm: cannot remove `/mnt/master/level09/level19/level29/level39/level49/level59/level69/level79/level89': Directory not empty

corresponding client logs. 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-11-06 11:28:18.902560] I [client.c:2103:client_rpc_notify] 7-master-client-3: disconnected from 10.70.43.158:491
54. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-11-06 11:28:18.902601] E [afr-common.c:3919:afr_notify] 7-master-replicate-1: All subvolumes are down. Going off
line until atleast one of them comes back up.
[2013-11-06 11:38:32.425362] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-0: remote operation failed:
 Directory not empty
[2013-11-06 11:38:32.425698] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-1: remote operation failed:
 Directory not empty
[2013-11-06 11:38:44.159290] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed:
 Directory not empty
[2013-11-06 11:38:44.159412] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed:
 Directory not empty
[2013-11-06 11:39:12.312085] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed:
 Directory not empty
[2013-11-06 11:39:12.312149] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed:
 Directory not empty
[2013-11-06 11:39:12.315267] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 8-master-client-3: remote operation failed:
 File exists. Path: /level03/level13/level23/level33/level43/level53/level63/level73/level83
[2013-11-06 11:39:12.315313] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 8-master-client-2: remote operation failed:
 File exists. Path: /level03/level13/level23/level33/level43/level53/level63/level73/level83
[2013-11-06 11:39:33.039777] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed: Directory not empty
[2013-11-06 11:39:33.040168] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed: Directory not empty
[2013-11-06 11:40:03.002733] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed:

<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Comment 2 Amar Tumballi 2013-11-11 09:54:18 UTC
again, same as bug 1028343 in back trace. Which is fixed now in .42rhs build. Can this be tested?

Comment 3 Amar Tumballi 2013-11-13 09:28:30 UTC
considering bg128343 is VERIFIED, moving this bug to ON_QA.

Comment 4 Vijaykumar Koppad 2013-11-14 13:04:44 UTC
This bug was mainly for gsyncd crashing with  "No data available" and on the build glusterfs-3.4.0.44rhs-1 , the gsyncd crash doesn't happen, but the failure of rm is still there. Hence moving this bug as verified and tracking the other issue with this Bug 1030438

Comment 5 errata-xmlrpc 2013-11-27 15:46:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html