Bug 1027252 - Dist-geo-rep : gsyncd process crashed while removing files after remove brick on the master.
Summary: Dist-geo-rep : gsyncd process crashed while removing files after remove brick...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: Vijaykumar Koppad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-06 12:29 UTC by Vijaykumar Koppad
Modified: 2014-08-25 00:50 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.4.0.43rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-27 15:46:16 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1769 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #1 2013-11-27 20:17:39 UTC

Description Vijaykumar Koppad 2013-11-06 12:29:29 UTC
Description of problem: gsyncd process crashed while removing files after remove brick on the master. This happened while removing files from master after the add-brick , rebalance and remove-brick had happened on the master volume.


Version-Release number of selected component (if applicable):glusterfs-3.4.0.39rhs-1


How reproducible: Didn't try to reproduce.


Steps to Reproduce:
1.Create and start a geo-rep relationship between master and slave.
2.put some data on master and let it sync to slave. 
3.add nodes to the cluster and add-bricks to the volume. 
4.start creating data on master and parallally start rebalance.
5.let the data to sync and rebalance to complete. 
6. Check the geo-rep status.
7. start creating data on master and parallely start remove brick of the bricks added.
8. let the data to sync and remove brick to complete.
9. check the geo-rep status.
10.wait for some time.
11. Start removing files on the master.
12. Check the geo-rep status.


Actual results: The geo-rep status for active replica nodes went to faulty and also while removing file it for directory not enpty errors. 


Expected results: removing of files should happen properly. 



Additional info:

Backtrace in geo-rep log . 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-11-06 17:08:29.881779] E [repce(/bricks/brick1):188:__call__] RepceClient: call 31582:139962570012416:1383737909.84 (entry_ops) failed on peer with OSError
[2013-11-06 17:08:29.882508] E [syncdutils(/bricks/brick1):207:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 535, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1134, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 437, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 858, in crawl
    self.process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 815, in process
    if self.process_change(change, done, retry):
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 780, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__
    raise res
OSError: [Errno 61] No data available
[2013-11-06 17:08:29.885715] I [syncdutils(/bricks/brick1):159:finalize] <top>: exiting.
[2013-11-06 17:08:29.896702] I [monitor(monitor):81:set_state] Monitor: new state: faulty
[2013-11-06 17:08:39.910705] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


while doing "rm -rf" on master mount point, some failed with errors 
m: cannot remove `/mnt/master/level08/level18/level28/level38/level48/level58/level68/level78/level88/level98': Directory not empty
rm: cannot remove `/mnt/master/level09/level19/level29/level39/level49/level59/level69/level79/level89': Directory not empty

corresponding client logs. 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-11-06 11:28:18.902560] I [client.c:2103:client_rpc_notify] 7-master-client-3: disconnected from 10.70.43.158:491
54. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-11-06 11:28:18.902601] E [afr-common.c:3919:afr_notify] 7-master-replicate-1: All subvolumes are down. Going off
line until atleast one of them comes back up.
[2013-11-06 11:38:32.425362] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-0: remote operation failed:
 Directory not empty
[2013-11-06 11:38:32.425698] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-1: remote operation failed:
 Directory not empty
[2013-11-06 11:38:44.159290] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed:
 Directory not empty
[2013-11-06 11:38:44.159412] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed:
 Directory not empty
[2013-11-06 11:39:12.312085] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed:
 Directory not empty
[2013-11-06 11:39:12.312149] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed:
 Directory not empty
[2013-11-06 11:39:12.315267] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 8-master-client-3: remote operation failed:
 File exists. Path: /level03/level13/level23/level33/level43/level53/level63/level73/level83
[2013-11-06 11:39:12.315313] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 8-master-client-2: remote operation failed:
 File exists. Path: /level03/level13/level23/level33/level43/level53/level63/level73/level83
[2013-11-06 11:39:33.039777] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed: Directory not empty
[2013-11-06 11:39:33.040168] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed: Directory not empty
[2013-11-06 11:40:03.002733] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed:

<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Comment 2 Amar Tumballi 2013-11-11 09:54:18 UTC
again, same as bug 1028343 in back trace. Which is fixed now in .42rhs build. Can this be tested?

Comment 3 Amar Tumballi 2013-11-13 09:28:30 UTC
considering bg128343 is VERIFIED, moving this bug to ON_QA.

Comment 4 Vijaykumar Koppad 2013-11-14 13:04:44 UTC
This bug was mainly for gsyncd crashing with  "No data available" and on the build glusterfs-3.4.0.44rhs-1 , the gsyncd crash doesn't happen, but the failure of rm is still there. Hence moving this bug as verified and tracking the other issue with this Bug 1030438

Comment 5 errata-xmlrpc 2013-11-27 15:46:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html


Note You need to log in before you can comment on or make changes to this bug.