Description of problem: gsyncd crashed because of failing to create brick status file. Python backtrace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2014-01-15 16:00:34.505021] I [master(/bricks/master_brick1):918:update_worker_status] _GMaster: Creating new /var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status [2014-01-15 16:00:34.505701] E [syncdutils(/bricks/master_brick1):207:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1157, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 459, in crawlwrap self.update_worker_remote_node() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 968, in update_worker_remote_node self.update_worker_status ('remote_node', remote_node) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 920, in update_worker_status with open(worker_status_file, 'wb') as f: IOError: [Errno 2] No such file or directory: '/var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status' [2014-01-15 16:00:34.508407] I [syncdutils(/bricks/master_brick1):159:finalize] <top>: exiting. [2014-01-15 18:03:48.187860] I [monitor(monitor):223:distribute] <top>: slave bricks: [{'host': '10.70.43.76', 'dir': '/bricks/slave_brick1'}, {'host': '10.70.43.135', 'dir': '/bricks/slave_brick2'}, {'host': '10.70.43.174', 'dir': '/bric ks/slave_brick3'}, {'host': '10.70.42.151', 'dir': '/bricks/slave_brick4'}] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version-Release number of selected component (if applicable):glusterfs-3.4.0.57rhs-1 How reproducible: Didn't try to reproduce. Steps to Reproduce: I observed this crash while doing the following steps. 1. create and start a geo-rep relationship between master and slave. 2. create data on master using the command, "./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K /mnt/master/" 3. then run this "for i in {1..10}; do ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=hardlink /mnt/master/ ; done" Actual results: Geo-rep crashed because of failing to create brick status file Expected results: First of all, geo-rep shouldn't fail to create some status file, even if it fails it shouldn't crash. Additional info:
If the setup is still there, could you attach the glusterd logs in the BZ? If glusterd failed to create the dir structure, there would be logs relating to that in the glusterd log file.
Created attachment 851031 [details] Glusterd logs of the node where it happened.
Vijaykumar, Is this bug reproducible? The directory structure is not deleted until a geo-replication 'delete' command is invoked.
Geo-replication status infrastructure is improved in RHGS 3.1. This issue is not seen during regression runs of RHGS 3.1. Closing this bug, please reopen this issue found again.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days