Bug 1054105

Summary: dist-geo-rep : gsyncd crashed because of failing to create brick status file.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED CURRENTRELEASE QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: low    
Version: 2.1CC: aavati, avishwan, csaba, david.macdonald, nlevinki, vagarwal, vshankar
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: status
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-06 14:26:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Glusterd logs of the node where it happened. none

Description Vijaykumar Koppad 2014-01-16 09:09:48 UTC
Description of problem: gsyncd crashed because of failing to create brick status file.

Python backtrace
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-15 16:00:34.505021] I [master(/bricks/master_brick1):918:update_worker_status] _GMaster: Creating new /var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status
[2014-01-15 16:00:34.505701] E [syncdutils(/bricks/master_brick1):207:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1157, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 459, in crawlwrap
    self.update_worker_remote_node()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 968, in update_worker_remote_node
    self.update_worker_status ('remote_node', remote_node)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 920, in update_worker_status
    with open(worker_status_file, 'wb') as f:
IOError: [Errno 2] No such file or directory: '/var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status'
[2014-01-15 16:00:34.508407] I [syncdutils(/bricks/master_brick1):159:finalize] <top>: exiting.
[2014-01-15 18:03:48.187860] I [monitor(monitor):223:distribute] <top>: slave bricks: [{'host': '10.70.43.76', 'dir': '/bricks/slave_brick1'}, {'host': '10.70.43.135', 'dir': '/bricks/slave_brick2'}, {'host': '10.70.43.174', 'dir': '/bric
ks/slave_brick3'}, {'host': '10.70.42.151', 'dir': '/bricks/slave_brick4'}]

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable):glusterfs-3.4.0.57rhs-1


How reproducible: Didn't try to reproduce. 


Steps to Reproduce:
I observed this crash while doing the following steps.

1. create and start a geo-rep relationship between master and slave. 
2. create data on master using the command, "./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K /mnt/master/"
3. then run this "for i in {1..10}; do ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=hardlink /mnt/master/ ; done"

Actual results: Geo-rep crashed because of failing to create brick status file


Expected results: First of all, geo-rep shouldn't fail to create some status file, even if it fails it shouldn't crash. 


Additional info:

Comment 2 Venky Shankar 2014-01-16 12:12:57 UTC
If the setup is still there, could you attach the glusterd logs in the BZ? If glusterd failed to create the dir structure, there would be logs relating to that in the glusterd log file.

Comment 3 Vijaykumar Koppad 2014-01-16 12:16:04 UTC
Created attachment 851031 [details]
Glusterd logs of the node where it happened.

Comment 4 Venky Shankar 2014-01-20 02:22:07 UTC
Vijaykumar,

Is this bug reproducible? The directory structure is not deleted until a geo-replication 'delete' command is invoked.

Comment 6 Aravinda VK 2015-08-06 14:26:45 UTC
Geo-replication status infrastructure is improved in RHGS 3.1. This issue is not seen during regression runs of RHGS 3.1. Closing this bug, please reopen this issue found again.

Comment 7 Red Hat Bugzilla 2023-09-14 01:57:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days