Bug 1054105 - dist-geo-rep : gsyncd crashed because of failing to create brick status file.
Summary: dist-geo-rep : gsyncd crashed because of failing to create brick status file.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: status
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-16 09:09 UTC by Vijaykumar Koppad
Modified: 2023-09-14 01:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-06 14:26:45 UTC
Embargoed:


Attachments (Terms of Use)
Glusterd logs of the node where it happened. (166.96 KB, text/x-log)
2014-01-16 12:16 UTC, Vijaykumar Koppad
no flags Details

Description Vijaykumar Koppad 2014-01-16 09:09:48 UTC
Description of problem: gsyncd crashed because of failing to create brick status file.

Python backtrace
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-15 16:00:34.505021] I [master(/bricks/master_brick1):918:update_worker_status] _GMaster: Creating new /var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status
[2014-01-15 16:00:34.505701] E [syncdutils(/bricks/master_brick1):207:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1157, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 459, in crawlwrap
    self.update_worker_remote_node()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 968, in update_worker_remote_node
    self.update_worker_status ('remote_node', remote_node)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 920, in update_worker_status
    with open(worker_status_file, 'wb') as f:
IOError: [Errno 2] No such file or directory: '/var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status'
[2014-01-15 16:00:34.508407] I [syncdutils(/bricks/master_brick1):159:finalize] <top>: exiting.
[2014-01-15 18:03:48.187860] I [monitor(monitor):223:distribute] <top>: slave bricks: [{'host': '10.70.43.76', 'dir': '/bricks/slave_brick1'}, {'host': '10.70.43.135', 'dir': '/bricks/slave_brick2'}, {'host': '10.70.43.174', 'dir': '/bric
ks/slave_brick3'}, {'host': '10.70.42.151', 'dir': '/bricks/slave_brick4'}]

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable):glusterfs-3.4.0.57rhs-1


How reproducible: Didn't try to reproduce. 


Steps to Reproduce:
I observed this crash while doing the following steps.

1. create and start a geo-rep relationship between master and slave. 
2. create data on master using the command, "./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K /mnt/master/"
3. then run this "for i in {1..10}; do ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=hardlink /mnt/master/ ; done"

Actual results: Geo-rep crashed because of failing to create brick status file


Expected results: First of all, geo-rep shouldn't fail to create some status file, even if it fails it shouldn't crash. 


Additional info:

Comment 2 Venky Shankar 2014-01-16 12:12:57 UTC
If the setup is still there, could you attach the glusterd logs in the BZ? If glusterd failed to create the dir structure, there would be logs relating to that in the glusterd log file.

Comment 3 Vijaykumar Koppad 2014-01-16 12:16:04 UTC
Created attachment 851031 [details]
Glusterd logs of the node where it happened.

Comment 4 Venky Shankar 2014-01-20 02:22:07 UTC
Vijaykumar,

Is this bug reproducible? The directory structure is not deleted until a geo-replication 'delete' command is invoked.

Comment 6 Aravinda VK 2015-08-06 14:26:45 UTC
Geo-replication status infrastructure is improved in RHGS 3.1. This issue is not seen during regression runs of RHGS 3.1. Closing this bug, please reopen this issue found again.

Comment 7 Red Hat Bugzilla 2023-09-14 01:57:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.