1054105 – dist-geo-rep : gsyncd crashed because of failing to create brick status file.

Bug 1054105 - dist-geo-rep : gsyncd crashed because of failing to create brick status file.

Summary: dist-geo-rep : gsyncd crashed because of failing to create brick status file.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:	status
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-16 09:09 UTC by Vijaykumar Koppad
Modified:	2023-09-14 01:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-06 14:26:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Glusterd logs of the node where it happened. (166.96 KB, text/x-log) 2014-01-16 12:16 UTC, Vijaykumar Koppad	no flags	Details
View All

Description Vijaykumar Koppad 2014-01-16 09:09:48 UTC

Description of problem: gsyncd crashed because of failing to create brick status file.

Python backtrace
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-15 16:00:34.505021] I [master(/bricks/master_brick1):918:update_worker_status] _GMaster: Creating new /var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status
[2014-01-15 16:00:34.505701] E [syncdutils(/bricks/master_brick1):207:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1157, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 459, in crawlwrap
    self.update_worker_remote_node()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 968, in update_worker_remote_node
    self.update_worker_status ('remote_node', remote_node)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 920, in update_worker_status
    with open(worker_status_file, 'wb') as f:
IOError: [Errno 2] No such file or directory: '/var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status'
[2014-01-15 16:00:34.508407] I [syncdutils(/bricks/master_brick1):159:finalize] <top>: exiting.
[2014-01-15 18:03:48.187860] I [monitor(monitor):223:distribute] <top>: slave bricks: [{'host': '10.70.43.76', 'dir': '/bricks/slave_brick1'}, {'host': '10.70.43.135', 'dir': '/bricks/slave_brick2'}, {'host': '10.70.43.174', 'dir': '/bric
ks/slave_brick3'}, {'host': '10.70.42.151', 'dir': '/bricks/slave_brick4'}]

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable):glusterfs-3.4.0.57rhs-1


How reproducible: Didn't try to reproduce. 


Steps to Reproduce:
I observed this crash while doing the following steps.

1. create and start a geo-rep relationship between master and slave. 
2. create data on master using the command, "./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K /mnt/master/"
3. then run this "for i in {1..10}; do ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=hardlink /mnt/master/ ; done"

Actual results: Geo-rep crashed because of failing to create brick status file


Expected results: First of all, geo-rep shouldn't fail to create some status file, even if it fails it shouldn't crash. 


Additional info:

Comment 2 Venky Shankar 2014-01-16 12:12:57 UTC

If the setup is still there, could you attach the glusterd logs in the BZ? If glusterd failed to create the dir structure, there would be logs relating to that in the glusterd log file.

Comment 3 Vijaykumar Koppad 2014-01-16 12:16:04 UTC

Created attachment 851031 [details]
Glusterd logs of the node where it happened.

Comment 4 Venky Shankar 2014-01-20 02:22:07 UTC

Vijaykumar,

Is this bug reproducible? The directory structure is not deleted until a geo-replication 'delete' command is invoked.

Comment 6 Aravinda VK 2015-08-06 14:26:45 UTC

Geo-replication status infrastructure is improved in RHGS 3.1. This issue is not seen during regression runs of RHGS 3.1. Closing this bug, please reopen this issue found again.

Comment 7 Red Hat Bugzilla 2023-09-14 01:57:16 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.