Bug 1054105

Summary:

dist-geo-rep : gsyncd crashed because of failing to create brick status file.

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Vijaykumar Koppad <vkoppad>

Component:

geo-replication

Assignee:

Bug Updates Notification Mailing List <rhs-bugs>

Status:

CLOSED CURRENTRELEASE

QA Contact:

storage-qa-internal <storage-qa-internal>

Severity:

high

Docs Contact:

Priority:

low

Version:

2.1

CC:

aavati, avishwan, csaba, david.macdonald, nlevinki, vagarwal, vshankar

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

status

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-08-06 14:26:45 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Glusterd logs of the node where it happened.	none

Description Vijaykumar Koppad 2014-01-16 09:09:48 UTC

Description of problem: gsyncd crashed because of failing to create brick status file.

Python backtrace
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-15 16:00:34.505021] I [master(/bricks/master_brick1):918:update_worker_status] _GMaster: Creating new /var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status
[2014-01-15 16:00:34.505701] E [syncdutils(/bricks/master_brick1):207:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1157, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 459, in crawlwrap
    self.update_worker_remote_node()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 968, in update_worker_remote_node
    self.update_worker_status ('remote_node', remote_node)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 920, in update_worker_status
    with open(worker_status_file, 'wb') as f:
IOError: [Errno 2] No such file or directory: '/var/lib/glusterd/geo-replication/master_10.70.43.76_slave/_bricks_master_brick1.status'
[2014-01-15 16:00:34.508407] I [syncdutils(/bricks/master_brick1):159:finalize] <top>: exiting.
[2014-01-15 18:03:48.187860] I [monitor(monitor):223:distribute] <top>: slave bricks: [{'host': '10.70.43.76', 'dir': '/bricks/slave_brick1'}, {'host': '10.70.43.135', 'dir': '/bricks/slave_brick2'}, {'host': '10.70.43.174', 'dir': '/bric
ks/slave_brick3'}, {'host': '10.70.42.151', 'dir': '/bricks/slave_brick4'}]

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable):glusterfs-3.4.0.57rhs-1


How reproducible: Didn't try to reproduce. 


Steps to Reproduce:
I observed this crash while doing the following steps.

1. create and start a geo-rep relationship between master and slave. 
2. create data on master using the command, "./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K /mnt/master/"
3. then run this "for i in {1..10}; do ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=hardlink /mnt/master/ ; done"

Actual results: Geo-rep crashed because of failing to create brick status file


Expected results: First of all, geo-rep shouldn't fail to create some status file, even if it fails it shouldn't crash. 


Additional info:

Comment 2 Venky Shankar 2014-01-16 12:12:57 UTC

If the setup is still there, could you attach the glusterd logs in the BZ? If glusterd failed to create the dir structure, there would be logs relating to that in the glusterd log file.

Comment 3 Vijaykumar Koppad 2014-01-16 12:16:04 UTC

Created attachment 851031 [details]
Glusterd logs of the node where it happened.

Comment 4 Venky Shankar 2014-01-20 02:22:07 UTC

Vijaykumar,

Is this bug reproducible? The directory structure is not deleted until a geo-replication 'delete' command is invoked.

Comment 6 Aravinda VK 2015-08-06 14:26:45 UTC

Geo-replication status infrastructure is improved in RHGS 3.1. This issue is not seen during regression runs of RHGS 3.1. Closing this bug, please reopen this issue found again.

Comment 7 Red Hat Bugzilla 2023-09-14 01:57:16 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days