Bug 1623749
Summary: | Geo-rep: Few workers fails to start with out any failure | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Amar Tumballi <atumball> |
Component: | geo-replication | Assignee: | Sunny Kumar <sunkumar> |
Status: | CLOSED ERRATA | QA Contact: | Rochelle <rallan> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.4 | CC: | apaladug, avishwan, bugs, chpai, csaba, khiremat, mchangir, rallan, rhs-bugs, sanandpa, sankarshan, sheggodu, storage-qa-internal, sunkumar |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | RHGS 3.4.z Batch Update 1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.12.2-21 | Doc Type: | Bug Fix |
Doc Text: |
Previously, workers failed during startup due to deadlock caused while waiting for the flock. When a monitor starts the workers, they update the status file by using flock to synchronize. When worker one opened the status file to update, worker two could be forked causing the file descriptor to be referenced by worker two. Since it was necessary to close the file descriptor to unlock the lock, worker one failed to unlock as the reference existed in worker two causing a deadlock for worker 2 to come up. With this fix, the flock is unlocked specifically and the status file is updated so that the reference is not leaked to any worker or agent process. As a result of this fix, all workers come up without fail.
|
Story Points: | --- |
Clone Of: | 1614799 | Environment: | |
Last Closed: | 2018-10-31 08:46:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1614799, 1630145 | ||
Bug Blocks: |
Description
Amar Tumballi
2018-08-30 06:07:01 UTC
This bug is marked for 3.4.1 mainly because with the patch, the upstream tests which were failing consistently in geo-rep are now passing successfully. Hence it makes sense to get it into the product, IMO. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3432 |