| Summary: | [GlusterD]: After volume sync (vol dir), bricks went to offline with error message "glusterfsd: Port is already in use" | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> |
| Component: | glusterd | Assignee: | Satish Mohan <smohan> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | Byreddy <bsrirama> |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | rhgs-3.1 | CC: | amukherj, rcyriac, rhs-bugs, sasundar, storage-qa-internal, vbellur |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-17 04:36:02 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Byreddy
2016-01-12 04:39:49 UTC
The problem here is once the sync happened through handshaking the brick was attempted a start where the brick was already running and it resulted into a port already in use issue. In this case we shouldn't have attempted to start the bricks and here is the reason: Brick pidfiles are stored in /var/lib/glusterd/vols/<volname/run/ which itself is wrong as we shouldn't store any run time information in the configuration path. We already have a bug to correct the pid file path across Gluster stack, refer to [1] . Due to this when the configuration folder was deleted, the pidfile also got removed at node 2 resulting into node 2 identifying that brick process is not running where it is and hence results into 'port already in use' failure. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1258561 Reducing the severity to low as we have not seen usage of volume sync by the users/customers. (In reply to Atin Mukherjee from comment #3) > > Reducing the severity to low as we have not seen usage of volume sync by the > users/customers. This issue is not related to volume sync invoked by the users/customers, its all about correct placement of PID files. PIDs should be maintained per-node run-state dir which is /var/run or /var/run gluster. That is as per standard linux configuration. This will affect the users who go for offline upgrade or upgrade from ISO image, where they backup /var/lib/glusterd and restore post upgrade. Based on the above facts, raising the severity to HIGH (In reply to SATHEESARAN from comment #6) > Based on the above facts, raising the severity to HIGH Missed to update severity (In reply to SATHEESARAN from comment #6) > (In reply to Atin Mukherjee from comment #3) > > > > Reducing the severity to low as we have not seen usage of volume sync by the > > users/customers. > > This issue is not related to volume sync invoked by the users/customers, its > all about correct placement of PID files. > > PIDs should be maintained per-node run-state dir which is /var/run or > /var/run gluster. > > That is as per standard linux configuration. Yes, agreed. Although volume sync is one of the use case which caused this problem the underlying issue is at misplaced pid files. > > This will affect the users who go for offline upgrade or upgrade from ISO > image, where they backup /var/lib/glusterd and restore post upgrade. > > Based on the above facts, raising the severity to HIGH I kind of disagree here. If we go along with the documented procedures of offline upgrade where the brick processes are stopped you'd never encounter this. However the incorrect pid files will be placed in the configuration. (In reply to Atin Mukherjee from comment #8) > I kind of disagree here. If we go along with the documented procedures of > offline upgrade where the brick processes are stopped you'd never encounter > this. However the incorrect pid files will be placed in the configuration. Thanks Atin for correcting me. I agree with the same. Offline upgrade or peer replacement procedure mandates stopping bricks and so this issue will never be encountered, unless the customer/user removed /var/lib/glusterd accidentally with running bricks. As you suggested, placing the PID files per standards, we can avoid any unforeseen problems arising in the future. I suggest to prioritize it for RHGS 3.1 zstream |