Bug 1647506
Summary: | glusterd_brick_start wrongly discovers already-running brick | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | patrice | ||||||
Component: | glusterd | Assignee: | Sanju <srakonde> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.1 | CC: | amukherj, bugs, moagrawa, pasik, patrice, srakonde | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-02-24 11:02:26 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
patrice
2018-11-07 16:03:27 UTC
Mohit - can you please check it? I think this is addressed through https://bugzilla.redhat.com/show_bug.cgi?id=1595320 which is fixed in glusters-5. Mohit - would you check if this can be backported to 4.1 branch? Hi, Can you please share the dump of /var/log/glusterfs along with below information with a timestamp when the issue was reproduced ?? 1) ps -aef | grep gluster 2) gluster v info Thanks, Mohit Agrawal Here are some trace : [root@pb-gluster-ope-0 ~]# docker exec -it gluster bash [root@pb-gluster-ope-0 /]# ps -eaf UID PID PPID C STIME TTY TIME CMD root 1 0 0 13:31 ? 00:00:00 /usr/bin/python2 /usr/bin/supervisord -c /etc/supervisord.conf root 8 1 0 13:31 ? 00:00:00 /usr/sbin/glusterd -N -l /var/log/glusterfs/glusterfs.log --log-level INFO root 9 1 0 13:31 ? 00:00:00 /usr/bin/python /usr/bin/gmanager.py root 18 0 0 13:31 pts/1 00:00:00 bash root 33 1 0 13:32 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/d7c25a447b41ab7d.socket --xlator-option *replicate*.nod root 42 1 0 13:32 ? 00:00:00 /usr/sbin/glusterfsd -s pilot-0 --volfile-id glusterPGSQL.pilot-0.mnt-glusterPGSQL-1 -p /var/run/gluster/vols/glusterPGSQL/pilot-0-mnt-glusterPGSQL-1.pid -S /var/run/gluster/8ae1d08240e9da74.socket --brick-name /mnt/gluster root 97 18 0 13:32 pts/1 00:00:00 ps -eaf [root@pb-gluster-ope-0 /]# find /run -name *.pid /run/supervisor/supervisord.pid /run/gluster/vols/glusterPGSQL/pilot-0-mnt-glusterPGSQL-1.pid /run/gluster/vols/glustervol1/pilot-0-local-glusterV0-1.pid /run/gluster/glustershd/glustershd.pid [root@pb-gluster-ope-0 /]# cat /run/gluster/vols/glusterPGSQL/pilot-0-mnt-glusterPGSQL-1.pid 42 [root@pb-gluster-ope-0 /]# cat /run/gluster/vols/glustervol1/pilot-0-local-glusterV0-1.pid 42 [root@pb-gluster-ope-0 /]# ls -l /run/gluster/vols/glusterPGSQL/pilot-0-mnt-glusterPGSQL-1.pid /run/gluster/vols/glustervol1/pilot-0-local-glusterV0-1.pid -rw-r--r--. 1 root root 3 Nov 12 13:32 /run/gluster/vols/glusterPGSQL/pilot-0-mnt-glusterPGSQL-1.pid -rw-r--r--. 1 root root 3 Nov 12 13:31 /run/gluster/vols/glustervol1/pilot-0-local-glusterV0-1.pid and the /var/log/glusterfs file are in attached file BR P. Created attachment 1504697 [details]
/var/log/glusterfs files
And gluster v info result : [root@pb-gluster-ope-0 /]# gluster v info Volume Name: glusterPGSQL Type: Replicate Volume ID: 322d4e1f-483a-4dae-8a3c-8f6fbc51dd57 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: pilot-0:/mnt/glusterPGSQL/1 Brick2: pilot-1:/mnt/glusterPGSQL/1 Brick3: pilot-2:/mnt/glusterPGSQL/1 Options Reconfigured: cluster.self-heal-daemon: enable transport.address-family: inet nfs.disable: on performance.client-io-threads: off Volume Name: glustervol1 Type: Replicate Volume ID: 10dca74f-2cf6-476c-89e1-69e6a67a7bde Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: pilot-0:/local/glusterV0/1 Brick2: pilot-1:/local/glusterV0/1 Brick3: pilot-2:/local/glusterV0/1 Options Reconfigured: cluster.self-heal-daemon: enable transport.address-family: inet nfs.disable: on performance.client-io-threads: off Hi, Thanks for sharing the info. After analyzing the logs it is difficult to map pid of running brick process. Usually, we got this type of situation only in the container environment while user do configure brick multiplex feature (multiple bricks are attached with same brick) but in this case you are getting an issue even without enabling brick multiplex. Would it be possible for you to test the patch if I do share the patch with you or how can I share a test build(rpm) with you ?? Thanks, Mohit Agrawal Hi, Yes, i can test a rpm if you provide me it! (but it could take few days to test it) BR P. Created attachment 1505089 [details]
glusterfs source rpm
Hi, I have attached gluster_src_rpm, Please build the other rpms from this source rpm and share the result. To build the rpm need to follow the instruction 1) rpm -hiv <source_rpm> 2) cd rpmbuild/SPECS; rpmbuild -ba glusterfs.spec The command will build rpms in rpmbuild/RPMS/x86_64 Thanks, Mohit Agrawal Have you tested out the fix? Did you get a chance to test the rpm provided? Hi, Sorry I have no way to test the rpm (no more labs!) BR We are not seeing this issue with latest master, closing the bug. |