Bug 1444926
| Summary: | Brick Multiplexing: creating a volume with same base name and base brick after it was deleted brings down all the bricks associated with the same brick process | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
| Component: | core | Assignee: | Mohit Agrawal <moagrawa> |
| Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.3 | CC: | amukherj, moagrawa, nchilaka, rhs-bugs, storage-qa-internal |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | brick-multiplexing | ||
| Fixed In Version: | glusterfs-3.8.4-25 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-09-21 04:39:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1451598, 1458113 | ||
| Bug Blocks: | 1417151 | ||
|
Description
Nag Pavan Chilakam
2017-04-24 14:41:40 UTC
Please attach the logs, as or me issue is different. Below is the details of the issue been observe on my setup that too wit multiple iterations. 1. 4 node cluster 2. Create DR1 3. Start DR1 4. create DR2 5. start DR2 6. create DR3 7. Start DR3 8. Delete DR1 9. delete dr1 bricks from all nodes 10.Create DR1 11.Start DR1 Output...[root@localhost ~]# gluster v status Status of volume: dr1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.122.4:/rhs/brick1/dr1 49152 0 Y 4444 Brick 192.168.122.6:/rhs/brick1/dr1 49157 0 Y 3323 Brick 192.168.122.79:/rhs/brick1/dr1 49152 0 Y 3916 Brick 192.168.122.109:/rhs/brick1/dr1 49152 0 Y 4745 Self-heal Daemon on localhost N/A N/A Y 3343 Self-heal Daemon on 192.168.122.4 N/A N/A Y 4464 Self-heal Daemon on 192.168.122.109 N/A N/A Y 4765 Self-heal Daemon on 192.168.122.79 N/A N/A Y 3936 Task Status of Volume dr1 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: dr2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.122.4:/rhs/brick1/dr2 N/A N/A N N/A Brick 192.168.122.6:/rhs/brick1/dr2 N/A N/A N N/A Brick 192.168.122.79:/rhs/brick1/dr2 N/A N/A N N/A Brick 192.168.122.109:/rhs/brick1/dr2 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 3343 Self-heal Daemon on 192.168.122.109 N/A N/A Y 4765 Self-heal Daemon on 192.168.122.79 N/A N/A Y 3936 Self-heal Daemon on 192.168.122.4 N/A N/A Y 4464 Task Status of Volume dr2 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: dr3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.122.4:/rhs/brick1/dr3 N/A N/A N N/A Brick 192.168.122.6:/rhs/brick1/dr3 N/A N/A N N/A Brick 192.168.122.79:/rhs/brick1/dr3 N/A N/A N N/A Brick 192.168.122.109:/rhs/brick1/dr3 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 3343 Self-heal Daemon on 192.168.122.4 N/A N/A Y 4464 Self-heal Daemon on 192.168.122.79 N/A N/A Y 3936 Self-heal Daemon on 192.168.122.109 N/A N/A Y 4765 Task Status of Volume dr3 ------------------------------------------------------------------------------ There are no active volume tasks For me the newly dr1 creation shows the details correctly, but not the other volumes. Initial level of debugging reveal that its a path issue. also note that i cannot start or delete the volumes due to below error [root@dhcp35-45 ~]# gluster v list dr1 dr2 dr3 [root@dhcp35-45 ~]# for i in $(gluster v list);do gluster v stop $i;done Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: dr1: failed: Commit failed on 10.70.35.112. Error: error Commit failed on 10.70.35.23. Error: error Commit failed on 10.70.35.138. Error: error Commit failed on 10.70.35.122. Error: error Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: dr2: failed: Commit failed on 10.70.35.23. Error: error Commit failed on 10.70.35.122. Error: error Commit failed on 10.70.35.112. Error: error Commit failed on 10.70.35.138. Error: error Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: dr3: failed: Commit failed on 10.70.35.23. Error: error Commit failed on 10.70.35.122. Error: error Commit failed on 10.70.35.138. Error: error Commit failed on 10.70.35.112. Error: error [root@dhcp35-45 ~]# [root@dhcp35-45 ~]# gluster v status Volume dr1 is not started Volume dr2 is not started Volume dr3 is not started [root@dhcp35-45 ~]# gluster v start dr1 volume start: dr1: failed: Pre Validation failed on 10.70.35.122. Volume dr1 already started Pre Validation failed on 10.70.35.23. Volume dr1 already started Pre Validation failed on 10.70.35.112. Volume dr1 already started Pre Validation failed on 10.70.35.138. Volume dr1 already started [root@dhcp35-45 ~]# gluster v start dr2 volume start: dr2: failed: Pre Validation failed on 10.70.35.122. Volume dr2 already started Pre Validation failed on 10.70.35.112. Volume dr2 already started Pre Validation failed on 10.70.35.138. Volume dr2 already started Pre Validation failed on 10.70.35.23. Volume dr2 already started [root@dhcp35-45 ~]# gluster v start dr3 volume start: dr3: failed: Pre Validation failed on 10.70.35.23. Volume dr3 already started Pre Validation failed on 10.70.35.112. Volume dr3 already started Pre Validation failed on 10.70.35.122. Volume dr3 already started Pre Validation failed on 10.70.35.138. Volume dr3 already started [root@dhcp35-45 ~]# gluster v status Volume dr1 is not started Volume dr2 is not started Volume dr3 is not started upstream patch : https://review.gluster.org/#/c/17101/ Upstream patches : https://review.gluster.org/#/q/topic:bug-1444596 Downstream patches: https://code.engineering.redhat.com/gerrit/#/c/105595/ https://code.engineering.redhat.com/gerrit/#/c/105596/ Even on 3.8.4-25 the issue exists. below are the steps I will have to move to failed_qa . 1) created 3 1x3 vols v1,v2,v3 with brick mux enabled, and all bricks getting same pid 2) IOs started on v2 and v3 3) stopped v1---->IOs still going on 4) deletes v1 --->still good 5) now deleted the bricks of v1---> the bricks were just directories under the LV(and each volume had seperate LV) This makes posix health check failed msg pop up as below Broadcast message from systemd-journald.eng.blr.redhat.com (Tue 2017-05-16 12:40:33 IST): rhs-brick1-myr-1[28967]: [2017-05-16 07:10:33.029490] M [MSGID: 113075] [posix-helpers.c:1893:posix_health_check_thread_proc] 0-myr-1-posix: health-check failed, going down Message from syslogd@dhcp35-45 at May 16 12:40:33 ... rhs-brick1-myr-1[28967]:[2017-05-16 07:10:33.029490] M [MSGID: 113075] [posix-helpers.c:1893:posix_health_check_thread_proc] 0-myr-1-posix: health-check failed, going down IOs stop Volume mount inaccessible with transport end point error. Tried to mount v2 on a new directory which is failing [2017-05-16 07:22:12.931554] I [MSGID: 114064] [client-handshake.c:148:client_notify_parents_child_up] 0-myr-3-client-1: Defering sending CHILD_UP message as the client translators are not yet ready to serve. [2017-05-16 07:22:12.931596] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-myr-3-client-2: Connected to myr-3-client-2, attached to remote volume '/rhs/brick3/myr-3'. [2017-05-16 07:22:12.931632] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-myr-3-client-2: Server and Client lk-version numbers are not same, reopening the fds [2017-05-16 07:22:12.931775] I [MSGID: 114064] [client-handshake.c:148:client_notify_parents_child_up] 0-myr-3-client-2: Defering sending CHILD_UP message as the client translators are not yet ready to serve. [2017-05-16 07:22:12.931803] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-myr-3-client-1: Server lk version = 1 [2017-05-16 07:22:12.931862] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-myr-3-client-2: Server lk version = 1 [2017-05-16 07:22:12.932443] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-myr-3-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-05-16 07:22:12.933664] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-myr-3-client-0: Connected to myr-3-client-0, attached to remote volume '/rhs/brick3/myr-3'. [2017-05-16 07:22:12.933687] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-myr-3-client-0: Server and Client lk-version numbers are not same, reopening the fds [2017-05-16 07:22:12.933767] I [MSGID: 114064] [client-handshake.c:148:client_notify_parents_child_up] 0-myr-3-client-0: Defering sending CHILD_UP message as the client translators are not yet ready to serve. [2017-05-16 07:22:12.933915] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-myr-3-client-0: Server lk version = 1 [2017-05-16 07:22:23.900464] I [fuse-bridge.c:5251:fuse_graph_setup] 0-fuse: switched to graph 0 [2017-05-16 07:22:23.902241] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2017-05-16 07:22:23.902504] I [MSGID: 108006] [afr-common.c:4827:afr_local_init] 0-myr-3-replicate-0: no subvolumes up [2017-05-16 07:22:23.902904] W [fuse-bridge.c:767:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) [2017-05-16 07:22:23.907736] I [fuse-bridge.c:5092:fuse_thread_proc] 0-fuse: unmounting /mnt/test4 The message "I [MSGID: 108006] [afr-common.c:4827:afr_local_init] 0-myr-3-replicate-0: no subvolumes up" repeated 2 times between [2017-05-16 07:22:23.902504] and [2017-05-16 07:22:23.906642] [2017-05-16 07:22:23.908145] W [glusterfsd.c:1291:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f9fbfee1dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f9fc1577f45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f9fc1577d6b] ) 0-: received signum (15), shutting down [2017-05-16 07:22:23.908179] I [fuse-bridge.c:5803:fini] 0-fuse: Unmounting '/mnt/test4'. validation: not seeing the issue anymore on 3.8.4-27 hence moving to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |