Description of problem: ======================= I have a 2x(4+2) ec volume and when I stop the volume, the volume stops successfully , however the brick processes are failing to get terminated in most of the nodes. In my setup i have been able to reproduce is 3/3 I have been testing cgroups scripts on my nodes bz#1484446 so as part of this testing, to clear previous cgroup policies, i stop the volume and glusterd service to restart fresh, but reuse same volume below is the brick log error: ============================== [2018-04-04 11:20:14.003583] I [glusterfsd-mgmt.c:264:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /gluster/brick1/zen [2018-04-04 11:20:14.003669] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.36.42:928 [2018-04-04 11:20:14.003753] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.205:1016 [2018-04-04 11:20:14.003832] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.47.121:910 [2018-04-04 11:20:14.003894] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.36.43:926 [2018-04-04 11:20:14.003989] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.46.35:928 [2018-04-04 11:20:14.004063] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.14:1017 [2018-04-04 11:20:14.004142] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.169:1014 [2018-04-04 11:20:14.004235] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.29:1013 [2018-04-04 11:20:14.004303] I [glusterfsd-mgmt.c:264:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /gluster/brick2/zen [2018-04-04 11:20:14.004326] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.177:1016 [2018-04-04 11:20:14.004396] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.145:1017 [2018-04-04 11:20:14.004524] I [rpcsvc.c:1958:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3' [2018-04-04 11:20:14.004521] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from rhs-client18.lab.eng.blr.redhat.com-2048-2018/04/02-11:55:02:784702-zen-client-0-0-4 [2018-04-04 11:20:14.004777] I [rpcsvc.c:1895:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:1 [2018-04-04 11:20:14.005310] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-205.lab.eng.blr.redhat.com-24945-2018/04/04-11:19:48:448051-zen-client-0-0-0 [2018-04-04 11:20:14.005385] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.36.42:927 [2018-04-04 11:20:14.005482] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.205:1003 [2018-04-04 11:20:14.005543] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.47.121:909 [2018-04-04 11:20:14.005581] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.36.43:925 [2018-04-04 11:20:14.005656] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.46.35:927 [2018-04-04 11:20:14.005700] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.169:1001 [2018-04-04 11:20:14.005737] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.14:999 [2018-04-04 11:20:14.005769] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.29:997 [2018-04-04 11:20:14.005795] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.177:1000 [2018-04-04 11:20:14.005829] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.145:1003 [2018-04-04 11:20:14.007002] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-205.lab.eng.blr.redhat.com-24945-2018/04/04-11:19:48:448051-zen-client-0-0-0 [2018-04-04 11:20:14.007085] I [MSGID: 101191] [event-epoll.c:644:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 2 [2018-04-04 11:20:14.007215] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection rhs-client18.lab.eng.blr.redhat.com-2048-2018/04/02-11:55:02:784702-zen-client-0-0-4 [2018-04-04 11:20:14.007332] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp47-121.lab.eng.blr.redhat.com-25030-2018/04/02-11:58:04:314774-zen-client-0-0-4 [2018-04-04 11:20:14.008663] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp47-121.lab.eng.blr.redhat.com-25030-2018/04/02-11:58:04:314774-zen-client-0-0-4 [2018-04-04 11:20:14.008734] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from rhs-client19.lab.eng.blr.redhat.com-26003-2018/04/02-11:35:49:681109-zen-client-0-0-4 [2018-04-04 11:20:14.010321] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection rhs-client19.lab.eng.blr.redhat.com-26003-2018/04/02-11:35:49:681109-zen-client-0-0-4 [2018-04-04 11:20:14.010388] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp46-35.lab.eng.blr.redhat.com-23510-2018/04/02-11:58:17:827441-zen-client-0-0-4 [2018-04-04 11:20:14.011310] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp46-35.lab.eng.blr.redhat.com-23510-2018/04/02-11:58:17:827441-zen-client-0-0-4 [2018-04-04 11:20:14.011379] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-14.lab.eng.blr.redhat.com-22523-2018/04/04-11:19:50:520470-zen-client-0-0-0 [2018-04-04 11:20:14.012338] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-14.lab.eng.blr.redhat.com-22523-2018/04/04-11:19:50:520470-zen-client-0-0-0 [2018-04-04 11:20:14.012494] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-169.lab.eng.blr.redhat.com-22468-2018/04/04-11:19:50:518854-zen-client-0-0-0 [2018-04-04 11:20:14.013536] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-169.lab.eng.blr.redhat.com-22468-2018/04/04-11:19:50:518854-zen-client-0-0-0 [2018-04-04 11:20:14.013601] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-29.lab.eng.blr.redhat.com-22618-2018/04/04-11:19:50:527898-zen-client-0-0-0 [2018-04-04 11:20:14.015057] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-29.lab.eng.blr.redhat.com-22618-2018/04/04-11:19:50:527898-zen-client-0-0-0 [2018-04-04 11:20:14.015151] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-177.lab.eng.blr.redhat.com-22739-2018/04/04-11:19:50:543607-zen-client-0-0-0 [2018-04-04 11:20:14.016138] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-177.lab.eng.blr.redhat.com-22739-2018/04/04-11:19:50:543607-zen-client-0-0-0 [2018-04-04 11:20:14.016208] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-145.lab.eng.blr.redhat.com-23044-2018/04/04-11:19:50:578712-zen-client-0-0-0 [2018-04-04 11:20:14.017090] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-145.lab.eng.blr.redhat.com-23044-2018/04/04-11:19:50:578712-zen-client-0-0-0 [2018-04-04 11:20:14.017158] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from rhs-client18.lab.eng.blr.redhat.com-2048-2018/04/02-11:55:02:784702-zen-client-6-0-4 [2018-04-04 11:20:14.018069] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection rhs-client18.lab.eng.blr.redhat.com-2048-2018/04/02-11:55:02:784702-zen-client-6-0-4 [2018-04-04 11:20:14.018135] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-205.lab.eng.blr.redhat.com-24945-2018/04/04-11:19:48:448051-zen-client-6-0-0 [2018-04-04 11:20:14.018960] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-205.lab.eng.blr.redhat.com-24945-2018/04/04-11:19:48:448051-zen-client-6-0-0 [2018-04-04 11:20:14.019024] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp47-121.lab.eng.blr.redhat.com-25030-2018/04/02-11:58:04:314774-zen-client-6-0-4 [2018-04-04 11:20:14.019844] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp47-121.lab.eng.blr.redhat.com-25030-2018/04/02-11:58:04:314774-zen-client-6-0-4 [2018-04-04 11:20:14.019900] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from rhs-client19.lab.eng.blr.redhat.com-26003-2018/04/02-11:35:49:681109-zen-client-6-0-4 [2018-04-04 11:20:14.020796] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection rhs-client19.lab.eng.blr.redhat.com-26003-2018/04/02-11:35:49:681109-zen-client-6-0-4 [2018-04-04 11:20:14.020855] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp46-35.lab.eng.blr.redhat.com-23510-2018/04/02-11:58:17:827441-zen-client-6-0-4 [2018-04-04 11:20:14.021619] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp46-35.lab.eng.blr.redhat.com-23510-2018/04/02-11:58:17:827441-zen-client-6-0-4 [2018-04-04 11:20:14.021680] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-169.lab.eng.blr.redhat.com-22468-2018/04/04-11:19:50:518854-zen-client-6-0-0 [2018-04-04 11:20:14.022527] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-169.lab.eng.blr.redhat.com-22468-2018/04/04-11:19:50:518854-zen-client-6-0-0 [2018-04-04 11:20:14.022589] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-14.lab.eng.blr.redhat.com-22523-2018/04/04-11:19:50:520470-zen-client-6-0-0 [2018-04-04 11:20:14.023321] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-14.lab.eng.blr.redhat.com-22523-2018/04/04-11:19:50:520470-zen-client-6-0-0 [2018-04-04 11:20:14.023382] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-29.lab.eng.blr.redhat.com-22618-2018/04/04-11:19:50:527898-zen-client-6-0-0 [2018-04-04 11:20:14.024344] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-29.lab.eng.blr.redhat.com-22618-2018/04/04-11:19:50:527898-zen-client-6-0-0 [2018-04-04 11:20:14.024426] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-177.lab.eng.blr.redhat.com-22739-2018/04/04-11:19:50:543607-zen-client-6-0-0 [2018-04-04 11:20:14.025395] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-177.lab.eng.blr.redhat.com-22739-2018/04/04-11:19:50:543607-zen-client-6-0-0 [2018-04-04 11:20:14.025465] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-145.lab.eng.blr.redhat.com-23044-2018/04/04-11:19:50:578712-zen-client-6-0-0 [2018-04-04 11:20:14.026335] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-145.lab.eng.blr.redhat.com-23044-2018/04/04-11:19:50:578712-zen-client-6-0-0 [2018-04-04 11:20:14.028963] E [glusterfsd-mgmt.c:236:glusterfs_handle_terminate] 0-glusterfs: can't terminate /gluster/brick1/zen - not found [2018-04-04 11:20:14.030190] E [glusterfsd-mgmt.c:236:glusterfs_handle_terminate] 0-glusterfs: can't terminate /gluster/brick2/zen - not found glusterd log: --------- [2018-04-04 11:20:14.062647] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2018-04-04 11:20:14.062724] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped [2018-04-04 11:20:14.063164] I [MSGID: 106568] [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 24946 [2018-04-04 11:20:15.063503] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped [2018-04-04 11:20:15.063685] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2018-04-04 11:20:15.063721] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped [2018-04-04 11:20:15.063799] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped [2018-04-04 11:20:15.063830] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped [2018-04-04 11:20:15.064516] I [MSGID: 106144] [glusterd-pmap.c:383:pmap_registry_remove] 0-pmap: removing brick /gluster/brick1/zen on port 49152 [2018-04-04 11:20:15.064823] I [MSGID: 106144] [glusterd-pmap.c:383:pmap_registry_remove] 0-pmap: removing brick /gluster/brick2/zen on port 49152 [root@dhcp35-205 scripts]# gluster v info g Volume Name: zen Type: Distributed-Disperse Volume ID: a6470510-3f32-4f34-8004-521d9670bec9 Status: Stopped Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: dhcp35-205.lab.eng.blr.redhat.com:/gluster/brick1/zen Brick2: dhcp35-169.lab.eng.blr.redhat.com:/gluster/brick1/zen Brick3: dhcp35-145.lab.eng.blr.redhat.com:/gluster/brick1/zen Brick4: dhcp35-177.lab.eng.blr.redhat.com:/gluster/brick1/zen Brick5: dhcp35-29.lab.eng.blr.redhat.com:/gluster/brick1/zen Brick6: dhcp35-14.lab.eng.blr.redhat.com:/gluster/brick1/zen Brick7: dhcp35-205.lab.eng.blr.redhat.com:/gluster/brick2/zen Brick8: dhcp35-169.lab.eng.blr.redhat.com:/gluster/brick2/zen Brick9: dhcp35-145.lab.eng.blr.redhat.com:/gluster/brick2/zen Brick10: dhcp35-177.lab.eng.blr.redhat.com:/gluster/brick2/zen Brick11: dhcp35-29.lab.eng.blr.redhat.com:/gluster/brick2/zen Brick12: dhcp35-14.lab.eng.blr.redhat.com:/gluster/brick2/zen Options Reconfigured: nfs.disable: on transport.address-family: inet disperse.other-eager-lock: off disperse.parallel-writes: on disperse.eager-lock: on [root@dhcp35-205 scripts]# gluster v status Volume zen is not started Version-Release number of selected component (if applicable): ---------- 3.12.2-6 How reproducible: -------------- 4/4 on my setup Steps to Reproduce: 1.created a 2x4+2 volume 2.did some cgroups testing 3.did a volume stop and glusterd stop to clear previous cgroup policies 4.did a glusterd stop on all nodes 5. checked that stale brick process exist killed manually the brick process started glusterd on all nodes did a volume start reran step 3,4,5 --->Still seeing stale brick procs Actual results: --------------- seems like the bricks are not getting terminated , however we are deleting the bricks from portmapper entries as part of volume stop, and hence volume stop succeeds even though brick processes were not terminated successfully Expected results: ------------ either volume stop must fail if the environment is not optimal for volume stop or bricks must be terminated properly
sosreports@http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1563640 it contains strace of glusterd, if that is of help in /var/log/glusterfs also statedump of glusterd and stale fsd is avaialble in /var/run/gluster
This is already reported earlier as part of BZ 1548829 which is approved for RHGS 3.4.0. *** This bug has been marked as a duplicate of bug 1548829 ***