Bug 1563640 - brick processes not getting terminated when volume is stopped
Summary: brick processes not getting terminated when volume is stopped
Keywords:
Status: CLOSED DUPLICATE of bug 1548829
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Vijay Bellur
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-04 11:29 UTC by Nag Pavan Chilakam
Modified: 2018-04-04 13:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-04 13:51:42 UTC
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2018-04-04 11:29:41 UTC
Description of problem:
=======================
I have a 2x(4+2) ec volume and when I stop the volume, the volume stops successfully , however the brick processes are failing to get terminated in most of the nodes.
In my setup i have been able to reproduce is 3/3

I have been testing cgroups scripts on my nodes bz#1484446
so as part of this testing, to clear previous cgroup policies, i stop the volume and glusterd service to restart fresh, but reuse same volume

below is the brick log error:
==============================




[2018-04-04 11:20:14.003583] I [glusterfsd-mgmt.c:264:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /gluster/brick1/zen
[2018-04-04 11:20:14.003669] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.36.42:928
[2018-04-04 11:20:14.003753] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.205:1016
[2018-04-04 11:20:14.003832] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.47.121:910
[2018-04-04 11:20:14.003894] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.36.43:926
[2018-04-04 11:20:14.003989] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.46.35:928
[2018-04-04 11:20:14.004063] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.14:1017
[2018-04-04 11:20:14.004142] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.169:1014
[2018-04-04 11:20:14.004235] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.29:1013
[2018-04-04 11:20:14.004303] I [glusterfsd-mgmt.c:264:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /gluster/brick2/zen
[2018-04-04 11:20:14.004326] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.177:1016
[2018-04-04 11:20:14.004396] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.145:1017
[2018-04-04 11:20:14.004524] I [rpcsvc.c:1958:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3'
[2018-04-04 11:20:14.004521] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from rhs-client18.lab.eng.blr.redhat.com-2048-2018/04/02-11:55:02:784702-zen-client-0-0-4
[2018-04-04 11:20:14.004777] I [rpcsvc.c:1895:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:1
[2018-04-04 11:20:14.005310] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-205.lab.eng.blr.redhat.com-24945-2018/04/04-11:19:48:448051-zen-client-0-0-0
[2018-04-04 11:20:14.005385] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.36.42:927
[2018-04-04 11:20:14.005482] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.205:1003
[2018-04-04 11:20:14.005543] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.47.121:909
[2018-04-04 11:20:14.005581] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.36.43:925
[2018-04-04 11:20:14.005656] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.46.35:927
[2018-04-04 11:20:14.005700] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.169:1001
[2018-04-04 11:20:14.005737] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.14:999
[2018-04-04 11:20:14.005769] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.29:997
[2018-04-04 11:20:14.005795] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.177:1000
[2018-04-04 11:20:14.005829] I [server.c:1545:notify] 0-zen-server: disconnecting 10.70.35.145:1003
[2018-04-04 11:20:14.007002] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-205.lab.eng.blr.redhat.com-24945-2018/04/04-11:19:48:448051-zen-client-0-0-0
[2018-04-04 11:20:14.007085] I [MSGID: 101191] [event-epoll.c:644:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 2
[2018-04-04 11:20:14.007215] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection rhs-client18.lab.eng.blr.redhat.com-2048-2018/04/02-11:55:02:784702-zen-client-0-0-4
[2018-04-04 11:20:14.007332] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp47-121.lab.eng.blr.redhat.com-25030-2018/04/02-11:58:04:314774-zen-client-0-0-4
[2018-04-04 11:20:14.008663] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp47-121.lab.eng.blr.redhat.com-25030-2018/04/02-11:58:04:314774-zen-client-0-0-4
[2018-04-04 11:20:14.008734] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from rhs-client19.lab.eng.blr.redhat.com-26003-2018/04/02-11:35:49:681109-zen-client-0-0-4
[2018-04-04 11:20:14.010321] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection rhs-client19.lab.eng.blr.redhat.com-26003-2018/04/02-11:35:49:681109-zen-client-0-0-4
[2018-04-04 11:20:14.010388] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp46-35.lab.eng.blr.redhat.com-23510-2018/04/02-11:58:17:827441-zen-client-0-0-4
[2018-04-04 11:20:14.011310] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp46-35.lab.eng.blr.redhat.com-23510-2018/04/02-11:58:17:827441-zen-client-0-0-4
[2018-04-04 11:20:14.011379] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-14.lab.eng.blr.redhat.com-22523-2018/04/04-11:19:50:520470-zen-client-0-0-0
[2018-04-04 11:20:14.012338] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-14.lab.eng.blr.redhat.com-22523-2018/04/04-11:19:50:520470-zen-client-0-0-0
[2018-04-04 11:20:14.012494] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-169.lab.eng.blr.redhat.com-22468-2018/04/04-11:19:50:518854-zen-client-0-0-0
[2018-04-04 11:20:14.013536] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-169.lab.eng.blr.redhat.com-22468-2018/04/04-11:19:50:518854-zen-client-0-0-0
[2018-04-04 11:20:14.013601] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-29.lab.eng.blr.redhat.com-22618-2018/04/04-11:19:50:527898-zen-client-0-0-0
[2018-04-04 11:20:14.015057] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-29.lab.eng.blr.redhat.com-22618-2018/04/04-11:19:50:527898-zen-client-0-0-0
[2018-04-04 11:20:14.015151] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-177.lab.eng.blr.redhat.com-22739-2018/04/04-11:19:50:543607-zen-client-0-0-0
[2018-04-04 11:20:14.016138] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-177.lab.eng.blr.redhat.com-22739-2018/04/04-11:19:50:543607-zen-client-0-0-0
[2018-04-04 11:20:14.016208] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-145.lab.eng.blr.redhat.com-23044-2018/04/04-11:19:50:578712-zen-client-0-0-0
[2018-04-04 11:20:14.017090] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-145.lab.eng.blr.redhat.com-23044-2018/04/04-11:19:50:578712-zen-client-0-0-0
[2018-04-04 11:20:14.017158] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from rhs-client18.lab.eng.blr.redhat.com-2048-2018/04/02-11:55:02:784702-zen-client-6-0-4
[2018-04-04 11:20:14.018069] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection rhs-client18.lab.eng.blr.redhat.com-2048-2018/04/02-11:55:02:784702-zen-client-6-0-4
[2018-04-04 11:20:14.018135] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-205.lab.eng.blr.redhat.com-24945-2018/04/04-11:19:48:448051-zen-client-6-0-0
[2018-04-04 11:20:14.018960] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-205.lab.eng.blr.redhat.com-24945-2018/04/04-11:19:48:448051-zen-client-6-0-0
[2018-04-04 11:20:14.019024] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp47-121.lab.eng.blr.redhat.com-25030-2018/04/02-11:58:04:314774-zen-client-6-0-4
[2018-04-04 11:20:14.019844] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp47-121.lab.eng.blr.redhat.com-25030-2018/04/02-11:58:04:314774-zen-client-6-0-4
[2018-04-04 11:20:14.019900] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from rhs-client19.lab.eng.blr.redhat.com-26003-2018/04/02-11:35:49:681109-zen-client-6-0-4
[2018-04-04 11:20:14.020796] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection rhs-client19.lab.eng.blr.redhat.com-26003-2018/04/02-11:35:49:681109-zen-client-6-0-4
[2018-04-04 11:20:14.020855] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp46-35.lab.eng.blr.redhat.com-23510-2018/04/02-11:58:17:827441-zen-client-6-0-4
[2018-04-04 11:20:14.021619] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp46-35.lab.eng.blr.redhat.com-23510-2018/04/02-11:58:17:827441-zen-client-6-0-4
[2018-04-04 11:20:14.021680] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-169.lab.eng.blr.redhat.com-22468-2018/04/04-11:19:50:518854-zen-client-6-0-0
[2018-04-04 11:20:14.022527] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-169.lab.eng.blr.redhat.com-22468-2018/04/04-11:19:50:518854-zen-client-6-0-0
[2018-04-04 11:20:14.022589] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-14.lab.eng.blr.redhat.com-22523-2018/04/04-11:19:50:520470-zen-client-6-0-0
[2018-04-04 11:20:14.023321] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-14.lab.eng.blr.redhat.com-22523-2018/04/04-11:19:50:520470-zen-client-6-0-0
[2018-04-04 11:20:14.023382] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-29.lab.eng.blr.redhat.com-22618-2018/04/04-11:19:50:527898-zen-client-6-0-0
[2018-04-04 11:20:14.024344] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-29.lab.eng.blr.redhat.com-22618-2018/04/04-11:19:50:527898-zen-client-6-0-0
[2018-04-04 11:20:14.024426] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-177.lab.eng.blr.redhat.com-22739-2018/04/04-11:19:50:543607-zen-client-6-0-0
[2018-04-04 11:20:14.025395] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-177.lab.eng.blr.redhat.com-22739-2018/04/04-11:19:50:543607-zen-client-6-0-0
[2018-04-04 11:20:14.025465] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-zen-server: disconnecting connection from dhcp35-145.lab.eng.blr.redhat.com-23044-2018/04/04-11:19:50:578712-zen-client-6-0-0
[2018-04-04 11:20:14.026335] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-zen-server: Shutting down connection dhcp35-145.lab.eng.blr.redhat.com-23044-2018/04/04-11:19:50:578712-zen-client-6-0-0
[2018-04-04 11:20:14.028963] E [glusterfsd-mgmt.c:236:glusterfs_handle_terminate] 0-glusterfs: can't terminate /gluster/brick1/zen - not found
[2018-04-04 11:20:14.030190] E [glusterfsd-mgmt.c:236:glusterfs_handle_terminate] 0-glusterfs: can't terminate /gluster/brick2/zen - not found




glusterd log:
---------
[2018-04-04 11:20:14.062647] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2018-04-04 11:20:14.062724] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped
[2018-04-04 11:20:14.063164] I [MSGID: 106568] [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 24946
[2018-04-04 11:20:15.063503] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped
[2018-04-04 11:20:15.063685] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2018-04-04 11:20:15.063721] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped
[2018-04-04 11:20:15.063799] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2018-04-04 11:20:15.063830] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped
[2018-04-04 11:20:15.064516] I [MSGID: 106144] [glusterd-pmap.c:383:pmap_registry_remove] 0-pmap: removing brick /gluster/brick1/zen on port 49152
[2018-04-04 11:20:15.064823] I [MSGID: 106144] [glusterd-pmap.c:383:pmap_registry_remove] 0-pmap: removing brick /gluster/brick2/zen on port 49152

[root@dhcp35-205 scripts]# gluster v info
g 
Volume Name: zen
Type: Distributed-Disperse
Volume ID: a6470510-3f32-4f34-8004-521d9670bec9
Status: Stopped
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp35-205.lab.eng.blr.redhat.com:/gluster/brick1/zen
Brick2: dhcp35-169.lab.eng.blr.redhat.com:/gluster/brick1/zen
Brick3: dhcp35-145.lab.eng.blr.redhat.com:/gluster/brick1/zen
Brick4: dhcp35-177.lab.eng.blr.redhat.com:/gluster/brick1/zen
Brick5: dhcp35-29.lab.eng.blr.redhat.com:/gluster/brick1/zen
Brick6: dhcp35-14.lab.eng.blr.redhat.com:/gluster/brick1/zen
Brick7: dhcp35-205.lab.eng.blr.redhat.com:/gluster/brick2/zen
Brick8: dhcp35-169.lab.eng.blr.redhat.com:/gluster/brick2/zen
Brick9: dhcp35-145.lab.eng.blr.redhat.com:/gluster/brick2/zen
Brick10: dhcp35-177.lab.eng.blr.redhat.com:/gluster/brick2/zen
Brick11: dhcp35-29.lab.eng.blr.redhat.com:/gluster/brick2/zen
Brick12: dhcp35-14.lab.eng.blr.redhat.com:/gluster/brick2/zen
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
disperse.other-eager-lock: off
disperse.parallel-writes: on
disperse.eager-lock: on
[root@dhcp35-205 scripts]# gluster v status
Volume zen is not started


Version-Release number of selected component (if applicable):
----------
3.12.2-6

How reproducible:
--------------
4/4 on my setup




Steps to Reproduce:
1.created a 2x4+2 volume
2.did some cgroups testing 
3.did a volume stop and glusterd stop to clear previous cgroup policies
4.did a glusterd stop on all nodes
5. checked that stale brick process exist
killed manually the brick process
started glusterd on all nodes
did a volume start
reran step 3,4,5  --->Still seeing stale brick procs

Actual results:
---------------
seems like the bricks are not getting terminated , however we are deleting the bricks from portmapper entries as part of volume stop, and hence volume stop succeeds even though brick processes were not terminated successfully

Expected results:
------------
either volume stop must fail if the environment is not optimal for volume stop or bricks must be terminated properly

Comment 2 Nag Pavan Chilakam 2018-04-04 11:47:08 UTC
sosreports@http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1563640

it contains strace of glusterd, if that is of help in /var/log/glusterfs
also statedump of glusterd and stale fsd is avaialble in /var/run/gluster

Comment 3 Atin Mukherjee 2018-04-04 13:51:42 UTC
This is already reported earlier as part of BZ 1548829 which is approved for RHGS 3.4.0.

*** This bug has been marked as a duplicate of bug 1548829 ***


Note You need to log in before you can comment on or make changes to this bug.