Bug 1262212 - Brick process does not start after being killed with SIGKILL and then running `gluster volume start force'
Brick process does not start after being killed with SIGKILL and then running...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core (Show other bugs)
unspecified
Unspecified Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
Anoop
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-11 03:47 EDT by Shruti Sampat
Modified: 2017-03-25 12:25 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shruti Sampat 2015-09-11 03:47:30 EDT
Description of problem:
-----------------------

In a 3-way replicated volume, one brick in each replica set was killed using SIGKILL while I/O was running on fuse client. After a while, attempts to start the killed bricks using `gluster volume start force' were found to fail repeatedly. The following is from the logs -

Brick logs when the volume is started with force option -

<snip>

+------------------------------------------------------------------------------+
[2015-09-10 23:36:12.837482] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2015-09-10 23:36:14.947021] W [socket.c:642:__socket_rwv] 0-3-test-quota: readv on /var/run/gluster/quotad.socket failed (No data available)
[2015-09-10 23:36:15.980138] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 7b8f0589-7451-4995-8d78-a0da9c702f7a
[2015-09-10 23:36:15.980174] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-3-test-server: accepted client from dhcp37-70.lab.eng.blr.redhat.com-21710-2015/09/11-06:11:12:923821-3-test-client-0-0-0 (version: 3.7.1)
[2015-09-10 23:36:15.982765] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 7b8f0589-7451-4995-8d78-a0da9c702f7a
[2015-09-10 23:36:15.982796] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-3-test-server: accepted client from dhcp37-197.lab.eng.blr.redhat.com-21886-2015/09/11-06:10:46:657748-3-test-client-0-0-0 (version: 3.7.1)
[2015-09-10 23:36:15.982915] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-3-test-server: accepted client from vm10-rhsqa13.lab.eng.blr.redhat.com-14776-2015/09/10-05:36:14:214793-3-test-client-0-0-4 (version: 3.7.1)
[2015-09-10 23:36:16.012835] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 7b8f0589-7451-4995-8d78-a0da9c702f7a
[2015-09-10 23:36:16.012871] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-3-test-server: accepted client from dhcp37-197.lab.eng.blr.redhat.com-21894-2015/09/11-06:10:47:670581-3-test-client-0-0-0 (version: 3.7.1)
[2015-09-10 23:36:16.013150] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 7b8f0589-7451-4995-8d78-a0da9c702f7a
[2015-09-10 23:36:16.013197] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-3-test-server: accepted client from dhcp37-135.lab.eng.blr.redhat.com-20213-2015/09/11-06:10:48:664075-3-test-client-0-0-0 (version: 3.7.1)
[2015-09-10 23:36:16.025388] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 7b8f0589-7451-4995-8d78-a0da9c702f7a
[2015-09-10 23:36:16.025420] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-3-test-server: accepted client from dhcp37-135.lab.eng.blr.redhat.com-20205-2015/09/11-06:10:47:604671-3-test-client-0-0-0 (version: 3.7.1)
[2015-09-10 23:36:16.025539] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 7b8f0589-7451-4995-8d78-a0da9c702f7a
[2015-09-10 23:36:16.025571] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-3-test-server: accepted client from dhcp37-135.lab.eng.blr.redhat.com-20197-2015/09/11-06:10:46:587730-3-test-client-0-0-0 (version: 3.7.1)

</snip>

From glusterd logs -

<snip>

The message "I [MSGID: 106005] [glusterd-handler.c:4899:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.37.126:/rhs/brick2/b1 has disconnected from glusterd." repeated 39 times between [2015-09-11 00:59:50.495698] and [2015-09-11 01:01:47.519805]
The message "I [MSGID: 106005] [glusterd-handler.c:4899:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.37.126:/rhs/brick3/b1 has disconnected from glusterd." repeated 39 times between [2015-09-11 00:59:50.496232] and [2015-09-11 01:01:47.521145]
[2015-09-11 01:01:50.520025] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/d688303ff19aece29c724dfbabf0aa3f.socket failed (Invalid argument)
[2015-09-11 01:01:50.520770] I [MSGID: 106005] [glusterd-handler.c:4899:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.37.126:/rhs/brick2/b1 has disconnected from glusterd.
[2015-09-11 01:01:50.521500] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/8639fa8939074b2eba37825a7012056c.socket failed (Invalid argument)
[2015-09-11 01:01:50.522167] I [MSGID: 106005] [glusterd-handler.c:4899:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.37.126:/rhs/brick3/b1 has disconnected from glusterd.
[2015-09-11 01:01:53.520813] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/d688303ff19aece29c724dfbabf0aa3f.socket failed (Invalid argument)
[2015-09-11 01:01:53.522477] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/8639fa8939074b2eba37825a7012056c.socket failed (Invalid argument)
[2015-09-11 01:01:56.521453] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/d688303ff19aece29c724dfbabf0aa3f.socket failed (Invalid argument)
[2015-09-11 01:01:56.522860] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/8639fa8939074b2eba37825a7012056c.socket failed (Invalid argument)

</snip>

Restarting glusterd also does not help.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
glusterfs-3.7.1-14.el7rhgs.x86_64

How reproducible:
------------------
Haven't tried on another volume.

Steps to Reproduce:
-------------------
1. While the I/o is running from a fuse client on a 2x3 volume, kill one brick from each replica set.
2. After a while, start the volume with force option - `gluster volume start <vol-name> force'

Actual results:
---------------
The bricks that were killed in step 1 do not start after trying to start with force option or after restarting glusterd.

Expected results:
------------------
Brick processes are expected to start after `gluster volume start force'

Note You need to log in before you can comment on or make changes to this bug.