Bug 1733425 - Setting volume option when one of the glusterd is stopped in the cluster, post glusterd restart seeing couldn't find vol info in glusterd logs and shd, brick process offline
Summary: Setting volume option when one of the glusterd is stopped in the cluster, pos...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Sanju
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1727269
TreeView+ depends on / blocked
 
Reported: 2019-07-26 05:08 UTC by Atin Mukherjee
Modified: 2019-08-26 06:27 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1727269
Environment:
Last Closed: 2019-08-26 06:27:50 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Gluster.org Gerrit 23042 None Merged glusterd: stop stale bricks during handshaking in brick mux mode 2019-08-26 06:27:49 UTC

Comment 1 Atin Mukherjee 2019-07-26 05:09:14 UTC
Description of problem:
When glusterd is stopped on one node and performed volume set operation on only two volumes(one is upgraded and another volume is newly created.). Restarted glusterd on the node where it stopped on that node seeing "Couldn't find volinfo" errors and shd and brick process on the two volume where volume option is set are offline.


Version-Release number of selected component (if applicable):
mainline

How reproducible:
1/1

Steps to Reproduce:
1. Create 1000 volumes and start them
2. Performed in-service upgrade on the cluster. 
3. Create 1000 more volumes (1X3 replicate) and start them
4. Stop glusterd on node N1.
5. Perform volume set operation on two volumes from n2 'performance.readdir-ahead on'. One which is being used for upgrade and another newly created volume. 
6. Start glusterd on node N1.

Actual results:
glusterfsd and shd on these two volumes are offline.
seeing "couldn't find vol info" in glusterd logs. 

Expected results:
glusterfsd and shd should be online and should not see any errors with "Couldn't find volinfo" in glusterd log.

Additional info:

glusterd log snippet
#########################################
[2019-07-05 06:49:29.078858] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:29.078892] E [MSGID: 106619] [glusterd-svc-helper.c:791:__glusterd_send_svc_configure_req] 0-management: Unable to stat  (No such file or directory)
[2019-07-05 06:49:30.079325] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:31.079862] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:32.080355] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:33.080899] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:34.081374] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
The message "E [MSGID: 106619] [glusterd-svc-helper.c:791:__glusterd_send_svc_configure_req] 0-management: Unable to stat  (No such file or directory)" repeated 5 times between [2019-07-05 06:49:29.078892] and [2019-07-05 06:49:34.081406]
[2019-07-05 06:49:35.081508] W [MSGID: 106619] [glusterd-svc-helper.c:931:glusterd_attach_svc] 0-glusterd: attach failed for glustershd(volume=ronnie_-999)
[2019-07-05 06:49:35.081583] E [MSGID: 106048] [glusterd-shd-svc.c:477:glusterd_shdsvc_start] 0-glusterd: Failed to attach shd svc(volume=ronnie_-999) to pid=-1
[2019-07-05 06:49:44.928310] E [MSGID: 106558] [glusterd-volgen.c:3289:volgen_graph_build_clients] 0-glusterd: volume inconsistency: brick count is 0
[2019-07-05 06:49:44.928395] E [MSGID: 106068] [glusterd-shd-svc.c:191:glusterd_shdsvc_create_volfile] 0-management: Failed to create volfile
[2019-07-05 06:49:44.941696] I [glusterd-utils.c:6560:glusterd_brick_start] 0-management: discovered already-running brick /bricks/brick9/testvol_-999
[2019-07-05 06:49:44.941768] I [MSGID: 106142] [glusterd-pmap.c:290:pmap_registry_bind] 0-pmap: adding brick /bricks/brick9/testvol_-999 on port 49155
[2019-07-05 06:49:44.941824] I [rpc-clnt.c:1014:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-07-05 06:49:44.945117] I [MSGID: 106620] [glusterd-svc-helper.c:900:glusterd_attach_svc] 0-glusterd: adding svc glustershd (volume=testvol_-999) to existing process with pid 590
[2019-07-05 06:49:44.945285] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:44.945316] E [MSGID: 106619] [glusterd-svc-helper.c:791:__glusterd_send_svc_configure_req] 0-management: Unable to stat  (No such file or directory)
[2019-07-05 06:49:45.018727] I [socket.c:3837:socket_submit_outgoing_msg] 0-socket.management: not connected (priv->connected = -1)
[2019-07-05 06:49:45.018748] E [rpcsvc.c:1583:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2019-07-05 06:49:45.018761] E [MSGID: 106430] [glusterd-utils.c:600:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2019-07-05 06:49:45.023858] W [MSGID: 106056] [glusterd-snapshot-utils.c:376:glusterd_snap_volinfo_find_by_volume_id] 0-management: Snap volume not found
[2019-07-05 06:49:45.945716] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:45.945750] E [MSGID: 106619] [glusterd-svc-helper.c:791:__glusterd_send_svc_configure_req] 0-management: Unable to stat  (No such file or directory)
[2019-07-05 06:49:46.447404] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:46.946292] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:47.946837] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:48.947321] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:49.947840] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:50.948339] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:51.948895] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:52.949406] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:53.950006] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:54.950527] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:55.951051] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:56.951573] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:57.952095] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
[2019-07-05 06:49:58.952597] E [glusterd-handshake.c:343:build_volfile_path] 0-management: Couldn't find volinfo
The message "E [MSGID: 106619] [glusterd-svc-helper.c:791:__glusterd_send_svc_configure_req] 0-management: Unable to stat  (No such file or directory)" repeated 13 times between [2019-07-05 06:49:45.945750] and [2019-07-05 06:49:58.952676]
[2019-07-05 06:49:59.952756] W [MSGID: 106619] [glusterd-svc-helper.c:931:glusterd_attach_svc] 0-glusterd: attach failed for glustershd(volume=testvol_-999)
[2019-07-05 06:49:59.953137] E [MSGID: 106048] [glusterd-shd-svc.c:477:glusterd_shdsvc_start] 0-glusterd: Failed to attach shd svc(volume=testvol_-999) to pid=590

Comment 2 Worker Ant 2019-07-26 05:14:31 UTC
REVIEW: https://review.gluster.org/23042 (glusterd: stop stale bricks during handshaking in brick mux mode) posted (#3) for review on master by Atin Mukherjee

Comment 3 Worker Ant 2019-08-26 06:27:50 UTC
REVIEW: https://review.gluster.org/23042 (glusterd: stop stale bricks during handshaking in brick mux mode) merged (#6) on master by Atin Mukherjee


Note You need to log in before you can comment on or make changes to this bug.