1653303 – volume delete through heketi throwing "Error: Unable to get snapshot information from volume 1543236374-B-10-4: dial tcp 10.70.35.38:22: getsockopt: no route to host"

Bug 1653303 - volume delete through heketi throwing "Error: Unable to get snapshot information from volume 1543236374-B-10-4: dial tcp 10.70.35.38:22: getsockopt: no route to host"

Summary: volume delete through heketi throwing "Error: Unable to get snapshot informat...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	heketi
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	John Mulligan
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-26 13:52 UTC by Nag Pavan Chilakam
Modified:	2019-04-15 10:06 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-03-12 20:46:53 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2018-11-26 13:52:58 UTC

Description of problem:
===============
(for now raising bug more of a question)
I was doing volume delete from heketi and did a node reboot.
During this time I saw heketi throwing error messages as below
"Error: Unable to get snapshot information from volume 1543236374-B-10-4: dial tcp 10.70.35.38:22: getsockopt: no route to host"

What does the above error message mean?
Is it something meaningful?
Is it benign or could it be an issue?


Version-Release number of selected component (if applicable):
===========
3.12.2-29

How reproducible:
=============
hit it once

Steps to Reproduce:
1.created 6 node cluster
2.using heketi to do volume creates and deletes (from different terminals, without conflicting the create/del operation for same volume)
3.did a node reboot during step 2
saw below errors from heketi

While peer down is meaningful, what does it mean by below
heketi throwing "Error: Unable to get snapshot information from volume 1543236374-B-10-4: dial tcp 10.70.35.38:22: getsockopt: no route to host"



Volume a519b56095c240f5980d8bb3716fab42 deleted

real    0m29.136s
user    0m0.093s
sys     0m0.045s
Error: Unable to delete volume 1543236374-B-4-1: volume delete: 1543236374-B-4-1: failed: Some of the peers are down

real    0m9.101s
user    0m0.087s
sys     0m0.024s
Error: Unable to get snapshot information from volume 1543236374-B-17-2: dial tcp 10.70.35.38:22: getsockopt: connection refused

real    0m0.095s
user    0m0.080s
sys     0m0.026s
Error: Unable to delete volume 1543236374-B-12-1: volume delete: 1543236374-B-12-1: failed: Some of the peers are down

real    0m9.103s
user    0m0.085s
sys     0m0.028s
Error: Unable to delete volume 1543236374-B-2-3: volume delete: 1543236374-B-2-3: failed: Some of the peers are down

real    0m5.101s
user    0m0.081s
sys     0m0.024s
Error: Unable to get snapshot information from volume 1543236374-B-13-5: dial tcp 10.70.35.38:22: getsockopt: no route to host

real    0m16.120s
user    0m0.086s
sys     0m0.034s
Error: Unable to delete volume 1543236374-B-7-2: volume delete: 1543236374-B-7-2: failed: Some of the peers are down

real    0m6.102s
user    0m0.083s
sys     0m0.029s
Error: Unable to delete volume 1543236374-B-8-1: volume delete: 1543236374-B-8-1: failed: Some of the peers are down

real    0m5.099s
user    0m0.079s
sys     0m0.029s
Error: Unable to delete volume 1543236374-B-7-4: volume delete: 1543236374-B-7-4: failed: Some of the peers are down

real    0m7.103s
user    0m0.089s
sys     0m0.021s
Error: Unable to delete volume 1543236374-B-14-5: volume delete: 1543236374-B-14-5: failed: Some of the peers are down

real    0m5.096s
user    0m0.078s
sys     0m0.027s
Error: Unable to delete volume 1543236374-B-8-2: volume delete: 1543236374-B-8-2: failed: Some of the peers are down

real    0m8.104s
user    0m0.081s
sys     0m0.031s
Error: Unable to delete volume 1543236374-B-6-5: volume delete: 1543236374-B-6-5: failed: Some of the peers are down

real    0m9.105s
user    0m0.086s
sys     0m0.029s
Error: Unable to delete volume 1543236374-B-4-2: volume delete: 1543236374-B-4-2: failed: Some of the peers are down

real    0m10.110s
user    0m0.083s
sysor: U0m0.036s delete volume 1543236374-B-19-3: volume delete: 1543236374-B-19-3: failed: Some of the peers are down
user    0m0.077s
sys     0m0.033s
Error: Unable to get snapshot information from volume 1543236374-B-10-4: dial tcp 10.70.35.38:22: getsockopt: no route to host

real    0m4.098s
user    0m0.081s
sys     0m0.029s
Error: Unable to delete volume 1543236374-B-17-3: volume delete: 1543236374-B-17-3: failed: Some of the peers are down

real    0m7.100s
user    0m0.086s
sys     0m0.022s
Error: Unable to delete volume 1543236374-B-13-4: volume delete: 1543236374-B-13-4: failed: Some of the peers are down

real    0m10.104s
user    0m0.080s
sys     0m0.037s
Volume c857024ab678c4075bdf778f479ebcd3 deleted

real    1m24.213s
user    0m0.115s
sys     0m0.082s

Comment 2 Atin Mukherjee 2018-11-26 14:29:33 UTC

Did you mean to file this bug under heketi component? From the error message it does seem like volume can't be deleted because the peer is down?

Comment 3 Atin Mukherjee 2018-11-27 14:58:07 UTC

Moving this to heketi since I haven't heard back from Nag yet.

Comment 4 John Mulligan 2018-11-28 22:28:09 UTC

FWIW the "dial tcp" stuff is probably coming from heketi directly while the "Some of the peers are down" are probably generated by glusterd and just getting piped through heketi. 
This is almost certainly due to the reboot of the node but let's sort through the rubble and see what we can come up with to confirm.
To do that we'll need to start with the the heketi logs. In addition, I like to have a db dump if possible.

Comment 5 Nag Pavan Chilakam 2018-12-03 07:13:25 UTC

(In reply to John Mulligan from comment #4)
> FWIW the "dial tcp" stuff is probably coming from heketi directly while the
> "Some of the peers are down" are probably generated by glusterd and just
> getting piped through heketi. 
> This is almost certainly due to the reboot of the node but let's sort
> through the rubble and see what we can come up with to confirm.
> To do that we'll need to start with the the heketi logs. In addition, I like
> to have a db dump if possible.

FYI, I will work on this and update the same, but may take sometime. Till then let the needinfo be on me

Note You need to log in before you can comment on or make changes to this bug.