Description of problem: =============== (for now raising bug more of a question) I was doing volume delete from heketi and did a node reboot. During this time I saw heketi throwing error messages as below "Error: Unable to get snapshot information from volume 1543236374-B-10-4: dial tcp 10.70.35.38:22: getsockopt: no route to host" What does the above error message mean? Is it something meaningful? Is it benign or could it be an issue? Version-Release number of selected component (if applicable): =========== 3.12.2-29 How reproducible: ============= hit it once Steps to Reproduce: 1.created 6 node cluster 2.using heketi to do volume creates and deletes (from different terminals, without conflicting the create/del operation for same volume) 3.did a node reboot during step 2 saw below errors from heketi While peer down is meaningful, what does it mean by below heketi throwing "Error: Unable to get snapshot information from volume 1543236374-B-10-4: dial tcp 10.70.35.38:22: getsockopt: no route to host" Volume a519b56095c240f5980d8bb3716fab42 deleted real 0m29.136s user 0m0.093s sys 0m0.045s Error: Unable to delete volume 1543236374-B-4-1: volume delete: 1543236374-B-4-1: failed: Some of the peers are down real 0m9.101s user 0m0.087s sys 0m0.024s Error: Unable to get snapshot information from volume 1543236374-B-17-2: dial tcp 10.70.35.38:22: getsockopt: connection refused real 0m0.095s user 0m0.080s sys 0m0.026s Error: Unable to delete volume 1543236374-B-12-1: volume delete: 1543236374-B-12-1: failed: Some of the peers are down real 0m9.103s user 0m0.085s sys 0m0.028s Error: Unable to delete volume 1543236374-B-2-3: volume delete: 1543236374-B-2-3: failed: Some of the peers are down real 0m5.101s user 0m0.081s sys 0m0.024s Error: Unable to get snapshot information from volume 1543236374-B-13-5: dial tcp 10.70.35.38:22: getsockopt: no route to host real 0m16.120s user 0m0.086s sys 0m0.034s Error: Unable to delete volume 1543236374-B-7-2: volume delete: 1543236374-B-7-2: failed: Some of the peers are down real 0m6.102s user 0m0.083s sys 0m0.029s Error: Unable to delete volume 1543236374-B-8-1: volume delete: 1543236374-B-8-1: failed: Some of the peers are down real 0m5.099s user 0m0.079s sys 0m0.029s Error: Unable to delete volume 1543236374-B-7-4: volume delete: 1543236374-B-7-4: failed: Some of the peers are down real 0m7.103s user 0m0.089s sys 0m0.021s Error: Unable to delete volume 1543236374-B-14-5: volume delete: 1543236374-B-14-5: failed: Some of the peers are down real 0m5.096s user 0m0.078s sys 0m0.027s Error: Unable to delete volume 1543236374-B-8-2: volume delete: 1543236374-B-8-2: failed: Some of the peers are down real 0m8.104s user 0m0.081s sys 0m0.031s Error: Unable to delete volume 1543236374-B-6-5: volume delete: 1543236374-B-6-5: failed: Some of the peers are down real 0m9.105s user 0m0.086s sys 0m0.029s Error: Unable to delete volume 1543236374-B-4-2: volume delete: 1543236374-B-4-2: failed: Some of the peers are down real 0m10.110s user 0m0.083s sysor: U0m0.036s delete volume 1543236374-B-19-3: volume delete: 1543236374-B-19-3: failed: Some of the peers are down user 0m0.077s sys 0m0.033s Error: Unable to get snapshot information from volume 1543236374-B-10-4: dial tcp 10.70.35.38:22: getsockopt: no route to host real 0m4.098s user 0m0.081s sys 0m0.029s Error: Unable to delete volume 1543236374-B-17-3: volume delete: 1543236374-B-17-3: failed: Some of the peers are down real 0m7.100s user 0m0.086s sys 0m0.022s Error: Unable to delete volume 1543236374-B-13-4: volume delete: 1543236374-B-13-4: failed: Some of the peers are down real 0m10.104s user 0m0.080s sys 0m0.037s Volume c857024ab678c4075bdf778f479ebcd3 deleted real 1m24.213s user 0m0.115s sys 0m0.082s
Did you mean to file this bug under heketi component? From the error message it does seem like volume can't be deleted because the peer is down?
Moving this to heketi since I haven't heard back from Nag yet.
FWIW the "dial tcp" stuff is probably coming from heketi directly while the "Some of the peers are down" are probably generated by glusterd and just getting piped through heketi. This is almost certainly due to the reboot of the node but let's sort through the rubble and see what we can come up with to confirm. To do that we'll need to start with the the heketi logs. In addition, I like to have a db dump if possible.
(In reply to John Mulligan from comment #4) > FWIW the "dial tcp" stuff is probably coming from heketi directly while the > "Some of the peers are down" are probably generated by glusterd and just > getting piped through heketi. > This is almost certainly due to the reboot of the node but let's sort > through the rubble and see what we can come up with to confirm. > To do that we'll need to start with the the heketi logs. In addition, I like > to have a db dump if possible. FYI, I will work on this and update the same, but may take sometime. Till then let the needinfo be on me