Bug 1129675 - [SNAPSHOT]: Once the snapshot is retored, the gluster volume heal <vol-name> shows "Transport endpoint is not connected" and writes from client is pending for this brick
Summary: [SNAPSHOT]: Once the snapshot is retored, the gluster volume heal <vol-name> ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: snapshot
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: Rahul Hinduja
URL:
Whiteboard: SNAPSHOT
Depends On:
Blocks: 1087818
TreeView+ depends on / blocked
 
Reported: 2014-08-13 12:57 UTC by Rahul Hinduja
Modified: 2018-04-16 16:03 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
If glusterd is down in one of the nodes in cluster or if the node itself is down, then performing a snapshot restore operation leads to inconsistency. Workaround (if any): Perform snapshot restore only if all the nodes and their corresponding glusterd services are running. In the following conditions after restore, restart glusterd service using "service glusterd start" command -Executing "gluster volume heal <vol-name> info" command displays the error message "Transport endpoint not connected". -Error occurs when clients try to connect to glusterd service.
Clone Of:
Environment:
Last Closed: 2018-04-16 16:03:58 UTC


Attachments (Terms of Use)

Description Rahul Hinduja 2014-08-13 12:57:16 UTC
Description of problem:
========================

In a scenario, where snap restore is performed when glusterd was down in one of the node in cluster. Restore is successful and entry is updated in the missed_snaps_list with entry 2:1. When a glusterd is brought online, the missed entry list restores and update its entry to 2:2 that means the restore is successful on this node as well.

But if you than issue a command "gluster volume heal <vol-name> info" it gives a error "Transport endpoint is not connected" for the restored brick(where glusterd was down during the restore). But gluster volume status shows that the brick is online.

As follows:
===========

[root@inception ~]# gluster volume heal vol1 info
Brick inception.lab.eng.blr.redhat.com:/var/run/gluster/snaps
Number of entries: 0

Brick rhs-arch-srv2.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick2/b2
Status: Transport endpoint is not connected

Brick rhs-arch-srv3.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick3/b2/
Number of entries: 0

Brick rhs-arch-srv4.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick4/b2/
Number of entries: 0

[root@inception ~]# 



Version-Release number of selected component (if applicable):
==============================================================

glusterfs-3.6.0.27-1.el6rhs.x86_64


How reproducible:
=================
always


Steps to Reproduce:
===================
1. Create 4 node cluster
2. Create a 2*2 volume
3. Create a snapshot (snap1) of the volume
4. Check the gluster volume heal <vol-name> info, it should be successful
5. bring down glusterd on one of the node(for ex node2)
6. offline the volume using "gluster volume stop vol"
7. Restore the volume to snap1. Restore should be successful
8. Start the volume
9. Start the glusterd on node2
10. check gluster volume status <vol-name>, it should list all the process online.
11. Check the gluster volume heal <vol-name> info

Actual results:
===============

It errors as "Status: Transport endpoint is not connected" for a brick participating in the node where glusterd was down during restore (node2)


Expected results:
=================

When glusterd is brought online at step 9 the "gluster volume heal  <vol-name> info" should not error with "Transport endpoint is not connected"

Comment 2 Rahul Hinduja 2014-08-13 13:55:04 UTC
Marking this bug urgent as client also doesnt connect to the brick which is part of the node2. Any writes from client is pending to this brick.

Comment 7 Shalaka 2014-09-20 09:37:32 UTC
Please review and sign-off edited doc text.

Comment 8 Shalaka 2014-09-26 05:45:25 UTC
Canceling need_info as Rajesh reviewed and signed-off doc text during online review meeting.


Note You need to log in before you can comment on or make changes to this bug.