Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1129675 - [SNAPSHOT]: Once the snapshot is retored, the gluster volume heal <vol-name> shows "Transport endpoint is not connected" and writes from client is pending for this brick
[SNAPSHOT]: Once the snapshot is retored, the gluster volume heal <vol-name> ...
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: snapshot (Show other bugs)
3.0
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
Rahul Hinduja
SNAPSHOT
: Triaged, ZStream
Depends On:
Blocks: 1087818
  Show dependency treegraph
 
Reported: 2014-08-13 08:57 EDT by Rahul Hinduja
Modified: 2018-04-16 12:03 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
If glusterd is down in one of the nodes in cluster or if the node itself is down, then performing a snapshot restore operation leads to inconsistency. Workaround (if any): Perform snapshot restore only if all the nodes and their corresponding glusterd services are running. In the following conditions after restore, restart glusterd service using "service glusterd start" command -Executing "gluster volume heal <vol-name> info" command displays the error message "Transport endpoint not connected". -Error occurs when clients try to connect to glusterd service.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-16 12:03:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rahul Hinduja 2014-08-13 08:57:16 EDT
Description of problem:
========================

In a scenario, where snap restore is performed when glusterd was down in one of the node in cluster. Restore is successful and entry is updated in the missed_snaps_list with entry 2:1. When a glusterd is brought online, the missed entry list restores and update its entry to 2:2 that means the restore is successful on this node as well.

But if you than issue a command "gluster volume heal <vol-name> info" it gives a error "Transport endpoint is not connected" for the restored brick(where glusterd was down during the restore). But gluster volume status shows that the brick is online.

As follows:
===========

[root@inception ~]# gluster volume heal vol1 info
Brick inception.lab.eng.blr.redhat.com:/var/run/gluster/snaps
Number of entries: 0

Brick rhs-arch-srv2.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick2/b2
Status: Transport endpoint is not connected

Brick rhs-arch-srv3.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick3/b2/
Number of entries: 0

Brick rhs-arch-srv4.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick4/b2/
Number of entries: 0

[root@inception ~]# 



Version-Release number of selected component (if applicable):
==============================================================

glusterfs-3.6.0.27-1.el6rhs.x86_64


How reproducible:
=================
always


Steps to Reproduce:
===================
1. Create 4 node cluster
2. Create a 2*2 volume
3. Create a snapshot (snap1) of the volume
4. Check the gluster volume heal <vol-name> info, it should be successful
5. bring down glusterd on one of the node(for ex node2)
6. offline the volume using "gluster volume stop vol"
7. Restore the volume to snap1. Restore should be successful
8. Start the volume
9. Start the glusterd on node2
10. check gluster volume status <vol-name>, it should list all the process online.
11. Check the gluster volume heal <vol-name> info

Actual results:
===============

It errors as "Status: Transport endpoint is not connected" for a brick participating in the node where glusterd was down during restore (node2)


Expected results:
=================

When glusterd is brought online at step 9 the "gluster volume heal  <vol-name> info" should not error with "Transport endpoint is not connected"
Comment 2 Rahul Hinduja 2014-08-13 09:55:04 EDT
Marking this bug urgent as client also doesnt connect to the brick which is part of the node2. Any writes from client is pending to this brick.
Comment 7 Shalaka 2014-09-20 05:37:32 EDT
Please review and sign-off edited doc text.
Comment 8 Shalaka 2014-09-26 01:45:25 EDT
Canceling need_info as Rajesh reviewed and signed-off doc text during online review meeting.

Note You need to log in before you can comment on or make changes to this bug.