1121560 – [SNAPSHOT]: Output message when a snapshot create is issued when multiple bricks are down needs to be improved

Bug 1121560 - [SNAPSHOT]: Output message when a snapshot create is issued when multiple bricks are down needs to be improved

Summary: [SNAPSHOT]: Output message when a snapshot create is issued when multiple bri...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Mohammed Rafi KC
QA Contact:	senaik
Docs Contact:
URL:
Whiteboard:	SNAPSHOT
Depends On:
Blocks:	1202842 1205596 1223636 1232886
TreeView+	depends on / blocked

Reported:	2014-07-21 08:37 UTC by Rahul Hinduja
Modified:	2016-09-17 12:56 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.0-3.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1205596 (view as bug list)
Environment:
Last Closed:	2015-07-29 04:34:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Rahul Hinduja 2014-07-21 08:37:14 UTC

Description of problem:
=======================

Currently when multiple bricks are offline and snapshot creation is issued, the snapshot fails complaining about only one brick is offline. It should print both about all the bricks which are offline. 

For example: 
===========

2  bricks of a node is down

[root@inception ~]# gluster v status vol1 | grep "inception"
Brick inception.lab.eng.blr.redhat.com:/rhs/brick2/b2   N/A     N       28623
Brick inception.lab.eng.blr.redhat.com:/rhs/brick3/b3   N/A     N       28684
[root@inception ~]#

Snapshot create only complains about 1 brick

[root@inception ~]# gluster snapshot create RS2 vol1
snapshot create: failed: brick inception.lab.eng.blr.redhat.com:/rhs/brick2/b2 is not started. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
Snapshot command failed
[root@inception ~]# 


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.0.24-1.el6rhs.x86_64

How reproducible:
==================
1/1


Steps to Reproduce:
===================
1. bring 2 process down of node
2. create a snapshot from the same node


Actual results:
===============

[root@inception ~]# gluster snapshot create RS2 vol1
snapshot create: failed: brick inception.lab.eng.blr.redhat.com:/rhs/brick2/b2 is not started. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
Snapshot command failed
[root@inception ~]# 

Expected results:
=================

The illusion with the above output is that only one brick is offline whereas in actual there are 2 bricks which are offline. The message should be very clear about what all processes are offline. Tabular format is much better approach.

Comment 4 Anil Shah 2015-06-12 09:46:42 UTC

Can you please update the patch link in the BZ.

Comment 5 senaik 2015-06-12 10:20:36 UTC

Version: glusterfs-3.7.1-2.el6rhs.x86_64
=======

Created snapshot when multiple bricks are offline in the volume. It prints one message for every node where bricks are down. 

In a 4 node cluster if some bricks from 3 nodes are down, it prints the message three times as below : 

 gluster snapshot create S1 vol0
snapshot create: failed: One or more bricks are not running. Please run volume status command to see brick status.
Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
One or more bricks are not running. Please run volume status command to see brick status.
Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
One or more bricks are not running. Please run volume status command to see brick status.
Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
Snapshot command failed

It should show the message only once irrespective of the number of nodes where bricks are down (or) print specific details from each node mentioning which brick is down.

Moving it back to 'Assigned'

Comment 6 Avra Sengupta 2015-06-18 13:21:59 UTC

Mainline - http://review.gluster.org/#/c/11234/
3.7 - http://review.gluster.org/#/c/11293/
Downstream - https://code.engineering.redhat.com/gerrit/51039

Comment 7 Avra Sengupta 2015-06-19 06:53:32 UTC

As per the current framework, the best we can do right now is display the node information along with the error string. That should bring some structure to the error display on screen. Please file a RFE for future, so as to tackle this issue more elegantly

Comment 9 senaik 2015-07-09 07:15:43 UTC

Version : glusterfs-3.7.1-8.el6rhs.x86_64

Killed some bricks in the volume from 3 nodes in the cluster and created a snapshot on the volume , it fails with the below message with details on which node bricks are not running: 
 
 gluster snapshot create S1 vol0
snapshot create: failed: Pre Validation failed on rhs-arch-srv2.lab.eng.blr.redhat.com. One or more bricks are not running. Please run volume status command to see brick status.
Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
Pre Validation failed on rhs-arch-srv3.lab.eng.blr.redhat.com. One or more bricks are not running. Please run volume status command to see brick status.
Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
Pre Validation failed on rhs-arch-srv4.lab.eng.blr.redhat.com. One or more bricks are not running. Please run volume status command to see brick status.
Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
Snapshot command failed

As per comment 7, marking this bug 'verified'.
Will be raising a RFE to handle the failure scenarios more elegantly.

Comment 11 errata-xmlrpc 2015-07-29 04:34:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.