958797 – Gluster: after killing a brick process, the brick status still shows the brick as "UP" for 5min

Bug 958797 - Gluster: after killing a brick process, the brick status still shows the brick as "UP" for 5min

Summary: Gluster: after killing a brick process, the brick status still shows the bric...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Frontend.WebAdmin
Sub Component:
Version:	---
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.2.2
Target Release:	---
Assignee:	Sahina Bose
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1379309
Blocks:	958803
TreeView+	depends on / blocked

Reported:	2013-05-02 12:36 UTC by Prasanth
Modified:	2018-05-10 06:23 UTC (History)
CC List:	15 users (show)
Fixed In Version:	no gerrit references
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	958803 (view as bug list)
Environment:
Last Closed:	2018-05-10 06:23:12 UTC
oVirt Team:	Gluster
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+ rule-engine: planning_ack+ rule-engine: devel_ack+ sasundar: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	66758	0	master	MERGED	engine: Handle geo-rep and brick event	2017-12-27 10:56:45 UTC

Description Prasanth 2013-05-02 12:36:35 UTC

Description of problem:

After killing a brick process, the brick status still shows the brick as "UP"


Version-Release number of selected component (if applicable):  Red Hat Enterprise Virtualization Manager Version: 3.2.0-10.20.master.el6ev 


How reproducible: 100%


Steps to Reproduce:
1. Create a "Gluster Cluster"
2. Add one or more hosts
3. Create a volume and start it
4. Find the PID of the bricks using "gluster volume status <VOLNAME>" and kill one or more bricks using # kill -9 <PID>

Once done, check the brick status from the UI
  
Actual results: The brick status is not updated immediately.


Expected results: The brick status in the UI should immediately get reflected and change it to "DOWN", if down and vice versa


Additional info: It's easily reproducible. However, if you still needs the logs, please let me know.

Comment 1 Shireesh 2013-05-13 10:34:51 UTC

Brick status is updated every 5 minutes. Have you checked waiting for 5 minutes?

Comment 2 Prasanth 2013-05-16 08:08:59 UTC

(In reply to comment #1)
> Brick status is updated every 5 minutes. Have you checked waiting for 5
> minutes?

Yes, I know it will get updated every 5min and I've tested and confirmed that too. But as I already mentioned in the "Actual Results", status of critical processes like this should get updated and notified immediately to the admin without much delay. A brick being down can affect a part or entire functioning of the volume depending upon the volume type. This can even affect the production if all the bricks goes down in a volume and no alerts or notifications are send to the admin for 5min. I know this is resource consuming task, but we should find a suitable solution for this to mitigate this issue.

As a min requirement, we should ateast provide a manual refresh button for all the sub-tabs, especially for bricks. On refresh, this should fetch the latest status from all the servers and update the fields accordingly.

Comment 4 Itamar Heim 2013-12-08 07:43:10 UTC

still planned for 3.4?

Comment 5 Sahina Bose 2013-12-09 08:52:59 UTC

Yes - a manual refresh option to be provided for 3.4

Comment 8 Sahina Bose 2014-03-26 10:43:28 UTC

The patch to introduce sync button did not make it to 3.4 - so retargeting to 3.5

Comment 9 Eyal Edri 2015-02-25 08:39:49 UTC

3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2

Comment 10 Eyal Edri 2015-04-28 11:22:01 UTC

moving to 3.5.4 due to capacity planning for 3.5.3.
if you believe this should remain in 3.5.3, please sync with pm/dev/qe and a full triple ack for it. also - ensure priority is set accordingly.

Comment 11 Sandro Bonazzola 2015-10-26 12:42:55 UTC

this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 12 Sahina Bose 2015-10-30 12:02:26 UTC

Retargeting to coincide with alerting mechanism from gluster

Comment 13 Yaniv Lavi 2015-11-01 12:30:47 UTC

(In reply to Sahina Bose from comment #12)
> Retargeting to coincide with alerting mechanism from gluster

Please use target milestone and flags to change to 4.0 from now on.

Comment 14 Red Hat Bugzilla Rules Engine 2015-11-30 20:41:55 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 15 Sandro Bonazzola 2016-05-02 10:00:28 UTC

Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 16 Sahina Bose 2018-01-31 07:07:39 UTC

Moving to modified as dependent RFE is now merged

Comment 17 Sandro Bonazzola 2018-02-22 11:14:51 UTC

If no code change since Jan 31, this should probably be ON_QE. Can you check?

Comment 18 Sahina Bose 2018-02-23 05:44:30 UTC

Yes, moved to ON_QA

Comment 19 Sandro Bonazzola 2018-04-05 11:50:06 UTC

Sahina, on which version of ovirt-engine has this been fixed?

Comment 20 Sahina Bose 2018-04-23 09:18:29 UTC

ovirt-engine-4.2.2.2 - this bug was dependent on bug 1379309

Comment 21 SATHEESARAN 2018-05-10 02:11:29 UTC

Tested with RHV 4.2.3 and gluster 3.12

RHV makes use of eventing mechanism, and RHV UI shows syncs much faster.

One problem observed is that RHV uses still uses the CLI based polling method, and that takes sometime to respond ( obvious ). Sometimes brick kill event are found via CLI based polling. I will keep track of this issue in a separate bug

Comment 22 Sandro Bonazzola 2018-05-10 06:23:12 UTC

This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.