Bug 1036039 - [RHSC] Bricks status is not getting synched when gluster CLI output shows the port as N/A
Summary: [RHSC] Bricks status is not getting synched when gluster CLI output shows the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc
Version: 2.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: RHGS 2.1.2
Assignee: Sahina Bose
QA Contact: Shruti Sampat
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-29 09:08 UTC by Shruti Sampat
Modified: 2015-05-13 16:27 UTC (History)
7 users (show)

Fixed In Version: cb11
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-25 08:06:51 UTC
Target Upstream Version:


Attachments (Terms of Use)
engine logs (14.26 MB, text/x-log)
2013-11-29 09:10 UTC, Shruti Sampat
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0208 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #2 2014-02-25 12:20:30 UTC
oVirt gerrit 21908 0 None None None Never

Description Shruti Sampat 2013-11-29 09:08:50 UTC
Description of problem:
------------------------

When glusterd was stopped and then started on a machine, the gluster CLI command for volume status returned the following output - 

[root@rhs glusterfs_rpms]# gluster v status                                                                                                                                                                    
Status of volume: dis_rep_vol
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.84:/rhs/brick4/b1                        N/A     Y       21427
Brick 10.70.37.132:/rhs/brick4/b1                       49153   Y       15844
Brick 10.70.37.84:/rhs/brick5/b1                        N/A     Y       21438
Brick 10.70.37.132:/rhs/brick5/b1                       49154   Y       15856
Brick 10.70.37.64:/rhs/brick5/b1                        49154   Y       6428
Brick 10.70.37.176:/rhs/brick5/b1                       49154   Y       14884
NFS Server on localhost                                 2049    Y       3285
Self-heal Daemon on localhost                           N/A     Y       3293
NFS Server on 10.70.37.176                              2049    Y       5005
Self-heal Daemon on 10.70.37.176                        N/A     Y       5012
NFS Server on 10.70.37.132                              2049    Y       30595
Self-heal Daemon on 10.70.37.132                        N/A     Y       30605
NFS Server on 10.70.37.64                               2049    Y       22804
Self-heal Daemon on 10.70.37.64                         N/A     Y       22812
 
Task Status of Volume dis_rep_vol
------------------------------------------------------------------------------
There are no active volume tasks


The port number for a couple of bricks, as seen above is N/A. Because of this, the brick status that was set to down, due to glusterd going down, was not set to up after glusterd was started. The following is from the engine logs -

2013-11-28 20:57:38,270 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler_Worker-67) Error while refreshing brick statuses for volume dis_rep_vol of cluster test: org.ovirt.eng
ine.core.common.errors.VdcBLLException: VdcBLLException: java.lang.NumberFormatException: For input string: "N/A" (Failed with error ENGINE and code 5001)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:122) [bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GlusterJob.runVdsCommand(GlusterJob.java:64) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.getVolumeAdvancedDetails(GlusterSyncJob.java:848) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshBrickStatuses(GlusterSyncJob.java:806) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshClusterHeavyWeightData(GlusterSyncJob.java:791) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshHeavyWeightData(GlusterSyncJob.java:766) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) [:1.7.0_45]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_45]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_45]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]


Version-Release number of selected component (if applicable):
Red Hat Storage Console Version: 2.1.2-0.25.master.el6_5 
glusterfs 3.4.0.44.1u2rhs

How reproducible:
Saw it a couple of times.

Steps to Reproduce:
1. In a cluster of 4 nodes, kill glusterd on one of the nodes, see the status of bricks residing on that node, being set to DOWN in the UI.
2. Start glusterd on the node and wait for 5 minutes for the brick status to be synched correctly, as UP.

Actual results:
The brick status is not set to UP, even more than 10 minutes.
Find the above pasted exception in the engine logs.

Expected results:
The brick status should have been set to UP.

Additional info:

Comment 1 Shruti Sampat 2013-11-29 09:10:58 UTC
Created attachment 830549 [details]
engine logs

Comment 3 Sahina Bose 2013-12-02 09:36:46 UTC
If the port is returned as N/A for a brick, the brick should be shown as DOWN - according to gluster team.

Handled code in engine so that an exception is not thrown in such cases.

Comment 4 Shruti Sampat 2013-12-17 07:04:18 UTC
Verified as fixed in Red Hat Storage Console Version: 2.1.2-0.27.beta.el6_5. Brick status remains down when "gluster volume status" returns ports as N/A. No exception seen in engine logs.

Comment 6 errata-xmlrpc 2014-02-25 08:06:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html


Note You need to log in before you can comment on or make changes to this bug.