| Summary: | [RHSC] Bricks status is not getting synched when gluster CLI output shows the port as N/A | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Shruti Sampat <ssampat> | ||||
| Component: | rhsc | Assignee: | Sahina Bose <sabose> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Shruti Sampat <ssampat> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 2.1 | CC: | dpati, dtsang, knarra, mmahoney, pprakash, rhs-bugs, sdharane | ||||
| Target Milestone: | --- | Keywords: | ZStream | ||||
| Target Release: | RHGS 2.1.2 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | cb11 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-02-25 08:06:51 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 830549 [details]
engine logs
If the port is returned as N/A for a brick, the brick should be shown as DOWN - according to gluster team. Handled code in engine so that an exception is not thrown in such cases. Verified as fixed in Red Hat Storage Console Version: 2.1.2-0.27.beta.el6_5. Brick status remains down when "gluster volume status" returns ports as N/A. No exception seen in engine logs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html |
Description of problem: ------------------------ When glusterd was stopped and then started on a machine, the gluster CLI command for volume status returned the following output - [root@rhs glusterfs_rpms]# gluster v status Status of volume: dis_rep_vol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.84:/rhs/brick4/b1 N/A Y 21427 Brick 10.70.37.132:/rhs/brick4/b1 49153 Y 15844 Brick 10.70.37.84:/rhs/brick5/b1 N/A Y 21438 Brick 10.70.37.132:/rhs/brick5/b1 49154 Y 15856 Brick 10.70.37.64:/rhs/brick5/b1 49154 Y 6428 Brick 10.70.37.176:/rhs/brick5/b1 49154 Y 14884 NFS Server on localhost 2049 Y 3285 Self-heal Daemon on localhost N/A Y 3293 NFS Server on 10.70.37.176 2049 Y 5005 Self-heal Daemon on 10.70.37.176 N/A Y 5012 NFS Server on 10.70.37.132 2049 Y 30595 Self-heal Daemon on 10.70.37.132 N/A Y 30605 NFS Server on 10.70.37.64 2049 Y 22804 Self-heal Daemon on 10.70.37.64 N/A Y 22812 Task Status of Volume dis_rep_vol ------------------------------------------------------------------------------ There are no active volume tasks The port number for a couple of bricks, as seen above is N/A. Because of this, the brick status that was set to down, due to glusterd going down, was not set to up after glusterd was started. The following is from the engine logs - 2013-11-28 20:57:38,270 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler_Worker-67) Error while refreshing brick statuses for volume dis_rep_vol of cluster test: org.ovirt.eng ine.core.common.errors.VdcBLLException: VdcBLLException: java.lang.NumberFormatException: For input string: "N/A" (Failed with error ENGINE and code 5001) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:122) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.gluster.GlusterJob.runVdsCommand(GlusterJob.java:64) [bll.jar:] at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.getVolumeAdvancedDetails(GlusterSyncJob.java:848) [bll.jar:] at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshBrickStatuses(GlusterSyncJob.java:806) [bll.jar:] at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshClusterHeavyWeightData(GlusterSyncJob.java:791) [bll.jar:] at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshHeavyWeightData(GlusterSyncJob.java:766) [bll.jar:] at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) [:1.7.0_45] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_45] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_45] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] Version-Release number of selected component (if applicable): Red Hat Storage Console Version: 2.1.2-0.25.master.el6_5 glusterfs 3.4.0.44.1u2rhs How reproducible: Saw it a couple of times. Steps to Reproduce: 1. In a cluster of 4 nodes, kill glusterd on one of the nodes, see the status of bricks residing on that node, being set to DOWN in the UI. 2. Start glusterd on the node and wait for 5 minutes for the brick status to be synched correctly, as UP. Actual results: The brick status is not set to UP, even more than 10 minutes. Find the above pasted exception in the engine logs. Expected results: The brick status should have been set to UP. Additional info: