Bug 1038988

Summary: Gluster brick sync does not work when host has multiple interfaces
Product: [Retired] oVirt Reporter: Sahina Bose <sabose>
Component: ovirt-engine-webadminAssignee: Ramesh N <rnachimu>
Status: CLOSED CURRENTRELEASE QA Contact: bugs <bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.3CC: acathrow, dtsang, ecohen, iheim, kmayilsa, knarra, mgoldboi, mmahoney, pprakash, rnachimu, sdharane, ssampat, yeylon
Target Milestone: ---   
Target Release: 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: gluster
Fixed In Version: ovirt-3.4.0-beta2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-31 12:32:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1024889    

Description Sahina Bose 2013-12-06 09:57:14 UTC
Description of problem:

Syncing of gluster volume info does not work when host has multiple networks, as described in the user scenario below:

Because of a few issues I had with keepalived, I moved my storage network to it's own VLAN but it seems to have broken part of the ovirt gluster management.

Same scenario:
2 Hosts

1x Engine, VDSM, Gluster
1x VDSM,Gluster

So to properly split the gluster data and ovirtmgmt I simply assigned them two host names and two IPS.

172.16.0.1 (ovirtmgmt) hvx.melb.example.net
172.16.1.1 (gluster) gsx.melb.example.net

However the oVirt engine does not seem to like this, it would not pick up the gluster volume as "running" until I did a restart through the UI. 

2013-12-06 13:15:08,940 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-75) START, GlusterVolumesListVDSCommand(HostName = HV01, HostId = 91c776e4-8454-4b2a-90b2-8700b6f58d9d), log id: 6efbe3fe
2013-12-06 13:15:08,973 WARN  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] (DefaultQuartzScheduler_Worker-75) Could not find server gs01.melb.example.net in cluster 99408929-82cf-4dc7-a532-9d998063fa95
2013-12-06 13:15:08,976 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-75) FINISH, GlusterVolumesListVDSCommand, return: {a285e87a-d191-4b55-98f5-a4e0bcb85517=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@9a3ec542}, log id: 6efbe3fe
2013-12-06 13:15:08,989 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler_Worker-75) Error while updating Volume DATA!: java.lang.NullPointerException
        at org.ovirt.engine.core.common.utils.gluster.GlusterCoreUtil.findBrick(GlusterCoreUtil.java:65) [common.jar:]
        at org.ovirt.engine.core.common.utils.gluster.GlusterCoreUtil.findBrick(GlusterCoreUtil.java:51) [common.jar:]
        at org.ovirt.engine.core.common.utils.gluster.GlusterCoreUtil.containsBrick(GlusterCoreUtil.java:39) [common.jar:]
        at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.removeDeletedBricks(GlusterSyncJob.java:518) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.updateBricks(GlusterSyncJob.java:510) [bll.jar:]


Volume information isn't being pulled as it thinks the gs01.melb.example.net is not within the cluster, where in fact it is but registered under hv01.melb.example.net



Version-Release number of selected component (if applicable):
3.3

How reproducible:
Always

Steps to Reproduce:
As above

Expected results:

Gluster brick sync should use the gluster host UUID.


Additional info:

Comment 1 Sahina Bose 2013-12-06 09:58:38 UTC
Tim, 
can you enhance the vdsm verb for glusterVolumesList to return the host uuid as well?

Comment 2 Sahina Bose 2013-12-06 10:02:44 UTC
Changing target release to 3.4, as there's a change required in glusterfs as well.

Comment 3 Sahina Bose 2014-02-05 06:08:42 UTC
Issue with gluster sync when host has multiple interfaces has been fixed.

However, please note that if engine and gluster operate on the same host with different ip addresses, operations from engine like remove-brick and add brick will fail as gluster is not aware of the ip address issued for these commands.

The safest way is to work with FQDNs in these cases.

Comment 4 Sandro Bonazzola 2014-03-31 12:32:33 UTC
this is an automated message: moving to Closed CURRENT RELEASE since oVirt 3.4.0 has been released