Description of problem:
Syncing of gluster volume info does not work when host has multiple networks, as described in the user scenario below:
Because of a few issues I had with keepalived, I moved my storage network to it's own VLAN but it seems to have broken part of the ovirt gluster management.
Same scenario:
2 Hosts
1x Engine, VDSM, Gluster
1x VDSM,Gluster
So to properly split the gluster data and ovirtmgmt I simply assigned them two host names and two IPS.
172.16.0.1 (ovirtmgmt) hvx.melb.example.net
172.16.1.1 (gluster) gsx.melb.example.net
However the oVirt engine does not seem to like this, it would not pick up the gluster volume as "running" until I did a restart through the UI.
2013-12-06 13:15:08,940 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-75) START, GlusterVolumesListVDSCommand(HostName = HV01, HostId = 91c776e4-8454-4b2a-90b2-8700b6f58d9d), log id: 6efbe3fe
2013-12-06 13:15:08,973 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] (DefaultQuartzScheduler_Worker-75) Could not find server gs01.melb.example.net in cluster 99408929-82cf-4dc7-a532-9d998063fa95
2013-12-06 13:15:08,976 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-75) FINISH, GlusterVolumesListVDSCommand, return: {a285e87a-d191-4b55-98f5-a4e0bcb85517=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@9a3ec542}, log id: 6efbe3fe
2013-12-06 13:15:08,989 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler_Worker-75) Error while updating Volume DATA!: java.lang.NullPointerException
at org.ovirt.engine.core.common.utils.gluster.GlusterCoreUtil.findBrick(GlusterCoreUtil.java:65) [common.jar:]
at org.ovirt.engine.core.common.utils.gluster.GlusterCoreUtil.findBrick(GlusterCoreUtil.java:51) [common.jar:]
at org.ovirt.engine.core.common.utils.gluster.GlusterCoreUtil.containsBrick(GlusterCoreUtil.java:39) [common.jar:]
at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.removeDeletedBricks(GlusterSyncJob.java:518) [bll.jar:]
at org.ovirt.engine.core.bll.gluster.GlusterSyncJob.updateBricks(GlusterSyncJob.java:510) [bll.jar:]
Volume information isn't being pulled as it thinks the gs01.melb.example.net is not within the cluster, where in fact it is but registered under hv01.melb.example.net
Version-Release number of selected component (if applicable):
3.3
How reproducible:
Always
Steps to Reproduce:
As above
Expected results:
Gluster brick sync should use the gluster host UUID.
Additional info:
Issue with gluster sync when host has multiple interfaces has been fixed.
However, please note that if engine and gluster operate on the same host with different ip addresses, operations from engine like remove-brick and add brick will fail as gluster is not aware of the ip address issued for these commands.
The safest way is to work with FQDNs in these cases.