Bug 1300095

Summary: Attaching gluster network does not initiate peer probe against gluster dedicated interfaces
Product: [oVirt] ovirt-engine Reporter: Ivan Bulatovic <combuster>
Component: BLL.GlusterAssignee: Sahina Bose <sabose>
Status: CLOSED CURRENTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: medium Docs Contact:
Priority: high    
Version: 3.6.1.3CC: blackfede, bmcclain, bugs, combuster, michal.skrivanek, rhev-integ, sabose
Target Milestone: ovirt-4.0.4Flags: rule-engine: ovirt-4.0.z+
rule-engine: planning_ack+
sabose: devel_ack+
sasundar: testing_ack+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-4.0.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-26 12:38:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine_log
none
vdsm + engine log
none
snip engine.log while attaching the gluster network to the host
none
Another snip of engine.log while attaching gluster network to the host none

Description Ivan Bulatovic 2016-01-20 00:10:03 UTC
Description of problem:

After tagging a network for gluster usage and connecting it to a desired network interface on the nodes in the cluster, I am unable to create a volume by adding bricks. Error shown in the webadmin is:

Error while executing action Create Gluster Volume: Volume create failed
error: Host ip_address is not in 'Peer in Cluster' state
return code: 30800

IP address is the one provided for the gluster network interface on one of the nodes.

What puzzles me the most is that I can't find any trace of this in the engine.log or vdsm.log on the hosts except for the:

2016-01-20 00:27:30,908 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-39) [56b09c31] Correlation ID: 56b09c31, Job ID: cf0134de-ba05-406f-ac7c-c9b6410793eb, Call Stack: null, Custom Event ID: -1, Message: Creation of Gluster Volume rep10 failed on cluster DC1C.

in the ovirt engine log.

Version-Release number of selected component (if applicable):

ovirt-engine-3.6.1.3-1.el7.centos.noarch
glusterfs-server-3.7.6-1.el7.x86_64

How reproducible:

Always, clean install and every attempt after cleaning all related configuration, except if peers are manually probed.

Steps to Reproduce:
1. Create a network and select it as a gluster network
2. Attach the network to the desired interfaces and provide an IP address
3. Create a replica volume from two bricks

Additional info:

I know that replica 2 is not supported but this should not have any effect on the issue. Gluster peer status only shows the hostname that is resolving to an IP address of the ovirtmgmt interface on the nodes. I have 4 NIC's, two are bonded and attached to the ovirtmgmt network, and other two are bonded and attached to the gluster network.

Comment 1 Sahina Bose 2016-02-10 14:09:58 UTC
Could you provide following information
1. Which cluster version in Ovirt have you added the gluster servers to?
2. Could you attach engine logs from the time you attached gluster network to the interface

Peer probing of alternate interface should happen as part of periodic sync, if the host status is UP. need to see engine log to know what's going on.

Regarding the error message - "Host ip_address is not in 'Peer in Cluster' state" - this message is directly returned from gluster CLI and forwarded to the caller (UI)

Comment 2 Ivan Bulatovic 2016-02-10 22:46:19 UTC
Created attachment 1123009 [details]
engine_log

Hi Sahina,

I've tried again yesterday with 3.6-snapshot. Issue is still there. Since this was a clean install, compatibility version is 3.6, and cluster node type is virt and gluster. Gluster network shows a Gluster role, is attached but not set as required.

I'm providing the log covering the timespan [attach gluster network -> create a volume].

Comment 3 Sahina Bose 2016-02-22 09:58:22 UTC
Thanks, from the logs I see that vglust n/w is associated with bond1 that has a static IP. However I see no logs indicating that this has been probed as alternate IP address in gluster.
Peer probe happens if an interface is associated with logical network with role "gluster", and the interface has an IP address that's not been already probed.
Looks like a bug, will try and reproduce this.

Could you also attach vdsm.log from host "ovirttest1"?

Comment 4 Ivan Bulatovic 2016-02-24 14:45:12 UTC
Created attachment 1130237 [details]
vdsm + engine log

Hi Sahina,

this is a combined snip from the engine and SPM node vdsm log. I've attached additional content from the vdsm.log, because I'm not sure how often peer probe is triggered from the engine.

gluster peer status returns only hostname/fqdn that resolves to IP from the ovirtmgmt network. This was a clean install and IP's from the gluster network were not probed (neither automatically or manually).

Have you managed to reproduce this, maybe I'm missing a step somewhere?

ovirtmgmt network is not tagged for gluster traffic, gluster network is.

Comment 5 Sahina Bose 2016-04-29 12:14:56 UTC
Ivan, yes and sorry couldn't get to this earlier - we have always tested in a cluster with only gluster service enabled - hence hadn't come across this before.
In your case you have both gluster and virt service enabled. In such cases, the current code that removes/syncs the server information is skipped - as we do not want hypervisor servers to be removed from cluster when someone detaches peer from gluster CLI.

I will send a patch to handle this case of probing alternate interface - which is now embedded in code block that is skipped

Comment 6 Yaniv Kaul 2016-05-04 18:25:10 UTC
A draft on master means it'll miss 3.6.6. Moving to 3.6.7. Let me know if it's not OK.

Comment 7 Sahina Bose 2016-06-20 07:56:06 UTC
Patch not in 3.6 branch, changing to POST

Comment 8 Federico Fortini 2016-07-13 16:24:40 UTC
I've verified and this issue is still present on Ovirt 4.0.0.
I've configured the cluster for BOTH Gluster and VIRT, and VDSM create a pair using the FQDN of ovirtmgm interface.

Using engine:
ovirt-engine  4.0.0.6-1.el7.centos  @ovirt-4.0
Using node:
ovirt-release-host-node.noarch   4.0.0-5.el7

Comment 9 SATHEESARAN 2016-07-19 16:54:29 UTC
The test failed with RHEV 3.6.8-3

1. I had 3 hosts with 2 network interfaces
2. Added these hosts to the 3.6 gluster + virt cluster using IP ( from nic1 )
3. Create a network only for gluster with gluster type checked and VM network unchecked
4. Attached this network to the other interface

Observation :
Peer probe was not initiated with the IP to which the gluster network was attached

I have repeated the same test with gluster-only-cluster and it worked well.

Comment 10 Sahina Bose 2016-07-20 06:17:41 UTC
This patches for this have been merged in ovirt-4.0 branch and have made it into the ovirt-4.0.1 build. Moving to ON_QA with corrected version

Comment 11 SATHEESARAN 2016-08-23 03:36:01 UTC
I have tested with RHV 4.0.2-7 and gluster3.7.9 installed with vdsm-4.18.11-1.el7ev.x86_64

Each machine had 2 access to 2 networks, and had an IP x, y
1. Added 2 gluster nodes to the gluster only cluster with IP x
2. Create a new logical network which is configured as 'Gluster network' using 'Manage Network' option
3. Attached this network to the host interface with IP Y

There is no re-probe that happened with the new IP Y

Marking this bug as failed QA.

Comment 12 Red Hat Bugzilla Rules Engine 2016-08-23 03:36:06 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 13 SATHEESARAN 2016-08-23 03:37:47 UTC
Created attachment 1193125 [details]
snip engine.log while attaching the gluster network to the host

Comment 14 SATHEESARAN 2016-08-23 03:38:34 UTC
Created attachment 1193126 [details]
Another snip of engine.log while attaching gluster network to the host

Comment 16 SATHEESARAN 2016-09-19 10:05:36 UTC
(In reply to SATHEESARAN from comment #11)
> I have tested with RHV 4.0.2-7 and gluster3.7.9 installed with
> vdsm-4.18.11-1.el7ev.x86_64
> 
> Each machine had 2 access to 2 networks, and had an IP x, y
> 1. Added 2 gluster nodes to the gluster only cluster with IP x
> 2. Create a new logical network which is configured as 'Gluster network'
> using 'Manage Network' option
> 3. Attached this network to the host interface with IP Y
> 
> There is no re-probe that happened with the new IP Y
> 
> Marking this bug as failed QA.

Identified the cause why this fix is not working for me.
After attaching the gluster network to the interface, the interface doesn't show up the IP and peer probe doesn't happens with this IP

When clicking on 'Refresh Host Capabilities', I could see the IP appearing on that interface and reprobe also happens with that IP

Comment 17 SATHEESARAN 2016-09-19 10:15:30 UTC
This issue is fixed with 4.0.z. and moved to failedqa missing the known_issue - https://bugzilla.redhat.com/show_bug.cgi?id=1246047

Comment 18 SATHEESARAN 2016-09-19 10:35:32 UTC
Tested with RHV 4.0.4-2 and RHGS 3.1.3, and all works

Refer comment16