Bug 1300095 - Attaching gluster network does not initiate peer probe against gluster dedicated interfaces
Attaching gluster network does not initiate peer probe against gluster dedica...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Gluster (Show other bugs)
3.6.1.3
x86_64 Linux
high Severity medium (vote)
: ovirt-4.0.4
: ---
Assigned To: Sahina Bose
SATHEESARAN
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-19 19:10 EST by Ivan Bulatovic
Modified: 2016-09-26 08:38 EDT (History)
7 users (show)

See Also:
Fixed In Version: ovirt-4.0.1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-26 08:38:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Gluster
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.0.z+
rule-engine: planning_ack+
sabose: devel_ack+
sasundar: testing_ack+


Attachments (Terms of Use)
engine_log (119.82 KB, text/plain)
2016-02-10 17:46 EST, Ivan Bulatovic
no flags Details
vdsm + engine log (751.02 KB, text/plain)
2016-02-24 09:45 EST, Ivan Bulatovic
no flags Details
snip engine.log while attaching the gluster network to the host (9.76 KB, text/plain)
2016-08-22 23:37 EDT, SATHEESARAN
no flags Details
Another snip of engine.log while attaching gluster network to the host (13.79 KB, text/plain)
2016-08-22 23:38 EDT, SATHEESARAN
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 56854 master MERGED engine: Sync gluster servers with virt service 2016-06-16 10:40 EDT
oVirt gerrit 59379 master MERGED engine: gluster sync - fix covertity issue 2016-06-20 00:48 EDT
oVirt gerrit 59581 ovirt-engine-4.0 MERGED engine: Sync gluster servers with virt service 2016-06-27 04:11 EDT
oVirt gerrit 59582 ovirt-engine-4.0 MERGED engine: gluster sync - fix covertity issue 2016-06-29 04:37 EDT

  None (edit)
Description Ivan Bulatovic 2016-01-19 19:10:03 EST
Description of problem:

After tagging a network for gluster usage and connecting it to a desired network interface on the nodes in the cluster, I am unable to create a volume by adding bricks. Error shown in the webadmin is:

Error while executing action Create Gluster Volume: Volume create failed
error: Host ip_address is not in 'Peer in Cluster' state
return code: 30800

IP address is the one provided for the gluster network interface on one of the nodes.

What puzzles me the most is that I can't find any trace of this in the engine.log or vdsm.log on the hosts except for the:

2016-01-20 00:27:30,908 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-39) [56b09c31] Correlation ID: 56b09c31, Job ID: cf0134de-ba05-406f-ac7c-c9b6410793eb, Call Stack: null, Custom Event ID: -1, Message: Creation of Gluster Volume rep10 failed on cluster DC1C.

in the ovirt engine log.

Version-Release number of selected component (if applicable):

ovirt-engine-3.6.1.3-1.el7.centos.noarch
glusterfs-server-3.7.6-1.el7.x86_64

How reproducible:

Always, clean install and every attempt after cleaning all related configuration, except if peers are manually probed.

Steps to Reproduce:
1. Create a network and select it as a gluster network
2. Attach the network to the desired interfaces and provide an IP address
3. Create a replica volume from two bricks

Additional info:

I know that replica 2 is not supported but this should not have any effect on the issue. Gluster peer status only shows the hostname that is resolving to an IP address of the ovirtmgmt interface on the nodes. I have 4 NIC's, two are bonded and attached to the ovirtmgmt network, and other two are bonded and attached to the gluster network.
Comment 1 Sahina Bose 2016-02-10 09:09:58 EST
Could you provide following information
1. Which cluster version in Ovirt have you added the gluster servers to?
2. Could you attach engine logs from the time you attached gluster network to the interface

Peer probing of alternate interface should happen as part of periodic sync, if the host status is UP. need to see engine log to know what's going on.

Regarding the error message - "Host ip_address is not in 'Peer in Cluster' state" - this message is directly returned from gluster CLI and forwarded to the caller (UI)
Comment 2 Ivan Bulatovic 2016-02-10 17:46 EST
Created attachment 1123009 [details]
engine_log

Hi Sahina,

I've tried again yesterday with 3.6-snapshot. Issue is still there. Since this was a clean install, compatibility version is 3.6, and cluster node type is virt and gluster. Gluster network shows a Gluster role, is attached but not set as required.

I'm providing the log covering the timespan [attach gluster network -> create a volume].
Comment 3 Sahina Bose 2016-02-22 04:58:22 EST
Thanks, from the logs I see that vglust n/w is associated with bond1 that has a static IP. However I see no logs indicating that this has been probed as alternate IP address in gluster.
Peer probe happens if an interface is associated with logical network with role "gluster", and the interface has an IP address that's not been already probed.
Looks like a bug, will try and reproduce this.

Could you also attach vdsm.log from host "ovirttest1"?
Comment 4 Ivan Bulatovic 2016-02-24 09:45 EST
Created attachment 1130237 [details]
vdsm + engine log

Hi Sahina,

this is a combined snip from the engine and SPM node vdsm log. I've attached additional content from the vdsm.log, because I'm not sure how often peer probe is triggered from the engine.

gluster peer status returns only hostname/fqdn that resolves to IP from the ovirtmgmt network. This was a clean install and IP's from the gluster network were not probed (neither automatically or manually).

Have you managed to reproduce this, maybe I'm missing a step somewhere?

ovirtmgmt network is not tagged for gluster traffic, gluster network is.
Comment 5 Sahina Bose 2016-04-29 08:14:56 EDT
Ivan, yes and sorry couldn't get to this earlier - we have always tested in a cluster with only gluster service enabled - hence hadn't come across this before.
In your case you have both gluster and virt service enabled. In such cases, the current code that removes/syncs the server information is skipped - as we do not want hypervisor servers to be removed from cluster when someone detaches peer from gluster CLI.

I will send a patch to handle this case of probing alternate interface - which is now embedded in code block that is skipped
Comment 6 Yaniv Kaul 2016-05-04 14:25:10 EDT
A draft on master means it'll miss 3.6.6. Moving to 3.6.7. Let me know if it's not OK.
Comment 7 Sahina Bose 2016-06-20 03:56:06 EDT
Patch not in 3.6 branch, changing to POST
Comment 8 Federico Fortini 2016-07-13 12:24:40 EDT
I've verified and this issue is still present on Ovirt 4.0.0.
I've configured the cluster for BOTH Gluster and VIRT, and VDSM create a pair using the FQDN of ovirtmgm interface.

Using engine:
ovirt-engine  4.0.0.6-1.el7.centos  @ovirt-4.0
Using node:
ovirt-release-host-node.noarch   4.0.0-5.el7
Comment 9 SATHEESARAN 2016-07-19 12:54:29 EDT
The test failed with RHEV 3.6.8-3

1. I had 3 hosts with 2 network interfaces
2. Added these hosts to the 3.6 gluster + virt cluster using IP ( from nic1 )
3. Create a network only for gluster with gluster type checked and VM network unchecked
4. Attached this network to the other interface

Observation :
Peer probe was not initiated with the IP to which the gluster network was attached

I have repeated the same test with gluster-only-cluster and it worked well.
Comment 10 Sahina Bose 2016-07-20 02:17:41 EDT
This patches for this have been merged in ovirt-4.0 branch and have made it into the ovirt-4.0.1 build. Moving to ON_QA with corrected version
Comment 11 SATHEESARAN 2016-08-22 23:36:01 EDT
I have tested with RHV 4.0.2-7 and gluster3.7.9 installed with vdsm-4.18.11-1.el7ev.x86_64

Each machine had 2 access to 2 networks, and had an IP x, y
1. Added 2 gluster nodes to the gluster only cluster with IP x
2. Create a new logical network which is configured as 'Gluster network' using 'Manage Network' option
3. Attached this network to the host interface with IP Y

There is no re-probe that happened with the new IP Y

Marking this bug as failed QA.
Comment 12 Red Hat Bugzilla Rules Engine 2016-08-22 23:36:06 EDT
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 13 SATHEESARAN 2016-08-22 23:37 EDT
Created attachment 1193125 [details]
snip engine.log while attaching the gluster network to the host
Comment 14 SATHEESARAN 2016-08-22 23:38 EDT
Created attachment 1193126 [details]
Another snip of engine.log while attaching gluster network to the host
Comment 16 SATHEESARAN 2016-09-19 06:05:36 EDT
(In reply to SATHEESARAN from comment #11)
> I have tested with RHV 4.0.2-7 and gluster3.7.9 installed with
> vdsm-4.18.11-1.el7ev.x86_64
> 
> Each machine had 2 access to 2 networks, and had an IP x, y
> 1. Added 2 gluster nodes to the gluster only cluster with IP x
> 2. Create a new logical network which is configured as 'Gluster network'
> using 'Manage Network' option
> 3. Attached this network to the host interface with IP Y
> 
> There is no re-probe that happened with the new IP Y
> 
> Marking this bug as failed QA.

Identified the cause why this fix is not working for me.
After attaching the gluster network to the interface, the interface doesn't show up the IP and peer probe doesn't happens with this IP

When clicking on 'Refresh Host Capabilities', I could see the IP appearing on that interface and reprobe also happens with that IP
Comment 17 SATHEESARAN 2016-09-19 06:15:30 EDT
This issue is fixed with 4.0.z. and moved to failedqa missing the known_issue - https://bugzilla.redhat.com/show_bug.cgi?id=1246047
Comment 18 SATHEESARAN 2016-09-19 06:35:32 EDT
Tested with RHV 4.0.4-2 and RHGS 3.1.3, and all works

Refer comment16

Note You need to log in before you can comment on or make changes to this bug.