Red Hat Bugzilla – Bug 1300095
Attaching gluster network does not initiate peer probe against gluster dedicated interfaces
Last modified: 2016-09-26 08:38:16 EDT
Description of problem:
After tagging a network for gluster usage and connecting it to a desired network interface on the nodes in the cluster, I am unable to create a volume by adding bricks. Error shown in the webadmin is:
Error while executing action Create Gluster Volume: Volume create failed
error: Host ip_address is not in 'Peer in Cluster' state
return code: 30800
IP address is the one provided for the gluster network interface on one of the nodes.
What puzzles me the most is that I can't find any trace of this in the engine.log or vdsm.log on the hosts except for the:
2016-01-20 00:27:30,908 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-39) [56b09c31] Correlation ID: 56b09c31, Job ID: cf0134de-ba05-406f-ac7c-c9b6410793eb, Call Stack: null, Custom Event ID: -1, Message: Creation of Gluster Volume rep10 failed on cluster DC1C.
in the ovirt engine log.
Version-Release number of selected component (if applicable):
Always, clean install and every attempt after cleaning all related configuration, except if peers are manually probed.
Steps to Reproduce:
1. Create a network and select it as a gluster network
2. Attach the network to the desired interfaces and provide an IP address
3. Create a replica volume from two bricks
I know that replica 2 is not supported but this should not have any effect on the issue. Gluster peer status only shows the hostname that is resolving to an IP address of the ovirtmgmt interface on the nodes. I have 4 NIC's, two are bonded and attached to the ovirtmgmt network, and other two are bonded and attached to the gluster network.
Could you provide following information
1. Which cluster version in Ovirt have you added the gluster servers to?
2. Could you attach engine logs from the time you attached gluster network to the interface
Peer probing of alternate interface should happen as part of periodic sync, if the host status is UP. need to see engine log to know what's going on.
Regarding the error message - "Host ip_address is not in 'Peer in Cluster' state" - this message is directly returned from gluster CLI and forwarded to the caller (UI)
Created attachment 1123009 [details]
I've tried again yesterday with 3.6-snapshot. Issue is still there. Since this was a clean install, compatibility version is 3.6, and cluster node type is virt and gluster. Gluster network shows a Gluster role, is attached but not set as required.
I'm providing the log covering the timespan [attach gluster network -> create a volume].
Thanks, from the logs I see that vglust n/w is associated with bond1 that has a static IP. However I see no logs indicating that this has been probed as alternate IP address in gluster.
Peer probe happens if an interface is associated with logical network with role "gluster", and the interface has an IP address that's not been already probed.
Looks like a bug, will try and reproduce this.
Could you also attach vdsm.log from host "ovirttest1"?
Created attachment 1130237 [details]
vdsm + engine log
this is a combined snip from the engine and SPM node vdsm log. I've attached additional content from the vdsm.log, because I'm not sure how often peer probe is triggered from the engine.
gluster peer status returns only hostname/fqdn that resolves to IP from the ovirtmgmt network. This was a clean install and IP's from the gluster network were not probed (neither automatically or manually).
Have you managed to reproduce this, maybe I'm missing a step somewhere?
ovirtmgmt network is not tagged for gluster traffic, gluster network is.
Ivan, yes and sorry couldn't get to this earlier - we have always tested in a cluster with only gluster service enabled - hence hadn't come across this before.
In your case you have both gluster and virt service enabled. In such cases, the current code that removes/syncs the server information is skipped - as we do not want hypervisor servers to be removed from cluster when someone detaches peer from gluster CLI.
I will send a patch to handle this case of probing alternate interface - which is now embedded in code block that is skipped
A draft on master means it'll miss 3.6.6. Moving to 3.6.7. Let me know if it's not OK.
Patch not in 3.6 branch, changing to POST
I've verified and this issue is still present on Ovirt 4.0.0.
I've configured the cluster for BOTH Gluster and VIRT, and VDSM create a pair using the FQDN of ovirtmgm interface.
ovirt-engine 126.96.36.199-1.el7.centos @ovirt-4.0
The test failed with RHEV 3.6.8-3
1. I had 3 hosts with 2 network interfaces
2. Added these hosts to the 3.6 gluster + virt cluster using IP ( from nic1 )
3. Create a network only for gluster with gluster type checked and VM network unchecked
4. Attached this network to the other interface
Peer probe was not initiated with the IP to which the gluster network was attached
I have repeated the same test with gluster-only-cluster and it worked well.
This patches for this have been merged in ovirt-4.0 branch and have made it into the ovirt-4.0.1 build. Moving to ON_QA with corrected version
I have tested with RHV 4.0.2-7 and gluster3.7.9 installed with vdsm-4.18.11-1.el7ev.x86_64
Each machine had 2 access to 2 networks, and had an IP x, y
1. Added 2 gluster nodes to the gluster only cluster with IP x
2. Create a new logical network which is configured as 'Gluster network' using 'Manage Network' option
3. Attached this network to the host interface with IP Y
There is no re-probe that happened with the new IP Y
Marking this bug as failed QA.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1193125 [details]
snip engine.log while attaching the gluster network to the host
Created attachment 1193126 [details]
Another snip of engine.log while attaching gluster network to the host
(In reply to SATHEESARAN from comment #11)
> I have tested with RHV 4.0.2-7 and gluster3.7.9 installed with
> Each machine had 2 access to 2 networks, and had an IP x, y
> 1. Added 2 gluster nodes to the gluster only cluster with IP x
> 2. Create a new logical network which is configured as 'Gluster network'
> using 'Manage Network' option
> 3. Attached this network to the host interface with IP Y
> There is no re-probe that happened with the new IP Y
> Marking this bug as failed QA.
Identified the cause why this fix is not working for me.
After attaching the gluster network to the interface, the interface doesn't show up the IP and peer probe doesn't happens with this IP
When clicking on 'Refresh Host Capabilities', I could see the IP appearing on that interface and reprobe also happens with that IP
This issue is fixed with 4.0.z. and moved to failedqa missing the known_issue - https://bugzilla.redhat.com/show_bug.cgi?id=1246047
Tested with RHV 4.0.4-2 and RHGS 3.1.3, and all works