Bug 1040235 - POSIXFS storage domain backed by Gluster is inaccessible in IS26
Summary: POSIXFS storage domain backed by Gluster is inaccessible in IS26
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-webadmin-portal
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ---
: 3.3.1
Assignee: Sahina Bose
QA Contact:
URL:
Whiteboard: gluster
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-11 01:53 UTC by Stephen Gordon
Modified: 2016-02-10 18:58 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-30 09:09:37 UTC
oVirt Team: Gluster
Target Upstream Version:


Attachments (Terms of Use)

Description Stephen Gordon 2013-12-11 01:53:40 UTC
Description of problem:

Upgraded RHEV-M and hosts from is25.1 to is26 and my POSIXFS domain (which is backed by Gluster storage) went down. The message in the event log shows:

    Gluster command [<UNKNOWN>] failed on server torricelli.

The engine.log shows:

    2013-12-10 20:15:00,172 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-26) [cbfb82] Command GlusterServersListVDS execution failed. Exception: VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.Exception'>:method "glusterHostsList" is not supported

This seems to suggest based on what I can find that the issue is a result of vdsm-gluster not being installed on the host, but this host didn't have vdsm-gluster installed previously either and things were working OK (presumably because the storage was being accessed as POSIXFS).

Now vdsm-gluster isn't being shipped for 3.3 (it was pulled in is26, but the system appears to require it installed for POSIXFS storage domains backed by gluster?

Version-Release number of selected component (if applicable):

Host:

vdsm-4.13.2-0.1.rc.el6ev.x86_64
vdsm-cli-4.13.2-0.1.rc.el6ev.noarch
vdsm-python-cpopen-4.13.0-0.10.beta1.el6ev.x86_64
vdsm-python-4.13.2-0.1.rc.el6ev.x86_64
vdsm-xmlrpc-4.13.2-0.1.rc.el6ev.noarch


How reproducible:

rhevm-log-collector-3.3.1-2.el6ev.noarch
redhat-support-plugin-rhev-3.3.0-8.el6ev.noarch
rhev-guest-tools-iso-3.3-6.noarch
rhevm-backend-3.3.0-0.37.beta1.el6ev.noarch
rhevm-reports-3.3.0-14.el6ev.noarch
rhevm-restapi-3.3.0-0.37.beta1.el6ev.noarch
rhevm-spice-client-x64-msi-3.3-4.el6_4.noarch
rhevm-setup-3.3.0-0.37.beta1.el6ev.noarch
rhevm-setup-plugins-3.3.0-1.el6ev.noarch
rhevm-spice-client-x64-cab-3.3-4.el6_4.noarch
rhevm-branding-rhev-3.3.0-1.2.beta1.el6ev.noarch
rhevm-iso-uploader-3.3.0-1.el6ev.noarch
rhevm-dependencies-3.3.4-1.el6ev.noarch
rhevm-userportal-3.3.0-0.37.beta1.el6ev.noarch
quartz-rhevm-1.8.3-5.noarch
rhevm-doc-3.3.0-1.el6eng.noarch
rhevm-dwh-3.3.0-16.el6ev.noarch
rhevm-lib-3.3.0-0.37.beta1.el6ev.noarch
rhevm-webadmin-portal-3.3.0-0.37.beta1.el6ev.noarch
rhevm-sdk-python-3.3.0.17-1.el6ev.noarch
rhevm-spice-client-x86-msi-3.3-4.el6_4.noarch
rhevm-spice-client-x86-cab-3.3-4.el6_4.noarch
rhevm-websocket-proxy-3.3.0-0.37.beta1.el6ev.noarch
rhevm-dbscripts-3.3.0-0.37.beta1.el6ev.noarch
rhevm-3.3.0-0.37.beta1.el6ev.noarch
rhevm-image-uploader-3.3.1-1.el6ev.noarch
rhevm-cli-3.3.0.8-1.el6ev.noarch
rhevm-tools-3.3.0-0.37.beta1.el6ev.noarch

Actual results:

POSIXFS storage domain backed by gluster is inaccessible 

Expected results:

POSIXFS storage domain backed by gluster to be accessible.

Additional info:

The only mount options in this environment are vers=3, there are no backup volumes.

Comment 1 Stephen Gordon 2013-12-11 03:11:10 UTC
Ok, this is weird. I was poking around the cluster settings and noticed that the virt service was disabled, the gluster service was enabled, and the radio button to change the option was greyed out (to be clear this cluster is explicitly for use as virt). To try and rectify this I took the following steps:

1) Set AllowClusterWithVirtGlusterEnabled to true, restarted ovirt-engine.
2) Set cluster to virt enabled, gluster disabled.
3) Set AllowClusterWithVirtGlusterEnabled to false, restarted ovirt-engine.
4) Still encountered same issue as in description when activating host.
5) Set AllowClusterWithVirtGlusterEnabled to true, restarted ovirt-engine.
6) Did *not* re-enable gluster service for the cluster, host activates.

So for some reason the host will activate with AllowClusterWithVirtGlusterEnabled set to true but not when it is set to false - even when only the virt service is enabled?

Not sure how much value there is in pursuing this under the circumstances but it's unclear to be exactly why this would be.

Comment 2 Stephen Gordon 2013-12-11 03:24:38 UTC
Lowered priority based on comment # 1, leaving severity as is based on immediate impact when encountered.

Comment 4 Sahina Bose 2013-12-11 08:34:12 UTC
There was a recent change to check if gluster is running on hosts in gluster enabled clusters which is why this error has shown up. But the issue seems to be that a virt cluster has gluster service enabled. 

Could you help me root cause the issue?

The RHEVM setup should have set the AllowClusterWithVirtGlusterEnabled=true. Was this value changed in the is25.1 setup?

This value is to control if allowing both virt service and gluster service is enabled on the cluster.

So IIUC, you had a cluster with ONLY virt service enabled in is25.1 and then you upgraded to is26.

Now, the cluster had gluster service enabled and virt service box greyed out?

Comment 5 Stephen Gordon 2013-12-11 12:05:24 UTC
(In reply to Sahina Bose from comment #4)
> There was a recent change to check if gluster is running on hosts in gluster
> enabled clusters which is why this error has shown up. But the issue seems
> to be that a virt cluster has gluster service enabled. 
> 
> Could you help me root cause the issue?
> 
> The RHEVM setup should have set the AllowClusterWithVirtGlusterEnabled=true.
> Was this value changed in the is25.1 setup?

I believe I looked at this at the time and left it as default, is there a way to check the historical values from the logs/database?

> This value is to control if allowing both virt service and gluster service
> is enabled on the cluster.
> 
> So IIUC, you had a cluster with ONLY virt service enabled in is25.1 and then
> you upgraded to is26.

The only host in the cluster does in fact have gluster on it (unsupported config) but the gluster setup is not managed via RHEV-M, hence only virt was enabled.

> Now, the cluster had gluster service enabled and virt service box greyed out?

Yes, this was probably the most surprising thing as from a RHEV perspective this host was already setup and running VMs. Does RHEV-M/VDSM now default this setting based on the detection of the glusterd service?

When I set AllowClusterWithVirtGlusterEnabled to true, and set the cluster to virt enabled and gluster disabled, the cluster comes up. With any other combination of these settings it does not (error shown in the bug description).

I'm unsure whether this bug is simply a result of my (unsupported) configuration, something that only people upgrading between beta builds will encounter, or a wider issue.

Comment 7 Sahina Bose 2013-12-18 01:48:10 UTC
(In reply to Stephen Gordon from comment #5)
> (In reply to Sahina Bose from comment #4)
> > There was a recent change to check if gluster is running on hosts in gluster
> > enabled clusters which is why this error has shown up. But the issue seems
> > to be that a virt cluster has gluster service enabled. 
> > 
> > Could you help me root cause the issue?
> > 
> > The RHEVM setup should have set the AllowClusterWithVirtGlusterEnabled=true.
> > Was this value changed in the is25.1 setup?
> 
> I believe I looked at this at the time and left it as default, is there a
> way to check the historical values from the logs/database?
> 

There's a backup of the database on running setup which can be found at /var/logs/ovirt-engine/backups. You could look at the previous value here

> > This value is to control if allowing both virt service and gluster service
> > is enabled on the cluster.
> > 
> > So IIUC, you had a cluster with ONLY virt service enabled in is25.1 and then
> > you upgraded to is26.
> 
> The only host in the cluster does in fact have gluster on it (unsupported
> config) but the gluster setup is not managed via RHEV-M, hence only virt was
> enabled.
> 
> > Now, the cluster had gluster service enabled and virt service box greyed out?
> 
> Yes, this was probably the most surprising thing as from a RHEV perspective
> this host was already setup and running VMs. Does RHEV-M/VDSM now default
> this setting based on the detection of the glusterd service?
> 

No, there's no such check and AFAIK does not change the setting of cluster.

> When I set AllowClusterWithVirtGlusterEnabled to true, and set the cluster
> to virt enabled and gluster disabled, the cluster comes up. With any other
> combination of these settings it does not (error shown in the bug
> description).
> 
> I'm unsure whether this bug is simply a result of my (unsupported)
> configuration, something that only people upgrading between beta builds will
> encounter, or a wider issue.

We will probably need to test this scenario to make sure there's no bug.
Sas, have you encountered this?

Comment 8 Stephen Gordon 2013-12-18 02:25:48 UTC
# grep AllowClusterWithVirtGlusterEnabled /var/lib/ovirt-engine/backups/*
/var/lib/ovirt-engine/backups/engine-20131120192236.vohOBZ.sql:291	AllowClusterWithVirtGlusterEnabled	false	general
/var/lib/ovirt-engine/backups/engine-20131210200110.zSCm8A.sql:291	AllowClusterWithVirtGlusterEnabled	true	general
/var/lib/ovirt-engine/backups/ovirt-engine_db_backup_2013_04_29_16_02_00.sql:291	AllowClusterWithVirtGlusterEnabled	false	general
/var/lib/ovirt-engine/backups/ovirt-engine_db_backup_2013_06_11_17_16_22.sql:291	AllowClusterWithVirtGlusterEnabled	false	general
/var/lib/ovirt-engine/backups/ovirt-engine_db_backup_2013_06_11_17_35_20.sql:291	AllowClusterWithVirtGlusterEnabled	false	general
/var/lib/ovirt-engine/backups/ovirt-engine_db_backup_2013_11_14_17_47_44.sql:291	AllowClusterWithVirtGlusterEnabled	false	general

Comment 9 Kanagaraj 2013-12-19 06:19:33 UTC
AllowClusterWithVirtGlusterEnabled doesn't have any implications on the lifecycle of the hosts.

AllowClusterWithVirtGlusterEnabled controls following things.

If this value is 'false'

1) Radio buttons will be shown in the 'Cluster' popup instead of checkboxes for Virt and Gluster services. So the user cannot create a hybrid cluster

2) Backend will not allow the user to create a cluster with both Virt and Gluster services enabled

If the value is 'true'

1) Checkboxes will be shown in the 'Cluster' popup for Virt and Gluster services. So the user can create a hybrid cluster

2) Backend will allow the user to create a cluster with both Virt and Gluster services enabled


By default RHEVM has AllowClusterWithVirtGlusterEnabled=false to avoid hybrid clusters.
So i doubt changing this value would result in change of behavior of hosts.


Checkbox or radio buttons might grey out if the cluster has VM's running or gluster volumes present.



Stephen, Is it possible can you share the db backup file to see what the cluster configuration is before the upgrade?

Comment 13 SATHEESARAN 2014-01-30 08:48:34 UTC
> We will probably need to test this scenario to make sure there's no bug.
> Sas, have you encountered this?

Apologies for the very late reply, missed this needinfo on me

So I have been doing updates from IS(x-1) to ISX, all the time, with hosts in virt-cluster and gluster-cluster.

And I have been never met with any issues so far..

Comment 14 Sean Cohen 2014-01-30 09:09:37 UTC
As this bug is not reproducing in QA, I am going ahead and closing it
Sean


Note You need to log in before you can comment on or make changes to this bug.