Bug 966342 - [RHEVM-RHS] One of the RHS Node in gluster cluster regularly goes non-operational as soon as its up
[RHEVM-RHS] One of the RHS Node in gluster cluster regularly goes non-operati...
Status: CLOSED NEXTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-webadmin-portal (Show other bugs)
3.1.4
Unspecified Linux
medium Severity medium
: ---
: 3.3.0
Assigned To: Sahina Bose
SATHEESARAN
gluster
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-23 02:05 EDT by SATHEESARAN
Modified: 2016-02-10 13:59 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
virt rhev integration
Last Closed: 2013-06-24 02:39:37 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Gluster
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Screenshot of RHEVM, showing one non-operational RHS Nodes (216.93 KB, image/png)
2013-05-23 02:05 EDT, SATHEESARAN
no flags Details

  None (edit)
Description SATHEESARAN 2013-05-23 02:05:20 EDT
Created attachment 752001 [details]
Screenshot of RHEVM, showing one non-operational RHS Nodes

Description of problem:

RHS Nodes is shown non-operational immediately after it comes up. It happens at regular interval.

Events tab for that RHS Node says, "Host rhs-node-3 cannot access one of the Storage Domains attached to the Data Center data-center-1. Setting Host state to Non-Operational"

Version-Release number of selected component (if applicable):
RHS   : glusterfs-3.3.0.8rhs-1.el6rhs [ RHS2.0 Update 5 ]
RHEVM : RHEVM 3.1.4 [3.1.0-53.el6e]/[si28.1]
VDSM  
RHS Nodes in gluster cluster: 
RHEL-H Node in virt cluster :

How reproducible:


Steps to Reproduce:
1. Create a POSIX FS Data center of compatibility 3.1
2. Create a gluster cluster in the newly created data center.
3. Add RHS Nodes to this gluster cluster. RHS Nodes used 
for this gluster cluster, contains RHS 2.0 Update 5 rpms for glusterfs
4. Create a new virt cluster and add a RHEL-H [RHEL 6.4]to it.
5. Create a 6X2 Distributed Replicate volume using RHS Nodes.
6. Create a Data Domain [storage domain] and attach it to RHEL-H
7. Create a NFS export in RHEVM itself and make it  for ISO Domain
8. Attach the ISO Domain to the Data center

Actual results:
One of the RHS Nodes is shown as Non-operational. After sometime it comes up and immediately turns non-operational. It happens regularly.

Expected results:
The data center should be up and so will be the Nodes in the cluster

Additional info:
Comment 2 SATHEESARAN 2013-05-23 02:24:31 EDT
Additional Info:

Prior to hitting this issue, I was witnessing that Data-domain was throwing error, while attaching ISO domain. Also, I observed Data center going down and coming up, while attaching the ISO domain.

But later this was not happening.
Comment 4 Sahina Bose 2013-05-23 06:54:31 EDT
Does it come back UP?
Comment 5 SATHEESARAN 2013-05-23 07:51:35 EDT
Sahina, 

The behavior is fluctuating,

I could able to see in Events for the Node,

<snip>	
2013-May-23, 17:20
	
Host rhs-node-3 cannot access one of the Storage Domains attached to the Data Center data-center-1. Setting Host state to Non-Operational.
	
571ab2f0
	
2013-May-23, 17:20
	
Detected new Host rhs-node-3. Host state was set to Up.
</snip>

for every 5 minutes.

The host is coming up and immediately goes down,
Comment 6 SATHEESARAN 2013-05-23 08:45:18 EDT
(In reply to SATHEESARAN from comment #5)
> Sahina, 
> 
> The behavior is fluctuating,
> 
> I could able to see in Events for the Node,
> 
> <snip>	
> 2013-May-23, 17:20
> 	
> Host rhs-node-3 cannot access one of the Storage Domains attached to the
> Data Center data-center-1. Setting Host state to Non-Operational.
> 	
> 571ab2f0
> 	
> 2013-May-23, 17:20
> 	
> Detected new Host rhs-node-3. Host state was set to Up.
> </snip>
> 
> for every 5 minutes.
> 
> The host is coming up and immediately goes down,

I am seeing this issue even now
Comment 7 SATHEESARAN 2013-05-23 08:45:46 EDT
(In reply to SATHEESARAN from comment #2)
> Additional Info:
> 
> Prior to hitting this issue, I was witnessing that Data-domain was throwing
> error, while attaching ISO domain. Also, I observed Data center going down
> and coming up, while attaching the ISO domain.
> 
> But later this was not happening.

While attaching ISO Domain, I see an error message. This blocks further test run.

Seems like a bug, have to explore more to understand this problem, but I could not find a trace of it now.

2013-05-22 23:32:49,357 ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (pool-3-thread-43) [3ffa59c6] The connection with details 10.70.37.72:/exports/iso failed because of error code 100 and error message is: general exception

2013-05-22 23:32:49,357 ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (pool-3-thread-46) [626e53ca] The connection with details 10.70.37.72:/exports/iso failed because of error code 100 and error message is: general exception

2013-05-22 23:32:49,358 ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (pool-3-thread-47) [2777b6d] The connection with details 10.70.37.72:/exports/iso failed because of error code 100 and error message is: general exception

2013-05-22 23:32:49,359 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-46) [626e53ca] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand.

2013-05-22 23:32:49,360 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-43) [3ffa59c6] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand.

2013-05-22 23:32:49,360 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-47) [2777b6d] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand.
Comment 8 SATHEESARAN 2013-05-23 08:48:24 EDT
(In reply to SATHEESARAN from comment #7)

This error was the one I was observing earlier while attaching the ISO Domain to Data center. This can be even seen in engine.logs.

But later this error vanished, followed by RHS Nodes going up and down at regular intervals
Comment 9 Sahina Bose 2013-06-14 04:22:30 EDT
rhs-node-3 was set as spm
2013-05-22 18:21:31,897 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-3-thread-41) [6cc90984] START, ConnectStorageServerVDSCommand(HostName = rhs-node-3, HostId = 825c6db4-c299-11e2-a8a9-525400e469d5, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: 18defe0a-3d29-489c-9760-4eaca5813220, connection: 10.70.37.72:/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 40f6fab8

and there was an issue with storage domain - so set to Non-Operational
vds rhs-node-3 reported domain 7fb41558-ed54-4781-bd55-8e26cf22c362:data-domain as in problem, moving the vds to status NonOperational

The Gluster sync job was setting the Host to Up, however as it does not check for storage domain status.

Gluster cluster hosts should not have SPM status - This is fixed in 3.2

Note You need to log in before you can comment on or make changes to this bug.