Bug 966342
Summary: | [RHEVM-RHS] One of the RHS Node in gluster cluster regularly goes non-operational as soon as its up | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | SATHEESARAN <sasundar> | ||||
Component: | ovirt-engine-webadmin-portal | Assignee: | Sahina Bose <sabose> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | SATHEESARAN <sasundar> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.1.4 | CC: | acathrow, dyasny, ecohen, grajaiya, hchiramm, iheim, Rhev-m-bugs, rhs-bugs, sabose, sasundar, vbellur, ykaul | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.3.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | gluster | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: |
virt rhev integration
|
|||||
Last Closed: | 2013-06-24 06:39:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Gluster | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Additional Info: Prior to hitting this issue, I was witnessing that Data-domain was throwing error, while attaching ISO domain. Also, I observed Data center going down and coming up, while attaching the ISO domain. But later this was not happening. Does it come back UP? Sahina, The behavior is fluctuating, I could able to see in Events for the Node, <snip> 2013-May-23, 17:20 Host rhs-node-3 cannot access one of the Storage Domains attached to the Data Center data-center-1. Setting Host state to Non-Operational. 571ab2f0 2013-May-23, 17:20 Detected new Host rhs-node-3. Host state was set to Up. </snip> for every 5 minutes. The host is coming up and immediately goes down, (In reply to SATHEESARAN from comment #5) > Sahina, > > The behavior is fluctuating, > > I could able to see in Events for the Node, > > <snip> > 2013-May-23, 17:20 > > Host rhs-node-3 cannot access one of the Storage Domains attached to the > Data Center data-center-1. Setting Host state to Non-Operational. > > 571ab2f0 > > 2013-May-23, 17:20 > > Detected new Host rhs-node-3. Host state was set to Up. > </snip> > > for every 5 minutes. > > The host is coming up and immediately goes down, I am seeing this issue even now (In reply to SATHEESARAN from comment #2) > Additional Info: > > Prior to hitting this issue, I was witnessing that Data-domain was throwing > error, while attaching ISO domain. Also, I observed Data center going down > and coming up, while attaching the ISO domain. > > But later this was not happening. While attaching ISO Domain, I see an error message. This blocks further test run. Seems like a bug, have to explore more to understand this problem, but I could not find a trace of it now. 2013-05-22 23:32:49,357 ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (pool-3-thread-43) [3ffa59c6] The connection with details 10.70.37.72:/exports/iso failed because of error code 100 and error message is: general exception 2013-05-22 23:32:49,357 ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (pool-3-thread-46) [626e53ca] The connection with details 10.70.37.72:/exports/iso failed because of error code 100 and error message is: general exception 2013-05-22 23:32:49,358 ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (pool-3-thread-47) [2777b6d] The connection with details 10.70.37.72:/exports/iso failed because of error code 100 and error message is: general exception 2013-05-22 23:32:49,359 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-46) [626e53ca] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand. 2013-05-22 23:32:49,360 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-43) [3ffa59c6] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand. 2013-05-22 23:32:49,360 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-47) [2777b6d] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand. (In reply to SATHEESARAN from comment #7) This error was the one I was observing earlier while attaching the ISO Domain to Data center. This can be even seen in engine.logs. But later this error vanished, followed by RHS Nodes going up and down at regular intervals rhs-node-3 was set as spm 2013-05-22 18:21:31,897 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-3-thread-41) [6cc90984] START, ConnectStorageServerVDSCommand(HostName = rhs-node-3, HostId = 825c6db4-c299-11e2-a8a9-525400e469d5, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: 18defe0a-3d29-489c-9760-4eaca5813220, connection: 10.70.37.72:/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 40f6fab8 and there was an issue with storage domain - so set to Non-Operational vds rhs-node-3 reported domain 7fb41558-ed54-4781-bd55-8e26cf22c362:data-domain as in problem, moving the vds to status NonOperational The Gluster sync job was setting the Host to Up, however as it does not check for storage domain status. Gluster cluster hosts should not have SPM status - This is fixed in 3.2 |
Created attachment 752001 [details] Screenshot of RHEVM, showing one non-operational RHS Nodes Description of problem: RHS Nodes is shown non-operational immediately after it comes up. It happens at regular interval. Events tab for that RHS Node says, "Host rhs-node-3 cannot access one of the Storage Domains attached to the Data Center data-center-1. Setting Host state to Non-Operational" Version-Release number of selected component (if applicable): RHS : glusterfs-3.3.0.8rhs-1.el6rhs [ RHS2.0 Update 5 ] RHEVM : RHEVM 3.1.4 [3.1.0-53.el6e]/[si28.1] VDSM RHS Nodes in gluster cluster: RHEL-H Node in virt cluster : How reproducible: Steps to Reproduce: 1. Create a POSIX FS Data center of compatibility 3.1 2. Create a gluster cluster in the newly created data center. 3. Add RHS Nodes to this gluster cluster. RHS Nodes used for this gluster cluster, contains RHS 2.0 Update 5 rpms for glusterfs 4. Create a new virt cluster and add a RHEL-H [RHEL 6.4]to it. 5. Create a 6X2 Distributed Replicate volume using RHS Nodes. 6. Create a Data Domain [storage domain] and attach it to RHEL-H 7. Create a NFS export in RHEVM itself and make it for ISO Domain 8. Attach the ISO Domain to the Data center Actual results: One of the RHS Nodes is shown as Non-operational. After sometime it comes up and immediately turns non-operational. It happens regularly. Expected results: The data center should be up and so will be the Nodes in the cluster Additional info: