Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1099209

Summary: Invalid status on Data Center - Setting Data Center status to Non Responsive - Network error during communication with the Host
Product: [Retired] oVirt Reporter: Sokratis <sokratis123k>
Component: ovirt-engine-coreAssignee: Liron Aravot <laravot>
Status: CLOSED NOTABUG QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4CC: acathrow, amureini, bugs, gklein, iheim, laravot, sokratis123k, yeylon
Target Milestone: ---   
Target Release: 3.4.2   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-01 19:28:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovirt engine log
none
ovirt-node-01 vdsm.log
none
ovirt-node-02 vdsm.log none

Description Sokratis 2014-05-19 18:24:29 UTC
Created attachment 897272 [details]
ovirt engine log

Description of problem:

On 19/05/2014 17:14 PM the Data Center switched to Non-Responsive and every attempt to active the Master Data Domain failed

Version-Release number of selected component (if applicable): oVirt 3.4.1


How reproducible:


Steps to Reproduce:
1. Select Master Data Domain and try to Activate

Actual results:

Attached failed with the following error:

Network error during communication with the Host

engine.log is attached

Thank you,

Sokratis

Comment 1 Allon Mureinik 2014-05-20 07:21:48 UTC
Liron, can you please take a look?

Comment 2 Allon Mureinik 2014-05-20 07:22:26 UTC
Sokratis, can you also add VDSM's logs?

Comment 3 Sokratis 2014-05-20 08:08:03 UTC
There are 2 hosts connected at the moment. Do you need vdsm.log from both?

Comment 4 Sokratis 2014-05-20 08:25:42 UTC
Created attachment 897498 [details]
ovirt-node-01 vdsm.log

Comment 5 Sokratis 2014-05-20 08:26:06 UTC
Created attachment 897500 [details]
ovirt-node-02 vdsm.log

Comment 6 Sokratis 2014-05-20 08:26:45 UTC
I added vdsm.log from both hosts anyway. Let me know if you need anything else.

Thanks!

Comment 7 Sokratis 2014-05-21 07:45:55 UTC
After rebooting both ovirt nodes the data center came up again and has been running normally for almost a day. I will keep monitoring this for a couple of days.

In the meantime, please let me know if you find anything in the logs that indicates a problem.

Thank you!

Comment 8 Allon Mureinik 2014-05-21 17:59:42 UTC
Liron, any insights?

Comment 9 Sokratis 2014-05-31 15:31:26 UTC
Any updates regarding this?

Comment 10 Liron Aravot 2014-06-01 13:48:04 UTC
 Sokratis, the main issue here seems to be storage access problem 

Thread-53::DEBUG::2014-05-19 17:14:38,989::task::1185::TaskManager.Task::(prepare) Task=`3e2832b1-451a-4f09-a33b-b5d1cd652491`::finished: {'4e6c401f-7ab7-4b16-99f5-e32cc564495e': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.000514588', 'lastCheck': '9.5', 'valid': True}, 'f742258e-ecf8-436f-9da9-b5cdb4c305ba': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000332391', 'lastCheck': '223.1', 'valid': True}}

(note the high "lastCheck" value), can you check your host access to the storage?

Comment 11 Sokratis 2014-06-01 17:19:12 UTC
Actually what I noticed was very high load on the ovirt nodes at that time (>300) and that's why I rebooted them. Since then (10 days ago) I didn't have any problems with the host access to the storage.

It could be a stuck process or just a random problem with storage. I don't think we can investigate this further. I suggest you close it and if the problem occurs again I can just open a new request.

Thanks for your help!

Sokratis

Comment 12 Allon Mureinik 2014-06-01 19:28:43 UTC
(In reply to Sokratis from comment #11)
> I suggest you close it and if the
> problem occurs again I can just open a new request.
Closing based on this comment.

Sokratis, please don't hesitate to reopen this issue if the problem reoccurs.