Bug 1130114 - Vdsm's monitorDomain ERROR ("Storage domain does not exist") floods the logs
Summary: Vdsm's monitorDomain ERROR ("Storage domain does not exist") floods the logs
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 3.5.0
Assignee: Nir Soffer
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-14 11:20 UTC by Ori Gofen
Modified: 2016-02-10 16:47 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-14 17:21:24 UTC
oVirt Team: Storage


Attachments (Terms of Use)
vdsm+engine logs (1.00 MB, application/gzip)
2014-08-14 11:20 UTC, Ori Gofen
no flags Details

Description Ori Gofen 2014-08-14 11:20:54 UTC
Created attachment 926742 [details]
vdsm+engine logs

Description of problem:

This bug related to the same problem described at BZ #1101009,though the following scenario cause vdsm to loop through find_domain tracebacks
and errors.

Thread-24::ERROR::2014-08-14 13:44:35,456::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 52d8ebbf-6a2e-4968-9a0a-11f46ddbc612 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('52d8ebbf-6a2e-4968-9a0a-11f46ddbc612',)
Thread-14::DEBUG::2014-08-14 13:44:35,456::__init__::225::IOProcess::(_processLogs) Queuing request...
Thread-24::ERROR::2014-08-14 13:44:35,457::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 52d8ebbf-6a2e-4968-9a0a-11f46ddbc612 monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 204, in _monitorDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('52d8ebbf-6a2e-4968-9a0a-11f46ddbc612',)

rather than throw the error one time (as described at BZ #1101009)

Setup:
Have two initialized setups(with host_1 on setup_1,host_2 on setup_2)

Steps to Reproduce 1:
1.add host_1 to setup_2 (do not remove it from setup_1)
2.remove host_1 from setup_2
3.reinstall host_1 on setup_1 
Actual results:
vdsm log get's flooded with errors and tracebacks

Steps to Reproduce 2:
1.add storage domain

Actual results:
vdsm log throws several errors and a traceback

Version-Release number of selected component (if applicable):
rc1

How reproducible:
100%

Expected results:
vdsm should know to handle this situation

Additional info:

***** note ******
host_1 had contained an nfs server,the engine throws several errors through Steps to reproduce 1 operation:

2014-08-14 14:00:38,877 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-8-thread-25) [2ea93c87] Command ConnectStorageServerVDSCommand(HostName = vdsb, HostId = a437569e-70a7-444e-99b9-f5e7b4b43bce, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: 6684d658-23f5-49a0-81f7-94eaeebcbc5a, connection: 10.35.102.78:/nfsshare, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]) execution failed. Exception: VDSNetworkException: java.net.SocketException: Socket closed

i didn't considered it to be a bug but just for general knowledge when going over engine's log.

Comment 1 Nir Soffer 2014-08-14 14:35:55 UTC
(In reply to Ori from comment #0)
> Created attachment 926742 [details]
> vdsm+engine logs
> 
> Description of problem:
> 

I don't see any description of a problem.

What is the problem?

> 
> Setup:
> Have two initialized setups(with host_1 on setup_1,host_2 on setup_2)

What are setup_1 and setup_2? we cannot use this info to reproduce anything.

> 
> Steps to Reproduce 1:
> 1.add host_1 to setup_2 (do not remove it from setup_1)

What do you mean by adding a host to another setup without removing it from the other?

> 2.remove host_1 from setup_2
> 3.reinstall host_1 on setup_1 
> Actual results:
> vdsm log get's flooded with errors and tracebacks

This flow is very unclear - I cannot reproduce this according to this description.
You must give much more detailed description.

Since we cannot do anything with this description, we will not handle this in 
this bug - please open another for this flow.

> 
> Steps to Reproduce 2:
> 1.add storage domain
> 
> Actual results:
> vdsm log throws several errors and a traceback

We will continue to handle this in this bug.

> 
> Version-Release number of selected component (if applicable):
> rc1
> 
> How reproducible:
> 100%
> 
> Expected results:
> vdsm should know to handle this situation

This does not mean anything.

Comment 2 Nir Soffer 2014-08-14 14:43:08 UTC
Ori, please create clean vdsm log showing the errors when creating a storage domain.

Comment 3 Ori Gofen 2014-08-14 16:16:51 UTC
(In reply to Nir Soffer from comment #1)
> (In reply to Ori from comment #0)
> > Created attachment 926742 [details]
> > vdsm+engine logs
> > 
> > Description of problem:
> > 
> 
> I don't see any description of a problem.
> 
> What is the problem?

As I mentioned at the description, BZ #1101009 provides a lot of info about this bug. The main issue here that on several occasions vdsm searches and monitors domains which either not have been created yet or not "belongs" to it.

> > 
> > Setup:
> > Have two initialized setups(with host_1 on setup_1,host_2 on setup_2)
> 
> What are setup_1 and setup_2? we cannot use this info to reproduce anything.

setup_1 and setup_2 are two different setups!

> > 
> > Steps to Reproduce 1:
> > 1.add host_1 to setup_2 (do not remove it from setup_1)
>
> What do you mean by adding a host to another setup without removing it from
> the other?

when you have one setup,lets say "setup_1" it has a host connected to it lets call that host,"host_1",then you add host_1 to a different setup called in our case,"setup_2",do not remove or maintain host_1 from setup_1 while doing so.

> > 2.remove host_1 from setup_2
> > 3.reinstall host_1 on setup_1 
> > Actual results:
> > vdsm log get's flooded with errors and tracebacks
> 
> This flow is very unclear - I cannot reproduce this according to this
> description.
> You must give much more detailed description.
> Since we cannot do anything with this description, we will not handle this
> in 
> this bug - please open another for this flow.

Detailed Steps to reproduce:

Before reproducing this bug,make sure you have:

2X machines(or VM's) with oVirt engine 3.5 rc1 version installed on both
2X setups of oVirt (one on each),to setup oVirt please run engine-setup
2X initialized dc's (one on each dc) to initialize a dc you need to add a host and create a storage domain.

** make sure both hosts are "up" **


Ok now the steps:

1.add one of the hosts (both of them are up,just to remind you),to the other setup,after the host was added and is in the state up move to step 2
2.now you have 2 Setups,one with 1 host on it(it is now none-responsive) and one with 2 hosts(both are up),we want to remove now, the same host we just added, from the setup which has two hosts on it.
3. now reinstall the none-responsive host on the first dc(dc - data center) 

hopefully this is clear enough.

> > 
> > Steps to Reproduce 2:
> > 1.add storage domain
> > 
> > Actual results:
> > vdsm log throws several errors and a traceback
> 
> We will continue to handle this in this bug.
> 
> > 
> > Version-Release number of selected component (if applicable):
> > rc1
> > 
> > How reproducible:
> > 100%
> > 
> > Expected results:
> > vdsm should know to handle this situation
> 
> This does not mean anything.

haven't you read BZ #1101009 ?

>Ori, please create clean vdsm log showing the errors when creating a storage >domain.

haven't you read BZ #1101009 ?

flow one is the bug! This bug is about those errors during vdsm monitoring.
we don't want to clean them,we want to solve them.

However,the attachments you are asking for are found at... BZ #1101009.

Comment 4 Nir Soffer 2014-08-14 16:53:52 UTC
Allon, according to comment 3, it looks like Ori is using the system in a way which is not supported. I don't see any business value in this, and suggest to close this as WONTFIX.

Comment 5 Allon Mureinik 2014-08-14 17:21:24 UTC
(In reply to Nir Soffer from comment #4)
> Allon, according to comment 3, it looks like Ori is using the system in a
> way which is not supported. I don't see any business value in this, and
> suggest to close this as WONTFIX.
Agreed.


Note You need to log in before you can comment on or make changes to this bug.