Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1109156

Summary: [vdsm] Adding first NFS data domain failed couple of times before added successfully
Product: [Retired] oVirt Reporter: Jiri Belka <jbelka>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED CURRENTRELEASE QA Contact: Jiri Belka <jbelka>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5CC: amureini, bazulay, bugs, derez, fsimonce, gklein, iheim, jbelka, mgoldboi, rbalakri, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-engine-3.5.0_beta1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-17 12:24:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport-LogCollector-20140613131328.tar.xz none

Description Jiri Belka 2014-06-13 11:18:26 UTC
Created attachment 908492 [details]
sosreport-LogCollector-20140613131328.tar.xz

Description of problem:
Adding NFS data domain failed couple of times before added successfully. All with following error in Admin Portal:

~~~
Operation Canceled
Error while executing action: A Request to the Server failed with the following Status Code: 500
~~~

FYI this was first data domain and DC was in status 'Uninitialized' (obviously).

~~~
Thread-13::DEBUG::2014-06-13 13:10:08,328::task::595::TaskManager.Task::(_updateState) Task=`2a1320ef-de70-4b07-9b13-8290361dd806`::moving from state preparing -> state finished
Thread-13::DEBUG::2014-06-13 13:10:08,328::resourceManager::940::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-13::DEBUG::2014-06-13 13:10:08,328::resourceManager::977::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-13::DEBUG::2014-06-13 13:10:08,328::task::993::TaskManager.Task::(_decref) Task=`2a1320ef-de70-4b07-9b13-8290361dd806`::ref 0 aborting False
Thread-13::DEBUG::2014-06-13 13:10:08,384::BindingXMLRPC::325::vds::(wrapper) client [10.34.60.239] flowID [2f4d8180]
Thread-13::DEBUG::2014-06-13 13:10:08,385::task::595::TaskManager.Task::(_updateState) Task=`bf8ce66d-40e5-40fa-9932-8837d479710b`::moving from state init -> state preparing
Thread-13::INFO::2014-06-13 13:10:08,385::logUtils::44::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID='00000002-0002-0002-0002-0000000002b0', options=None)
Thread-13::ERROR::2014-06-13 13:10:08,385::task::866::TaskManager.Task::(_setError) Task=`bf8ce66d-40e5-40fa-9932-8837d479710b`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 607, in getSpmStatus
    pool = self.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 325, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('00000002-0002-0002-0002-0000000002b0',)
~~~

I manually check and cleaned NFS share before every try...

Please inspect what's going on...

Version-Release number of selected component (if applicable):
vdsm-4.15.0-78.git349f848.el6.x86_64

How reproducible:
???

Steps to Reproduce:
1. add first data domain
2.
3.

Actual results:
fail, fail, fail... then it was successfully added

Expected results:
work out of the box

Additional info:
my DC is 3.4 version/CL is 3.4 version as well

Comment 1 Allon Mureinik 2014-06-14 10:50:35 UTC
Fede, could this be related to your refactoring in the pool metadata handling? Please take a look.

Comment 2 Daniel Erez 2014-06-18 14:54:43 UTC
According to the described scenario, it looks like a duplicate of bug 1109156 (should probably be included in 3.5 alpha-3 build).

Comment 3 Federico Simoncelli 2014-07-14 10:59:53 UTC
I haven't been able to reproduce this (never noticed in my environment) and I couldn't find any vdsm log attached.

Can you please try to reproduce and ping me to debug the issue together?
Thanks.

Comment 4 Daniel Erez 2014-07-14 11:28:26 UTC
(In reply to Daniel Erez from comment #2)
> According to the described scenario, it looks like a duplicate of bug
> 1109156 (should probably be included in 3.5 alpha-3 build).

Sorry, meant duplicate of bug 1107945.

Comment 5 Federico Simoncelli 2014-07-14 11:31:28 UTC
(In reply to Daniel Erez from comment #4)
> (In reply to Daniel Erez from comment #2)
> > According to the described scenario, it looks like a duplicate of bug
> > 1109156 (should probably be included in 3.5 alpha-3 build).
> 
> Sorry, meant duplicate of bug 1107945.

Thanks. Tracking bug 1107945 state in order to verify this scenario as well.

Comment 6 Jiri Belka 2014-07-24 11:49:14 UTC
I can't reproduce with ovirt-3.5.0-beta1.1.

Comment 7 Jiri Belka 2014-07-24 11:51:25 UTC
#6 is valid anyway ;)

Comment 8 Sandro Bonazzola 2014-10-17 12:24:25 UTC
oVirt 3.5 has been released and should include the fix for this issue.