Bug 1223925 - After forcing a NFS Export removal vdsm still looking for it
Summary: After forcing a NFS Export removal vdsm still looking for it
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.5
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: m1
: 3.6.0
Assignee: Liron Aravot
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-21 17:47 UTC by Mario Ohnewald
Modified: 2016-03-10 06:13 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-04 12:56:20 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
VDSM is looking forlooking for storage domain 036b5575-51fa-4f14-8b05-890d7807894c (1.15 KB, text/plain)
2015-05-21 17:47 UTC, Mario Ohnewald
no flags Details
DataCenter - Screenshot 1 (43.98 KB, image/png)
2015-05-21 17:48 UTC, Mario Ohnewald
no flags Details
DataCenter - Screenshot 2 (42.05 KB, image/png)
2015-05-21 17:49 UTC, Mario Ohnewald
no flags Details
Storage - Screenshot (57.78 KB, image/png)
2015-05-21 17:50 UTC, Mario Ohnewald
no flags Details
oVirt Architechture (79.70 KB, image/png)
2015-05-21 17:57 UTC, Mario Ohnewald
no flags Details
engine.log (127.76 KB, text/plain)
2015-05-21 18:01 UTC, Mario Ohnewald
no flags Details
node01 vdsm.log (77.94 KB, text/plain)
2015-05-21 18:01 UTC, Mario Ohnewald
no flags Details
node02 vdsm.log (48.67 KB, text/plain)
2015-05-21 18:03 UTC, Mario Ohnewald
no flags Details

Description Mario Ohnewald 2015-05-21 17:47:16 UTC
Created attachment 1028305 [details]
VDSM is looking forlooking for storage  domain 036b5575-51fa-4f14-8b05-890d7807894c

Description of problem: After forcing a NFS Export removal the vdsm is still looking for it, wont fint it and declare my DataCenter, Cluster and Storage as Unknown state.


Version-Release number of selected component (if applicable):

# rpm -qa | grep -e ovirt -e vdsm (VDSM Node: CentOS 6.6)
ovirt-log-collector-3.4.1-1.el6.noarch
vdsm-python-4.14.6-0.el6.x86_64
vdsm-cli-4.14.6-0.el6.noarch
ovirt-release-11.2.0-1.noarch
ovirt-engine-sdk-python-3.5.1.0-1.el6.noarch
vdsm-python-zombiereaper-4.14.6-0.el6.noarch
vdsm-xmlrpc-4.14.6-0.el6.noarch
vdsm-4.14.6-0.el6.x86_64
ovirt-engine-lib-3.4.0-1.el6.noarch
vdsm-gluster-4.14.6-0.el6.noarch

# rpm -qa | grep -e ovirt -e vdsm  (Engine: CentOS 6.6)
ovirt-engine-sdk-python-3.5.2.1-1.el6.noarch
ovirt-engine-websocket-proxy-3.5.2.1-1.el6.noarch
ovirt-engine-tools-3.5.2.1-1.el6.noarch
ovirt-engine-setup-plugin-ovirt-engine-3.5.2.1-1.el6.noarch
ovirt-engine-extensions-api-impl-3.5.2.1-1.el6.noarch
ovirt-image-uploader-3.5.1-1.el6.noarch
ovirt-release35-003-1.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-3.5.2.1-1.el6.noarch
ovirt-engine-backend-3.5.2.1-1.el6.noarch
ovirt-engine-cli-3.5.0.5-1.el6.noarch
ovirt-engine-lib-3.5.2.1-1.el6.noarch
ovirt-engine-setup-base-3.5.2.1-1.el6.noarch
ovirt-engine-setup-3.5.2.1-1.el6.noarch
ovirt-iso-uploader-3.5.2-1.el6.noarch
ovirt-engine-userportal-3.5.2.1-1.el6.noarch
ovirt-engine-3.5.2.1-1.el6.noarch
ovirt-host-deploy-1.3.1-1.el6.noarch
ovirt-engine-setup-plugin-websocket-proxy-3.5.2.1-1.el6.noarch
ovirt-engine-webadmin-portal-3.5.2.1-1.el6.noarch
ovirt-host-deploy-java-1.3.1-1.el6.noarch
vdsm-jsonrpc-java-1.0.15-1.el6.noarch
ovirt-log-collector-3.5.2-1.el6.noarch
ovirt-engine-dbscripts-3.5.2.1-1.el6.noarch
ovirt-engine-jboss-as-7.1.1-1.el6.x86_64
ovirt-engine-restapi-3.5.2.1-1.el6.noarch



How reproducible: dont know


Additional info:
° See File Attached
° http://lists.ovirt.org/pipermail/users/2015-May/032889.html
° http://lists.ovirt.org/pipermail/users/2015-May/032921.html => never the less: why dont the gluster storage come up if the storage  domain 036b5575-51fa-4f14-8b05-890d7807894c fails? Like this, the DC is broken and i can not export or move the machines away...

- engine.log, vdsm.logs will follow shortly

Comment 1 Mario Ohnewald 2015-05-21 17:48:41 UTC
Created attachment 1028306 [details]
DataCenter - Screenshot 1

Comment 2 Mario Ohnewald 2015-05-21 17:49:14 UTC
Created attachment 1028307 [details]
DataCenter - Screenshot 2

DataCenter - Screenshot 2

Comment 3 Mario Ohnewald 2015-05-21 17:50:06 UTC
Created attachment 1028309 [details]
Storage - Screenshot

Comment 4 Mario Ohnewald 2015-05-21 17:57:34 UTC
Created attachment 1028332 [details]
oVirt Architechture

Greeen:
-----------
The oVirt Engine talks to the nodes via the green ssh line. The NFS and ISO is running over this link, to.

Red (not relevant for this bug):
-----------
WAN Uplink to the Internet

Blue (not relevant for this bug):
-----------
Database VLAN for our Database Servers

Grey (not relevant for this bug):
-----------
Ilo Managment for Fencing

Comment 5 Mario Ohnewald 2015-05-21 18:01:04 UTC
Created attachment 1028333 [details]
engine.log

Comment 6 Mario Ohnewald 2015-05-21 18:01:39 UTC
Created attachment 1028334 [details]
node01 vdsm.log

Comment 7 Mario Ohnewald 2015-05-21 18:03:23 UTC
Created attachment 1028335 [details]
node02 vdsm.log

Comment 8 Mario Ohnewald 2015-05-21 18:06:32 UTC
Sorry, i forgot:

- node01 was already put into maintance mode, rebootet and activated again.
- vdsm was also restarted a couple of times.

The problem still exists. Node01 (and Node02) are still seeking for that zombie storage domain.

Comment 9 Elad 2015-05-25 12:25:18 UTC
Hi Mario, 

I'm trying to understand the exact scenario and I would like to know the following:

1) With which oVirt version did you encounter the bug? The bug is reported for 3.5 but in the description you mentioned that your host has vdsm-4.14.6-0.el6.x86_64 installed which is a 3.4 vdsm build.

2) The scenario here seems to be a bit confusing. In order to remove a NFS export domain with format=true (relevant also for data and ISO domains), the domain should be detached from the DC first. This means that the domain is not attached to the storage pool (DC) before you try to remove the domain, therefore, the domain is not part of any DC while it is being removed.
Can you please specify the steps to reproduce? 


Thanks

Comment 10 Mario Ohnewald 2015-05-28 15:30:47 UTC
1.) I dont exactly know what you mean. The "about" Screen of my oVirt Engine shows: "oVirt Engine Version: 3.5.2.1-1.el6". Rest see rpm -qa above.

2.) It happended after i did a oVirt Engine Update ( i was on 3.5.1 ). After a successfull Update i rebootet the oVirt engine. After it came up i had problems with my NFS Exports. So i removed (forced) them. Afterwards i found out that the vdsm nodes were missing a route => there fore the NFS mount just cloud not work.

Anyway, the NFS Export was removed on the engine by then. Thats when the problem really started.


Is this information any help?

Comment 11 Elad 2015-05-31 08:22:06 UTC
Mario, 
Thanks for the input.

VDSM is still looking for the domain since you've force removed it, so it was removed from the engine DB but not from the storage.

Liron - Is there anything he can do in order to make VDSM stop looking for the domain?

Mario, can you please check the current master storage domain availability? Is it reachable from VDSM?

Thanks

Comment 12 Mario Ohnewald 2015-05-31 20:15:13 UTC
My master storage domain is glusterfs and in "Unknown" state but reachable from the vdsm´s.

My glusterfs storage domains works:

--------------------
d444b009-8aa3-4785-bd06-38bad0ce22c0::DEBUG::2015-05-31 21:01:14,097::persistentDict::299::Storage.PersistentDict::(flush) about to write lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=RaidVolBGluster', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=1', 'POOL_DESCRIPTION=HP_Proliant_DL180G6', 'POOL_DOMAINS=6d882c77-cdbc-48ef-ae21-1a6d45e7f8a1:Active,036b5575-51fa-4f14-8b05-890d7807894c:Active,abc51e26-7175-4b38-b3a8-95c6928fbc2b:Active,23602741-2967-4fad-a749-d58e1459d5c8:Attached', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=0', 'POOL_UUID=b384b3da-02a6-44f3-a3f6-56751ce8c26d', 'REMOTE_PATH=127.0.0.1:/RaidVolB', 'ROLE=Master', 'SDUUID=abc51e26-7175-4b38-b3a8-95c6928fbc2b', 'TYPE=GLUSTERFS', 'VERSION=3', '_SHA_CKSUM=844e8802cbb42703b444c6768aed00fdf84a071e']
d444b009-8aa3-4785-bd06-38bad0ce22c0::DEBUG::2015-05-31 21:01:14,104::persistentDict::175::Storage.PersistentDict::(transaction) Finished transaction
--------------------

Live Migration beween the node also works. So i think my master storage domain is okay. My glusterfs logs also look good and replication works.

Comment 13 Tal Nisan 2015-06-04 12:56:20 UTC
Seems like not a bug, a domain was forcefully removed and then VDSM was looking for it, once it stopped the environment came back to life


Note You need to log in before you can comment on or make changes to this bug.