Created attachment 1028305 [details] VDSM is looking forlooking for storage domain 036b5575-51fa-4f14-8b05-890d7807894c Description of problem: After forcing a NFS Export removal the vdsm is still looking for it, wont fint it and declare my DataCenter, Cluster and Storage as Unknown state. Version-Release number of selected component (if applicable): # rpm -qa | grep -e ovirt -e vdsm (VDSM Node: CentOS 6.6) ovirt-log-collector-3.4.1-1.el6.noarch vdsm-python-4.14.6-0.el6.x86_64 vdsm-cli-4.14.6-0.el6.noarch ovirt-release-11.2.0-1.noarch ovirt-engine-sdk-python-3.5.1.0-1.el6.noarch vdsm-python-zombiereaper-4.14.6-0.el6.noarch vdsm-xmlrpc-4.14.6-0.el6.noarch vdsm-4.14.6-0.el6.x86_64 ovirt-engine-lib-3.4.0-1.el6.noarch vdsm-gluster-4.14.6-0.el6.noarch # rpm -qa | grep -e ovirt -e vdsm (Engine: CentOS 6.6) ovirt-engine-sdk-python-3.5.2.1-1.el6.noarch ovirt-engine-websocket-proxy-3.5.2.1-1.el6.noarch ovirt-engine-tools-3.5.2.1-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-3.5.2.1-1.el6.noarch ovirt-engine-extensions-api-impl-3.5.2.1-1.el6.noarch ovirt-image-uploader-3.5.1-1.el6.noarch ovirt-release35-003-1.noarch ovirt-engine-setup-plugin-ovirt-engine-common-3.5.2.1-1.el6.noarch ovirt-engine-backend-3.5.2.1-1.el6.noarch ovirt-engine-cli-3.5.0.5-1.el6.noarch ovirt-engine-lib-3.5.2.1-1.el6.noarch ovirt-engine-setup-base-3.5.2.1-1.el6.noarch ovirt-engine-setup-3.5.2.1-1.el6.noarch ovirt-iso-uploader-3.5.2-1.el6.noarch ovirt-engine-userportal-3.5.2.1-1.el6.noarch ovirt-engine-3.5.2.1-1.el6.noarch ovirt-host-deploy-1.3.1-1.el6.noarch ovirt-engine-setup-plugin-websocket-proxy-3.5.2.1-1.el6.noarch ovirt-engine-webadmin-portal-3.5.2.1-1.el6.noarch ovirt-host-deploy-java-1.3.1-1.el6.noarch vdsm-jsonrpc-java-1.0.15-1.el6.noarch ovirt-log-collector-3.5.2-1.el6.noarch ovirt-engine-dbscripts-3.5.2.1-1.el6.noarch ovirt-engine-jboss-as-7.1.1-1.el6.x86_64 ovirt-engine-restapi-3.5.2.1-1.el6.noarch How reproducible: dont know Additional info: ° See File Attached ° http://lists.ovirt.org/pipermail/users/2015-May/032889.html ° http://lists.ovirt.org/pipermail/users/2015-May/032921.html => never the less: why dont the gluster storage come up if the storage domain 036b5575-51fa-4f14-8b05-890d7807894c fails? Like this, the DC is broken and i can not export or move the machines away... - engine.log, vdsm.logs will follow shortly
Created attachment 1028306 [details] DataCenter - Screenshot 1
Created attachment 1028307 [details] DataCenter - Screenshot 2 DataCenter - Screenshot 2
Created attachment 1028309 [details] Storage - Screenshot
Created attachment 1028332 [details] oVirt Architechture Greeen: ----------- The oVirt Engine talks to the nodes via the green ssh line. The NFS and ISO is running over this link, to. Red (not relevant for this bug): ----------- WAN Uplink to the Internet Blue (not relevant for this bug): ----------- Database VLAN for our Database Servers Grey (not relevant for this bug): ----------- Ilo Managment for Fencing
Created attachment 1028333 [details] engine.log
Created attachment 1028334 [details] node01 vdsm.log
Created attachment 1028335 [details] node02 vdsm.log
Sorry, i forgot: - node01 was already put into maintance mode, rebootet and activated again. - vdsm was also restarted a couple of times. The problem still exists. Node01 (and Node02) are still seeking for that zombie storage domain.
Hi Mario, I'm trying to understand the exact scenario and I would like to know the following: 1) With which oVirt version did you encounter the bug? The bug is reported for 3.5 but in the description you mentioned that your host has vdsm-4.14.6-0.el6.x86_64 installed which is a 3.4 vdsm build. 2) The scenario here seems to be a bit confusing. In order to remove a NFS export domain with format=true (relevant also for data and ISO domains), the domain should be detached from the DC first. This means that the domain is not attached to the storage pool (DC) before you try to remove the domain, therefore, the domain is not part of any DC while it is being removed. Can you please specify the steps to reproduce? Thanks
1.) I dont exactly know what you mean. The "about" Screen of my oVirt Engine shows: "oVirt Engine Version: 3.5.2.1-1.el6". Rest see rpm -qa above. 2.) It happended after i did a oVirt Engine Update ( i was on 3.5.1 ). After a successfull Update i rebootet the oVirt engine. After it came up i had problems with my NFS Exports. So i removed (forced) them. Afterwards i found out that the vdsm nodes were missing a route => there fore the NFS mount just cloud not work. Anyway, the NFS Export was removed on the engine by then. Thats when the problem really started. Is this information any help?
Mario, Thanks for the input. VDSM is still looking for the domain since you've force removed it, so it was removed from the engine DB but not from the storage. Liron - Is there anything he can do in order to make VDSM stop looking for the domain? Mario, can you please check the current master storage domain availability? Is it reachable from VDSM? Thanks
My master storage domain is glusterfs and in "Unknown" state but reachable from the vdsm´s. My glusterfs storage domains works: -------------------- d444b009-8aa3-4785-bd06-38bad0ce22c0::DEBUG::2015-05-31 21:01:14,097::persistentDict::299::Storage.PersistentDict::(flush) about to write lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=RaidVolBGluster', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=1', 'POOL_DESCRIPTION=HP_Proliant_DL180G6', 'POOL_DOMAINS=6d882c77-cdbc-48ef-ae21-1a6d45e7f8a1:Active,036b5575-51fa-4f14-8b05-890d7807894c:Active,abc51e26-7175-4b38-b3a8-95c6928fbc2b:Active,23602741-2967-4fad-a749-d58e1459d5c8:Attached', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=0', 'POOL_UUID=b384b3da-02a6-44f3-a3f6-56751ce8c26d', 'REMOTE_PATH=127.0.0.1:/RaidVolB', 'ROLE=Master', 'SDUUID=abc51e26-7175-4b38-b3a8-95c6928fbc2b', 'TYPE=GLUSTERFS', 'VERSION=3', '_SHA_CKSUM=844e8802cbb42703b444c6768aed00fdf84a071e'] d444b009-8aa3-4785-bd06-38bad0ce22c0::DEBUG::2015-05-31 21:01:14,104::persistentDict::175::Storage.PersistentDict::(transaction) Finished transaction -------------------- Live Migration beween the node also works. So i think my master storage domain is okay. My glusterfs logs also look good and replication works.
Seems like not a bug, a domain was forcefully removed and then VDSM was looking for it, once it stopped the environment came back to life