Bug 835920 - 3.1 - vdsm - beta1 PosixFS: after reconstruct, data-center is UP and storage is unknown (stuck)
3.1 - vdsm - beta1 PosixFS: after reconstruct, data-center is UP and storage ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.3
x86_64 Linux
high Severity urgent
: beta
: ---
Assigned To: Laszlo Hornyak
Daniel Paikov
storage
:
: 814331 835949 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-27 10:34 EDT by Haim
Modified: 2013-09-30 19:25 EDT (History)
20 users (show)

See Also:
Fixed In Version: vdsm-4.9.6-27.0
Doc Type: Bug Fix
Doc Text:
In an earlier version of Red Hat Enterprise Virtualization, when working with PosixFS (Gluster) and migrating data domains, the reconstruction of the data domains would sometimes fail. Sometimes when reconstruct commands were sent to VDSM, the storage domain acquired an "unknown" status and the status of the data center remained "UP". In this scenario, reconstruct and spmStart both succeeded on VDSM. This was because vdsm was sending "POSIXFS" instead of "SHAREDFS". VDSM has been now updated and storage migration now works as expected.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-04 14:01:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine.log (120.12 KB, application/x-gzip)
2012-06-27 10:37 EDT, Haim
no flags Details

  None (edit)
Description Haim 2012-06-27 10:34:07 EDT
Description of problem:

on posixFS (using glusterfs), after reconstruct command is sent to vdsm, storage domain goes to unknown, and data-center status is UP.
reconstruct and spmStart was succeeded on vdsm. 

no errors on vdsm side (host is SPM) - I think that the problem relies in the fact engine failed to change storage status on DB...

the following error repeats in engine-logs:

2012-06-27 20:34:11,595 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-86) [3fb85499] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IRSErrorException: IRSErrorException: 
2012-06-27 20:34:21,680 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand] (QuartzScheduler_Worker-47) [7cc11628] irsBroker::BuildStorageDynamicFromXmlRpcStruct::Failed building Storage dynamic, xmlRpcStruct = org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcStruct@1b4351d6
2012-06-27 20:34:21,680 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand] (QuartzScheduler_Worker-47) [7cc11628] org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSErrorException:


data-base capture of all related tables:

engine=# SELECT * from storage_domain_static;
                  id                  |               storage                |   storage_name   | storage_domain_type | storage_type | storage_domain_format_type |         _crea
te_date          |         _update_date          | recoverable 
--------------------------------------+--------------------------------------+------------------+---------------------+--------------+----------------------------+--------------
-----------------+-------------------------------+-------------
 fdbeb420-5422-4047-b241-2254cb131e34 | 879b3711-765e-45ac-b8aa-c71612822522 | myglusterDomin2  |                   0 |            1 | 0                          | 2012-06-26 21
:34:49.734133+03 | 2012-06-26 21:35:43.316215+03 | t
 faa99bc4-ecc5-4945-b0c4-3a8530709e04 | e5a46a66-f0e2-49a6-8699-e0c845465d8a | myglusterDomin1  |                   1 |            1 | 0                          | 2012-06-26 21
:26:39.106975+03 | 2012-06-26 21:35:43.316215+03 | t
 748039c1-9e96-459c-809f-6590fe11a37b | 0dabafbc-fd45-47c4-b8ff-ec833880226e | myDom            |                   0 |            6 | 0                          | 2012-06-26 21
:55:53.281407+03 | 2012-06-26 21:56:04.709448+03 | t
 e8c3bc49-3d28-433d-a215-63ff96fcbc97 | 2bd3367b-53f0-4fdb-9c3d-b472df8e64c7 | gluster2-volumes |                   0 |            1 | 0                          | 2012-06-27 16
:15:47.882953+03 | 2012-06-27 16:15:51.892524+03 | t
(4 rows)

engine=# SELECT * from storage_domain_dynamic;
                  id                  | available_disk_size | used_disk_size 
--------------------------------------+---------------------+----------------
 e8c3bc49-3d28-433d-a215-63ff96fcbc97 |                  13 |              4
 fdbeb420-5422-4047-b241-2254cb131e34 |                  14 |              3
 faa99bc4-ecc5-4945-b0c4-3a8530709e04 |                  14 |              3
 748039c1-9e96-459c-809f-6590fe11a37b |                  14 |              3
(4 rows)

engine=# SELECT * from storage_pool;
                  id                  |   name   |       description       | storage_pool_type | storage_pool_format_type | status | master_domain_version |              spm_vds
_id              | compatibility_version |         _create_date          |         _update_date          | quota_enforcement_type 
--------------------------------------+----------+-------------------------+-------------------+--------------------------+--------+-----------------------+---------------------
-----------------+-----------------------+-------------------------------+-------------------------------+------------------------
 75659836-bedc-11e1-ad25-001a4a16970e | Default  | The default Data Center |                 3 |                          |      0 |                     0 |                     
                 | 3.1                   | 2012-06-25 18:43:06.947764+03 | 2012-06-25 19:22:34.435044+03 |                      0
 880465a5-2db7-42c7-b567-16c2b1a074e0 | gluster  |                         |                 1 | 0                        |      4 |                     2 |                     
                 | 3.1                   | 2012-06-26 21:25:06.494617+03 | 2012-06-26 21:42:52.410258+03 |                      2
 b66cb5e6-1e47-4644-bcd2-fdd8d6b5f394 | kaka2    |                         |                 4 |                          |      0 |                     0 | 00000000-0000-0000-0
000-000000000000 | 3.1                   | 2012-06-25 20:41:47.429586+03 |                               |                      0
 e7c3db96-290e-413a-b06f-78628230b4f1 | Gluster2 |                         |                 1 | 0                        |      1 |                     1 | 1d1d51f4-bee2-11e1-a
36f-001a4a16970e | 3.1                   | 2012-06-27 16:15:08.456696+03 | 2012-06-27 16:16:38.055742+03 |                      0
 def8e8d2-9711-4adb-b86d-309051c7027a | PosixFS  |                         |                 6 | 0                        |      1 |                     1 | 0fc63d04-c072-11e1-9
011-001a4a16970e | 3.1                   | 2012-06-26 21:43:09.057281+03 | 2012-06-27 19:35:54.730954+03 |                      0
(5 rows)
Comment 1 Haim 2012-06-27 10:37:36 EDT
Created attachment 594803 [details]
engine.log
Comment 2 mkublin 2012-06-28 04:58:17 EDT
After investigation with Haim, it is look like that a problem is not at reconstruct, a problem is that vdsm is reporting at getStoragePoolInfo storage type SHAREDFS instead of POSIXFS moving to Ayal
Comment 4 mkublin 2012-07-10 01:48:16 EDT
*** Bug 814331 has been marked as a duplicate of this bug. ***
Comment 5 mkublin 2012-07-10 01:49:30 EDT
*** Bug 835949 has been marked as a duplicate of this bug. ***
Comment 6 Laszlo Hornyak 2012-07-10 07:55:45 EDT
http://gerrit.ovirt.org/6103
Comment 7 Laszlo Hornyak 2012-07-13 02:04:34 EDT
I73b0d29cf39a45589d90335e88ae84c5744796e1
Comment 11 Haim 2012-08-12 11:58:47 EDT
verified on si13.2 with vdsm 4.9-27. managed to create 2 hosts setup with 2 data posix fs domains, and migrate master between the both domains.
Comment 13 Laszlo Hornyak 2012-10-24 03:05:58 EDT
I think it is ok to include it in release notes.
Comment 14 Jacob Wyatt 2012-11-02 16:59:26 EDT
2 nodes, 1 engine

Glusterfs 3.3.1
vdsm 4.10  
ovirt-engine 3.1
Fedora 17

Successfully created a "brick" on each of 2 cluster nodes and then created a volume via the ovirt web interface.  I added that volume as the Data(Master) volume to the cluster and it initializes and starts but then the nodes constantly contend for SPM.  If I put one of the nodes in maintenance mode everything is fine.

I hope this is the right bug. I linked over from a duplicate that appeared to match my issue.  Thanks.
Comment 16 errata-xmlrpc 2012-12-04 14:01:29 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html

Note You need to log in before you can comment on or make changes to this bug.