Bug 835920 - 3.1 - vdsm - beta1 PosixFS: after reconstruct, data-center is UP and storage is unknown (stuck)
Summary: 3.1 - vdsm - beta1 PosixFS: after reconstruct, data-center is UP and storage ...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: beta
: ---
Assignee: Laszlo Hornyak
QA Contact: Daniel Paikov
URL:
Whiteboard: storage
Keywords:
: 814331 835949 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-27 14:34 UTC by Haim
Modified: 2013-09-30 23:25 UTC (History)
20 users (show)

(edit)
In an earlier version of Red Hat Enterprise Virtualization, when working with PosixFS (Gluster) and migrating data domains, the reconstruction of the data domains would sometimes fail. Sometimes when reconstruct commands were sent to VDSM, the storage domain acquired an "unknown" status and the status of the data center remained "UP". In this scenario, reconstruct and spmStart both succeeded on VDSM. This was because vdsm was sending "POSIXFS" instead of "SHAREDFS". VDSM has been now updated and storage migration now works as expected.
Clone Of:
(edit)
Last Closed: 2012-12-04 19:01:29 UTC


Attachments (Terms of Use)
engine.log (120.12 KB, application/x-gzip)
2012-06-27 14:37 UTC, Haim
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:1508 normal SHIPPED_LIVE Important: rhev-3.1.0 vdsm security, bug fix, and enhancement update 2012-12-04 23:48:05 UTC

Description Haim 2012-06-27 14:34:07 UTC
Description of problem:

on posixFS (using glusterfs), after reconstruct command is sent to vdsm, storage domain goes to unknown, and data-center status is UP.
reconstruct and spmStart was succeeded on vdsm. 

no errors on vdsm side (host is SPM) - I think that the problem relies in the fact engine failed to change storage status on DB...

the following error repeats in engine-logs:

2012-06-27 20:34:11,595 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-86) [3fb85499] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IRSErrorException: IRSErrorException: 
2012-06-27 20:34:21,680 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand] (QuartzScheduler_Worker-47) [7cc11628] irsBroker::BuildStorageDynamicFromXmlRpcStruct::Failed building Storage dynamic, xmlRpcStruct = org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcStruct@1b4351d6
2012-06-27 20:34:21,680 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand] (QuartzScheduler_Worker-47) [7cc11628] org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSErrorException:


data-base capture of all related tables:

engine=# SELECT * from storage_domain_static;
                  id                  |               storage                |   storage_name   | storage_domain_type | storage_type | storage_domain_format_type |         _crea
te_date          |         _update_date          | recoverable 
--------------------------------------+--------------------------------------+------------------+---------------------+--------------+----------------------------+--------------
-----------------+-------------------------------+-------------
 fdbeb420-5422-4047-b241-2254cb131e34 | 879b3711-765e-45ac-b8aa-c71612822522 | myglusterDomin2  |                   0 |            1 | 0                          | 2012-06-26 21
:34:49.734133+03 | 2012-06-26 21:35:43.316215+03 | t
 faa99bc4-ecc5-4945-b0c4-3a8530709e04 | e5a46a66-f0e2-49a6-8699-e0c845465d8a | myglusterDomin1  |                   1 |            1 | 0                          | 2012-06-26 21
:26:39.106975+03 | 2012-06-26 21:35:43.316215+03 | t
 748039c1-9e96-459c-809f-6590fe11a37b | 0dabafbc-fd45-47c4-b8ff-ec833880226e | myDom            |                   0 |            6 | 0                          | 2012-06-26 21
:55:53.281407+03 | 2012-06-26 21:56:04.709448+03 | t
 e8c3bc49-3d28-433d-a215-63ff96fcbc97 | 2bd3367b-53f0-4fdb-9c3d-b472df8e64c7 | gluster2-volumes |                   0 |            1 | 0                          | 2012-06-27 16
:15:47.882953+03 | 2012-06-27 16:15:51.892524+03 | t
(4 rows)

engine=# SELECT * from storage_domain_dynamic;
                  id                  | available_disk_size | used_disk_size 
--------------------------------------+---------------------+----------------
 e8c3bc49-3d28-433d-a215-63ff96fcbc97 |                  13 |              4
 fdbeb420-5422-4047-b241-2254cb131e34 |                  14 |              3
 faa99bc4-ecc5-4945-b0c4-3a8530709e04 |                  14 |              3
 748039c1-9e96-459c-809f-6590fe11a37b |                  14 |              3
(4 rows)

engine=# SELECT * from storage_pool;
                  id                  |   name   |       description       | storage_pool_type | storage_pool_format_type | status | master_domain_version |              spm_vds
_id              | compatibility_version |         _create_date          |         _update_date          | quota_enforcement_type 
--------------------------------------+----------+-------------------------+-------------------+--------------------------+--------+-----------------------+---------------------
-----------------+-----------------------+-------------------------------+-------------------------------+------------------------
 75659836-bedc-11e1-ad25-001a4a16970e | Default  | The default Data Center |                 3 |                          |      0 |                     0 |                     
                 | 3.1                   | 2012-06-25 18:43:06.947764+03 | 2012-06-25 19:22:34.435044+03 |                      0
 880465a5-2db7-42c7-b567-16c2b1a074e0 | gluster  |                         |                 1 | 0                        |      4 |                     2 |                     
                 | 3.1                   | 2012-06-26 21:25:06.494617+03 | 2012-06-26 21:42:52.410258+03 |                      2
 b66cb5e6-1e47-4644-bcd2-fdd8d6b5f394 | kaka2    |                         |                 4 |                          |      0 |                     0 | 00000000-0000-0000-0
000-000000000000 | 3.1                   | 2012-06-25 20:41:47.429586+03 |                               |                      0
 e7c3db96-290e-413a-b06f-78628230b4f1 | Gluster2 |                         |                 1 | 0                        |      1 |                     1 | 1d1d51f4-bee2-11e1-a
36f-001a4a16970e | 3.1                   | 2012-06-27 16:15:08.456696+03 | 2012-06-27 16:16:38.055742+03 |                      0
 def8e8d2-9711-4adb-b86d-309051c7027a | PosixFS  |                         |                 6 | 0                        |      1 |                     1 | 0fc63d04-c072-11e1-9
011-001a4a16970e | 3.1                   | 2012-06-26 21:43:09.057281+03 | 2012-06-27 19:35:54.730954+03 |                      0
(5 rows)

Comment 1 Haim 2012-06-27 14:37:36 UTC
Created attachment 594803 [details]
engine.log

Comment 2 mkublin 2012-06-28 08:58:17 UTC
After investigation with Haim, it is look like that a problem is not at reconstruct, a problem is that vdsm is reporting at getStoragePoolInfo storage type SHAREDFS instead of POSIXFS moving to Ayal

Comment 4 mkublin 2012-07-10 05:48:16 UTC
*** Bug 814331 has been marked as a duplicate of this bug. ***

Comment 5 mkublin 2012-07-10 05:49:30 UTC
*** Bug 835949 has been marked as a duplicate of this bug. ***

Comment 6 Laszlo Hornyak 2012-07-10 11:55:45 UTC
http://gerrit.ovirt.org/6103

Comment 7 Laszlo Hornyak 2012-07-13 06:04:34 UTC
I73b0d29cf39a45589d90335e88ae84c5744796e1

Comment 11 Haim 2012-08-12 15:58:47 UTC
verified on si13.2 with vdsm 4.9-27. managed to create 2 hosts setup with 2 data posix fs domains, and migrate master between the both domains.

Comment 13 Laszlo Hornyak 2012-10-24 07:05:58 UTC
I think it is ok to include it in release notes.

Comment 14 Jacob Wyatt 2012-11-02 20:59:26 UTC
2 nodes, 1 engine

Glusterfs 3.3.1
vdsm 4.10  
ovirt-engine 3.1
Fedora 17

Successfully created a "brick" on each of 2 cluster nodes and then created a volume via the ovirt web interface.  I added that volume as the Data(Master) volume to the cluster and it initializes and starts but then the nodes constantly contend for SPM.  If I put one of the nodes in maintenance mode everything is fine.

I hope this is the right bug. I linked over from a duplicate that appeared to match my issue.  Thanks.

Comment 16 errata-xmlrpc 2012-12-04 19:01:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html


Note You need to log in before you can comment on or make changes to this bug.