Bug 1367602 - Gluster: Ovirt Break "distribute 2" storage
Summary: Gluster: Ovirt Break "distribute 2" storage
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: vdsm
Classification: oVirt
Component: Gluster
Version: 4.18.11
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: sankarshan
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-16 23:11 UTC by Badalyan Vyacheslav
Modified: 2016-08-19 10:51 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-08-19 10:51:38 UTC
oVirt Team: Gluster
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description Badalyan Vyacheslav 2016-08-16 23:11:17 UTC
Description of problem:

70% of images after 1 week work looks like this:
[2016-08-16 01:57:58.641508] I [MSGID: 109081] [dht-common.c:3946:dht_setxattr] 0-SSD-dht: fixing the layout of /d3824bfc-921c-4ab2-abc7-21ede51df915/images/3cc42ff5-d152-43ef-a1cc-d0df53ceecca
[2016-08-16 01:57:58.641524] I [MSGID: 109045] [dht-selfheal.c:1756:dht_fix_layout_of_directory] 0-SSD-dht: subvolume 0 (SSD-client-0): 457373 chunks
[2016-08-16 01:57:58.641533] I [MSGID: 109045] [dht-selfheal.c:1756:dht_fix_layout_of_directory] 0-SSD-dht: subvolume 1 (SSD-client-1): 457366 chunks
[2016-08-16 01:57:58.642157] I [MSGID: 109064] [dht-layout.c:824:dht_layout_dir_mismatch] 0-SSD-dht: subvol: SSD-client-0; inode layout - 0 - 2147500080 - 1; disk layout - 2147500081 - 4294967295 - 1
[2016-08-16 01:57:58.642171] I [MSGID: 109018] [dht-common.c:882:dht_revalidate_cbk] 0-SSD-dht: Mismatching layouts for /d3824bfc-921c-4ab2-abc7-21ede51df915/images/3cc42ff5-d152-43ef-a1cc-d0df53ceecca, gfid = 1f4b36ab-75c5-4ba4-b48f-bf156fdf254e
[2016-08-16 01:57:58.642182] I [MSGID: 109064] [dht-layout.c:824:dht_layout_dir_mismatch] 0-SSD-dht: subvol: SSD-client-1; inode layout - 2147500081 - 4294967295 - 1; disk layout - 0 - 2147500080 - 1
[2016-08-16 01:57:58.642205] I [MSGID: 109018] [dht-common.c:882:dht_revalidate_cbk] 0-SSD-dht: Mismatching layouts for /d3824bfc-921c-4ab2-abc7-21ede51df915/images/3cc42ff5-d152-43ef-a1cc-d0df53ceecca, gfid = 1f4b36ab-75c5-4ba4-b48f-bf156fdf254e
[2016-08-16 01:57:58.642435] I [dht-rebalance.c:2517:gf_defrag_process_dir] 0-SSD-dht: migrate data called on /d3824bfc-921c-4ab2-abc7-21ede51df915/images/3cc42ff5-d152-43ef-a1cc-d0df53ceecca
[2016-08-16 01:57:58.643860] I [dht-rebalance.c:2728:gf_defrag_process_dir] 0-SSD-dht: Migration operation on dir /d3824bfc-921c-4ab2-abc7-21ede51df915/images/3cc42ff5-d152-43ef-a1cc-d0df53ceecca took 0.00 secs
[


Version-Release number of selected component (if applicable):
[root@intel1 ~]# rpm -qa | grep vdsm
vdsm-hook-ethtool-options-4.18.11-1.el7.centos.noarch
vdsm-hook-vmfex-dev-4.18.11-1.el7.centos.noarch
vdsm-hook-nestedvt-4.18.11-1.el7.centos.noarch
vdsm-hook-pincpu-4.18.11-1.el7.centos.noarch
vdsm-cli-4.18.11-1.el7.centos.noarch
vdsm-hook-ipv6-4.18.11-1.el7.centos.noarch
vdsm-yajsonrpc-4.18.11-1.el7.centos.noarch
vdsm-hook-spiceoptions-4.18.11-1.el7.centos.noarch
vdsm-python-4.18.11-1.el7.centos.noarch
vdsm-hook-qos-4.18.11-1.el7.centos.noarch
vdsm-xmlrpc-4.18.11-1.el7.centos.noarch
vdsm-gluster-4.18.11-1.el7.centos.noarch
vdsm-hook-smbios-4.18.11-1.el7.centos.noarch
vdsm-infra-4.18.11-1.el7.centos.noarch
vdsm-4.18.11-1.el7.centos.x86_64
vdsm-hook-numa-4.18.11-1.el7.centos.noarch
vdsm-api-4.18.11-1.el7.centos.noarch
vdsm-hook-qemucmdline-4.18.11-1.el7.centos.noarch
vdsm-jsonrpc-4.18.11-1.el7.centos.noarch
vdsm-hook-hugepages-4.18.11-1.el7.centos.noarch
[root@intel1 ~]# rpm -qa | grep gluster
glusterfs-client-xlators-3.7.13-1.el7.x86_64
glusterfs-extra-xlators-3.7.13-1.el7.x86_64
glusterfs-cli-3.7.13-1.el7.x86_64
glusterfs-libs-3.7.13-1.el7.x86_64
glusterfs-fuse-3.7.13-1.el7.x86_64
glusterfs-geo-replication-3.7.13-1.el7.x86_64
python-gluster-3.7.13-1.el7.noarch
vdsm-gluster-4.18.11-1.el7.centos.noarch
glusterfs-api-3.7.13-1.el7.x86_64
glusterfs-3.7.13-1.el7.x86_64
glusterfs-server-3.7.13-1.el7.x86_64


2 Gluster nodes (hypercovere with ovirt)
1 - Volume on raid 10 SSD
2 - Volume on raid 10 SSD

Also you need. 
Have 2 hosts with any storage. 
One of them SPM.
All must have running VM.

Steps to Reproduce:
 
1. Create "distribute 2" gluster volume
2. Click Optimize for ovirt
3. Start volume and add it to Storage
4. ONLINE Move disk from all vms to new Volume. 

Actual results:

If VM was run on HOST ONE but coppy procees (SPM) on HOST TWO you get spit-brain! I try to move 20 MySQL VM and 4-6 images was broke.

Expected results:
Gluster! Magic! One time config and 10 years perfomance and proffit! :))))


And alo... Add arbiter to UI, add Tier configuration to IU.... and say pleeeeease to gluster team that TIER alwo MUST have replica 3 arbiter 1 variant! PLEEEEASE!

Comment 1 Badalyan Vyacheslav 2016-08-16 23:22:12 UTC
inode layout and disk layout looks like file was created in one time by two nodes... and have strange mirroring in values...

Comment 2 Sahina Bose 2016-08-19 10:51:38 UTC
The layout issues seem like gluster bugs. 2 hosts is not a supported configuration for gluster as image store. You need to work with replica 3 with/without arbiter to avoid the split brain issue. Please re-open a bug under GlusterFS product, if you face issues after moving to replica 3.

There's already an RFE tp provide arbiter configuration support in UI - Bug 1254073

Regarding tier and arbiter - what's the requirement? Attaching a arbiter volume as hot tier to existing gluster volume? If so, please add a bug in Glusterfs with requirements.

Closing this bug.


Note You need to log in before you can comment on or make changes to this bug.