1895553 – Add pre-flight in cockpit deployment flow to check for disk block sizes used for bricks and LV cache to be identical

Bug 1895553 - Add pre-flight in cockpit deployment flow to check for disk block sizes used for bricks and LV cache to be identical

Summary: Add pre-flight in cockpit deployment flow to check for disk block sizes used ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	cockpit-ovirt
Classification:	oVirt
Component:	gluster-ansible
Sub Component:
Version:	0.14.13
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.4.3-2
Target Release:	---
Assignee:	Gobinda Das
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1857663 1901507 1902301
TreeView+	depends on / blocked

Reported:	2020-11-07 03:48 UTC by SATHEESARAN
Modified:	2021-01-22 12:51 UTC (History)
CC List:	6 users (show)
Fixed In Version:	cockpit-ovirt-0.14.15
Clone Of:	1857663
Clones:	1901507 (view as bug list)
Environment:
Last Closed:	2021-01-22 12:51:52 UTC
oVirt Team:	Gluster
Embargoed:
Flags:	sasundar: ovirt-4.4? aoconnor: blocker+ sasundar: planning_ack? sbonazzo: devel_ack+ sasundar: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	112299	0	master	MERGED	Include pre-flight check for disk block sizes for the disks used for bricks and LV cache to be identical	2021-01-22 12:51:17 UTC

Description SATHEESARAN 2020-11-07 03:48:32 UTC

Description of problem:
-----------------------
The precondition for the disk block size is that the disks used for volumes should be of same block size, for example, all 'vmstore' brick should be 512B, or all 'vmstore' bricks should be with block size 4KM and there is no mix that is allowed.

The statement is true for LV cache as well.(ie.) SSD can be attached to thinpool composed of bricks of same disk block size. Which means SSD of 4KB block size, can't be attached to thinpool consists of 512B disk block sized disks.

These checks should be done well before deployment and proper assertion, with proper error message should be thrown


Version-Release number of selected component (if applicable):
--------------------------------------------------------------
cockpit-ovirt-dashboard

Actual results:
---------------
No error thrown when mix of 4K and 512B disk block sized disks are used for one volume

Expected results:
-----------------
Preflight check added to validate that the gluster volumes do not contain disks of varied disk block sizes and throw proper error messages,in such situations

--- Additional comment from SATHEESARAN on 2020-07-16 10:14:47 UTC ---

Here are few examples for valid and invalid scenarios

Valid situations are:

1. 512B block sized disks used for engine volume
2. 4K block sized disks used for vmstore volume
3. 4K block sized SSD used as LV cache for thinpool composed of 4KB block sized disks
4. 512B block sized SSD used as LV cache for thinpool composed of 512B block sized disks

Invalid scenarios:

1. 'vmstore' volume configuration has bricks created out of 4KB disk block sized disk on host1, host2 and host3 is using 512B brick for arbiter.etc

--- Additional comment from RHEL Program Management on 2020-09-29 02:32:22 UTC ---

This BZ is being approved for a RHHI-V 1.8.z update, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'rhhiv‑1.8.z'

--- Additional comment from Rejy M Cyriac on 2020-09-29 09:53:53 UTC ---

Based on current GA time-lines for RHV 4.4.3 (24 November), and RHGS 3.5.3 (01 December) at which the gluster-ansible fix (Bug 1857667) is included,
either this BZ may have to be re-targeted for a later RHV 4.4.z update,
or the gluster-ansible fix would have to shipped earlier

--- Additional comment from Gobinda Das on 2020-10-15 07:34:57 UTC ---

(In reply to Rejy M Cyriac from comment #3)
> Based on current GA time-lines for RHV 4.4.3 (24 November), and RHGS 3.5.3
> (01 December) at which the gluster-ansible fix (Bug 1857667) is included,
> either this BZ may have to be re-targeted for a later RHV 4.4.z update,
> or the gluster-ansible fix would have to shipped earlier

Hi Rejy,
 I think we need to ship all gluster-ansible bugs which are targeted for rhgs-3.5.3 and rhv-4.4.3 in  RHGS-3.5.2-Async

--- Additional comment from SATHEESARAN on 2020-10-29 06:45:06 UTC ---

(In reply to Rejy M Cyriac from comment #3)
> Based on current GA time-lines for RHV 4.4.3 (24 November), and RHGS 3.5.3
> (01 December) at which the gluster-ansible fix (Bug 1857667) is included,
> either this BZ may have to be re-targeted for a later RHV 4.4.z update,
> or the gluster-ansible fix would have to shipped earlier

The information is already addressed by Gobinda in comment4
We are targetting this for RHHI-V 1.8 update2

--- Additional comment from SATHEESARAN on 2020-11-07 03:45:48 UTC ---

Tested with gluster-ansible-roles-1.0.5-22.el8rhgs

When creating the volume with its brick on disks of varied block sizes, then error is thrown with proper information

<snip>
        "item": {
            "pvname": "/dev/sdf",
            "vgname": "gluster_vg_sdf"
        },
        "rc": 0,
        "start": "2020-11-07 01:58:33.888422",
        "stderr": "",
        "stderr_lines": [],
        "stdout": "4096",
        "stdout_lines": [
            "4096"
        ]
    },
    "msg": "The logical block size of disk is not 512 bytes"


</snip>

The check works only with CLI based automated deployment of RHHI-V

There will be yet another bug to fix this in cockpit based deployment

Comment 1 SATHEESARAN 2020-11-07 03:51:35 UTC

The changes are made in /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_deployment.yml
but the changes also need to be in PLAYBOOK used by web console ( cockpit ) -which is located at /usr/share/cockpit/ovirt-dashboard/ansible/hc_wizard.yml

Comment 2 SATHEESARAN 2020-11-20 00:57:08 UTC

@Gobinda,

I do not see any patches attached to this bug.
Also please provide the fixed-in-version, when moving the bug to ON_QA

Comment 3 Gobinda Das 2020-11-20 05:23:19 UTC

@sas,
Sorry it's a mistake, I thought it's a rhhi bug for gluster-ansible.
I have sent patch for cockpit: https://gerrit.ovirt.org/#/c/112299/

Comment 4 SATHEESARAN 2020-12-03 13:16:43 UTC

Tested with cockpit-ovirt-dashboard-0.14.15

The deployment fails in one particular scenario, when there is a combination of 4K and 512B block sized devices in the same host.

For example:

host1      host2       host3
sdb(4K)     sdb(4K)    sdb(4K)
sdc(512B)   sdc(512B)  sdb(512B)
sdd(512B)   sdd(512B)  sdd(512B)

In the above case, this is the valid scenario, but still the deployment fails.

Deployment fails:
1. when one of the disk for volume is 4K while the rest are 512B, also vice versa
The above situation is correct.

Deployment is successful:
1. When all disks are 512B
2. When all disks are 4K

Problematic scenario is when the usage of 4K and 512B on the hosts, though
the devices for the volume are of the same disk block size.

Comment 5 SATHEESARAN 2020-12-03 13:39:43 UTC

I have got the code that performs the check correctly:

<snip>
    - name: Check if block device is 4KN
      shell: >
         blockdev --getss {{ item.pvname }} | grep -Po -q "4096"  && echo true || echo false
      register: is4KN
      with_items: "{{ gluster_infra_volume_groups }}"

    - set_fact:
        numdisks: "{{ hostvars[ansible_play_hosts[0]]['is4KN']['results'] | length - 1}}"
        mylist: []

    - name: Create the list of indexes
      set_fact:
        mylist: "{{ mylist + [item] }}"
      with_sequence: start=0 end={{numdisks}}

    - assert:
        that: 
          - hostvars[ansible_play_hosts[0]]['is4KN']['results'][{{item.1}}]['stdout'] == hostvars["{{item.0}}"]['is4KN']['results'][{{item.1}}]['stdout']
        fail_msg: Mix of 4KN and 512B block size disks
      with_nested:
        - "{{ ansible_play_hosts }}"
        - "{{ mylist }}"
      run_once: true
</snip>

Comment 6 SATHEESARAN 2020-12-04 00:38:02 UTC

Gobinda has clarified on the mail to RHHI Program list, that this is restriction of usage of 4K and 512B on the same host
is the restriction, as per design.

<mail_snip>
On Thu, Dec 3, 2020 at 8:01 PM Gobinda Das <godas> wrote:

    Hi sas,
     I don't think this is an issue, the result of what you are getting is by the design.
    The idea of this fix is to solve performance issues. So at any point of time we should not allow mix of block size across volumes.
    For example if data volume is 4k and vmstore is 512B then even though customers invested more for 4k devices, they will not get expected performance for 4k because another volume is slower.
    Also another scenario is for lvm cache, if you have mix of block size and lvm cache disk is 512B then again will end up with the same situation.
</mail_snip>

Comment 7 SATHEESARAN 2020-12-04 00:42:37 UTC

Verified with cockpit-ovirt-dashboard-0.11.15

When using different block sized disks for gluster volumes across the host
or on the same host, the deployment fails

testcase1: Following scenario is a deployment failure

host1    host2     host3
sdb(4k)  sdb(512B) sdb(4K)
sdc      sdc       sdc
sdd      sdd       sdd


testcase2: Following scenario is also the deployment failure, as there are mixed block size devices
on the same host

host1     host2     host3
sdb(4k)   sdb(4K)   sdb(4K)
sdc(512B) sdc(512B) sdc(512B)
sdd       sdd       sdd

Note You need to log in before you can comment on or make changes to this bug.