Description of problem ====================== When admin uses "Create Brick" button[1] to initialize given storage device (eg. vdc), Console partitions it in the following way: * single mbr partition (eg. vdc1) is created on the device * lvm pv (physical volume) is created in vdc1 partition * lvm vg (volume group) is created in the pv * lvm thin pool and thin volume is created inside vg So that the final partitioning looks like this: [sda------------------------------- ... --] [mbr-gap][vda1--------------------- ... --] [pv-gap][pv-data---------- ... --] ^ ^ ^ lba 0 1MB 2MB (pe_start) Where lba is adress of sector counted from the start of the device. As you can see, first mbr partition starts on 1MB boundary (default value), and there is another gap for lvm metadata, again 1MB by default. This means that pe_start is actually located 2MB from the start of the device. The problem is that when you need to tweak data alignemnt (eg. when RAID is used), console uses --dataalignment option of pvcreate command to do it. But pvcreate cares only about start of the device it's created on - in our case vda1, and doesn't know (by design) about 1MB wide mbr gap. So for example when one needs to set data alignment to value which is not multiple of 1MB, let's say 1,25 MB, console aligns data in a wrong way as you will end up with 1MB (mbr gap) + 1,25MB (lvm data alignment) = 2.25MB instead. Moreover Console seems to break our own recommendations. Process how to align partitions properly is described in great detail in chapter "11.2. Brick Configuration"[2] of RHS Admin Guide and mbr partitions are not mentioned there at all. [1] the button is available inside "Storage Devices" tab for each Host. [2] https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/Brick_Configuration.html Version-Release number of selected component (if applicable) ============================================================ rhsc-3.1.0-0.62.el6.noarch How reproducible ================ 100% Steps to Reproduce ================== 1. Prepare clean disk device on one host managed by console 2. Use "Create Brick" function to initialize disk device while selecting nontrivial setup which requires special data alignment which is not multiple of 1MB (eg. some RAID). 3. Check partitioning created on the disk device. Actual results ============== Let's see partition table on the disk device (/dev/vdc in my case): ~~~ # fdisk -cul /dev/vdc Disk /dev/vdc: 107.4 GB, 107374182400 bytes 255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0009f9f6 Device Boot Start End Blocks Id System /dev/vdc1 2048 209715199 104856576 8e Linux LVM ~~~ As we can see, Console created mbr partition vdc1 on the device. This means that first sector of vdc1 partition starts on lba 2048 (1MB from the beginning of the disk). So far so good. But when we look at pv setup: ~~~ # pvs -o +pe_start /dev/vdc1 PV VG Fmt Attr PSize PFree 1st PE /dev/vdc1 vg-alignmentbrick lvm2 a-- 100.00g 3.50g 1.25m ~~~ We see that data alignment of 1.25 MB was used. And if I read manpage of pvcreate right, this means that actual first sector of pv (pe_start) lies on 1MB + 1.25 MB = 2.25 MB. And this is be a problem, because the actual data alignment is 2.25 MB, which is not multiple of required data alignment value 1.25 MB and so the alignment setup is wrong. Expected results ================ Console makes sure that actual data alignment matches the requirements. It seems to me that the easiest way to achieve this would be to drop mbr partition entirely. Note that chapter "11.2. Brick Configuration"[2] doesn't mentions mbr partitioning at all.
If this BZ is fixed by not creating mbr partition when brick is created on clean device as suggested in expected results section, would it make sense to prevent storage admin to create brick on already existing (not created by console) mbr partition?
(In reply to Martin Bukatovic from comment #2) > If this BZ is fixed by not creating mbr partition when brick is created on > clean > device as suggested in expected results section, would it make sense to > prevent > storage admin to create brick on already existing (not created by console) > mbr > partition? Yes. This make sense. let me clone the bz#1211140 to downstream.
Before creating any bricks: ============================================================================================= [root@dhcp35-99 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhgs-root 18G 1.5G 16G 9% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 8.5M 1.9G 1% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/vda1 497M 89M 409M 18% /boot tmpfs 389M 0 389M 0% /run/user/0 [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 rhgs lvm2 a-- 19.51g 40.00m [root@dhcp35-99 ~]# vgs VG #PV #LV #SN Attr VSize VFree rhgs 1 2 0 wz--n- 19.51g 40.00m [root@dhcp35-99 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root rhgs -wi-ao---- 17.47g swap rhgs -wi-ao---- 2.00g [root@dhcp35-99 ~]# fdisk -l |grep vd Disk /dev/vda: 21.5 GB, 21474836480 bytes, 41943040 sectors /dev/vda1 * 2048 1026047 512000 83 Linux /dev/vda2 1026048 41943039 20458496 8e Linux LVM Disk /dev/vdb: 53.7 GB, 53687091200 bytes, 104857600 sectors Disk /dev/vdc: 53.7 GB, 53687091200 bytes, 104857600 sectors Disk /dev/vdd: 53.7 GB, 53687091200 bytes, 104857600 sectors Disk /dev/vde: 53.7 GB, 53687091200 bytes, 104857600 sectors Disk /dev/vdf: 53.7 GB, 53687091200 bytes, 104857600 sectors Disk /dev/vdg: 53.7 GB, 53687091200 bytes, 104857600 sectors [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# ======================================================================================================================= After creating brick from console: used RAID 6 for creation where stripe size is 128KB and no of bricks used is 6 so, data alignment = 4disks * 128KB= 512KB [root@dhcp35-99 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhgs-root 18G 1.5G 16G 9% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 8.6M 1.9G 1% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/vda1 497M 89M 409M 18% /boot tmpfs 389M 0 389M 0% /run/user/0 /dev/mapper/vg--brick1-brick1 50G 33M 50G 1% /rhgs/brick1 [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 rhgs lvm2 a-- 19.51g 40.00m /dev/vdg vg-brick1 lvm2 a-- 50.00g 1.50m [root@dhcp35-99 ~]# vgs VG #PV #LV #SN Attr VSize VFree rhgs 1 2 0 wz--n- 19.51g 40.00m vg-brick1 1 2 0 wz--n- 50.00g 1.50m [root@dhcp35-99 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root rhgs -wi-ao---- 17.47g swap rhgs -wi-ao---- 2.00g brick1 vg-brick1 Vwi-aot--- 50.00g pool-brick1 0.07 pool-brick1 vg-brick1 twi-aot--- 49.75g 0.07 0.03 [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# fdisk -l /dev/vdg Disk /dev/vdg: 53.7 GB, 53687091200 bytes, 104857600 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# pvs -o +pe_start /dev/vdg PV VG Fmt Attr PSize PFree 1st PE /dev/vdg vg-brick1 lvm2 a-- 50.00g 1.50m 512.00k [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# cat /etc/fstab # # /etc/fstab # Created by anaconda on Sat Dec 26 04:42:40 2015 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/rhgs-root / xfs defaults 0 0 UUID=0ebedc93-5a4c-4709-8860-b398ed59ec7e /boot xfs defaults 0 0 /dev/mapper/rhgs-swap swap swap defaults 0 0 /dev/mapper/vg--brick1-brick1 /rhgs/brick1 xfs inode64,noatime 0 0 [root@dhcp35-99 ~]# ======================================================================================================= For RAID 10, stripe size is 256KB and no of bricks used is 4 so, data alignment = 2disks * 256KB= 512KB [root@dhcp35-99 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhgs-root 18G 1.6G 16G 9% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 8.6M 1.9G 1% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/vda1 497M 89M 409M 18% /boot tmpfs 389M 0 389M 0% /run/user/0 /dev/mapper/vg--brick1-brick1 50G 33M 50G 1% /rhgs/brick1 /dev/mapper/vg--brick2-brick2 50G 33M 50G 1% /rhgs/brick2 [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 rhgs lvm2 a-- 19.51g 40.00m /dev/vdf vg-brick2 lvm2 a-- 50.00g 1.50m /dev/vdg vg-brick1 lvm2 a-- 50.00g 1.50m [root@dhcp35-99 ~]# vgs VG #PV #LV #SN Attr VSize VFree rhgs 1 2 0 wz--n- 19.51g 40.00m vg-brick1 1 2 0 wz--n- 50.00g 1.50m vg-brick2 1 2 0 wz--n- 50.00g 1.50m [root@dhcp35-99 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root rhgs -wi-ao---- 17.47g swap rhgs -wi-ao---- 2.00g brick1 vg-brick1 Vwi-aot--- 50.00g pool-brick1 0.07 pool-brick1 vg-brick1 twi-aot--- 49.75g 0.07 0.03 brick2 vg-brick2 Vwi-aot--- 50.00g pool-brick2 0.06 pool-brick2 vg-brick2 twi-aot--- 49.75g 0.06 0.04 [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# fdisk -l /dev/vdf Disk /dev/vdf: 53.7 GB, 53687091200 bytes, 104857600 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# pvs -o +pe_start /dev/vdf PV VG Fmt Attr PSize PFree 1st PE /dev/vdf vg-brick2 lvm2 a-- 50.00g 1.50m 512.00k [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# =========================================================================================================================== With normal brick creation without RAID configs then usual JBOD stripe size is used which is 256KB. [root@dhcp35-99 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhgs-root 18G 1.6G 16G 9% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 8.6M 1.9G 1% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/vda1 497M 89M 409M 18% /boot tmpfs 389M 0 389M 0% /run/user/0 /dev/mapper/vg--brick1-brick1 50G 33M 50G 1% /rhgs/brick1 /dev/mapper/vg--brick2-brick2 50G 33M 50G 1% /rhgs/brick2 /dev/mapper/vg--brick3-brick3 50G 33M 50G 1% /rhgs/brick3 [root@dhcp35-99 ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 rhgs lvm2 a-- 19.51g 40.00m /dev/vde vg-brick3 lvm2 a-- 50.00g 1.75m /dev/vdf vg-brick2 lvm2 a-- 50.00g 1.50m /dev/vdg vg-brick1 lvm2 a-- 50.00g 1.50m [root@dhcp35-99 ~]# vgs VG #PV #LV #SN Attr VSize VFree rhgs 1 2 0 wz--n- 19.51g 40.00m vg-brick1 1 2 0 wz--n- 50.00g 1.50m vg-brick2 1 2 0 wz--n- 50.00g 1.50m vg-brick3 1 2 0 wz--n- 50.00g 1.75m [root@dhcp35-99 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root rhgs -wi-ao---- 17.47g swap rhgs -wi-ao---- 2.00g brick1 vg-brick1 Vwi-aot--- 50.00g pool-brick1 0.07 pool-brick1 vg-brick1 twi-aot--- 49.75g 0.07 0.03 brick2 vg-brick2 Vwi-aot--- 50.00g pool-brick2 0.06 pool-brick2 vg-brick2 twi-aot--- 49.75g 0.06 0.04 brick3 vg-brick3 Vwi-aot--- 50.00g pool-brick3 0.06 pool-brick3 vg-brick3 twi-aot--- 49.75g 0.06 0.04 [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# fdisk -l /dev/vde Disk /dev/vde: 53.7 GB, 53687091200 bytes, 104857600 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# pvs -o +pe_start /dev/vde PV VG Fmt Attr PSize PFree 1st PE /dev/vde vg-brick3 lvm2 a-- 50.00g 1.75m 256.00k [root@dhcp35-99 ~]# [root@dhcp35-99 ~]# Conclusion: no mbr partitions were created when creating bricks from console, with which brick will be aligned with RAID parameters.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0310.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days