1733408 – [RHEL 7.5] VDO - out of space issue

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1733408 - [RHEL 7.5] VDO - out of space issue

Summary: [RHEL 7.5] VDO - out of space issue

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	vdo
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andy Walsh
QA Contact:	vdo-qe
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-26 03:09 UTC by xhe@redhat.com
Modified:	2021-09-06 12:33 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-15 02:53:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description xhe@redhat.com 2019-07-26 03:09:57 UTC

Description of problem:
VDO - out of space issue

I tried to reproduce this out of space issue which mentioned in https://access.redhat.com/articles/3966841 , but seems I hit another out of space issue. Let me create current bug to track it. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:

1. create partitions of size of 5G, 7G and 3G 
# fdisk -l
   Device Boot      Start         End      Blocks   Id  System
/dev/sda3        35000320    49680383     7340032   83  Linux
/dev/sda4        49680384   584843263   267581440    5  Extended
/dev/sda5        49682432    60168191     5242880   83  Linux
/dev/sda6        60170240    66461695     3145728   83  Linux

2. create vdo volume1 with 7G physical size
# vdo create --name=vdo1 --device=/dev/sda3 --force

3. create vdo volume1 with 5G physical size
# vdo create --name=vdo2 --device=/dev/sda5 --force
# vdo status -n vdo1 |egrep 'Slab size|Logical size| Physical size|block size|logical blocks|logical blocks used|overhead blocks used|physical blocks|data blocks used|dev\/mapper'
    Logical size: 4185108K
    Physical size: 7G
    Slab size: 2G
      /dev/mapper/vdo1:
        block size: 4096
        data blocks used: 0
        logical blocks: 1046277
        logical blocks used: 0
        overhead blocks used: 787140
        physical blocks: 1835008
# vdo status -n vdo2 |egrep 'Slab size|Logical size| Physical size|block size|logical blocks|logical blocks used|overhead blocks used|physical blocks|data blocks used|dev\/mapper'
    Logical size: 2091956K
    Physical size: 5G
    Slab size: 2G
      /dev/mapper/vdo2:
        block size: 4096
        data blocks used: 0
        logical blocks: 522989
        logical blocks used: 0
        overhead blocks used: 786786
        physical blocks: 1310720

4 create the third vdo volume with 3G physical size
# vdo create --name=vdo_test1 --device=/dev/sda6 --force Creating VDO vdo_test1 vdo: ERROR - vdoformat: formatVDO failed on '/dev/sda6': VDO Status: Out of space

Actual results:


Expected results:


Additional info:

Comment 2 xhe@redhat.com 2019-07-26 03:23:05 UTC

I use --force is because I encountered previous 'ext3 signature offset 1080', vdo suggested me to use --force.

$ vdo create --name=vdo1 --device=/dev/sda5 
Creating VDO vdo1
vdo: ERROR - ext3 signature detected on /dev/sda5 at offset 1080; use --force to override

Comment 3 xhe@redhat.com 2019-07-26 04:02:20 UTC

Hi Andrew,

You are right, issue happens on the small size physical disk (e.g/dev/sda6=3G) as you mentioned in JIRA, I remember you said is 1.5G, do I need to calculate it with actual physical size like this: 3G/2(=1.5G)? 

I tried to use a 5G size disk to create vdo volume, and also I created 7x2G size vdo vols on this 5G size disk /dev/sda7, it works, until I create the eighth vdo9, it shows "Not enough available memory", I guess it is an expected error, right?

$ fdisk -l
   Device Boot      Start         End      Blocks   Id  System
/dev/sda7        62400512    72886271     5242880   83  Linux
/dev/sda8        72888320    79179775     3145728   83  Linux

$ lsblk
NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda        8:0    0 465.8G  0 disk 
├─sda5     8:5    0     7G  0 part 
│ └─vdo1 253:0    0     4G  0 vdo  
├─sda7     8:7    0     5G  0 part 
│ ├─vdo2 253:1    0     2G  0 vdo  
│ ├─vdo3 253:2    0     2G  0 vdo  
│ ├─vdo4 253:3    0     2G  0 vdo  
│ ├─vdo5 253:4    0     2G  0 vdo  
│ ├─vdo6 253:5    0     2G  0 vdo  
│ ├─vdo7 253:6    0     2G  0 vdo  
│ └─vdo8 253:7    0     2G  0 vdo  

$ vdo create --name=vdo9 --device=/dev/sda7 --force
Creating VDO vdo9
vdo: ERROR - Not enough available memory in system for index requirement of 256M

Comment 4 xhe@redhat.com 2019-07-26 04:17:44 UTC

Let me evaluate which vdo volumes are in risk of 'out of space' as we predicted in insights.
      Slab Size <2G                      - 85%
      Slab Size >=2G and <32G  - 90%
      Slab Size 32G                      - 95%
we see, the slab size of my current vdo vol are all 2G, let me use 90% limit.

$ vdostats 
Device               1K-blocks      Used Available Use% Space saving%
/dev/mapper/vdo3       5242880   3147144   2095736  60%           N/A
/dev/mapper/vdo2       5242880   3147144   2095736  60%           N/A
/dev/mapper/vdo1       7340032   3148560   4191472  42%           N/A
/dev/mapper/vdo7       5242880   3147144   2095736  60%           N/A
/dev/mapper/vdo6       5242880   3147144   2095736  60%           N/A
/dev/mapper/vdo5       5242880   3147144   2095736  60%           N/A
/dev/mapper/vdo4       5242880   3147144   2095736  60%           N/A
/dev/mapper/vdo8       5242880   3147144   2095736  60%           N/A

$ vdo status -n vdo1 |egrep 'Slab size|Logical size| Physical size|block size|logical blocks|logical blocks used|overhead blocks used|physical blocks|data blocks used|dev\/mapper'
    Logical size: 4185108K
    Physical size: 7G
    Slab size: 2G
      /dev/mapper/vdo1:
        block size: 4096
        data blocks used: 0
        logical blocks: 1046277
        logical blocks used: 0
        overhead blocks used: 787140
        physical blocks: 1835008

---------------- vdo1 is safe ----------------
vdo_physical_used_pct = (vdo_physical_used + vdo_overhead_used) / vdo_physical_size  
vdo_physical_used_pct = (data blocks used + overhead blocks used)/ Physical size    = (0+787140/1024/1024)/7G = 0.75/7 = 0.10

IF vdo_physical_used_pct >= 90%:
   False


$ vdo status -n vdo2 |egrep 'Slab size|Logical size| Physical size|block size|logical blocks|logical blocks used|overhead blocks used|physical blocks|data blocks used|dev\/mapper'
    Logical size: 2091956K
    Physical size: 5G
    Slab size: 2G
      /dev/mapper/vdo2:
        block size: 4096
        data blocks used: 0
        logical blocks: 522989
        logical blocks used: 0
        overhead blocks used: 786786
        physical blocks: 1310720

---------------- vdo2 is safe ----------------
vdo_physical_used_pct = (vdo_physical_used + vdo_overhead_used) / vdo_physical_size  
vdo_physical_used_pct = (data blocks used + overhead blocks used)/ Physical size    = (0+786786/1024/1024)/7G = 0.75/5 = 0.15

IF vdo_physical_used_pct 0.15 >= 90%:
   False

Comment 5 xhe@redhat.com 2019-07-26 04:22:06 UTC

Hi Andrew, 

My above vdo1 and vdo2 are all too safe from physical_used_pct. I want to make a unsafe vdo which can cause the same issue as KCS, may I have some of you advise to trigger it in my testing? Thanks!

Comment 6 Jakub Krysl 2019-07-26 08:11:22 UTC

I just had call with Xionan explaining how to change slab size and how to hit the issue. She managed to test her formula and now she can continue testing it using various slab sizes.
Clearing needinfo.

Comment 7 xhe@redhat.com 2019-07-26 08:46:14 UTC

I reproduced this issue with help of Jakub Krysl <jkrysl>. Thanks Jakub.

Here is my reproducer:
---------------------
$ vdo create --name=vdo2 --device=/dev/sda6 --force --vdoSlabSize 128 
Creating VDO vdo2
Starting VDO vdo2
Starting compression on VDO vdo2
VDO instance 9 volume is ready at /dev/mapper/vdo2

$ dd if=/dev/sda9 of=/dev/mapper/vdo2 bs=4K count=946631 status=progress
3431813120 bytes (3.4 GB) copied, 114.076388 s, 30.1 MB/s
dd: error writing ‘/dev/mapper/vdo2’: No space left on device  <--- see here
844569+0 records in
844568+0 records out
3459350528 bytes (3.5 GB) copied, 114.979 s, 30.1 MB/s

$ vdo status -n vdo2 |egrep 'Slab size|Logical size| Physical size|block size|logical blocks|logical blocks used|overhead blocks used|physical blocks|data blocks used|dev\/mapper|slab count'
    Logical size: 3378272K
    Physical size: 6G
    Slab size: 128M
      /dev/mapper/vdo2:
        block size: 4096
        data blocks used: 161863
        logical blocks: 844568
        logical blocks used: 844568
        overhead blocks used: 728175
        physical blocks: 1572864
        slab count: 26

$ lsblk
NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda        8:0    0 465.8G  0 disk 
├─sda5     8:5    0     7G  0 part 
│ └─vdo1 253:0    0     4G  0 vdo  
├─sda6     8:6    0     6G  0 part 
│ └─vdo2 253:1    0   3.2G  0 vdo  

$ vdostats 
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo2          6.0G      3.4G      2.6G  56%           80%  

$ rpm -qa|egrep 'kernel-[0-9]|vdo'
kernel-3.10.0-862.el7.x86_64
vdo-6.1.0.149-16.x86_64
kmod-kvdo-6.1.0.153-15.el7.x86_64

$ uname -a
Linux rdma-qe-04.lab.bos.redhat.com 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 8 xhe@redhat.com 2019-07-26 08:58:58 UTC

Let me calculate the physical used percentage again based on #c7, unfortunately it still doesn't match my formula.

vdo_physical_used_pct = (vdo_physical_used + vdo_overhead_used) / vdo_physical_size  
vdo_physical_used_pct = (data blocks used + overhead blocks used)/ Physical size    = (161863+728175)/1024*1024/5G = 0.8488/5 = 0.16

IF vdo_physical_used_pct 0.16976 >=0.85:
    False

Comment 9 sclafani 2019-07-26 22:02:57 UTC

> dd: error writing ‘/dev/mapper/vdo2’: No space left on device  <--- see here

This isn't coming from VDO because you've used up the physical space. You're getting this because you're trying to write beyond the end of the logical size of the VDO volume.

>         logical blocks: 844568
>         logical blocks used: 844568
> 844569+0 records in
> 844568+0 records out

You didn't specify a logical size for the VDO volume, so it defaulted to the maximum size that could never result in the over-provisioning of the VDO volume. That means you can never fill the physical VDO volume before filling the logical volume. You'd get the same error if VDO were replaced with a linear device with 844568 blocks of space.

I don't think this is the problem you're trying to solve, but unfortunately I don't understand what problem you are trying to solve.

Comment 10 Jakub Krysl 2019-07-29 07:49:39 UTC

(In reply to sclafani from comment #9)
> 
> I don't think this is the problem you're trying to solve, but unfortunately
> I don't understand what problem you are trying to solve.

AFAIK Xiaonan is trying to "fix" the ENOSPC issue ( https://access.redhat.com/articles/3966841 ) for most people using Inisghts, where it should do something when the fill limit is hit. The limit is based on slab size this way:
      Slab Size <2G                      - 85%
      Slab Size >=2G and <32G  - 90%
      Slab Size 32G                      - 95%
And she is trying to verify if the formula is correct.

We got in touch later after the last comment and I explained the relation between logical size, physical size and space saving. So hopefully there will be no issues any more. :)

I'll leave the needinfo intact for Xiaonan to reply if there are any other issues she needs help with. If not, please close the BZ as the issue is not in VDO but in the way VDO is used.

Comment 12 Andy Walsh 2019-08-02 21:16:42 UTC

I think I agree with sclafani on this issue.  The VDO logical size in your example is only 3.2G and you're getting ~80% space savings.  You could theoretically fit another 4x the 3.2G you already wrote into the same volume before running out of physical space.

Looking at Comment#8, I'm not sure you're calculating the physical size appropriately.

If you take the vdostats output and apply them exactly as shown:
used_pct = (data blocks used + overhead blocks used) / physical blocks
0.56 = (161863 + 728175) / 1572864 <-- This is 56% as shown in your example from comment#7.

Note that all blocks output (unless otherwise state) are in units of 4K, so converting to a human readable size of GiB, you have to multiply by 4 (to convert from 4096K to 1024k blocks), first.


I tried to reproduce this with similar conditions, but with no deduplication present, and I'm able to fill the device:
[root@localhost ~]# df -hl .
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        20G  1.8G   19G   9% /
[root@localhost ~]# truncate -s 18G loop
[root@localhost ~]# losetup loop0 !$
losetup loop0 loop
[root@localhost ~]# vgcreate vdo_base /dev/loop0
  Physical volume "/dev/loop0" successfully created.
  Volume group "vdo_base" successfully created
[root@localhost ~]# lvcreate -L 6G -n vdo_base vdo_base                                                                                                                              
  Logical volume "vdo_base" created.
[root@localhost ~]# vdo create --name vdo2 --device /dev/vdo_base/vdo_base --vdoSlabSize 128
Creating VDO vdo2
Starting VDO vdo2
Starting compression on VDO vdo2
VDO instance 0 volume is ready at /dev/mapper/vdo2
[root@localhost ~]# lsblk
NAME                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0                 7:0    0   18G  0 loop 
└─vdo_base-vdo_base 252:0    0    6G  0 lvm  
  └─vdo2            252:1    0  3.2G  0 vdo  
vda                 253:0    0   20G  0 disk 
└─vda1              253:1    0   20G  0 part /
[root@localhost ~]# sudo dd if=/dev/urandom of=/dev/mapper/vdo2 oflag=direct bs=4K count=946631 status=progress
3412733952 bytes (3.4 GB, 3.2 GiB) copied, 72 s, 47.4 MB/s
dd: error writing '/dev/mapper/vdo2': No space left on device
844689+0 records in
844688+0 records out
3459842048 bytes (3.5 GB, 3.2 GiB) copied, 72.982 s, 47.4 MB/s
[root@localhost ~]# vdo status -n vdo2 |egrep 'Slab size|Logical size| Physical size|block size|logical blocks|logical blocks used|overhead blocks used|physical blocks|data blocks used|dev\/mapper|slab count'
    Logical size: 3378752K
    Physical size: 6G
    Slab size: 128M
      /dev/mapper/vdo2:
        block size: 4096
        data blocks used: 844688
        logical blocks: 844688
        logical blocks used: 844688
        overhead blocks used: 728175
        physical blocks: 1572864
        slab count: 26
[root@localhost ~]# vdostats
Device               1K-blocks      Used Available Use% Space saving%
/dev/mapper/vdo2       6291456   6291452         4  99%            0%
[root@localhost ~]# vdostats --hum
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo2          6.0G      6.0G      4.0K  99%            0%


Taking those numbers, I'm able to get the 99% usage:
used_pct = (data blocks used + overhead blocks used) / physical blocks
0.99 = (844688 + 728175) / 1572864

Comment 13 xhe@redhat.com 2019-08-06 09:34:16 UTC

Thanks Michael and Andy,

I can reproduce it in my local host now!
[root@ibm-hs21-8853-2 ~]# vdostats --hum
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo_test1      6.0G      6.0G    480.0K  99%            0%

Note You need to log in before you can comment on or make changes to this bug.