RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1510567 - Calltrace when vdo create --indexMem with not enough free memory
Summary: Calltrace when vdo create --indexMem with not enough free memory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kmod-kvdo
Version: 7.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: bjohnsto
QA Contact: Jakub Krysl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-07 16:53 UTC by Jakub Krysl
Modified: 2021-09-03 11:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-10 16:25:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Console logs (88.24 KB, text/plain)
2017-11-07 16:53 UTC, Jakub Krysl
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0900 0 None None None 2018-04-10 16:26:06 UTC

Description Jakub Krysl 2017-11-07 16:53:22 UTC
Created attachment 1349064 [details]
Console logs

Description of problem:
When creating vdo with not enough free memory, the terminal command "vdo create" 
does not check for enough free memory and neither does kvdo. Instead it tries to create the vdo leading to out of memory calltrace, possibly crashing the server with panic (no more processes to kill). The result itself depends on how much free memory is actually there.
Tried it with 500MB, which lead to deadlock. 1500 MB lead to fast process killing calltraces and deadlock.
This is quite hard to test because of 1510558.

Note: I reduced free memory with 'modprobe scsi_debug dev_size_mb=SPARE_MEM_IN_MB'

Version-Release number of selected component (if applicable):
vdo-6.1.0.34-8
kmod-kvdo-6.1.0.34-7

How reproducible:
100%

Steps to Reproduce:
1. create vdo with not enough memory
2. observe calltraces

Actual results:
calltraces

Expected results:
vdo does not try to create itself when there is not enough free memory / kvdo aborts UDS creation when out of memory

Additional info:

Comment 2 Corey Marthaler 2017-11-08 17:58:29 UTC
Reproduced this in our environment as well. VDO doesn't really work w/ our low mem virt testing environment. With larger machines, it appears fine, but that is going to make automated regression testing difficult.


# Appears to work fine on large mem machine

[root@harding-02 ~]# cat /proc/cpuinfo | grep processor | wc -l
32

[root@harding-02 ~]# cat /proc/meminfo
MemTotal:       65755220 kB
MemFree:        62470472 kB
MemAvailable:   62276952 kB

[root@harding-02 ~]# vdo list

[root@harding-02 ~]# vdo create --name vdo1 --device /dev/mapper/mpathb
Creating VDO vdo1
Starting VDO vdo1
Starting compression on VDO vdo1
VDO instance 0 volume is ready at /dev/mapper/vdo1
[root@harding-02 ~]# vdo create --name vdo2 --device /dev/mapper/mpathc
Creating VDO vdo2
Starting VDO vdo2
Starting compression on VDO vdo2
VDO instance 1 volume is ready at /dev/mapper/vdo2
[root@harding-02 ~]# vdo create --name vdo3 --device /dev/mapper/mpathd
Creating VDO vdo3
Starting VDO vdo3
Starting compression on VDO vdo3
VDO instance 2 volume is ready at /dev/mapper/vdo3
[root@harding-02 ~]# vdo create --name vdo4 --device /dev/mapper/mpathe
Creating VDO vdo4
Starting VDO vdo4
Starting compression on VDO vdo4
VDO instance 3 volume is ready at /dev/mapper/vdo4




# Does not work on low mem machines

[root@host-117 ~]# cat /proc/cpuinfo | grep processor | wc -l
1
[root@host-117 ~]#  cat /proc/meminfo
MemTotal:        1016048 kB
MemFree:          490388 kB
MemAvailable:     586536 kB


[root@host-117 ~]# vdo create --name PV1 --device /dev/sda1
Creating VDO PV1
Starting VDO PV1
Starting compression on VDO PV1
VDO instance 1 volume is ready at /dev/mapper/PV1
[root@host-117 ~]# vdo create --name PV2 --device /dev/sdb1
Creating VDO PV2
Starting VDO PV2
Starting compression on VDO PV2
VDO instance 2 volume is ready at /dev/mapper/PV2
[root@host-117 ~]# time vdo create --name PV3 --device /dev/sdc1
Creating VDO PV3
vdo: ERROR - vdoformat: command failed, signal 9

real    23m50.128s
user    0m8.170s
sys     2m40.920s

Ends up hitting OOM killer:
Nov  8 11:18:56 host-117 kernel: Out of memory: Kill process 10213 (vdoformat) score 9 or sacrifice child
Nov  8 11:18:56 host-117 kernel: Killed process 10213 (vdoformat) total-vm:32784kB, anon-rss:3352kB, file-rss:756kB, shmem-rss:0kB

Comment 3 Thomas Jaskiewicz 2017-11-08 20:10:20 UTC
The original complaint asks us to not try to create a VDO when there is not enough memory.  We find this a very hard request to fulfill.

Starting up a VDO requires allocating hundreds of megabytes of kernel memory, and the Linux memory allocator is not friendly to such users.  We do specify flags that eschew the use of the OOM killer for our allocation, but sometimes see the OOM killer invoked by another process trying to allocate memory at the same point in time.

Looking at the kern.log attached to this report, I do see a related issue that does need to be addressed.  VDO first tries to load or rebuild the existing index, and if that fails then creates a new index.  This seems to be the wrong thing to do if the failure to load/rebuild is because of a memory allocation failure.

Comment 5 bjohnsto 2017-11-28 18:36:07 UTC
I do have a couple questions/comments here. 

1. Doing something in VDO manager and/or in vdoformat.c will not protect from a scenario where we try to move a VDO volume from one system to another. The kernel code could still fail on the new system if there is less memory. 

2. Even if the code checks on create, there is still the issue with stopping a volume and then starting it and having less kernel memory then when we created. 

3. The checking seems to be only for the index, and not for all the memory VDO will allocate. I'm not sure just checking for index memory only is the best idea.

Comment 7 Jakub Krysl 2017-12-15 11:14:26 UTC
Now there is a check when creating vdo with not enough memory for the index:
# vdo create --name vdo --device /dev/mapper/rhel_storageqe--74-home --indexMem 7
Creating VDO vdo
vdo: ERROR - Not enough available memory in system for index requirement of 7G

The check works even when there is not enough memory for default index:
# vdo create --name vdo --device /dev/mapper/rhel_storageqe--74-home
Creating VDO vdo
vdo: ERROR - Not enough available memory in system for index requirement of 256M

When starting stopped vdo, the behaviour now is calltraces, index does not start and its state is set to error and vdo itself starts. With this check on 'vdo start' too, the index will not start (not enough memory) and its state would be offline (probably), but the vdo itself could start so customer can access his data.
I think this behaviour is much better than calltraces and index state 'error', so is it possible to implement the check to 'vdo start' too to get to this (or similar) behaviour? (e.g. vdo started without index)
Thanks


Note: As I understand it checking for all memory vdo requires is nearly impossible, so checking for index memory is probably the only thing to mitigate this. I agree this is not the best idea, but probably the only option we have to at least partly avoid this. So if checking for index memory requirement is possible, it should be checked everywhere we can.

Comment 8 bjohnsto 2018-01-04 18:13:08 UTC
I'll look into moving the check to the start/stop code, but I have some concerns about whether something like vdoFormat will cause an error before my check will now happen. If so, we might end up with two different error messages. If it doesn't and my check will give the same error for both, i'll make the change. Otherwise I think we should take the fix as is for this releaase.

Comment 9 bjohnsto 2018-01-08 21:54:59 UTC
I did confirm that at least in my tests, moving the code to the start function will cause vdoFormat to give another error. 

I've put the call back before vdoFormat, and have added the same call into the start command. Its not ideal, since we're copying code, but it will provide what you're looking for.

Comment 10 Jakub Krysl 2018-01-09 09:19:59 UTC
Giving back because of recent changes.

Comment 12 Jakub Krysl 2018-02-14 11:50:47 UTC
# free -m
              total        used        free      shared  buff/cache   available
Mem:           7696        7379         124           9         191          57
Swap:          7935           0        7935
# vdo start --name vdo
Starting VDO vdo
vdo: ERROR - Not enough available memory in system for index requirement of 256M

Starting vdo with not enough memory when there was enough to create it now produces the same error as creating vdo with not enough memory.

Comment 15 errata-xmlrpc 2018-04-10 16:25:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0900


Note You need to log in before you can comment on or make changes to this bug.