Bug 1510567
| Summary: | Calltrace when vdo create --indexMem with not enough free memory | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jakub Krysl <jkrysl> | ||||
| Component: | kmod-kvdo | Assignee: | bjohnsto | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jakub Krysl <jkrysl> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.5 | CC: | awalsh, bjohnsto, cmarthal, limershe, salmy, tjaskiew | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-04-10 16:25:27 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Reproduced this in our environment as well. VDO doesn't really work w/ our low mem virt testing environment. With larger machines, it appears fine, but that is going to make automated regression testing difficult. # Appears to work fine on large mem machine [root@harding-02 ~]# cat /proc/cpuinfo | grep processor | wc -l 32 [root@harding-02 ~]# cat /proc/meminfo MemTotal: 65755220 kB MemFree: 62470472 kB MemAvailable: 62276952 kB [root@harding-02 ~]# vdo list [root@harding-02 ~]# vdo create --name vdo1 --device /dev/mapper/mpathb Creating VDO vdo1 Starting VDO vdo1 Starting compression on VDO vdo1 VDO instance 0 volume is ready at /dev/mapper/vdo1 [root@harding-02 ~]# vdo create --name vdo2 --device /dev/mapper/mpathc Creating VDO vdo2 Starting VDO vdo2 Starting compression on VDO vdo2 VDO instance 1 volume is ready at /dev/mapper/vdo2 [root@harding-02 ~]# vdo create --name vdo3 --device /dev/mapper/mpathd Creating VDO vdo3 Starting VDO vdo3 Starting compression on VDO vdo3 VDO instance 2 volume is ready at /dev/mapper/vdo3 [root@harding-02 ~]# vdo create --name vdo4 --device /dev/mapper/mpathe Creating VDO vdo4 Starting VDO vdo4 Starting compression on VDO vdo4 VDO instance 3 volume is ready at /dev/mapper/vdo4 # Does not work on low mem machines [root@host-117 ~]# cat /proc/cpuinfo | grep processor | wc -l 1 [root@host-117 ~]# cat /proc/meminfo MemTotal: 1016048 kB MemFree: 490388 kB MemAvailable: 586536 kB [root@host-117 ~]# vdo create --name PV1 --device /dev/sda1 Creating VDO PV1 Starting VDO PV1 Starting compression on VDO PV1 VDO instance 1 volume is ready at /dev/mapper/PV1 [root@host-117 ~]# vdo create --name PV2 --device /dev/sdb1 Creating VDO PV2 Starting VDO PV2 Starting compression on VDO PV2 VDO instance 2 volume is ready at /dev/mapper/PV2 [root@host-117 ~]# time vdo create --name PV3 --device /dev/sdc1 Creating VDO PV3 vdo: ERROR - vdoformat: command failed, signal 9 real 23m50.128s user 0m8.170s sys 2m40.920s Ends up hitting OOM killer: Nov 8 11:18:56 host-117 kernel: Out of memory: Kill process 10213 (vdoformat) score 9 or sacrifice child Nov 8 11:18:56 host-117 kernel: Killed process 10213 (vdoformat) total-vm:32784kB, anon-rss:3352kB, file-rss:756kB, shmem-rss:0kB The original complaint asks us to not try to create a VDO when there is not enough memory. We find this a very hard request to fulfill. Starting up a VDO requires allocating hundreds of megabytes of kernel memory, and the Linux memory allocator is not friendly to such users. We do specify flags that eschew the use of the OOM killer for our allocation, but sometimes see the OOM killer invoked by another process trying to allocate memory at the same point in time. Looking at the kern.log attached to this report, I do see a related issue that does need to be addressed. VDO first tries to load or rebuild the existing index, and if that fails then creates a new index. This seems to be the wrong thing to do if the failure to load/rebuild is because of a memory allocation failure. I do have a couple questions/comments here. 1. Doing something in VDO manager and/or in vdoformat.c will not protect from a scenario where we try to move a VDO volume from one system to another. The kernel code could still fail on the new system if there is less memory. 2. Even if the code checks on create, there is still the issue with stopping a volume and then starting it and having less kernel memory then when we created. 3. The checking seems to be only for the index, and not for all the memory VDO will allocate. I'm not sure just checking for index memory only is the best idea. Now there is a check when creating vdo with not enough memory for the index: # vdo create --name vdo --device /dev/mapper/rhel_storageqe--74-home --indexMem 7 Creating VDO vdo vdo: ERROR - Not enough available memory in system for index requirement of 7G The check works even when there is not enough memory for default index: # vdo create --name vdo --device /dev/mapper/rhel_storageqe--74-home Creating VDO vdo vdo: ERROR - Not enough available memory in system for index requirement of 256M When starting stopped vdo, the behaviour now is calltraces, index does not start and its state is set to error and vdo itself starts. With this check on 'vdo start' too, the index will not start (not enough memory) and its state would be offline (probably), but the vdo itself could start so customer can access his data. I think this behaviour is much better than calltraces and index state 'error', so is it possible to implement the check to 'vdo start' too to get to this (or similar) behaviour? (e.g. vdo started without index) Thanks Note: As I understand it checking for all memory vdo requires is nearly impossible, so checking for index memory is probably the only thing to mitigate this. I agree this is not the best idea, but probably the only option we have to at least partly avoid this. So if checking for index memory requirement is possible, it should be checked everywhere we can. I'll look into moving the check to the start/stop code, but I have some concerns about whether something like vdoFormat will cause an error before my check will now happen. If so, we might end up with two different error messages. If it doesn't and my check will give the same error for both, i'll make the change. Otherwise I think we should take the fix as is for this releaase. I did confirm that at least in my tests, moving the code to the start function will cause vdoFormat to give another error. I've put the call back before vdoFormat, and have added the same call into the start command. Its not ideal, since we're copying code, but it will provide what you're looking for. Giving back because of recent changes.
# free -m
total used free shared buff/cache available
Mem: 7696 7379 124 9 191 57
Swap: 7935 0 7935
# vdo start --name vdo
Starting VDO vdo
vdo: ERROR - Not enough available memory in system for index requirement of 256M
Starting vdo with not enough memory when there was enough to create it now produces the same error as creating vdo with not enough memory.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0900 |
Created attachment 1349064 [details] Console logs Description of problem: When creating vdo with not enough free memory, the terminal command "vdo create" does not check for enough free memory and neither does kvdo. Instead it tries to create the vdo leading to out of memory calltrace, possibly crashing the server with panic (no more processes to kill). The result itself depends on how much free memory is actually there. Tried it with 500MB, which lead to deadlock. 1500 MB lead to fast process killing calltraces and deadlock. This is quite hard to test because of 1510558. Note: I reduced free memory with 'modprobe scsi_debug dev_size_mb=SPARE_MEM_IN_MB' Version-Release number of selected component (if applicable): vdo-6.1.0.34-8 kmod-kvdo-6.1.0.34-7 How reproducible: 100% Steps to Reproduce: 1. create vdo with not enough memory 2. observe calltraces Actual results: calltraces Expected results: vdo does not try to create itself when there is not enough free memory / kvdo aborts UDS creation when out of memory Additional info: