Bug 1703278 - Scheduled NFS backup is not directly stored on NFS location rather stored on /tmp and then moved to NFS storage.
Summary: Scheduled NFS backup is not directly stored on NFS location rather stored on ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.10.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.11.0
Assignee: Nick LaMuro
QA Contact: Jaroslav Henner
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On: 1732808
Blocks: 1704905 1717025
TreeView+ depends on / blocked
 
Reported: 2019-04-26 01:22 UTC by Nikhil Gupta
Modified: 2020-01-10 16:07 UTC (History)
10 users (show)

Fixed In Version: 5.11.0.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1717025 (view as bug list)
Environment:
Last Closed: 2019-12-13 14:55:27 UTC
Category: Bug
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
evm.log (3.84 MB, text/plain)
2019-08-21 15:43 UTC, Jaroslav Henner
no flags Details

Description Nikhil Gupta 2019-04-26 01:22:32 UTC
Description of problem:
I have scheduled NFS backup and when I try to perform a backup, I get an error for not enough free disk space under /tmp of cfme appliance:
~~~
[----] W, [2019-04-23T16:01:20.852389 #5840:8eb1e70]  WARN -- : MIQ(EvmDatabaseOps.validate_free_space) Destination location: [/tmp/20190423-5840-1abizg5], does not have enough free disk space: [1028640768 bytes] for database of size: [64529440940 bytes]
~~~
Which is correct as the /tmp is only 1G and the database is over 60G.

non-vmdb6 # df -h /tmp/.
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/VG--CFME-lv_tmp 1014M   33M  982M   4% /tmp

vmdb # df -h 
/dev/mapper/vg_pg-lv_pg                134G   62G   73G  46% /var/opt/rh/rh-postgresql95/lib/pgsql


1. It should use NFS mount point directly to dump the database after the validation of free space in NFS share.

2. We did use "Extend Temporary Storage" option from appliance_console and it has '/var/www/miq_tmp' disk with sufficient available disk space, 

   /dev/vdb1                              100G   33M  100G   1% /var/www/miq_tmp

   I know that the appliance uses '/var/www/miq_tmp' temporary storage directory to perform certain image download functions. Why does it not use this disk? Is it configurable to use this temp storage while the database backup?

Version-Release number of selected component (if applicable):
cfme-5.10.3

How reproducible:
Always

Steps to Reproduce:
1. Make sure database dump size is more than 1 GB.
2. Schedule NFS database backup from UI.


Actual results:
Receive an error in the logs that /tmp does not have enough free disk space.

Expected results:
Database dump should be done directly on the NFS share.

Additional info:
After extending the /tmp logical volume manually, able to the database backup successfully.

Comment 11 Nick LaMuro 2019-05-08 01:29:20 UTC
Have a proposed fix for this issue pushed:

https://github.com/ManageIQ/manageiq/pull/18745/files


Working on validating this fixes things and getting it reviewed.



-Nick

Comment 14 Jerry Keselman 2019-05-31 18:05:48 UTC
Dennis,

I'm not sure why this was reassigned to me since it appears from reading through all the comments that Nick L already has a fix merged
and he is just waiting for that PR to be backported before this BZ can be moved to POST.

Jerry

Comment 16 Jaroslav Henner 2019-07-23 12:37:04 UTC
I tried to use inotifywait to watch the events happening in /tmp dir on cfme-5.11.0.15-1.el8cf.x86_64. I did find that some files are still created in /tmp:
# inotifywait -mr /tmp
...
/tmp/ CREATE,ISDIR miq_20190723-7895-1vb9rj4
/tmp/ OPEN,ISDIR miq_20190723-7895-1vb9rj4
/tmp/ ACCESS,ISDIR miq_20190723-7895-1vb9rj4
/tmp/ CLOSE_NOWRITE,CLOSE,ISDIR miq_20190723-7895-1vb9rj4
/tmp/ CREATE 20190723-7895-k9a36r
/tmp/ OPEN 20190723-7895-k9a36r
/tmp/ CREATE 20190723-7895-10k4lz1
/tmp/ OPEN 20190723-7895-10k4lz1
/tmp/ OPEN 20190723-7895-k9a36r
/tmp/ MODIFY 20190723-7895-10k4lz1
/tmp/ CLOSE_WRITE,CLOSE 20190723-7895-10k4lz1
/tmp/ CLOSE_WRITE,CLOSE 20190723-7895-k9a36r
/tmp/ OPEN 20190723-7895-10k4lz1
/tmp/ ACCESS 20190723-7895-10k4lz1
/tmp/ CLOSE_NOWRITE,CLOSE 20190723-7895-10k4lz1
/tmp/ DELETE 20190723-7895-10k4lz1
/tmp/ CLOSE_WRITE,CLOSE 20190723-7895-k9a36r
/tmp/ DELETE 20190723-7895-k9a36r
/tmp/miq_20190723-7895-1vb9rj4/ DELETE_SELF 
/tmp/ DELETE,ISDIR miq_20190723-7895-1vb9rj4


But it is true that on the cfme-5.10.7.1-1.el7cf.x86_64 there are more events:
[root@dhcp-8-198-222 ~]# inotifywait -mr /tmp/ | tee files
Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.
/tmp/ CREATE,ISDIR miq_20190723-12303-ba57n7
/tmp/ OPEN,ISDIR miq_20190723-12303-ba57n7
/tmp/ CLOSE_NOWRITE,CLOSE,ISDIR miq_20190723-12303-ba57n7
/tmp/ CREATE 20190723-12303-131ag0m
/tmp/ OPEN 20190723-12303-131ag0m
/tmp/ CREATE 20190723-12303-1ugpteq
/tmp/ OPEN 20190723-12303-1ugpteq
/tmp/ OPEN 20190723-12303-131ag0m
/tmp/ MODIFY 20190723-12303-131ag0m
/tmp/ ACCESS 20190723-12303-131ag0m
/tmp/ MODIFY 20190723-12303-131ag0m
/tmp/ ACCESS 20190723-12303-131ag0m
... *** many more times *** ...
/tmp/ MODIFY 20190723-12303-131ag0m
/tmp/ ACCESS 20190723-12303-131ag0m
/tmp/ CLOSE_WRITE,CLOSE 20190723-12303-1ugpteq
/tmp/ CLOSE_WRITE,CLOSE 20190723-12303-131ag0m
/tmp/ DELETE 20190723-12303-1ugpteq
/tmp/ CLOSE_WRITE,CLOSE 20190723-12303-131ag0m
/tmp/ DELETE 20190723-12303-131ag0m
/tmp/miq_20190723-12303-ba57n7/ DELETE_SELF 
/tmp/ DELETE,ISDIR miq_20190723-12303-ba57n7

Comment 17 Jaroslav Henner 2019-08-09 09:21:03 UTC
Maybe I am to strict here, but while the inotify test is certainly not bad, it is not completely conclusive, because I am not sure what file name to search for in the inotify output to prove the file is not created there.

So to really test this, I need to get a big DB, restore it on some appliance and then schedule the backup. Then I will see whether it fails because of filling the /tmp completely.

Comment 18 Nick LaMuro 2019-08-09 16:52:33 UTC
First off:  I am all for you testing with a big database.  I didn't have one available when I was doing the refactoring for this last summer, so it would be nice for this to get a proper stress test.



That said, the only thing that should be in the `/tmp` dir is a FIFO file that is in charge of streaming the data from the output of the `pg_dump`/`pg_basebackup` to the input of the file it is being sent to.  This was done in ruby to allow it to be streamed in the same fashion across all backup endpoints.

So what you might be seeing is the FIFO being hit a bunch, but nothing should be committed to disk long term.  But I am not an expert with `inotifywait`, so unsure.

Comment 19 Jaroslav Henner 2019-08-21 15:42:05 UTC
It didn't work. I got 
[----] W, [2019-08-21T11:35:22.876992 #9628:2afa1dfbca24]  WARN -- : MIQ(EvmDatabaseOps.validate_free_space) Destination location: [/tmp/miq_20190821-9628-1vfpo3o/db_backup/region_34/test/region_34_20190821_153522.backup], does not have enough free disk space: [9221177344 bytes] for database of size: [12932101607 bytes]
...
/var/www/miq/vmdb/lib/evm_database_ops.rb:41:in `validate_free_space': Destination location: [/tmp/miq_20190821-9628-1vfpo3o/db_backup/region_34/test/region_34_20190821_153522.backup], does not have enough free disk space: [9221177344 bytes] for database of size: [12932101607 bytes] (MiqException::MiqDatabaseBackupInsufficientSpace)
...
[----] E, [2019-08-21T11:35:23.017358 #9628:2afa176a85c4] ERROR -- : MIQ(MiqQueue#deliver) Message id: [34000043942339], Error: [undefined method `path' for nil:NilClass]
[----] E, [2019-08-21T11:35:23.017836 #9628:2afa176a85c4] ERROR -- : [NoMethodError]: undefined method `path' for nil:NilClass  Method:[block (2 levels) in <class:LogProxy>]
[----] E, [2019-08-21T11:35:23.018249 #9628:2afa176a85c4] ERROR -- : /opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-ca1c762f8036/lib/gems/pending/util/mount/miq_generic_mount_session.rb:493:in `source_for_log'
/opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-ca1c762f8036/lib/gems/pending/util/mount/miq_generic_mount_session.rb:265:in `rescue in add'
/opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-ca1c762f8036/lib/gems/pending/util/mount/miq_generic_mount_session.rb:261:in `add'
/var/www/miq/vmdb/lib/evm_database_ops.rb:160:in `block in with_file_storage'
/opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-ca1c762f8036/lib/gems/pending/util/miq_file_storage.rb:31:in `with_interface_class'
/var/www/miq/vmdb/lib/evm_database_ops.rb:133:in `with_file_storage'
/var/www/miq/vmdb/lib/evm_database_ops.rb:57:in `backup'
/var/www/miq/vmdb/app/models/database_backup.rb:53:in `_backup'
/var/www/miq/vmdb/app/models/database_backup.rb:37:in `backup'
/var/www/miq/vmdb/app/models/database_backup.rb:14:in `backup'
/var/www/miq/vmdb/app/models/miq_queue.rb:479:in `block in dispatch_method'
/usr/share/ruby/timeout.rb:93:in `block in timeout'
/usr/share/ruby/timeout.rb:33:in `block in catch'
/usr/share/ruby/timeout.rb:33:in `catch'

Comment 20 Jaroslav Henner 2019-08-21 15:43:15 UTC
Created attachment 1606635 [details]
evm.log

Comment 21 Nick LaMuro 2019-08-21 16:07:57 UTC
Jaroslav,

Can you provide some additional information about the error:


1. The command trying to be run
2. The out of `df -P LOCATION_OF_DB_DUMP_DIR`, or similar
3. The type of mount (I assume NFS, but unsure)
4. The version of MIQ/CFME you are now testing with


Based on the error you provided above, it seems to be working as expected, with only 9Gigs free on the location it is targeting, but 12 gigs being required by the DB, but I don't know if that is `/tmp` or the share.


Thanks,
-Nick


Note You need to log in before you can comment on or make changes to this bug.