Bug 2014035 - engine-backup failed in case of "/tmp" doesn't have enough space , no warning is provided to the user regarding out-of-space
Summary: engine-backup failed in case of "/tmp" doesn't have enough space , no warn...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backup-Restore.Engine
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.5.0
: 4.5.0
Assignee: Yedidyah Bar David
QA Contact: Guilherme Santos
URL:
Whiteboard:
: 2010075 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-14 10:41 UTC by Tzahi Ashkenazi
Modified: 2022-04-28 09:26 UTC (History)
1 user (show)

Fixed In Version: ovirt-engine-4.5.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-28 09:26:34 UTC
oVirt Team: Integration
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43809 0 None None None 2021-10-14 10:53:34 UTC
oVirt gerrit 118329 0 master MERGED packaging: engine-backup: Report low free space on failure 2022-01-30 14:49:39 UTC

Description Tzahi Ashkenazi 2021-10-14 10:41:40 UTC
Description of problem:

while running the engine-backup tool with scope=all (include DWH db)
this operation failed in case  /tmp  is have low space for the required process  no warning is provided to the user on the engine-backup log 
/tmp folder is used for a temporary folder to create the dump file 

in our case : 

1. DWH db size on the DWH VM is 33GB :
   [root@rhev-red-01-dwh ~]# du -sh /var/lib/pgsql/data/
   33G	/   var/lib/pgsql/data/
2. /tmp folder size on the engine is 2.1GB  : 
   [root@rhev-red-01 ~]# df -Th /tmp/
   Filesystem            Type  Size  Used Avail Use% Mounted on
   /dev/mapper/ovirt-tmp xfs   2.0G   47M  2.0G   3% /tmp
3. we have checked that in this case with DWH db size of 33GB the minimum 
   required space for /tmp folder is 4.1GB 

from the engine-backup log  no relevant error is provided regarding of low space for this operation :

engine-backup --scope=all --mode=backup --file=rhev-red-01_backup_03102021_all.tar  --log=rhev-red-01_backup_03102021_all.log
Start of engine-backup with mode 'backup'
scope: all
archive file: rhev-red-01_backup_03102021_all.tar
log file: rhev-red-01_backup_03102021_all.log
Backing up:
Notifying engine
- Files
- Engine database 'engine'
- DWH database 'ovirt_engine_history'
Notifying engine
FATAL: Database ovirt_engine_history backup failed

from the backup log : 
2021-10-03 09:12:21 3945450: Start of engine-backup mode backup scope all file rhev-red-01_backup_03102021_all.tar
2021-10-03 09:12:21 3945450: OUTPUT: Start of engine-backup with mode 'backup'
2021-10-03 09:12:21 3945450: OUTPUT: scope: all
2021-10-03 09:12:21 3945450: OUTPUT: archive file: rhev-red-01_backup_03102021_all.tar
2021-10-03 09:12:21 3945450: OUTPUT: log file: rhev-red-01_backup_03102021_all.log
2021-10-03 09:12:21 3945450: OUTPUT: Backing up:
2021-10-03 09:12:21 3945450: Generating pgpass
2021-10-03 09:12:21 3945450: OUTPUT: Notifying engine
2021-10-03 09:12:21 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('files', now(), 0, 'engine-backup: Backup Started, scope=files, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');


2021-10-03 09:12:21 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('db', now(), 0, 'engine-backup: Backup Started, scope=db, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');


2021-10-03 09:12:21 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('dwhdb', now(), 0, 'engine-backup: Backup Started, scope=dwhdb, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');


2021-10-03 09:12:21 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('cinderlib', now(), 0, 'engine-backup: Backup Started, scope=cinderlib, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');


2021-10-03 09:12:21 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('grafanadb', now(), 0, 'engine-backup: Backup Started, scope=grafanadb, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');


2021-10-03 09:12:21 3945450: Creating temp folder /tmp/engine-backup.nRZ4XZ7gmn/tar
2021-10-03 09:12:21 3945450: OUTPUT: - Files
2021-10-03 09:12:21 3945450: Backing up files to /tmp/engine-backup.nRZ4XZ7gmn/tar/files
2021-10-03 09:12:41 3945450: OUTPUT: - Engine database 'engine'
2021-10-03 09:12:41 3945450: Backing up database to /tmp/engine-backup.nRZ4XZ7gmn/tar/db/engine_backup.db
2021-10-03 09:12:41 3945450: pg_cmd running: pg_dump -w -U engine -h localhost -p 5432  engine -E UTF8 --disable-dollar-quoting --disable-triggers --format=custom
2021-10-03 09:12:50 3945450: OUTPUT: - DWH database 'ovirt_engine_history'
2021-10-03 09:12:50 3945450: Backing up dwh database to /tmp/engine-backup.nRZ4XZ7gmn/tar/db/dwh_backup.db
2021-10-03 09:12:50 3945450: pg_cmd running: pg_dump -w -U ovirt_engine_history -h 172.29.91.192 -p 5432  ovirt_engine_history -E UTF8 --disable-dollar-quoting --disable-triggers --format=custom
2021-10-03 09:16:12 3945450: FATAL: Database ovirt_engine_history backup failed
2021-10-03 09:16:12 3945450: OUTPUT: Notifying engine
2021-10-03 09:16:12 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('files', now(), -1, 'engine-backup: Database ovirt_engine_history backup failed, scope=files, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');


2021-10-03 09:16:12 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('db', now(), -1, 'engine-backup: Database ovirt_engine_history backup failed, scope=db, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');

2021-10-03 09:16:12 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('dwhdb', now(), -1, 'engine-backup: Database ovirt_engine_history backup failed, scope=dwhdb, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');
2021-10-03 09:16:12 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('cinderlib', now(), -1, 'engine-backup: Database ovirt_engine_history backup failed, scope=cinderlib, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');
2021-10-03 09:16:12 3945450: pg_cmd running: psql -w -U engine -h localhost -p 5432  engine -t -c SELECT LogEngineBackupEvent('grafanadb', now(), -1, 'engine-backup: Database ovirt_engine_history backup failed, scope=grafanadb, log=/rhev-red-01_backup_03102021_all.log', 'rhev-red-01.rdu2.scalelab.redhat.com', '/rhev-red-01_backup_03102021_all.log');



Version-Release number of selected component (if applicable):

rhv-release-4.4.9-4-001.noarch



Additional info:

the workaround is to run the engine-backup with --dirtmp to a path with more space!

Comment 1 Yedidyah Bar David 2021-10-14 11:36:04 UTC
This bug is not very easy to fix, because the pg_dump log goes to the same temporary space of the dump itself, so if this space is exhausted, the log will have no indication either. You can already see this in the example in comment 0 - there are no errors from it. In theory we can try various complex things like keeping it in memory, or elsewhere, or pre-allocate space for the log, but not sure it's worth it.

Some other options:

- We can add a generic error message, always.
- We can check free space on the tmpdir after failures, and error if it's (close to) full.
- We can guess that the free space needed is, say, at least 30% of the db size or so, and warn/err/abort if it's not enough. That's just a guess, though - the dump is compressed, and the compression ratio depends on the actual data. We can check the size using 'pg_database_size' (so also from remote), which in the case referenced in comment 0 returned around 19GB.

Comment 2 Yedidyah Bar David 2021-10-14 11:38:59 UTC
*** Bug 2010075 has been marked as a duplicate of this bug. ***

Comment 3 Tzahi Ashkenazi 2021-10-14 11:55:40 UTC
(In reply to Yedidyah Bar David from comment #1)

which in the case referenced in comment 0 returned around 19GB.

the original size of the DWH db was 33GB  when I opened the BZ 
after I have run on the DWH table > "ovirt_engine_history"  > vacuum full analyze
in order to decrease the size of the DWH db.

Comment 4 Yedidyah Bar David 2021-10-14 11:58:58 UTC
(In reply to Tzahi Ashkenazi from comment #3)
> (In reply to Yedidyah Bar David from comment #1)
> 
> which in the case referenced in comment 0 returned around 19GB.
> 
> the original size of the DWH db was 33GB  when I opened the BZ 
> after I have run on the DWH table > "ovirt_engine_history"  > vacuum full
> analyze
> in order to decrease the size of the DWH db.

This makes sense - I suppose after a full vacuum, 'du' and 'pg_database_size' should be quite similar.

Comment 5 Sandro Bonazzola 2021-10-21 07:14:42 UTC
We can go with either a generic message or just check free space on error and issue the specific error message.

Comment 7 Sandro Bonazzola 2022-04-28 09:26:34 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.