Bug 1974325

Summary: Pulp2-Pulp3 migration failed with "Katello::Errors::Pulp3Error: could not extend file "base/906156/908228": No space left on device"
Product: Red Hat Satellite Reporter: Ashish Humbe <ahumbe>
Component: RepositoriesAssignee: Justin Sherrill <jsherril>
Status: CLOSED ERRATA QA Contact: Danny Synk <dsynk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.9.0CC: apatel, aupadhye, bbuckingham, dalley, dsynk, jjeffers, jsherril, kgaikwad, mdellweg, osousa, peter.vreman, ttereshc
Target Milestone: 6.9.7Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-katello-3.18.1.43-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-10 16:23:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1957813    

Description Ashish Humbe 2021-06-21 11:39:19 UTC
Description of problem:

Pulp2-Pulp3 migration failed with error 

Migrating rpm content to Pulp 3 41993/161886^M ^M2021-06-10 15:36:22 +0000: Migrating rpm content to Pulp 3 41993/161886Migration failed, You will want to investigate: https://satellite.example.com/foreman_tasks/tasks/367aeee6-a854-4385-8143-8cbae7db1090
rake aborted!
ForemanTasks::TaskError: Task 367aeee6-a854-4385-8143-8cbae7db1090: Katello::Errors::Pulp3Error: could not extend file "base/906156/908228": No space left on device
HINT: Check free disk space.
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.29/lib/katello/tasks/pulp3_migration.rake:35:in block (2 levels) in <top (required)>'

As per recommendation, customer had more than double disk space under /var/opt/rh/rh-postgresql12/lib/pgsql but still they fall short of disk space. 


# du -sh /var/opt/rh/rh-postgresql12/lib/pgsql
5.8G    /var/opt/rh/rh-postgresql12/lib/pgsql

There are 2 asks in this bugzilla: 

1. We should predict the required PostgreSQL database size and accordingly ask customers to maintain the minimum required disk space. 
2. To recover from this situation executing "satellite-maintain content prepare-abort" command was not sufficient, the customer had to reboot the server and then run  "satellite-maintain content prepare" command to proceed further. We need to document the steps required to recover from such failure situation. 


Version-Release number of selected component (if applicable):
Satellite6.9.2

Comment 8 Peter Vreman 2021-07-02 12:54:46 UTC
Last night i hit the disk full again on postgres

The filesystem is 30GB, with 25G used normally. But in the nightly RedHat Repositories (~70 Repositories) sync it used 5GB more temporary space.

It really looks like the minimal requirements for postgres has to be raised to 50GB

Comment 9 Tanya Tereshchenko 2021-07-02 12:59:54 UTC
I'm posting here recommendations from Pulp devs based on internal testing.
Those numbers are not official but it would be good to know if in practice folks see similar numbers.

In comparison to Satellite with Pulp 2, PostgreSQL for Satellite with Pulp 3 will be increased by:  
 - 70-80% of the MongoDB size (with migration data)
 - 20-30% of the MongoDB size (without migration data, when it is removed)

Please note that the percentage above is of the MongoDB size, and not of the old PostgreSQL size.

Comment 10 Peter Vreman 2021-07-02 14:45:59 UTC
I do not know what is meant by with/wuthout mgiration data, but i can see at least a 5GB log file
~~~
[crash/LI] root@li-lc-2222:/var/opt/rh/rh-postgresql12/lib/pgsql/data/log# ls -lh
total 5.3G
-rw-------. 1 postgres postgres 115K Jul  2 14:18 postgresql-Fri.log
-rw-------. 1 postgres postgres 221M Jun 28 20:17 postgresql-Mon.log
-rw-------. 1 postgres postgres 1.7K Jun  5 02:37 postgresql-Sat.log
-rw-------. 1 postgres postgres  87M Jun  6 20:27 postgresql-Sun.log
-rw-------. 1 postgres postgres  77M Jul  1 16:12 postgresql-Thu.log
-rw-------. 1 postgres postgres 4.9G Jun 29 19:45 postgresql-Tue.log
-rw-------. 1 postgres postgres 3.4M Jun 30 20:36 postgresql-Wed.log
[crash/LI] root@li-lc-2222:/var/opt/rh/rh-postgresql12/lib/pgsql/data/log#
~~~

My statistics:


Before:
- postgres: 6.3GB
- mongodb: 12GB

After (now, 3 days after migration)
- postgres: 25GB (with a known peak hitting 30GB making the filesystem full)
- mongodb: 12GB

Comment 11 Justin Sherrill 2021-07-06 20:08:36 UTC
Peter,

Can you share your postgresql.conf file that should have your logging?  I'm guessing you're using non-default logging that returns long queries, and i'm betting the migration process has a lot of long queries :)

Comment 12 Daniel Alley 2021-07-07 03:13:49 UTC
Peter, since Tanya is away for the next week I will try to explain. "With migration data" and "without migration data" means that there is additional overhead required by the migration process which requires extra space in Postgresql, which will go away in 6.10 once the migration data is no longer required.

Satellite 6.8
* MongoDB (Pulp 2 data)
* Postgresql (everything else)

Satellite 6.9
* MongoDB (Pulp 2 data)
* Postgresql (Pulp 3 data + Pulp 2-to-pulp3 migration data + everything else)  --  requires (80% x MongoDB size) additional Postgresql capacity over 6.8

Satellite 6.10
* Postgresql (Pulp 3 data + everything else)    --   requires (30% x MongoDB size) additional Postgresql capacity over 6.8, but less capacity required than 6.9

Although as Tanya mentioned these numbers are estimates based on the testing we've done, they may not apply to every installation. But we'd love to gather further information about how accurate they are.

Comment 13 Peter Vreman 2021-07-12 13:58:58 UTC
postgresql.conf unchanged since Sat6.8 installation in Nov 2020:

~~~
[crash/LI] root@li-lc-2222:~# ls -l /var/opt/rh/rh-postgresql12/lib/pgsql/data/postgresql.conf
-rw-------. 1 postgres postgres 27014 Nov  3  2020 /var/opt/rh/rh-postgresql12/lib/pgsql/data/postgresql.conf

[crash/LI] root@li-lc-2222:~# grep log /var/opt/rh/rh-postgresql12/lib/pgsql/data/postgresql.conf
# "postgres -c log_connections=on".  Some parameters can be changed at run time
#wal_level = replica                    # minimal, replica, or logical
#wal_log_hints = off                    # also do full page writes of non-critical updates
#archive_command = ''           # command to use to archive a logfile segment
#archive_timeout = 0            # force a logfile segment switch after this
#restore_command = ''           # command to use to restore an archived logfile segment
#wal_keep_segments = 0          # in logfile segments; 0 disables
#max_logical_replication_workers = 4    # taken from max_worker_processes
#max_sync_workers_per_subscription = 2  # taken from max_logical_replication_workers
#log_destination = 'stderr'             # Valid values are combinations of
                                        # stderr, csvlog, syslog, and eventlog,
                                        # depending on platform.  csvlog
                                        # requires logging_collector to be on.
# This is used when logging to stderr:
logging_collector = on # Enable capturing of stderr and csvlog
                                        # into log files. Required to be on for
                                        # csvlogs.
# These are only used if logging_collector is on:
#log_directory = 'log'                  # directory where log files are written,
log_filename = 'postgresql-%a.log' # log file name pattern,
#log_file_mode = 0600                   # creation mode for log files,
log_truncate_on_rotation = on # If on, an existing log file with the
                                        # same name as the new log file will be
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 200000 # Automatic rotation of logfiles will
                                        # happen after that much log output.
# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#syslog_sequence_numbers = on
#syslog_split_messages = on
# This is only relevant when logging to eventlog (win32):
#log_min_messages = warning             # values in order of decreasing detail:
                                        #   log
#log_min_error_statement = error        # values in order of decreasing detail:
                                        #   log
#log_min_duration_statement = -1        # -1 is disabled, 0 logs all statements
                                        # and their durations, > 0 logs only
#log_transaction_sample_rate = 0.0      # Fraction of transactions whose statements
                                        # are logged regardless of their duration. 1.0 logs all
                                        # statements from all transactions, 0.0 never logs.
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default          # terse, default, or verbose messages
#log_hostname = off
#log_line_prefix = '%m [%p] '           # special values:
#log_lock_waits = off                   # log lock waits >= deadlock_timeout
#log_statement = 'none'                 # none, ddl, mod, all
#log_replication_commands = off
#log_temp_files = -1                    # log temporary files equal or larger
                                        # -1 disables, 0 logs all temp files
log_timezone = UTC
#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off
#log_autovacuum_min_duration = -1       # -1 disables, 0 logs all actions and
                                        # their durations, > 0 logs only
                                        #   log
default_text_search_config = 'pg_catalog.english'
log_line_prefix = '%t '
log_min_duration_statement = 1000
[crash/LI] root@li-lc-2222:~#
~~~

Comment 14 Justin Sherrill 2021-07-15 12:59:12 UTC
Connecting redmine issue https://projects.theforeman.org/issues/33028 from this bug

Comment 15 Bryan Kearney 2021-07-15 16:01:44 UTC
Upstream bug assigned to jsherril

Comment 16 Bryan Kearney 2021-07-15 16:01:47 UTC
Upstream bug assigned to jsherril

Comment 19 Peter Vreman 2021-09-21 15:39:13 UTC
I expected this small user-feedback fix in 6.9.6, it is important feedback to users to resize the postgres directory large enough before starting content migration in the background.

Comment 20 Brad Buckingham 2021-09-21 15:45:30 UTC
Hello Peter,

Thanks for the feedback.

We weren't able to get this one in to 6.9.6 (which is being released just before the 6.10 Beta); however, we are planning to have another 6.9.z prior to the 6.10 GA.  This will be one of the candidates for that zstream.

Comment 22 Danny Synk 2021-10-21 21:24:40 UTC
Verified on Satellite 6.9.7, snap 1 (tfm-rubygem-katello-3.18.1.44-1.el7sat.noarch).

Steps to Test:

1. Deploy Satellite 6.9.7, snap 1.
2. Run `satellite-maintain content migration-stats`:

~~~
# foreman-maintain content migration-stats
Running Retrieve Pulp 2 to Pulp 3 migration statistics
================================================================================
Retrieve Pulp 2 to Pulp 3 migration statistics: 
============Migration Summary================
Migrated/Total RPMs: 0/0
Migrated/Total errata: 0/0
Migrated/Total repositories: 0/0
All content has been migrated.

Note: ensure there is sufficient storage space for /var/lib/pulp/published to double in size before starting the migration process.
Check the size of /var/lib/pulp/published with 'du -sh /var/lib/pulp/published/'

Note: ensure there is sufficient storage space for postgresql.
You will need additional space for your postgresql database.  The partition holding '/var/opt/rh/rh-postgresql12/lib/pgsql/data/'
   will need additional free space equivalent to the size of your Mongo db database (/var/lib/mongodb/).
                                                                      [OK]
--------------------------------------------------------------------------------
~~~

3. Enable and synchronize a repository.
4. Run `foreman-maintain content migration-stats` again:

~~~
# foreman-maintain content migration-stats
Running Retrieve Pulp 2 to Pulp 3 migration statistics
================================================================================
Retrieve Pulp 2 to Pulp 3 migration statistics: 
============Migration Summary================
Migrated/Total RPMs: 0/1406
Migrated/Total errata: 0/533
Migrated/Total repositories: 0/1
Estimated migration time based on yum content: fewer than 5 minutes

Note: ensure there is sufficient storage space for /var/lib/pulp/published to double in size before starting the migration process.
Check the size of /var/lib/pulp/published with 'du -sh /var/lib/pulp/published/'

Note: ensure there is sufficient storage space for postgresql.
You will need additional space for your postgresql database.  The partition holding '/var/opt/rh/rh-postgresql12/lib/pgsql/data/'
   will need additional free space equivalent to the size of your Mongo db database (/var/lib/mongodb/).
                                                                      [OK]
--------------------------------------------------------------------------------
~~~

Expected Results:
A warning message about the required storage space for the PostgreSQL database is displayed regardless of the amount of Pulp content synchronized to the Satellite.

Actual Results:
A warning message about the required storage space for the PostgreSQL database is displayed regardless of the amount of Pulp content synchronized to the Satellite.

Comment 27 errata-xmlrpc 2021-11-10 16:23:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.9.7 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4612