Description of problem: In OSP 16.2 (Z2), even though we configured in the overcloud environment files [1] the interval at which glance cache pruner should run on the controller, we saw that the cache grows indefinitely and well beyond the configured size limit of 100GB [2] regardless of last access time and overall hits. Steps to Reproduce: OSP 16.2 deployed with Glance Cache Enabled and extra config in the extra_config.yaml TripleO environment file as highlighted below [1] Actual results: Even though Puppet generates the needed cron file in /var/lib/config-data/puppet-generated/glance_api/var/spool/cron [2][3], further investigation lead us to discover that there isn't any cron job running in the glance_api container of OSP 16.2 Z2 [4]. Expected results: Cron job should be implemented and started in the glance_api container to invoke the glance cache pruner. Additional info: From OSP 16.2 official documents: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/creating_and_managing_images/ch-image-service [1] #----------------- From TripleO Template environment used ----------------# parameter_defaults: GlanceCacheEnabled: true GlanceImageCacheMaxSize: 107374182400 .... ControllerExtraConfig: .... glance::cache::pruner::minute: '*/10' .... [2] #------------------ From /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf of control0001-lb -----------------# Optional: Tune the glance_cache_pruner to an alternative frequency when you redeploy the overcloud. The following example shows a frequency of 5 minutes: # The upper limit on cache size, in bytes, after which the cache-pruner cleans # up the image cache. #image_cache_max_size = 10737418240 image_cache_max_size=107374182400 # The amount of time, in seconds, an incomplete image remains in the cache. #image_cache_stall_time = 86400 image_cache_stall_time=86400 # Base directory for image cache. image_cache_dir=/var/lib/glance/image-cache [3] Contents of /var/lib/config-data/puppet-generated/glance_api/var/spool/cron: ``` [root@control0001-cdm ~]# ls -al /var/lib/config-data/puppet-generated/glance_api/var/spool/cron total 4 drwx------. 2 root root 20 May 10 16:00 . drwxr-xr-x. 3 root root 18 Sep 2 2021 .. -rw-------. 1 root root 507 Feb 11 09:47 glance [root@control0001-cdm ~]# cat /var/lib/config-data/puppet-generated/glance_api/var/spool/cron/glance # HEADER: This file was autogenerated at 2022-02-11 09:47:04 +0100 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: glance-cache-cleaner PATH=/bin:/usr/bin:/usr/sbin 1 0 * * * glance-cache-cleaner # Puppet Name: glance-cache-pruner PATH=/bin:/usr/bin:/usr/sbin */10 * * * * glance-cache-pruner ``` [4] Cron configuration and processes inside the glance_api container: ``` [root@control0001-cdm ~]# podman exec --user=0 -it glance_api /bin/bash [root@control0001-cdm /]# more /etc/cron* /etc/cron*/* | cat *** /etc/cron.d: directory *** *** /etc/cron.daily: directory *** :::::::::::::: /etc/cron.deny :::::::::::::: *** /etc/cron.hourly: directory *** *** /etc/cron.monthly: directory *** :::::::::::::: /etc/crontab :::::::::::::: SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root # For details see man 4 crontabs # Example of job definition: # .---------------- minute (0 - 59) # | .------------- hour (0 - 23) # | | .---------- day of month (1 - 31) # | | | .------- month (1 - 12) OR jan,feb,mar,apr ... # | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat # | | | | | # * * * * * user-name command to be executed *** /etc/cron.weekly: directory *** :::::::::::::: /etc/cron.d/0hourly :::::::::::::: # Run the hourly jobs SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root 01 * * * * root run-parts /etc/cron.hourly :::::::::::::: /etc/cron.hourly/0anacron :::::::::::::: #!/bin/sh # Check whether 0anacron was run today already if test -r /var/spool/anacron/cron.daily; then day=`cat /var/spool/anacron/cron.daily` fi if [ `date +%Y%m%d` = "$day" ]; then exit 0 fi # Do not run jobs when on battery power online=1 for psupply in AC ADP0 ; do sysfile="/sys/class/power_supply/$psupply/online" if [ -f $sysfile ] ; then if [ `cat $sysfile 2>/dev/null`x = 1x ]; then online=1 break else online=0 fi fi done if [ $online = 0 ]; then exit 0 fi /usr/sbin/anacron -s [root@control0001-cdm /]# ps lax F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 4 42415 1 0 20 0 4232 864 - Ss ? 0:00 dumb-init --single-child -- kolla_start 4 42415 8 1 20 0 922308 123280 - S ? 16:53 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 50 8 20 0 1128968 163032 - S ? 4:57 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 51 8 20 0 1264700 168444 - S ? 5:05 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 52 8 20 0 1132612 159244 - S ? 5:13 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 53 8 20 0 1263780 168900 - S ? 5:05 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 54 8 20 0 1132012 157984 - S ? 5:10 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 55 8 20 0 1129588 165068 - S ? 5:21 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 56 8 20 0 1132728 166996 - S ? 4:30 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 57 8 20 0 1130416 173112 - S ? 4:38 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 58 8 20 0 1133296 168140 - S ? 5:06 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 59 8 20 0 1264160 177212 - S ? 4:48 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 60 8 20 0 1264504 164080 - S ? 4:46 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 5 42415 61 8 20 0 1128608 150644 - S ? 4:46 /usr/bin/python3 /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/g 4 0 54403 0 20 0 22140 4216 - Ss pts/0 0:00 /bin/bash 4 0 54417 54403 20 0 45472 2116 - R+ pts/0 0:00 ps lax ```
The latest TripleO deploys a separate glance_api_cron container but this was added to fix a different[1], and that patch is present in master and wallaby only[2]. [1] https://bugs.launchpad.net/tripleo/+bug/1892467 [2] https://review.opendev.org/q/topic:bug%252F1892467 https://review.opendev.org/q/topic:bug/1892467-stable/wallaby We'd need to backport these changes or part of tht patch to deploy the glance_api_cron container.
(In reply to Takashi Kajinami from comment #1) > The latest TripleO deploys a separate glance_api_cron container but this was > added to fix a different[1], > and that patch is present in master and wallaby only[2]. > > [1] https://bugs.launchpad.net/tripleo/+bug/1892467 > [2] https://review.opendev.org/q/topic:bug%252F1892467 > https://review.opendev.org/q/topic:bug/1892467-stable/wallaby > > We'd need to backport these changes or part of tht patch to deploy the > glance_api_cron container. I missed the fact that Glance currently has the in-process periodic task to run the prefetch job so I believe we no longer need cron. https://github.com/openstack/glance/commit/73fefddd969e75e2f2d82866f4b8b9eeca488770
Sorry ignore comment:2. That is about a completely different feature...
Hello, From https://github.com/openstack/glance/commit/73fefddd969e75e2f2d82866f4b8b9eeca488770 I'm seeing these two statements: > In Train, Glance API has added a new periodic job ``cache_images`` which will > run after every predefined time interval to fetch the queued images into cache. > The default time interval for the ``cache_images`` periodic job is 300 > seconds. Admin/Operator can configure this interval in glance-api.conf file or > glance-cache.conf file using ``cache_prefetcher_interval`` configuration > option. The ``cache_images`` periodic job will only run if cache middleware > is enabled in your cloud. > The cache_images method will fetch all images which are in queued state > for caching in cache directory. The default value is 300. The arguments described in the statements are different from what is today pushed by TripleO in OSP 16.2 (see [1][2] below). Other point, I don't see any way to specify the maximum size of the cache used. Is my observation correct ? BR Riccardo [1] #------------------ From /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf of control0001-lb -----------------# Optional: Tune the glance_cache_pruner to an alternative frequency when you redeploy the overcloud. The following example shows a frequency of 5 minutes: # The upper limit on cache size, in bytes, after which the cache-pruner cleans # up the image cache. #image_cache_max_size = 10737418240 image_cache_max_size=107374182400 # The amount of time, in seconds, an incomplete image remains in the cache. #image_cache_stall_time = 86400 image_cache_stall_time=86400 # Base directory for image cache. image_cache_dir=/var/lib/glance/image-cache [2] #------------------ From /var/lib/config-data/puppet-generated/glance_api/var/spool/cron/glance of control0001-lb -----------------# # HEADER: This file was autogenerated at 2022-02-11 09:47:04 +0100 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: glance-cache-cleaner PATH=/bin:/usr/bin:/usr/sbin 1 0 * * * glance-cache-cleaner # Puppet Name: glance-cache-pruner PATH=/bin:/usr/bin:/usr/sbin */10 * * * * glance-cache-pruner
Hello, The Customer is asking for an update about this BUG. Do we have any progress to be shared with the Customer ? BR Riccardo
Added the upstream Gerrit patches to the link list. We need to backport these into 16.2.
Hello, The Customer is asking a vision of when this BUG will be fixed in OSP 16.2. About this point, could you share in which OSP 16.2 Z stream this fix will be introduced ? BR Riccardo
Hello, Could you confirm that this Bz will be included in z4 ? Customer is asking us for a progress on this topic. BR Riccardo
Tested failed. openstack-tripleo-heat-templates-11.6.1-2.20221010235135.el8ost.noarch Rhos release 16.2 1.Deployed environment used: parameter_defaults: GlanceCacheEnabled: true GlanceImageCacheMaxSize: '21474836480' GlanceImageCacheStallTime: '1200' ControllerExtraConfig: glance::cache::pruner::minute: '*/10' 2.Contents of cron file: [heat-admin@controller-0 ~]$ sudo cat /var/lib/config-data/puppet-generated/glance_api/var/spool/cron/gla nce # HEADER: This file was autogenerated at 2022-11-16 09:20:57 +0000 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: glance-cache-cleaner PATH=/bin:/usr/bin:/usr/sbin 1 0 * * * glance-cache-cleaner # Puppet Name: glance-cache-pruner PATH=/bin:/usr/bin:/usr/sbin */10 * * * * glance-cache-pruner [heat-admin@controller-0 ~]$ sudo podman exec -it glance_api /bin/bash bash-4.4$ more /etc/cron* /etc/cron*/* | cat *** /etc/cron.d: directory *** *** /etc/cron.daily: directory *** :::::::::::::: /etc/cron.deny :::::::::::::: *** /etc/cron.hourly: directory *** *** /etc/cron.monthly: directory *** :::::::::::::: /etc/crontab :::::::::::::: SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root # For details see man 4 crontabs # Example of job definition: # .---------------- minute (0 - 59) # | .------------- hour (0 - 23) # | | .---------- day of month (1 - 31) # | | | .------- month (1 - 12) OR jan,feb,mar,apr ... # | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat # | | | | | # * * * * * user-name command to be executed *** /etc/cron.weekly: directory *** :::::::::::::: /etc/cron.d/0hourly :::::::::::::: # Run the hourly jobs SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root 01 * * * * root run-parts /etc/cron.hourly :::::::::::::: /etc/cron.hourly/0anacron :::::::::::::: #!/bin/sh # Check whether 0anacron was run today already if test -r /var/spool/anacron/cron.daily; then day=`cat /var/spool/anacron/cron.daily` fi if [ `date +%Y%m%d` = "$day" ]; then exit 0 fi # Do not run jobs when on battery power online=1 for psupply in AC ADP0 ; do sysfile="/sys/class/power_supply/$psupply/online" if [ -f $sysfile ] ; then if [ `cat $sysfile 2>/dev/null`x = 1x ]; then online=1 break Else online=0 fi fi done if [ $online = 0 ]; then exit 0 fi /usr/sbin/anacron -s 3.contents of glance.conf: (overcloud) [stack@undercloud-0 ~]$ ctlplane-0 sudo crudini --get /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf DEFAULT image_cache_max_size Warning: Permanently added 'controller-0.ctlplane,192.168.24.24' (ECDSA) to the list of known hosts. 21474836480 (overcloud) [stack@undercloud-0 ~]$ ctlplane-0 sudo crudini --get /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf DEFAULT image_cache_stall_time Warning: Permanently added 'controller-0.ctlplane,192.168.24.24' (ECDSA) to the list of known hosts. 1200 (overcloud) [stack@undercloud-0 ~]$ ctlplane-0 sudo crudini --get /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf DEFAULT image_cache_dir Warning: Permanently added 'controller-0.ctlplane,192.168.24.24' (ECDSA) to the list of known hosts. /var/lib/glance/image-cache ================================ =================Test============: Deployed environment with image_cache_max_size=21474836480 and created image with size 23622320128. 1.Create file glance_pod.sh,copied overcloudrc to controller according to this 1.1.6.2. Using a periodic job to pre-cache an image in doc https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/creating_and_managing_images/ch-image-service 2.=========Create image========== : glance --insecure image-create --disk-format raw --container-format bare --visibility public --file vm-disk-22G --name alpine3.15_22G --progress +------------------+--------------------------------------------------------------------------------+| Property | Value | [0/1820] +------------------+----------------------------------------------------------------------------------+ | checksum | 64bc8f36677d57424d04d945f04b5789 | | container_format | bare | | created_at | 2022-11-19T12:00:08Z | | direct_url | swift+config://ref1/glance/7ca0be5c-8791-452b-abad-36a2ba90d90d | | disk_format | raw | | id | 7ca0be5c-8791-452b-abad-36a2ba90d90d | | min_disk | 0 | | min_ram | 0 | | name | alpine3.15_22G | | os_hash_algo | sha512 | | os_hash_value | 911e5eb6c27d0b7078ec2edd1c7ca1ed6b8f788c8cf00793c5e209136afb7424c4edff4e1b95ad2f | | | 30162c6764848542cd258b8dc49242d2260137faae3cb478 | | os_hidden | False | | owner | 072f7e1987be43b3a8cdd7e91d274662 | | protected | False | | size | 23622320128 | | status | active | | stores | default_backend | | tags | [] | | updated_at | 2022-11-19T12:03:48Z | | virtual_size | Not available | | visibility | public | +------------------+----------------------------------------------------------------------------------+ 3.======Queue image to cache===========: bash-4.4$ glance-cache-manage --host=172.17.1.43 queue-image 7ca0be5c-8791-452b-abad-36a2ba90d90d bash-4.4$ glance-cache-manage --host=172.17.1.43 list-queued Found 1 queued images... +--------------------------------------+ | ID | +--------------------------------------+ | 7ca0be5c-8791-452b-abad-36a2ba90d90d | +--------------------------------------+ 4.========List cached images========: bash-4.4$ glance-cache-manage --host=172.17.1.43 list-cached Found 1 cached images... +--------------------------------------+----------------------------+----------------------------+-------------+------+ | ID | Last Accessed (UTC) | Last Modified (UTC) | Size | Hits | +--------------------------------------+----------------------------+----------------------------+-------------+------+ | 7ca0be5c-8791-452b-abad-36a2ba90d90d | 2022-11-19T12:11:15.967292 | 2022-11-19T12:11:15.967292 | 23622320128 | 0 | +--------------------------------------+----------------------------+----------------------------+-------------+------+ 5.======Wait 10 minutes========. 6.======List cached images=====: bash-4.4$ glance-cache-manage --host=172.17.1.43 list-cached Found 1 cached images... +--------------------------------------+----------------------------+----------------------------+-------------+------+ | ID | Last Accessed (UTC) | Last Modified (UTC) | Size | Hits | +--------------------------------------+----------------------------+----------------------------+-------------+------+ | 7ca0be5c-8791-452b-abad-36a2ba90d90d | 2022-11-19T12:11:15.967292 | 2022-11-19T12:11:15.967292 | 23622320128 | 0 | +--------------------------------------+----------------------------+----------------------------+-------------+------+
Step 2 seems great, I like that /var/lib/config-data/puppet-generated/glance_api/var/spool/cron/glance contains the relevant entry, but are we missing the cron entries in the containers? Step 3 seems quite annoying, we're missing values there :/ Regarding the failure, do we know if: 1) The cron job was never run 2) The cron job was run but the glance-cache-* commands failed? What happens if we run glance-cache-cleaner/glance-cache-pruner manually? Is the cache actually cleaned up?
(In reply to Cyril Roelandt from comment #26) > Step 2 seems great, I like that > /var/lib/config-data/puppet-generated/glance_api/var/spool/cron/glance > contains the relevant entry, but are we missing the cron entries in the > containers? > > > Step 3 seems quite annoying, we're missing values there :/ > > Regarding the failure, do we know if: > > 1) The cron job was never run The cronjob is run every 10 minutes. Nov 20 04:01:02 controller-0 run-parts[7217]: (/etc/cron.hourly) finished 0anacron Nov 20 04:10:01 controller-0 CROND[4994]: (glance) CMD (glance-cache-pruner) Nov 20 04:10:03 controller-0 CROND[4993]: (glance) CMDOUT (2022-11-20 04:10:03.133 4994 INFO glance.image_cache [-] Image cache loaded driver 'sqlite'.#033[00m) Nov 20 04:20:01 controller-0 CROND[5044]: (glance) CMD (glance-cache-pruner) Nov 20 04:20:03 controller-0 CROND[5043]: (glance) CMDOUT (2022-11-20 04:20:03.008 5044 INFO glance.image_cache [-] Image cache loaded driver 'sqlite'.#033[00m) Nov 20 04:30:01 controller-0 CROND[5092]: (glance) CMD (glance-cache-pruner) > 2) The cron job was run but the glance-cache-* commands failed? need to check. > > What happens if we run glance-cache-cleaner/glance-cache-pruner manually? Is > the cache actually cleaned up? The answer to this is yes the cache gets cleaned up.
@Mridula: Can we expect https://review.opendev.org/c/openstack/tripleo-heat-templates/+/865501 to be backported all the way to Wallaby (for 17.1 inclusion)? What about Train (for 16.2 & 16.1)?
(In reply to Cyril Roelandt from comment #29) > @Mridula: Can we expect > https://review.opendev.org/c/openstack/tripleo-heat-templates/+/865501 to be > backported all the way to Wallaby (for 17.1 inclusion)? What about Train > (for 16.2 & 16.1)? We will be backporting it to Train(16.2) and targeting this for the z5.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.2.5 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:1763