Bug 2227676
| Summary: | [RHOSP16.2.5] The glance_api_cron container is created and tripleo_glance_api_cron_healthcheck.service fails post FFU from RHOSP 13 to 16.2.5 | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Anjana <anbs> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Manoj Katari <mkatari> |
| Status: | CLOSED ERRATA | QA Contact: | msava |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | abishop, drosenfe, eshames, mburns, mkatari, nkawamot |
| Target Milestone: | z6 | Keywords: | Triaged |
| Target Release: | 16.2 (Train on RHEL 8.4) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-11.6.1-2.20230717085025.1608f56.el8ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-11-08 19:19:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2243639, 2243643 | ||
| Bug Blocks: | |||
Is it this: https://bugzilla.redhat.com/show_bug.cgi?id=2142951 ? If so please mark as duplicate. @Takashi Sure, assigned it to me. Test failed. openstack-tripleo-heat-templates-11.6.1-2.20230808225213.9adcac6.el8ost.noarch Deploy glance with cache enabled. 1.glance-conf: ------------ [DEFAULT] image_member_quota=128 show_image_direct_url=True show_multiple_locations=True enable_v2_api=True node_staging_uri=file:///var/lib/glance/staging enabled_import_methods=[web-download] bind_host=172.17.1.95 bind_port=9292 workers=4 enabled_backends=default_backend:rbd image_cache_max_size=10737418240 image_cache_stall_time=86400 image_cache_dir=/var/lib/glance/image-cache registry_host=0.0.0.0 debug=True log_file=/var/log/glance/api.log log_dir=/var/log/glance transport_url=rabbit://guest:YoHgDTqqXbLC0rsSKBAjo1WkA.redhat.local:5672,guest:YoHgDTqqXbLC0rsSKBAjo1WkA.redhat.local:5672,guest:YoHgDTqqXbLC0rsSKBAjo1WkA.redhat.local:5672/?ssl=0 cache_prefetcher_interval=300 enable_v1_api=False [cinder] [cors] [database] connection=mysql+pymysql://glance:Jre4yOvrHzWGcO8XRlgqiEkIS.1.84/glance?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo [file] [glance.store.http.store] [glance.store.rbd.store] [glance.store.sheepdog.store] [glance.store.swift.store] [glance.store.vmware_datastore.store] [glance_store] default_backend=default_backend os_region_name=regionOne [image_format] [keystone_authtoken] www_authenticate_uri=http://10.0.0.103:5000 region_name=regionOne memcached_servers=controller-0.internalapi.redhat.local:11211,controller-1.internalapi.redhat.local:11211,controller-2.internalapi.redhat.local:11211 memcache_use_advanced_pool=True auth_type=password auth_url=http://172.17.1.84:5000 username=glance password=Jre4yOvrHzWGcO8XRlgqiEkIS user_domain_name=Default project_name=service project_domain_name=Default [oslo_concurrency] lock_path=/var/lib/glance/tmp 2. Api cron healthcheck ------------------------ [heat-admin@controller-0 ~]$ systemctl status tripleo_glance_api_cron_healthcheck.service ● tripleo_glance_api_cron_healthcheck.service - glance_api_cron healthcheck Loaded: loaded (/etc/systemd/system/tripleo_glance_api_cron_healthcheck.service; disabled; vendor preset: disabled) Active: inactive (dead) since Thu 2023-10-05 10:13:10 UTC; 55s ago Process: 221326 ExecStart=/usr/bin/podman exec --user root glance_api_cron /usr/share/openstack-tripleo-common/healthcheck/cron glance (code=exited, status=0/SUCCESS) Main PID: 221326 (code=exited, status=0/SUCCESS) 3. [heat-admin@controller-0 ~]$ sudo podman ps|grep glance 21e1658033b2 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230925.1 kolla_start About an hour ago Up About an hour ago glance_api 32e131a6da70 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230925.1 kolla_start About an hour ago Up About an hour ago glance_api_cron 4.Crontab check --------------- [heat-admin@controller-0 ~]$ sudo podman exec -it glance_api_cron crontab -l no crontab for root WARN[0000] Error resizing exec session c8f76a7a5bd9df1289a85ea7f414baf26ed7a074f2658fc4fe2d8da331722e93: could not open ctl file for terminal resize for container 32e131a6da70622303d6a49c4f1e7f66bf33cbd9548cda36bb91febb5bb57962: open /var/lib/containers/storage/overlay-containers/32e131a6da70622303d6a49c4f1e7f66bf33cbd9548cda36bb91febb5bb57962/userdata/c8f76a7a5bd9df1289a85ea7f414baf26ed7a074f2658fc4fe2d8da331722e93/ctl: no such device or address [heat-admin@controller-0 ~]$ sudo podman exec -it glance_api_cron /bin/bash [root@controller-0 /]# crontab -l no crontab for root 5.Create image and cache image in controller ---------------------------------------------- bash-4.4$ glance-cache-manage --host=192.168.24.43 queue-image 455f343c-c7db-4484-bdcd-3f386b22ba18 Queue image 455f343c-c7db-4484-bdcd-3f386b22ba18 for caching? [y/N] y Failed to queue the specified image for caching. Got error: [Errno 111] Connection refused bash-4.4$ Max, I'm moving this back to ON_QA because of issues in your verification process. First, the bug we're fixing is one in which the glance cache is NOT enabled prior to the FFU. In that situation, glance's cron job is not supposed to be enabled after the FFU. The problem (the bug) is that the cron job was created, and was failing because it had nothing to do because the cache isn't enabled. For this BZ, I envision the test procedure to do something like this: - Deploy OSP-13 with glance cache disabled. - FFU to 16.2 and verify there's no glance_api_cron pod running at all. Separate from verifying the BZ, I want to note the crontab check you ran in item 4. is not correct. If you look carefully at the tripleo_glance_api_cron_healthcheck.service you will see it runs this command: ==> ExecStart=/usr/bin/podman exec --user root glance_api_cron /usr/share/openstack-tripleo-common/healthcheck/cron glance Note the last argument is "glance". That gets passed into the /usr/share/openstack-tripleo-common/healthcheck/cron, and it represents the user associated with the cron job. In this case, the cron job runs as the glance user, not root. If you want to see what I mean, try this: $ sudo podman exec -it glance_api_cron crontab -u glance -l Just be sure to do that when glance cache is enabled (but verify this BZ with it disabled). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.2.6 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:6307 |
Description of problem: In RHOSP 16.2.5, the glance_api_cron container is created even though the image cache feature of glance is not enabled[1] In that case, tripleo_glance_api_cron_healthcheck.service will fail if the Image cache of the glance is not enabled.[2] It is assumed that this may be due to the fact that cron jobs are not created in the glance_api_cron container if the glance image cache is not enabled.[3] [1] [root@control01tbmoc ~]# podman ps|grep glance f873fc92a78e undercloudtb.ctlplane.localdomain:8787/rhosp-rhel8/openstack-glance-api:16.2 kolla_start 2 weeks ago Up 2 weeks ago glance_api 3f9bfe1b37e0 undercloudtb.ctlplane.localdomain:8787/rhosp-rhel8/openstack-glance-api:16.2 kolla_start 2 weeks ago Up 2 weeks ago glance_api_cron /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf: ~~~ ... [paste_deploy] ... flavor=keystone ... ~~~ [2] [root@control01tbmoc ~]# systemctl status tripleo_glance_api_cron_healthcheck.service ● tripleo_glance_api_cron_healthcheck.service - glance_api_cron healthcheck Loaded: loaded (/etc/systemd/system/tripleo_glance_api_cron_healthcheck.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2023-07-26 10:32:33 +08; 31s ago Process: 418699 ExecStart=/usr/bin/podman exec --user root glance_api_cron /usr/share/openstack-tripleo-common/healthcheck/cron glance (code=exited, status=1/FAILURE) Main PID: 418699 (code=exited, status=1/FAILURE) [3] [root@control01tbmoc ~]# podman exec -it glance_api_cron crontab -l no crontab for root WARN[0000] Error resizing exec session 9ba6504bcfb042f73bb11837a07ab5ebd3fd0933b259fb5382a446f14160ec8c: could not open ctl file for terminal resize for container 3f9bfe1b37e0e1c8b51989b0151bde4382e41f14c8f841127eafa6be79ea451c: open /var/lib/containers/storage/overlay-containers/3f9bfe1b37e0e1c8b51989b0151bde4382e41f14c8f841127eafa6be79ea451c/userdata/9ba6504bcfb042f73bb11837a07ab5ebd3fd0933b259fb5382a446f14160ec8c/ctl: no such device or address [root@control01tbmoc ~]# podman exec -it glance_api_cron cat /usr/share/openstack-tripleo-common/healthcheck/cron #!/bin/bash file="${1:-root}" if [ -f /var/spool/cron/${file} ]; then nb_lines=$(grep -cEv '^#' /var/spool/cron/${file}) if [ $nb_lines -ge 2 ]; then exit 0 fi fi exit 1 WARN[0000] Error resizing exec session f40464e695091fc8a10cb04f34f5b8c5d3cad262af1c90cbec2859fd7ecd5551: could not open ctl file for terminal resize for container 3f9bfe1b37e0e1c8b51989b0151bde4382e41f14c8f841127eafa6be79ea451c: open /var/lib/containers/storage/overlay-containers/3f9bfe1b37e0e1c8b51989b0151bde4382e41f14c8f841127eafa6be79ea451c/userdata/f40464e695091fc8a10cb04f34f5b8c5d3cad262af1c90cbec2859fd7ecd5551/ctl: no such device or address Version-Release number of selected component (if applicable): puppet-glance-15.5.0-2.20220804175403.d54e942.el8ost.noarch How reproducible: This always happens when Image cache is disabled in RHOSP16.2.5 Actual results: tripleo_glance_api_cron_healthcheck.service fail Expected results: tripleo_glance_api_cron_healthcheck.service does not fail Additional info: found this bugzilla similar to the error, https://bugzilla.redhat.com/show_bug.cgi?id=2159566 but the bugzilla is for RHOSP version 16.2.4 and the resolution is to upgrade to 16.2.5 The solution cannot be applied here as the customer environment is already 16.2.5