Description of problem: Overcloud deployment failed due to failure in finding/uploading Octavia image. This happens on a 2nd deployment after deleting the 1st deployment that passed. Version-Release number of selected component (if applicable): octavia-amphora-image-x86_64-14.0-20181126.1.el7ost.noarch python2-octaviaclient-1.6.0-0.20180816134807.64d007f.el7ost.noarch puppet-octavia-13.3.1-0.20181013113433.e19b590.el7ost.noarch How reproducible: not sure Steps to Reproduce: 1. Deploy overcloud with Octavia - success 2. Delete the overcloud - openstack overcloud delete overcloud -y - success 3. Deploy the same overcloud again - failed Actual results: Overcloud deploy failed Expected results: overcloud deploy pass Additional info: stdout_lines": ["", "PLAY [Undercloud[0]] ***********************************************************", "", "TASK [Gathering Facts] *********************************************************", "Sunday 02 December 2018 10:13:52 -0500 (0:00:00.117) 0:00:00.117 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : set file if already set by heat variable (backwards-compatibility)] ***", "Sunday 02 December 2018 10:14:19 -0500 (0:00:26.617) 0:00:26.734 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : set location if CentOS] *****************************", "Sunday 02 December 2018 10:14:19 -0500 (0:00:00.051) 0:00:26.786 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : set location if Red Hat] ****************************", "Sunday 02 December 2018 10:14:19 -0500 (0:00:00.050) 0:00:26.836 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : check if amphora image file exists] *****************", "Sunday 02 December 2018 10:14:19 -0500 (0:00:00.071) 0:00:26.907 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : include_tasks] **************************************", "Sunday 02 December 2018 10:14:24 -0500 (0:00:04.501) 0:00:31.409 ******* ", "included: /usr/share/openstack-tripleo-common/playbooks/roles/octavia-undercloud/tasks/image_mgmt.yml for undercloud-0", "", "TASK [octavia-undercloud : check if name is a symlink] *************************", "Sunday 02 December 2018 10:14:24 -0500 (0:00:00.086) 0:00:31.496 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : bypass image naming logic if image name is provided (backwards-compatibility)] ***", "Sunday 02 December 2018 10:14:24 -0500 (0:00:00.222) 0:00:31.718 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : set the actual glance image name if it is a symlink] ***", "Sunday 02 December 2018 10:14:24 -0500 (0:00:00.045) 0:00:31.764 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : set the actual glance image name if it is not a symlink] ***", "Sunday 02 December 2018 10:14:24 -0500 (0:00:00.065) 0:00:31.830 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : check there an image in glance already] *************", "Sunday 02 December 2018 10:14:24 -0500 (0:00:00.056) 0:00:31.886 ******* ", "fatal: [undercloud-0]: FAILED! => {\"changed\": true, \"cmd\": \"openstack image show octavia-amphora-14.0-20181126.1.x86_64 -c checksum -f value\", \"delta\": \"0:00:05.487314\", \"end\": \"2018-12-02 10:14:30.578032\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2018-12-02 10:14:25.090718\", \"stderr\": \"Could not find resource octavia-amphora-14.0-20181126.1.x86_64\", \"stderr_lines\": [\"Could not find resource octavia-amphora-14.0-20181126.1.x86_64\"], \"stdout\": \"\", \"stdout_lines\": []}", "...ignoring", "", "TASK [octavia-undercloud : get md5 from glance if image already exists there] ***", "Sunday 02 December 2018 10:14:30 -0500 (0:00:05.875) 0:00:37.762 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : determine if the image needs to be replaced] ********", "Sunday 02 December 2018 10:14:30 -0500 (0:00:00.050) 0:00:37.813 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : move existing image if the names match and the md5s are not the same] ***", "Sunday 02 December 2018 10:14:30 -0500 (0:00:00.052) 0:00:37.865 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : decide whether to upload new image] *****************", "Sunday 02 December 2018 10:14:30 -0500 (0:00:00.051) 0:00:37.917 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : upload image to glance] *****************************", "Sunday 02 December 2018 10:14:30 -0500 (0:00:00.077) 0:00:37.994 ******* ", "fatal: [undercloud-0]: FAILED! => {\"changed\": false, \"cmd\": \"openstack image create --disk-format qcow2 --container-format bare --tag amphora-image --file /usr/share/openstack-octavia-amphora-images/octavia-amphora.qcow2 octavia-amphora-14.0-20181126.1.x86_64\", \"delta\": \"0:02:06.264787\", \"end\": \"2018-12-02 10:16:37.307766\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2018-12-02 10:14:31.042979\", \"stderr\": \"Error finding address for https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: [Errno 32] Broken pipe\", \"stderr_lines\": [\"Error finding address for https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: [Errno 32] Broken pipe\"], \"stdout\": \"\", \"stdout_lines\": []}", "", "PLAY RECAP *********************************************************************", "undercloud-0 : ok=8 changed=1 unreachable=0 Overcloud configuration failed. failed=1 ", "", "Sunday 02 December 2018 10:16:37 -0500 (0:02:06.476) 0:02:44.470 ******* ", "=============================================================================== "]}
(In reply to Udi Shkalim from comment #0) > Description of problem: > Overcloud deployment failed due to failure in finding/uploading Octavia > image. > This happens on a 2nd deployment after deleting the 1st deployment that > passed. > > Version-Release number of selected component (if applicable): > octavia-amphora-image-x86_64-14.0-20181126.1.el7ost.noarch > python2-octaviaclient-1.6.0-0.20180816134807.64d007f.el7ost.noarch > puppet-octavia-13.3.1-0.20181013113433.e19b590.el7ost.noarch > > How reproducible: > not sure > > Steps to Reproduce: > 1. Deploy overcloud with Octavia - success > 2. Delete the overcloud - openstack overcloud delete overcloud -y - success > 3. Deploy the same overcloud again - failed > > Actual results: > Overcloud deploy failed > > Expected results: > overcloud deploy pass > > Additional info: > stdout_lines": ["", "PLAY [Undercloud[0]] > ***********************************************************", "", "TASK > [Gathering Facts] > *********************************************************", "Sunday 02 > December 2018 10:13:52 -0500 (0:00:00.117) 0:00:00.117 ******* ", > "ok: [undercloud-0]", "", "TASK [octavia-undercloud : set file if already > set by heat variable (backwards-compatibility)] ***", "Sunday 02 December > 2018 10:14:19 -0500 (0:00:26.617) 0:00:26.734 ******* ", "skipping: > [undercloud-0]", "", "TASK [octavia-undercloud : set location if CentOS] > *****************************", "Sunday 02 December 2018 10:14:19 -0500 > (0:00:00.051) 0:00:26.786 ******* ", "skipping: [undercloud-0]", "", > "TASK [octavia-undercloud : set location if Red Hat] > ****************************", "Sunday 02 December 2018 10:14:19 -0500 > (0:00:00.050) 0:00:26.836 ******* ", "ok: [undercloud-0]", "", "TASK > [octavia-undercloud : check if amphora image file exists] > *****************", "Sunday 02 December 2018 10:14:19 -0500 (0:00:00.071) > 0:00:26.907 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : > include_tasks] **************************************", "Sunday 02 December > 2018 10:14:24 -0500 (0:00:04.501) 0:00:31.409 ******* ", "included: > /usr/share/openstack-tripleo-common/playbooks/roles/octavia-undercloud/tasks/ > image_mgmt.yml for undercloud-0", "", "TASK [octavia-undercloud : check if > name is a symlink] *************************", "Sunday 02 December 2018 > 10:14:24 -0500 (0:00:00.086) 0:00:31.496 ******* ", "ok: > [undercloud-0]", "", "TASK [octavia-undercloud : bypass image naming logic > if image name is provided (backwards-compatibility)] ***", "Sunday 02 > December 2018 10:14:24 -0500 (0:00:00.222) 0:00:31.718 ******* ", > "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : set the actual > glance image name if it is a symlink] ***", "Sunday 02 December 2018 > 10:14:24 -0500 (0:00:00.045) 0:00:31.764 ******* ", "ok: > [undercloud-0]", "", "TASK [octavia-undercloud : set the actual glance image > name if it is not a symlink] ***", "Sunday 02 December 2018 10:14:24 -0500 > (0:00:00.065) 0:00:31.830 ******* ", "skipping: [undercloud-0]", "", > "TASK [octavia-undercloud : check there an image in glance already] > *************", "Sunday 02 December 2018 10:14:24 -0500 (0:00:00.056) > 0:00:31.886 ******* ", "fatal: [undercloud-0]: FAILED! => {\"changed\": > true, \"cmd\": \"openstack image show octavia-amphora-14.0-20181126.1.x86_64 > -c checksum -f value\", \"delta\": \"0:00:05.487314\", \"end\": \"2018-12-02 > 10:14:30.578032\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": > \"2018-12-02 10:14:25.090718\", \"stderr\": \"Could not find resource > octavia-amphora-14.0-20181126.1.x86_64\", \"stderr_lines\": [\"Could not > find resource octavia-amphora-14.0-20181126.1.x86_64\"], \"stdout\": \"\", > \"stdout_lines\": []}", "...ignoring", "", "TASK [octavia-undercloud : get > md5 from glance if image already exists there] ***", "Sunday 02 December > 2018 10:14:30 -0500 (0:00:05.875) 0:00:37.762 ******* ", "skipping: > [undercloud-0]", "", "TASK [octavia-undercloud : determine if the image > needs to be replaced] ********", "Sunday 02 December 2018 10:14:30 -0500 > (0:00:00.050) 0:00:37.813 ******* ", "skipping: [undercloud-0]", "", > "TASK [octavia-undercloud : move existing image if the names match and the > md5s are not the same] ***", "Sunday 02 December 2018 10:14:30 -0500 > (0:00:00.052) 0:00:37.865 ******* ", "skipping: [undercloud-0]", "", > "TASK [octavia-undercloud : decide whether to upload new image] > *****************", "Sunday 02 December 2018 10:14:30 -0500 (0:00:00.051) > 0:00:37.917 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : > upload image to glance] *****************************", "Sunday 02 December > 2018 10:14:30 -0500 (0:00:00.077) 0:00:37.994 ******* ", "fatal: > [undercloud-0]: FAILED! => {\"changed\": false, \"cmd\": \"openstack image > create --disk-format qcow2 --container-format bare --tag amphora-image > --file /usr/share/openstack-octavia-amphora-images/octavia-amphora.qcow2 > octavia-amphora-14.0-20181126.1.x86_64\", \"delta\": \"0:02:06.264787\", > \"end\": \"2018-12-02 10:16:37.307766\", \"msg\": \"non-zero return code\", > \"rc\": 1, \"start\": \"2018-12-02 10:14:31.042979\", \"stderr\": \"Error > finding address for > https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: > [Errno 32] Broken pipe\", \"stderr_lines\": [\"Error finding address for > https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: > [Errno 32] Broken pipe\"], \"stdout\": \"\", \"stdout_lines\": []}", "", > "PLAY RECAP > *********************************************************************", > "undercloud-0 : ok=8 changed=1 unreachable=0 > Overcloud configuration failed. > failed=1 ", "", "Sunday 02 December 2018 10:16:37 -0500 (0:02:06.476) > 0:02:44.470 ******* ", > "============================================================================ > === "]} Did not reproduce on my setup. Udi please try to reproduce on same env.
"Error finding address for https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: [Errno 32] Broken pipe" The error comes from overcloud Glance and does not seem to be related to Octavia. Please double check that Glance is running correctly. You may also want to check the logs from Glance in the overcloud.
(In reply to Carlos Goncalves from comment #3) > "Error finding address for > https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: > [Errno 32] Broken pipe" > > The error comes from overcloud Glance and does not seem to be related to > Octavia. Please double check that Glance is running correctly. You may also > want to check the logs from Glance in the overcloud. The issue reproduced again. Alex, did you try with the same setup as mine? ceph + ssl? The error is expected - checkout the command Octavia is trying to run: "cmd\": \"openstack image > create --disk-format qcow2 --container-format bare --tag amphora-image > --file /usr/share/openstack-octavia-amphora-images/octavia-amphora.qcow2 > octavia-amphora-14.0-20181126.1.x86_64\" This will fail since you have an extra variable - > octavia-amphora-14.0-20181126.1.x86_64\"
That is not an extra variable, it is the image name.
(In reply to Udi Shkalim from comment #4) > (In reply to Carlos Goncalves from comment #3) > > "Error finding address for > > https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: > > [Errno 32] Broken pipe" > > > > The error comes from overcloud Glance and does not seem to be related to > > Octavia. Please double check that Glance is running correctly. You may also > > want to check the logs from Glance in the overcloud. > > The issue reproduced again. Alex, did you try with the same setup as mine? > ceph + ssl? > The error is expected - checkout the command Octavia is trying to run: > "cmd\": \"openstack image > > create --disk-format qcow2 --container-format bare --tag amphora-image > > --file /usr/share/openstack-octavia-amphora-images/octavia-amphora.qcow2 > > octavia-amphora-14.0-20181126.1.x86_64\" > > This will fail since you have an extra variable - > > octavia-amphora-14.0-20181126.1.x86_64\" I tried yesterday and I did not manage to reproduce.
Is telemetry also enabled alongside with Ceph? If so, it could be a duplicate of rhbz #1632401.
I re-run the deployment with disabled telemetry and was able to re-create the bug setup: root.eng.tlv2.redhat.com
The problem is not related to Octavia. I could not create either a simple CirrOS image in overcloud Glance. (overcloud) [stack@undercloud-0 ~]$ wget https://download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img -4 --2018-12-06 07:30:10-- https://download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img Resolving download.cirros-cloud.net (download.cirros-cloud.net)... 64.90.42.85 `Connecting to download.cirros-cloud.net (download.cirros-cloud.net)|64.90.42.85|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 13267968 (13M) [text/plain] Saving to: ‘cirros-0.3.5-x86_64-disk.img’ 100%[=========================================================================================================================================================================>] 13,267,968 2.65MB/s in 11s 2018-12-06 07:30:23 (1.10 MB/s) - ‘cirros-0.3.5-x86_64-disk.img’ saved [13267968/13267968] (overcloud) [stack@undercloud-0 ~]$ openstack image create --file cirros-0.3.5-x86_64-disk.img --disk-format qcow2 --container-format bare cirros-0.3.5-x86_64-disk Error finding address for https://10.0.0.101:13292/v2/images/3a8b08b8-53c5-4088-9902-b9e3a05e063b/file: [Errno 32] Broken pipe
(In reply to Udi Shkalim from comment #8) > I re-run the deployment with disabled telemetry and was able to re-create > the bug > setup: root.eng.tlv2.redhat.com Is this still the same environment? I sourced overcloudrc on your envionrment, hit the issue, and then noticed that there is no overcloud deployed at all? Or is there a reproducer environment available somewhere else?
No, I re-deployed it. I'll try to reproduce and update with new information as soon as I can
Looks like this is a Ceph issue. The Ceph containers are in a restarting loop, this is some logging output from one of them: [root@ceph-0 ~]# docker logs --follow --tail 100 ceph-osd-ceph-0-vde 2018-12-11 21:45:05 /entrypoint.sh: static: does not generate config /dev/vde2 main_activate: path = /dev/vde1 get_dm_uuid: get_dm_uuid /dev/vde1 uuid path is /sys/dev/block/252:65/dm/uuid command: Running command: /usr/sbin/blkid -o udev -p /dev/vde1 command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/vde1 command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs mount: Mounting /dev/vde1 on /var/lib/ceph/tmp/mnt.AsoYcj with options noatime,largeio,inode64,swalloc command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,largeio,inode64,swalloc -- /dev/vde1 /var/lib/ceph/tmp/mnt.AsoYcj command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.AsoYcj activate: Cluster uuid is 7dcc2702-f95d-11e8-9647-525400d88adc command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid activate: Cluster name is ceph activate: OSD uuid is f7cb1359-f813-451b-9d96-0d711ce0d8eb activate: OSD id is 12 command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init command: Running command: /usr/bin/ceph-detect-init --default sysvinit activate: Marking with init system none command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.AsoYcj/none command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.AsoYcj/none activate: ceph osd.12 data dir is ready at /var/lib/ceph/tmp/mnt.AsoYcj move_mount: Moving mount to final location... command_check_call: Running command: /bin/mount -o noatime,largeio,inode64,swalloc -- /dev/vde1 /var/lib/ceph/osd/ceph-12 command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.AsoYcj 2018-12-11 21:45:08 /entrypoint.sh: SUCCESS exec: PID 227668: spawning /usr/bin/ceph-osd --cluster ceph -f -i 12 --setuser ceph --setgroup disk exec: Waiting 227668 to quit starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 /var/lib/ceph/osd/ceph-12/journal 2018-12-11 21:45:09.996142 7f330d70ad80 -1 journal do_read_entry(210849792): bad header magic 2018-12-11 21:45:10.088785 7f330d70ad80 -1 osd.12 33 log_to_monitors {default=true} 2018-12-11 21:45:10.101011 7f330d70ad80 -1 osd.12 33 init authentication failed: (1) Operation not permitted teardown: managing teardown after SIGCHLD teardown: Waiting PID 227668 to terminate teardown: Process 227668 is terminated teardown: Bye Bye, container will die with return code -1 And then the container exits, and restarts. John, Giulio - can you have a look at this, please? Also, moving this over to DFG:Ceph.
Udi, I'm going to close this not a bug, since we think this was caused by not cleaning the disks as described in docbug 1613918. If you reproduce it after cleaning your disks, then feel free to re-open. John
I believe this is the production documentation as well, but I added this to the doc bug to check https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/deploying_an_overcloud_with_containerized_red_hat_ceph/creation