Bug 1655312

Summary: Failure to find/upload octavia image during deployment after overcloud delete
Product: Red Hat OpenStack Reporter: Udi Shkalim <ushkalim>
Component: rhosp-directorAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Gurenko Alex <agurenko>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 14.0 (Rocky)CC: astafeye, bcafarel, cgoncalves, cschwede, dbecker, emacchi, gfidente, ihrachys, johfulto, lpeer, majopela, mburns, morazi, nyechiel, pgrist, ushkalim
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-17 14:14:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udi Shkalim 2018-12-02 16:22:36 UTC
Description of problem:
Overcloud deployment failed due to failure in finding/uploading Octavia image.
This happens on a 2nd deployment after deleting the 1st deployment that passed.

Version-Release number of selected component (if applicable):
octavia-amphora-image-x86_64-14.0-20181126.1.el7ost.noarch
python2-octaviaclient-1.6.0-0.20180816134807.64d007f.el7ost.noarch
puppet-octavia-13.3.1-0.20181013113433.e19b590.el7ost.noarch

How reproducible:
not sure

Steps to Reproduce:
1. Deploy overcloud with Octavia - success
2. Delete the overcloud - openstack overcloud delete overcloud -y - success
3. Deploy the same overcloud again - failed

Actual results:
Overcloud deploy failed

Expected results:
overcloud deploy pass

Additional info:
stdout_lines": ["", "PLAY [Undercloud[0]] ***********************************************************", "", "TASK [Gathering Facts] *********************************************************", "Sunday 02 December 2018  10:13:52 -0500 (0:00:00.117)       0:00:00.117 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : set file if already set by heat variable (backwards-compatibility)] ***", "Sunday 02 December 2018  10:14:19 -0500 (0:00:26.617)       0:00:26.734 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : set location if CentOS] *****************************", "Sunday 02 December 2018  10:14:19 -0500 (0:00:00.051)       0:00:26.786 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : set location if Red Hat] ****************************", "Sunday 02 December 2018  10:14:19 -0500 (0:00:00.050)       0:00:26.836 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : check if amphora image file exists] *****************", "Sunday 02 December 2018  10:14:19 -0500 (0:00:00.071)       0:00:26.907 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : include_tasks] **************************************", "Sunday 02 December 2018  10:14:24 -0500 (0:00:04.501)       0:00:31.409 ******* ", "included: /usr/share/openstack-tripleo-common/playbooks/roles/octavia-undercloud/tasks/image_mgmt.yml for undercloud-0", "", "TASK [octavia-undercloud : check if name is a symlink] *************************", "Sunday 02 December 2018  10:14:24 -0500 (0:00:00.086)       0:00:31.496 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : bypass image naming logic if image name is provided (backwards-compatibility)] ***", "Sunday 02 December 2018  10:14:24 -0500 (0:00:00.222)       0:00:31.718 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : set the actual glance image name if it is a symlink] ***", "Sunday 02 December 2018  10:14:24 -0500 (0:00:00.045)       0:00:31.764 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : set the actual glance image name if it is not a symlink] ***", "Sunday 02 December 2018  10:14:24 -0500 (0:00:00.065)       0:00:31.830 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : check there an image in glance already] *************", "Sunday 02 December 2018  10:14:24 -0500 (0:00:00.056)       0:00:31.886 ******* ", "fatal: [undercloud-0]: FAILED! => {\"changed\": true, \"cmd\": \"openstack image show octavia-amphora-14.0-20181126.1.x86_64 -c checksum -f value\", \"delta\": \"0:00:05.487314\", \"end\": \"2018-12-02 10:14:30.578032\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2018-12-02 10:14:25.090718\", \"stderr\": \"Could not find resource octavia-amphora-14.0-20181126.1.x86_64\", \"stderr_lines\": [\"Could not find resource octavia-amphora-14.0-20181126.1.x86_64\"], \"stdout\": \"\", \"stdout_lines\": []}", "...ignoring", "", "TASK [octavia-undercloud : get md5 from glance if image already exists there] ***", "Sunday 02 December 2018  10:14:30 -0500 (0:00:05.875)       0:00:37.762 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : determine if the image needs to be replaced] ********", "Sunday 02 December 2018  10:14:30 -0500 (0:00:00.050)       0:00:37.813 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : move existing image if the names match and the md5s are not the same] ***", "Sunday 02 December 2018  10:14:30 -0500 (0:00:00.052)       0:00:37.865 ******* ", "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : decide whether to upload new image] *****************", "Sunday 02 December 2018  10:14:30 -0500 (0:00:00.051)       0:00:37.917 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud : upload image to glance] *****************************", "Sunday 02 December 2018  10:14:30 -0500 (0:00:00.077)       0:00:37.994 ******* ", "fatal: [undercloud-0]: FAILED! => {\"changed\": false, \"cmd\": \"openstack image create --disk-format qcow2 --container-format bare --tag amphora-image --file /usr/share/openstack-octavia-amphora-images/octavia-amphora.qcow2 octavia-amphora-14.0-20181126.1.x86_64\", \"delta\": \"0:02:06.264787\", \"end\": \"2018-12-02 10:16:37.307766\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2018-12-02 10:14:31.042979\", \"stderr\": \"Error finding address for https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: [Errno 32] Broken pipe\", \"stderr_lines\": [\"Error finding address for https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: [Errno 32] Broken pipe\"], \"stdout\": \"\", \"stdout_lines\": []}", "", "PLAY RECAP *********************************************************************", "undercloud-0               : ok=8    changed=1    unreachable=0    Overcloud configuration failed.
failed=1   ", "", "Sunday 02 December 2018  10:16:37 -0500 (0:02:06.476)       0:02:44.470 ******* ", "=============================================================================== "]}

Comment 2 Alexander Stafeyev 2018-12-02 20:49:14 UTC
(In reply to Udi Shkalim from comment #0)
> Description of problem:
> Overcloud deployment failed due to failure in finding/uploading Octavia
> image.
> This happens on a 2nd deployment after deleting the 1st deployment that
> passed.
> 
> Version-Release number of selected component (if applicable):
> octavia-amphora-image-x86_64-14.0-20181126.1.el7ost.noarch
> python2-octaviaclient-1.6.0-0.20180816134807.64d007f.el7ost.noarch
> puppet-octavia-13.3.1-0.20181013113433.e19b590.el7ost.noarch
> 
> How reproducible:
> not sure
> 
> Steps to Reproduce:
> 1. Deploy overcloud with Octavia - success
> 2. Delete the overcloud - openstack overcloud delete overcloud -y - success
> 3. Deploy the same overcloud again - failed
> 
> Actual results:
> Overcloud deploy failed
> 
> Expected results:
> overcloud deploy pass
> 
> Additional info:
> stdout_lines": ["", "PLAY [Undercloud[0]]
> ***********************************************************", "", "TASK
> [Gathering Facts]
> *********************************************************", "Sunday 02
> December 2018  10:13:52 -0500 (0:00:00.117)       0:00:00.117 ******* ",
> "ok: [undercloud-0]", "", "TASK [octavia-undercloud : set file if already
> set by heat variable (backwards-compatibility)] ***", "Sunday 02 December
> 2018  10:14:19 -0500 (0:00:26.617)       0:00:26.734 ******* ", "skipping:
> [undercloud-0]", "", "TASK [octavia-undercloud : set location if CentOS]
> *****************************", "Sunday 02 December 2018  10:14:19 -0500
> (0:00:00.051)       0:00:26.786 ******* ", "skipping: [undercloud-0]", "",
> "TASK [octavia-undercloud : set location if Red Hat]
> ****************************", "Sunday 02 December 2018  10:14:19 -0500
> (0:00:00.050)       0:00:26.836 ******* ", "ok: [undercloud-0]", "", "TASK
> [octavia-undercloud : check if amphora image file exists]
> *****************", "Sunday 02 December 2018  10:14:19 -0500 (0:00:00.071)  
> 0:00:26.907 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud :
> include_tasks] **************************************", "Sunday 02 December
> 2018  10:14:24 -0500 (0:00:04.501)       0:00:31.409 ******* ", "included:
> /usr/share/openstack-tripleo-common/playbooks/roles/octavia-undercloud/tasks/
> image_mgmt.yml for undercloud-0", "", "TASK [octavia-undercloud : check if
> name is a symlink] *************************", "Sunday 02 December 2018 
> 10:14:24 -0500 (0:00:00.086)       0:00:31.496 ******* ", "ok:
> [undercloud-0]", "", "TASK [octavia-undercloud : bypass image naming logic
> if image name is provided (backwards-compatibility)] ***", "Sunday 02
> December 2018  10:14:24 -0500 (0:00:00.222)       0:00:31.718 ******* ",
> "skipping: [undercloud-0]", "", "TASK [octavia-undercloud : set the actual
> glance image name if it is a symlink] ***", "Sunday 02 December 2018 
> 10:14:24 -0500 (0:00:00.045)       0:00:31.764 ******* ", "ok:
> [undercloud-0]", "", "TASK [octavia-undercloud : set the actual glance image
> name if it is not a symlink] ***", "Sunday 02 December 2018  10:14:24 -0500
> (0:00:00.065)       0:00:31.830 ******* ", "skipping: [undercloud-0]", "",
> "TASK [octavia-undercloud : check there an image in glance already]
> *************", "Sunday 02 December 2018  10:14:24 -0500 (0:00:00.056)      
> 0:00:31.886 ******* ", "fatal: [undercloud-0]: FAILED! => {\"changed\":
> true, \"cmd\": \"openstack image show octavia-amphora-14.0-20181126.1.x86_64
> -c checksum -f value\", \"delta\": \"0:00:05.487314\", \"end\": \"2018-12-02
> 10:14:30.578032\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\":
> \"2018-12-02 10:14:25.090718\", \"stderr\": \"Could not find resource
> octavia-amphora-14.0-20181126.1.x86_64\", \"stderr_lines\": [\"Could not
> find resource octavia-amphora-14.0-20181126.1.x86_64\"], \"stdout\": \"\",
> \"stdout_lines\": []}", "...ignoring", "", "TASK [octavia-undercloud : get
> md5 from glance if image already exists there] ***", "Sunday 02 December
> 2018  10:14:30 -0500 (0:00:05.875)       0:00:37.762 ******* ", "skipping:
> [undercloud-0]", "", "TASK [octavia-undercloud : determine if the image
> needs to be replaced] ********", "Sunday 02 December 2018  10:14:30 -0500
> (0:00:00.050)       0:00:37.813 ******* ", "skipping: [undercloud-0]", "",
> "TASK [octavia-undercloud : move existing image if the names match and the
> md5s are not the same] ***", "Sunday 02 December 2018  10:14:30 -0500
> (0:00:00.052)       0:00:37.865 ******* ", "skipping: [undercloud-0]", "",
> "TASK [octavia-undercloud : decide whether to upload new image]
> *****************", "Sunday 02 December 2018  10:14:30 -0500 (0:00:00.051)  
> 0:00:37.917 ******* ", "ok: [undercloud-0]", "", "TASK [octavia-undercloud :
> upload image to glance] *****************************", "Sunday 02 December
> 2018  10:14:30 -0500 (0:00:00.077)       0:00:37.994 ******* ", "fatal:
> [undercloud-0]: FAILED! => {\"changed\": false, \"cmd\": \"openstack image
> create --disk-format qcow2 --container-format bare --tag amphora-image
> --file /usr/share/openstack-octavia-amphora-images/octavia-amphora.qcow2
> octavia-amphora-14.0-20181126.1.x86_64\", \"delta\": \"0:02:06.264787\",
> \"end\": \"2018-12-02 10:16:37.307766\", \"msg\": \"non-zero return code\",
> \"rc\": 1, \"start\": \"2018-12-02 10:14:31.042979\", \"stderr\": \"Error
> finding address for
> https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file:
> [Errno 32] Broken pipe\", \"stderr_lines\": [\"Error finding address for
> https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file:
> [Errno 32] Broken pipe\"], \"stdout\": \"\", \"stdout_lines\": []}", "",
> "PLAY RECAP
> *********************************************************************",
> "undercloud-0               : ok=8    changed=1    unreachable=0   
> Overcloud configuration failed.
> failed=1   ", "", "Sunday 02 December 2018  10:16:37 -0500 (0:02:06.476)    
> 0:02:44.470 ******* ",
> "============================================================================
> === "]}

Did not reproduce on my setup. Udi please try to reproduce on same env.

Comment 3 Carlos Goncalves 2018-12-03 08:28:20 UTC
"Error finding address for https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file: [Errno 32] Broken pipe"

The error comes from overcloud Glance and does not seem to be related to Octavia. Please double check that Glance is running correctly. You may also want to check the logs from Glance in the overcloud.

Comment 4 Udi Shkalim 2018-12-03 08:56:35 UTC
(In reply to Carlos Goncalves from comment #3)
> "Error finding address for
> https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file:
> [Errno 32] Broken pipe"
> 
> The error comes from overcloud Glance and does not seem to be related to
> Octavia. Please double check that Glance is running correctly. You may also
> want to check the logs from Glance in the overcloud.

The issue reproduced again. Alex, did you try with the same setup as mine? ceph + ssl?
The error is expected - checkout the command Octavia is trying to run:
"cmd\": \"openstack image
> create --disk-format qcow2 --container-format bare --tag amphora-image
> --file /usr/share/openstack-octavia-amphora-images/octavia-amphora.qcow2
> octavia-amphora-14.0-20181126.1.x86_64\"

This will fail since you have an extra variable - > octavia-amphora-14.0-20181126.1.x86_64\"

Comment 5 Carlos Goncalves 2018-12-03 10:16:10 UTC
That is not an extra variable, it is the image name.

Comment 6 Alexander Stafeyev 2018-12-03 14:19:00 UTC
(In reply to Udi Shkalim from comment #4)
> (In reply to Carlos Goncalves from comment #3)
> > "Error finding address for
> > https://10.0.0.101:13292/v2/images/91c20681-e65a-447d-9cd5-18214119a5ff/file:
> > [Errno 32] Broken pipe"
> > 
> > The error comes from overcloud Glance and does not seem to be related to
> > Octavia. Please double check that Glance is running correctly. You may also
> > want to check the logs from Glance in the overcloud.
> 
> The issue reproduced again. Alex, did you try with the same setup as mine?
> ceph + ssl?
> The error is expected - checkout the command Octavia is trying to run:
> "cmd\": \"openstack image
> > create --disk-format qcow2 --container-format bare --tag amphora-image
> > --file /usr/share/openstack-octavia-amphora-images/octavia-amphora.qcow2
> > octavia-amphora-14.0-20181126.1.x86_64\"
> 
> This will fail since you have an extra variable - >
> octavia-amphora-14.0-20181126.1.x86_64\"

I tried yesterday and I did not manage to reproduce.

Comment 7 Carlos Goncalves 2018-12-05 14:26:49 UTC
Is telemetry also enabled alongside with Ceph? If so, it could be a duplicate of rhbz #1632401.

Comment 8 Udi Shkalim 2018-12-06 11:51:46 UTC
I re-run the deployment with disabled telemetry and was able to re-create the bug
setup: root.eng.tlv2.redhat.com

Comment 10 Carlos Goncalves 2018-12-06 12:38:53 UTC
The problem is not related to Octavia. I could not create either a simple CirrOS image in overcloud Glance.

(overcloud) [stack@undercloud-0 ~]$ wget https://download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img -4
--2018-12-06 07:30:10--  https://download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img
Resolving download.cirros-cloud.net (download.cirros-cloud.net)... 64.90.42.85
`Connecting to download.cirros-cloud.net (download.cirros-cloud.net)|64.90.42.85|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13267968 (13M) [text/plain]
Saving to: ‘cirros-0.3.5-x86_64-disk.img’

100%[=========================================================================================================================================================================>] 13,267,968  2.65MB/s   in 11s    

2018-12-06 07:30:23 (1.10 MB/s) - ‘cirros-0.3.5-x86_64-disk.img’ saved [13267968/13267968]

(overcloud) [stack@undercloud-0 ~]$ openstack image create --file cirros-0.3.5-x86_64-disk.img --disk-format qcow2 --container-format bare cirros-0.3.5-x86_64-disk
Error finding address for https://10.0.0.101:13292/v2/images/3a8b08b8-53c5-4088-9902-b9e3a05e063b/file: [Errno 32] Broken pipe

Comment 12 Christian Schwede (cschwede) 2018-12-11 13:57:20 UTC
(In reply to Udi Shkalim from comment #8)
> I re-run the deployment with disabled telemetry and was able to re-create
> the bug
> setup: root.eng.tlv2.redhat.com

Is this still the same environment? I sourced overcloudrc on your envionrment, hit the issue, and then noticed that there is no overcloud deployed at all? Or is there a reproducer environment available somewhere else?

Comment 13 Udi Shkalim 2018-12-11 14:04:36 UTC
No, I re-deployed it. I'll try to reproduce and update with new information as soon as I can

Comment 15 Christian Schwede (cschwede) 2018-12-11 21:49:11 UTC
Looks like this is a Ceph issue. The Ceph containers are in a restarting loop, this is some logging output from one of them:

[root@ceph-0 ~]# docker logs --follow --tail 100 ceph-osd-ceph-0-vde
2018-12-11 21:45:05  /entrypoint.sh: static: does not generate config
/dev/vde2
main_activate: path = /dev/vde1
get_dm_uuid: get_dm_uuid /dev/vde1 uuid path is /sys/dev/block/252:65/dm/uuid
command: Running command: /usr/sbin/blkid -o udev -p /dev/vde1
command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/vde1
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
mount: Mounting /dev/vde1 on /var/lib/ceph/tmp/mnt.AsoYcj with options noatime,largeio,inode64,swalloc
command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,largeio,inode64,swalloc -- /dev/vde1 /var/lib/ceph/tmp/mnt.AsoYcj
command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.AsoYcj
activate: Cluster uuid is 7dcc2702-f95d-11e8-9647-525400d88adc
command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
activate: Cluster name is ceph
activate: OSD uuid is f7cb1359-f813-451b-9d96-0d711ce0d8eb
activate: OSD id is 12
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init
command: Running command: /usr/bin/ceph-detect-init --default sysvinit
activate: Marking with init system none
command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.AsoYcj/none
command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.AsoYcj/none
activate: ceph osd.12 data dir is ready at /var/lib/ceph/tmp/mnt.AsoYcj
move_mount: Moving mount to final location...
command_check_call: Running command: /bin/mount -o noatime,largeio,inode64,swalloc -- /dev/vde1 /var/lib/ceph/osd/ceph-12
command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.AsoYcj
2018-12-11 21:45:08  /entrypoint.sh: SUCCESS
exec: PID 227668: spawning /usr/bin/ceph-osd --cluster ceph -f -i 12 --setuser ceph --setgroup disk
exec: Waiting 227668 to quit
starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 /var/lib/ceph/osd/ceph-12/journal
2018-12-11 21:45:09.996142 7f330d70ad80 -1 journal do_read_entry(210849792): bad header magic
2018-12-11 21:45:10.088785 7f330d70ad80 -1 osd.12 33 log_to_monitors {default=true}
2018-12-11 21:45:10.101011 7f330d70ad80 -1 osd.12 33 init authentication failed: (1) Operation not permitted
teardown: managing teardown after SIGCHLD
teardown: Waiting PID 227668 to terminate 
teardown: Process 227668 is terminated
teardown: Bye Bye, container will die with return code -1

And then the container exits, and restarts.

John, Giulio - can you have a look at this, please? Also, moving this over to DFG:Ceph.

Comment 20 John Fulton 2018-12-17 14:14:00 UTC
Udi,

I'm going to close this not a bug, since we think this was caused by not cleaning the disks as described in docbug 1613918. If you reproduce it after cleaning your disks, then feel free to re-open. 

  John

Comment 21 Paul Grist 2018-12-17 14:18:58 UTC
I believe this is the production documentation as well, but I added this to the doc bug to check

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/deploying_an_overcloud_with_containerized_red_hat_ceph/creation