Created attachment 1250080 [details] output of loop listing servers Hi, I have a RHOS 10 environment which is completely deployed and functional. What I then did is in a bash window run the following code while true;do date;OS_CLOUD=rhosops-test-ggillies openstack server list;done 2>&1 | tee server_list_loop.log I have attached the log. I have also attached the log of the overcloud deploy. While this was running, I reran my openstack overcloud deploy command, with no changes, essentially this should be a noop operation, or at the very least, should not cause any control plane outage (as absolutely nothing is changing). You will see in the log however that first keystone throws an error, and then nova throws an error a bit later. This seems to indicate that all stack deploy operations are disruptive, including configuration changes, node scale up/down, etc. This is easily reproducible Regards, Graeme
Created attachment 1250081 [details] log of overcloud deploy command
So with Alex's patches for the norpm provider (https://review.openstack.org/#/c/435011 and the nova filter patch https://review.openstack.org/435099) we have a definite improvement in the number of restarts: - nova-api went from 3 to 1 - nova-* went from 2 to 1 - swift has no restarts any longer - neutron-* and httpd stayed at 1 and 2 respectively Here are the restarts divided by steps: * Step1 restart ntpd' * Step2 restart ntpd' * Step3 restart ntpd' restart httpd' * Step4 restart ntpd' restart openstack-nova-conductor' restart openstack-nova-scheduler' restart openstack-nova-consoleauth' restart openstack-nova-novncproxy' restart httpd' restart openstack-nova-api' restart neutron-dhcp-agent' restart neutron-server' restart neutron-l3-agent' restart neutron-metadata-agent' * Step5 restart ntpd' Emilien has a review up that will move all the wsgi configuration in a single step which should fix at least httpd.
Added BZ 1426434 to track the norpm provider issue
Looks like this is going to be used as a tracking bug? If so, you can add Tracking keyword.
*** Bug 1436728 has been marked as a duplicate of this bug. ***
So we have a number of related upstream bugs, and I'm not sure we have downstream bzs associated with them (if we do please link them here): https://bugs.launchpad.net/tripleo/+bug/1664650 https://bugs.launchpad.net/puppet-nova/+bug/1665443 https://bugs.launchpad.net/tripleo/+bug/1665405 https://bugs.launchpad.net/tripleo/+bug/1665426
Verified on latest build 2017-05-23.4. Compute: only ssh restarted Controller: only Apache (twice), Glance (twice) and Heat (once) restarted. Does not look like there were any interruptions during deploy command re-run, I've used same output as in original comment to monitor during the re-run.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1585