In more targeted testing of OSP10 minor update we found out there is an issue with way pacemaker services are created and how we defined the order constraints. We will have to provide fix for all the releases of OSP, but for OSP10 we probably will go with workaround I created and tested. Issue is pacemaker resource order constraints are kind Optional which means they do not apply on shutdown of pacemaker cluster on the node. This causes haproxy to be stopped before the VIP is migrated away from the node and subsequent APIs failure. Migration of VIPs will be applied in yum_update.sh script but we should also update the reboot documentation/procedure for the operators.
Verified , #check THT version : (undercloud) [stack@undercloud-0 ~]$ rpm -qa|grep openstack-tripleo-heat-templates openstack-tripleo-heat-templates-10.6.1-0.20190905170437.b33b839.el8ost.noarch #preform uc & oc nodes update and converge: [stack@undercloud-0 ~]$ . stackrc;openstack overcloud status ;tail -n 20 overcloud_update_run_Controller.log overcloud_update_run_Compute.log overcloud_update_converge.log +-----------+---------------------+---------------------+-------------------+ | Plan Name | Created | Updated | Deployment Status | +-----------+---------------------+---------------------+-------------------+ | overcloud | 2019-09-08 15:35:13 | 2019-09-08 15:35:13 | DEPLOY_SUCCESS | +-----------+---------------------+---------------------+-------------------+ sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 41548), raddr=('192.168.24.2', 13000)> sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 55260), raddr=('192.168.24.2', 13989)> ==> overcloud_update_run_Controller.log <== 2019-09-08 10:51:35 | 2019-09-08 10:51:35 | PLAY [Run update] ************************************************************** 2019-09-08 10:51:35 | skipping: no hosts matched 2019-09-08 10:51:35 | 2019-09-08 10:51:35 | PLAY [Run update] ************************************************************** 2019-09-08 10:51:35 | skipping: no hosts matched 2019-09-08 10:51:35 | 2019-09-08 10:51:35 | PLAY RECAP ********************************************************************* 2019-09-08 10:51:35 | controller-0 : ok=309 changed=149 unreachable=0 failed=0 skipped=667 rescued=0 ignored=2 2019-09-08 10:51:35 | controller-1 : ok=295 changed=141 unreachable=0 failed=0 skipped=681 rescued=0 ignored=2 2019-09-08 10:51:35 | controller-2 : ok=295 changed=141 unreachable=0 failed=0 skipped=681 rescued=0 ignored=2 2019-09-08 10:51:35 | 2019-09-08 10:51:35 | Sunday 08 September 2019 10:51:33 +0000 (0:00:00.067) 1:12:21.531 ****** 2019-09-08 10:51:35 | =============================================================================== 2019-09-08 10:51:35 | 2019-09-08 10:51:35 | Updated nodes - Controller 2019-09-08 10:51:35 | Success 2019-09-08 10:51:35 | 2019-09-08 10:51:35.390 1016217 INFO osc_lib.shell [-] END return value: None 2019-09-08 10:51:35 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 49194)> 2019-09-08 10:51:35 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 42574), raddr=('192.168.24.2', 13989)> ==> overcloud_update_run_Compute.log <== 2019-09-08 11:02:04 | skipping: no hosts matched 2019-09-08 11:02:04 | 2019-09-08 11:02:04 | PLAY [Run update] ************************************************************** 2019-09-08 11:02:04 | skipping: no hosts matched 2019-09-08 11:02:04 | 2019-09-08 11:02:04 | PLAY [Run update] ************************************************************** 2019-09-08 11:02:04 | skipping: no hosts matched 2019-09-08 11:02:04 | 2019-09-08 11:02:04 | PLAY RECAP ********************************************************************* 2019-09-08 11:02:04 | compute-0 : ok=157 changed=64 unreachable=0 failed=0 skipped=764 rescued=0 ignored=2 2019-09-08 11:02:04 | compute-1 : ok=157 changed=64 unreachable=0 failed=0 skipped=764 rescued=0 ignored=2 2019-09-08 11:02:04 | 2019-09-08 11:02:04 | Sunday 08 September 2019 11:02:04 +0000 (0:00:00.166) 0:10:14.126 ****** 2019-09-08 11:02:04 | =============================================================================== 2019-09-08 11:02:05 | 2019-09-08 11:02:05 | Updated nodes - Compute 2019-09-08 11:02:05 | Success 2019-09-08 11:02:05 | 2019-09-08 11:02:05.513 80385 INFO osc_lib.shell [-] END return value: None 2019-09-08 11:02:05 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 57320)> 2019-09-08 11:02:05 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 57152), raddr=('192.168.24.2', 13989)> ==> overcloud_update_converge.log <== 2019-09-08 11:47:50 | controller-1 : ok=290 changed=119 unreachable=0 failed=0 skipped=679 rescued=0 ignored=2 2019-09-08 11:47:50 | controller-2 : ok=290 changed=119 unreachable=0 failed=0 skipped=679 rescued=0 ignored=2 2019-09-08 11:47:50 | undercloud : ok=11 changed=7 unreachable=0 failed=0 skipped=32 rescued=0 ignored=0 2019-09-08 11:47:50 | 2019-09-08 11:47:50 | Sunday 08 September 2019 11:47:49 +0000 (0:00:00.571) 0:28:52.770 ****** 2019-09-08 11:47:50 | =============================================================================== 2019-09-08 11:47:52 | 2019-09-08 11:47:52 | Ansible passed. 2019-09-08 11:47:52 | Overcloud configuration completed. 2019-09-08 11:47:52 | 2019-09-08 11:47:52.449 100958 WARNING tripleoclient.plugin [-] Waiting for messages on queue 'tripleo' with no timeout. 2019-09-08 11:47:56 | Overcloud Endpoint: https://10.0.0.101:13000 2019-09-08 11:47:56 | Overcloud Horizon Dashboard URL: https://10.0.0.101:443/dashboard 2019-09-08 11:47:56 | Overcloud rc file: /home/stack/overcloudrc 2019-09-08 11:47:56 | Overcloud Deployed 2019-09-08 11:47:56 | 2019-09-08 11:47:56.264 100958 INFO tripleoclient.v1.overcloud_update.UpdateConverge [-] Update converge on stack overcloud complete. 2019-09-08 11:47:56 | 2019-09-08 11:47:56.265 100958 INFO osc_lib.shell [-] END return value: None 2019-09-08 11:47:56 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 39356), raddr=('192.168.24.2', 13808)> 2019-09-08 11:47:56 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 45436)> 2019-09-08 11:47:56 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 37320), raddr=('192.168.24.2', 13004)> 2019-09-08 11:47:56 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 50276), raddr=('192.168.24.2', 13989)> #check Vip was moved during oc update : (undercloud) [stack@undercloud-0 ~]$ sudo grep -r "Moving Vip" /var/log Binary file /var/log/journal/fc3029402122689e280ffb4295f04f09/system.journal matches /var/log/secure:Sep 8 19:01:37 undercloud-0 sudo[926413]: stack : TTY=pts/0 ; PWD=/home/stack ; USER=root ; COMMAND=/bin/grep -r Moving Vip /var/log (undercloud) [stack@undercloud-0 ~]$ grep -A 5 'Moving VIP' overcloud_update_run_Controller.log 2019-09-08 09:40:54 | changed: [controller-0] => {"changed": true, "cmd": "CLUSTER_NODE=$(crm_node -n)\necho \"Retrieving all the VIPs which are hosted on this node\"\nVIPS_TO_MOVE=$(crm_mon --as-xml | xmllint --xpath '//resource[@resource_agent = \"ocf::heartbeat:IPaddr2\" and @role = \"Started\" and @managed = \"true\" and ./node[@name = \"'${CLUSTER_NODE}'\"]]/@id' - | sed -e 's/id=//g' -e 's/\"//g')\nfor v in ${VIPS_TO_MOVE}; do\n echo \"Moving VIP $v on another node\"\n pcs resource move $v --wait=300\ndone\necho \"Removing the location constraints that were created to move the VIPs\"\nfor v in ${VIPS_TO_MOVE}; do\n echo \"Removing location ban for VIP $v\"\n ban_id=$(cibadmin --query | xmllint --xpath 'string(//rsc_location[@rsc=\"'${v}'\" and @node=\"'${CLUSTER_NODE}'\" and @score=\"-INFINITY\"]/@id)' -)\n if [ -n \"$ban_id\" ]; then\n pcs constraint remove ${ban_id}\n else\n echo \"Could not retrieve and clear location constraint for VIP $v\" 2>&1\n fi\ndone\n", "delta": "0:00:07.403534", "end": "2019-09-08 09:39:49.661484", "rc": 0, "start": "2019-09-08 09:39:42.257950", "stderr": "Error: Resource 'ip-172.17.1.121' is not running on any node", "stderr_lines": ["Error: Resource 'ip-172.17.1.121' is not running on any node"], "stdout": "Retrieving all the VIPs which are hosted on this node\nMoving VIP ip-192.168.24.37 on another node\nWarning: Creating location constraint cli-ban-ip-192.168.24.37-on-controller-0 with a score of -INFINITY for resource ip-192.168.24.37 on node controller-0.\nThis will prevent ip-192.168.24.37 from running on controller-0 until the constraint is removed. This will be the case even if controller-0 is the last node in the cluster.\nResource 'ip-192.168.24.37' is running on node controller-1.\nMoving VIP ip-172.17.1.121 on another node\nWarning: Creating location constraint cli-ban-ip-172.17.1.121-on-controller-0 with a score of -INFINITY for resource ip-172.17.1.121 on node controller-0.\nThis will prevent ip-172.17.1.121 from running on controller-0 until the constraint is removed. This will be the case even if controller-0 is the last node in the cluster.\nRemoving the location constraints that were created to move the VIPs\nRemoving location ban for VIP ip-192.168.24.37\nRemoving location ban for VIP ip-172.17.1.121", "stdout_lines": ["Retrieving all the VIPs which are hosted on this node", "Moving VIP ip-192.168.24.37 on another node", "Warning: Creating location constraint cli-ban-ip-192.168.24.37-on-controller-0 with a score of -INFINITY for resource ip-192.168.24.37 on node controller-0.", "This will prevent ip-192.168.24.37 from running on controller-0 until the constraint is removed. This will be the case even if controller-0 is the last node in the cluster.", "Resource 'ip-192.168.24.37' is running on node controller-1.", "Moving VIP ip-172.17.1.121 on another node", "Warning: Creating location constraint cli-ban-ip-172.17.1.121-on-controller-0 with a score of -INFINITY for resource ip-172.17.1.121 on node controller-0.", "This will prevent ip-172.17.1.121 from running on controller-0 until the constraint is removed. This will be the case even if controller-0 is the last node in the cluster.", "Removing the location constraints that were created to move the VIPs", "Removing location ban for VIP ip-192.168.24.37", "Removing location ban for VIP ip-172.17.1.121"]} 2019-09-08 09:40:54 | 2019-09-08 09:40:54 | TASK [Stop pacemaker cluster] ************************************************** 2019-09-08 09:40:54 | Sunday 08 September 2019 09:39:49 +0000 (0:00:07.861) 0:00:37.415 ****** 2019-09-08 09:40:54 | changed: [controller-0] => {"changed": true, "out": "offline"} 2019-09-08 09:40:54 | -- 2019-09-08 10:08:08 | changed: [controller-1] => {"changed": true, "cmd": "CLUSTER_NODE=$(crm_node -n)\necho \"Retrieving all the VIPs which are hosted on this node\"\nVIPS_TO_MOVE=$(crm_mon --as-xml | xmllint --xpath '//resource[@resource_agent = \"ocf::heartbeat:IPaddr2\" and @role = \"Started\" and @managed = \"true\" and ./node[@name = \"'${CLUSTER_NODE}'\"]]/@id' - | sed -e 's/id=//g' -e 's/\"//g')\nfor v in ${VIPS_TO_MOVE}; do\n echo \"Moving VIP $v on another node\"\n pcs resource move $v --wait=300\ndone\necho \"Removing the location constraints that were created to move the VIPs\"\nfor v in ${VIPS_TO_MOVE}; do\n echo \"Removing location ban for VIP $v\"\n ban_id=$(cibadmin --query | xmllint --xpath 'string(//rsc_location[@rsc=\"'${v}'\" and @node=\"'${CLUSTER_NODE}'\" and @score=\"-INFINITY\"]/@id)' -)\n if [ -n \"$ban_id\" ]; then\n pcs constraint remove ${ban_id}\n else\n echo \"Could not retrieve and clear location constraint for VIP $v\" 2>&1\n fi\ndone\n", "delta": "0:00:14.716948", "end": "2019-09-08 10:08:08.517700", "rc": 0, "start": "2019-09-08 10:07:53.800752", "stderr": "Error: Resource 'ip-172.17.1.121' is not running on any node", "stderr_lines": ["Error: Resource 'ip-172.17.1.121' is not running on any node"], "stdout": "Retrieving all the VIPs which are hosted on this node\nMoving VIP ip-192.168.24.37 on another node\nWarning: Creating location constraint cli-ban-ip-192.168.24.37-on-controller-1 with a score of -INFINITY for resource ip-192.168.24.37 on node controller-1.\nThis will prevent ip-192.168.24.37 from running on controller-1 until the constraint is removed. This will be the case even if controller-1 is the last node in the cluster.\nResource 'ip-192.168.24.37' is running on node controller-0.\nMoving VIP ip-10.0.0.101 on another node\nWarning: Creating location constraint cli-ban-ip-10.0.0.101-on-controller-1 with a score of -INFINITY for resource ip-10.0.0.101 on node controller-1.\nThis will prevent ip-10.0.0.101 from running on controller-1 until the constraint is removed. This will be the case even if controller-1 is the last node in the cluster.\nResource 'ip-10.0.0.101' is running on node controller-2.\nMoving VIP ip-172.17.1.121 on another node\nWarning: Creating location constraint cli-ban-ip-172.17.1.121-on-controller-1 with a score of -INFINITY for resource ip-172.17.1.121 on node controller-1.\nThis will prevent ip-172.17.1.121 from running on controller-1 until the constraint is removed. This will be the case even if controller-1 is the last node in the cluster.\nMoving VIP ip-172.17.3.118 on another node\nWarning: Creating location constraint cli-ban-ip-172.17.3.118-on-controller-1 with a score of -INFINITY for resource ip-172.17.3.118 on node controller-1.\nThis will prevent ip-172.17.3.118 from running on controller-1 until the constraint is removed. This will be the case even if controller-1 is the last node in the cluster.\nResource 'ip-172.17.3.118' is running on node controller-0.\nRemoving the location constraints that were created to move the VIPs\nRemoving location ban for VIP ip-192.168.24.37\nRemoving location ban for VIP ip-10.0.0.101\nRemoving location ban for VIP ip-172.17.1.121\nRemoving location ban for VIP ip-172.17.3.118", "stdout_lines": ["Retrieving all the VIPs which are hosted on this node", "Moving VIP ip-192.168.24.37 on another node", "Warning: Creating location constraint cli-ban-ip-192.168.24.37-on-controller-1 with a score of -INFINITY for resource ip-192.168.24.37 on node controller-1.", "This will prevent ip-192.168.24.37 from running on controller-1 until the constraint is removed. This will be the case even if controller-1 is the last node in the cluster.", "Resource 'ip-192.168.24.37' is running on node controller-0.", "Moving VIP ip-10.0.0.101 on another node", "Warning: Creating location constraint cli-ban-ip-10.0.0.101-on-controller-1 with a score of -INFINITY for resource ip-10.0.0.101 on node controller-1.", "This will prevent ip-10.0.0.101 from running on controller-1 until the constraint is removed. This will be the case even if controller-1 is the last node in the cluster.", "Resource 'ip-10.0.0.101' is running on node controller-2.", "Moving VIP ip-172.17.1.121 on another node", "Warning: Creating location constraint cli-ban-ip-172.17.1.121-on-controller-1 with a score of -INFINITY for resource ip-172.17.1.121 on node controller-1.", "This will prevent ip-172.17.1.121 from running on controller-1 until the constraint is removed. This will be the case even if controller-1 is the last node in the cluster.", "Moving VIP ip-172.17.3.118 on another node", "Warning: Creating location constraint cli-ban-ip-172.17.3.118-on-controller-1 with a score of -INFINITY for resource ip-172.17.3.118 on node controller-1.", "This will prevent ip-172.17.3.118 from running on controller-1 until the constraint is removed. This will be the case even if controller-1 is the last node in the cluster.", "Resource 'ip-172.17.3.118' is running on node controller-0.", "Removing the location constraints that were created to move the VIPs", "Removing location ban for VIP ip-192.168.24.37", "Removing location ban for VIP ip-10.0.0.101", "Removing location ban for VIP ip-172.17.1.121", "Removing location ban for VIP ip-172.17.3.118"]} 2019-09-08 10:09:47 | 2019-09-08 10:09:47 | 2019-09-08 10:09:47 | TASK [Stop pacemaker cluster] ************************************************** 2019-09-08 10:09:47 | Sunday 08 September 2019 10:08:08 +0000 (0:00:15.140) 0:28:56.257 ****** 2019-09-08 10:09:47 | changed: [controller-1] => {"changed": true, "out": "offline"} -- 2019-09-08 10:31:08 | changed: [controller-2] => {"changed": true, "cmd": "CLUSTER_NODE=$(crm_node -n)\necho \"Retrieving all the VIPs which are hosted on this node\"\nVIPS_TO_MOVE=$(crm_mon --as-xml | xmllint --xpath '//resource[@resource_agent = \"ocf::heartbeat:IPaddr2\" and @role = \"Started\" and @managed = \"true\" and ./node[@name = \"'${CLUSTER_NODE}'\"]]/@id' - | sed -e 's/id=//g' -e 's/\"//g')\nfor v in ${VIPS_TO_MOVE}; do\n echo \"Moving VIP $v on another node\"\n pcs resource move $v --wait=300\ndone\necho \"Removing the location constraints that were created to move the VIPs\"\nfor v in ${VIPS_TO_MOVE}; do\n echo \"Removing location ban for VIP $v\"\n ban_id=$(cibadmin --query | xmllint --xpath 'string(//rsc_location[@rsc=\"'${v}'\" and @node=\"'${CLUSTER_NODE}'\" and @score=\"-INFINITY\"]/@id)' -)\n if [ -n \"$ban_id\" ]; then\n pcs constraint remove ${ban_id}\n else\n echo \"Could not retrieve and clear location constraint for VIP $v\" 2>&1\n fi\ndone\n", "delta": "0:00:11.120663", "end": "2019-09-08 10:30:40.072117", "rc": 0, "start": "2019-09-08 10:30:28.951454", "stderr": "", "stderr_lines": [], "stdout": "Retrieving all the VIPs which are hosted on this node\nMoving VIP ip-10.0.0.101 on another node\nWarning: Creating location constraint cli-ban-ip-10.0.0.101-on-controller-2 with a score of -INFINITY for resource ip-10.0.0.101 on node controller-2.\nThis will prevent ip-10.0.0.101 from running on controller-2 until the constraint is removed. This will be the case even if controller-2 is the last node in the cluster.\nResource 'ip-10.0.0.101' is running on node controller-1.\nMoving VIP ip-172.17.1.17 on another node\nWarning: Creating location constraint cli-ban-ip-172.17.1.17-on-controller-2 with a score of -INFINITY for resource ip-172.17.1.17 on node controller-2.\nThis will prevent ip-172.17.1.17 from running on controller-2 until the constraint is removed. This will be the case even if controller-2 is the last node in the cluster.\nResource 'ip-172.17.1.17' is running on node controller-0.\nMoving VIP ip-172.17.4.41 on another node\nWarning: Creating location constraint cli-ban-ip-172.17.4.41-on-controller-2 with a score of -INFINITY for resource ip-172.17.4.41 on node controller-2.\nThis will prevent ip-172.17.4.41 from running on controller-2 until the constraint is removed. This will be the case even if controller-2 is the last node in the cluster.\nResource 'ip-172.17.4.41' is running on node controller-1.\nRemoving the location constraints that were created to move the VIPs\nRemoving location ban for VIP ip-10.0.0.101\nRemoving location ban for VIP ip-172.17.1.17\nRemoving location ban for VIP ip-172.17.4.41", "stdout_lines": ["Retrieving all the VIPs which are hosted on this node", "Moving VIP ip-10.0.0.101 on another node", "Warning: Creating location constraint cli-ban-ip-10.0.0.101-on-controller-2 with a score of -INFINITY for resource ip-10.0.0.101 on node controller-2.", "This will prevent ip-10.0.0.101 from running on controller-2 until the constraint is removed. This will be the case even if controller-2 is the last node in the cluster.", "Resource 'ip-10.0.0.101' is running on node controller-1.", "Moving VIP ip-172.17.1.17 on another node", "Warning: Creating location constraint cli-ban-ip-172.17.1.17-on-controller-2 with a score of -INFINITY for resource ip-172.17.1.17 on node controller-2.", "This will prevent ip-172.17.1.17 from running on controller-2 until the constraint is removed. This will be the case even if controller-2 is the last node in the cluster.", "Resource 'ip-172.17.1.17' is running on node controller-0.", "Moving VIP ip-172.17.4.41 on another node", "Warning: Creating location constraint cli-ban-ip-172.17.4.41-on-controller-2 with a score of -INFINITY for resource ip-172.17.4.41 on node controller-2.", "This will prevent ip-172.17.4.41 from running on controller-2 until the constraint is removed. This will be the case even if controller-2 is the last node in the cluster.", "Resource 'ip-172.17.4.41' is running on node controller-1.", "Removing the location constraints that were created to move the VIPs", "Removing location ban for VIP ip-10.0.0.101", "Removing location ban for VIP ip-172.17.1.17", "Removing location ban for VIP ip-172.17.4.41"]} 2019-09-08 10:31:08 | 2019-09-08 10:31:08 | TASK [Stop pacemaker cluster] ************************************************** 2019-09-08 10:31:08 | Sunday 08 September 2019 10:30:40 +0000 (0:00:11.523) 0:51:27.820 ****** 2019-09-08 10:31:08 | changed: [controller-2] => {"changed": true, "out": "offline"} 2019-09-08 10:31:39 |
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days