Description of problem: nova-compute does not send Basic.Publish to conductor and service times out in Red Hat OpenStack Platform In the entire cloud, we see that all compute services flap from up to down, regularly. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: We can see that the service status normally updates every 10 seconds but between 17:03:00 and 17:04:20, we receive no updates for compute-5: ~~~ [akaris@wks-akaris packet_capture]$ head -3 nova_service_list.b.txt | cut -b 1-180 Wed Sep 4 16:54:07 UTC 2019 +--------------------------------------+------------------+------------------------------------------------------+----------+---------+-------+----------------------------+-------- | Id | Binary | Host | Zone | Status | State | Updated_at | Disable (...) | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:01:10.000000 | - Wed Sep 4 17:01:18 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:01:10.000000 | - Wed Sep 4 17:01:25 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:01:20.000000 | - Wed Sep 4 17:01:32 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:01:30.000000 | - Wed Sep 4 17:01:38 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:01:30.000000 | - Wed Sep 4 17:01:45 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:01:40.000000 | - Wed Sep 4 17:01:52 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:01:50.000000 | - Wed Sep 4 17:01:58 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:01:50.000000 | - Wed Sep 4 17:02:05 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:00.000000 | - Wed Sep 4 17:02:11 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:10.000000 | - Wed Sep 4 17:02:18 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:10.000000 | - Wed Sep 4 17:02:24 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:20.000000 | - Wed Sep 4 17:02:31 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:30.000000 | - Wed Sep 4 17:02:37 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:30.000000 | - Wed Sep 4 17:02:44 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:40.000000 | - Wed Sep 4 17:02:50 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:50.000000 | - Wed Sep 4 17:02:57 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:02:50.000000 | - Wed Sep 4 17:03:03 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:03:10 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:03:16 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:03:23 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:03:29 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:03:36 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:03:42 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:03:49 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:03:55 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:04:02 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | down | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:04:09 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | down | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:04:16 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | down | 2019-09-04T17:03:00.000000 | - Wed Sep 4 17:04:22 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:04:20.000000 | - Wed Sep 4 17:04:29 UTC 2019 | 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute | ovsdpdk-5.localdomain | nova | enabled | up | 2019-09-04T17:04:30.000000 | - (...) ~~~ We see that something goes wrong on the compute node which doesn't send the update. Let's filter more specifically for the update string: ~~~ [akaris@wks-akaris packet_capture]$ tshark -tud -nn -r ovsdpdk-5.bond_api.b.pcap00 'tcp contains 051d586e-e5ca-4eba-b3f4-56101081a3df' | cut -b 1-110,200-300 48 2019-09-04 16:53:36.519548 0.000000 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 233 2019-09-04 16:53:46.527159 10.007611 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 432 2019-09-04 16:53:56.521869 9.994710 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 865 2019-09-04 16:54:12.163059 15.641190 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 1069 2019-09-04 16:54:22.219429 10.056370 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 1420 2019-09-04 16:54:32.215538 9.996109 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 1605 2019-09-04 16:54:42.222844 10.007306 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 1789 2019-09-04 16:54:52.218813 9.995969 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 1972 2019-09-04 16:55:02.215341 9.996528 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 2154 2019-09-04 16:55:12.211146 9.995805 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 2341 2019-09-04 16:55:22.207132 9.995986 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 2538 2019-09-04 16:55:32.205246 9.998114 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 4147 2019-09-04 16:56:49.573234 77.367988 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 4492 2019-09-04 16:56:59.573441 10.000207 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 4689 2019-09-04 16:57:09.577283 10.003842 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 4871 2019-09-04 16:57:19.582084 10.004801 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 5058 2019-09-04 16:57:29.587445 10.005361 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 5246 2019-09-04 16:57:39.594848 10.007403 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 5470 2019-09-04 16:57:49.591466 9.996618 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 5949 2019-09-04 16:58:06.860078 17.268612 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 6291 2019-09-04 16:58:25.234476 18.374398 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 6657 2019-09-04 16:58:43.998619 18.764143 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 7039 2019-09-04 16:58:53.993563 9.994944 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 7549 2019-09-04 16:59:20.874215 26.880652 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 7898 2019-09-04 16:59:30.872210 9.997995 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 8090 2019-09-04 16:59:40.877245 10.005035 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 8273 2019-09-04 16:59:50.881285 10.004040 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 8461 2019-09-04 17:00:00.887092 10.005807 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 8782 2019-09-04 17:00:10.893754 10.006662 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 8968 2019-09-04 17:00:20.898925 10.005171 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 9163 2019-09-04 17:00:30.904088 10.005163 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 9449 2019-09-04 17:00:40.907867 10.003779 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 9634 2019-09-04 17:00:50.911825 10.003958 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 9817 2019-09-04 17:01:00.916945 10.005120 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 9999 2019-09-04 17:01:10.921319 10.004374 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 10182 2019-09-04 17:01:20.916708 9.995389 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 10369 2019-09-04 17:01:30.920120 10.003412 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 10560 2019-09-04 17:01:40.916439 9.996319 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 10826 2019-09-04 17:01:50.911884 9.995445 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 11129 2019-09-04 17:02:00.915896 10.004012 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 11455 2019-09-04 17:02:10.922269 10.006373 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 11643 2019-09-04 17:02:20.926652 10.004383 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 11829 2019-09-04 17:02:30.934475 10.007823 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 12014 2019-09-04 17:02:40.930769 9.996294 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 12199 2019-09-04 17:02:50.927600 9.996831 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 12398 2019-09-04 17:03:00.922429 9.994829 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 14026 2019-09-04 17:04:20.507236 79.584807 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 14377 2019-09-04 17:04:30.504271 9.997035 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 14566 2019-09-04 17:04:40.512945 10.008674 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 14752 2019-09-04 17:04:50.507646 9.994701 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 14935 2019-09-04 17:05:00.512634 10.004988 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 15116 2019-09-04 17:05:10.520156 10.007522 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 15306 2019-09-04 17:05:20.518651 9.998495 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 15502 2019-09-04 17:05:30.516064 9.997413 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 15907 2019-09-04 17:05:52.404665 21.888601 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 16402 2019-09-04 17:06:11.158801 18.754136 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 16751 2019-09-04 17:06:21.157077 9.998276 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 16942 2019-09-04 17:06:31.152929 9.995852 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 17126 2019-09-04 17:06:41.153814 10.000885 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 17336 2019-09-04 17:06:51.150523 9.996709 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 17533 2019-09-04 17:07:01.159175 10.008652 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 17712 2019-09-04 17:07:11.158986 9.999811 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 18069 2019-09-04 17:07:29.937111 18.778125 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica ~~~ And our issue is here: ~~~ 11829 2019-09-04 17:02:30.934475 10.007823 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 12014 2019-09-04 17:02:40.930769 9.996294 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 12199 2019-09-04 17:02:50.927600 9.996831 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 12398 2019-09-04 17:03:00.922429 9.994829 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 14026 2019-09-04 17:04:20.507236 79.584807 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 14377 2019-09-04 17:04:30.504271 9.997035 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat 14566 2019-09-04 17:04:40.512945 10.008674 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applica 14752 2019-09-04 17:04:50.507646 9.994701 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body (applicat ~~~ That matchces the update intervals as reported by nova service-list.
During that time period, The nova-compute.log shows: 2019-09-04 17:02:53.230 1 DEBUG oslo_service.periodic_task [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running periodic task ComputeManager.update_available_resource run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215 2019-09-04 17:02:53.248 1 DEBUG nova.compute.resource_tracker [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Auditing locally available compute resources for dcvbgfdellpod-overcloud-computeovsdpdk-5.localdomain (node: dcvbgfdellpod-overcloud-computeovsdpdk-5.localdomain) update_available_resource /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:689 2019-09-04 17:03:07.133 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-09-04 17:03:07.324 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] CMD "/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share" returned: 0 in 0.191s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-09-04 17:03:07.325 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-09-04 17:03:07.588 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] CMD "/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share" returned: 0 in 0.263s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-09-04 17:04:20.299 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/69695e7c-e285-4ade-b592-05014234676d/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-09-04 17:04:20.581 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] CMD "/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/69695e7c-e285-4ade-b592-05014234676d/disk --force-share" returned: 0 in 0.282s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-09-04 17:04:20.582 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/69695e7c-e285-4ade-b592-05014234676d/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-09-04 17:04:20.876 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] CMD "/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/69695e7c-e285-4ade-b592-05014234676d/disk --force-share" returned: 0 in 0.294s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
nova compute sends status reports every 10 seconds via AMQP which look similar to this, but these messages are not sent out any more (I masked the hypervisor name in the below to ovsdpdk-5.localdomain): ~~~ "{\"_context_domain\": null, \"_msg_id\": \"88ffd4f05e214cfba46899f44b50b67e\", \"_context_global_request_id\": null, \"_context_quota_class\": null, \"_context_read_only\": false, \"_context_request_id\": \"req-c6ecaa26-d99b-436e-a641-48e998492709\", \"_context_service_catalog\": [], \"args\": {\"objmethod\": \"save\", \"args\": [], \"objinst\": {\"nova_object.version\": \"1.22\", \"nova_object.changes\": [\"report_count\"], \"nova_object.name\": \"Service\", \"nova_object.data\": {\"binary\": \"nova-compute\", \"uuid\": \"051d586e-e5ca-4eba-b3f4-56101081a3df\", \"deleted\": false, \"created_at\": \"2019-08-26T22:57:15Z\", \"updated_at\": \"2019-09-04T16:59:50Z\", \"report_count\": 53574, \"topic\": \"compute\", \"host\": \"ovsdpdk-5.localdomain\", \"version\": 30, \"disabled\": false, \"forced_down\": false, \"last_seen_up\": \"2019-09-04T16:59:50Z\", \"deleted_at\": null, \"disabled_reason\": null, \"id\": 185}, \"nova_object.namespace\": \"nova\"}, \"kwargs\": {}}, \"_unique_id\": \"02e250c11d264100b92c4f33e03fc25f\", \"_context_resource_uuid\": null, \"_context_instance_lock_checked\": false, \"_context_is_admin_project\": true, \"_context_user\": null, \"_context_user_id\": null, \"_context_project_name\": null, \"_context_read_deleted\": \"no\", \"_context_user_identity\": \"- - - - -\", \"_reply_q\": \"reply_301d63b27bf147b8a3bb3ff2556486b1\", \"_context_auth_token\": null, \"_context_show_deleted\": false, \"_context_tenant\": null, \"_context_roles\": [], \"_context_is_admin\": true, \"version\": \"3.0\", \"_context_project_id\": null, \"_context_project_domain\": null, \"_context_timestamp\": \"2019-09-04T15:26:28.894135\", \"_context_user_domain\": null, \"_context_user_name\": null, \"method\": \"object_action\", \"_context_project\": null, \"_context_remote_address\": null}" ~~~
The nova log entries might hint at: https://github.com/openstack/nova/blob/master/nova/virt/images.py#L74 And: https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py#L47 But I think now that this looks like a Red Herring. We can further see that the nova process goes dead silent and the only thing that passes over the socket is , for most of the time, TCP keepalives. The heartbeat mechanism for oslo still works, too ... however, I suspect that the nova thread responsible for updating the service hangs.
cat ./sos_commands/docker/docker_ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 371e235371da 10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-neutron-openvswitch-agent:13.0-90 "kolla_start" 8 days ago Up 8 days (healthy) neutron_ovs_agent 8a16bf665b92 10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-nova-compute:13.0-98 "kolla_start" 8 days ago Up About an hour (healthy) nova_compute 0a5c0e25091c 10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-cron:13.0-90 "kolla_start" 8 days ago Up 8 days logrotate_crond c0daf6a5ddd0 10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-ceilometer-compute:13.0-82 "kolla_start" 8 days ago Up 8 days ceilometer_agent_compute e4677ab3e60a 10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-nova-compute:13.0-98 "kolla_start" 8 days ago Up About an hour (healthy) nova_migration_target e3ae860918a3 10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-iscsid:13.0-82 "kolla_start" 8 days ago Up 8 days (healthy) iscsid fed9252ce94e 10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-nova-libvirt:13.0-101 "kolla_start" 8 days ago Up About an hour nova_libvirt 2e2b67b81a63 10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-nova-libvirt:13.0-101 "kolla_start" 8 days ago Up About an hour nova_virtlogd
At first glance this looks very much like a symptom of https://bugs.launchpad.net/nova/+bug/1840912, which I already have an upstream patch posted for. I'm still pushing this.
*** This bug has been marked as a duplicate of bug 1733930 ***
Here we can see consecutive log messages in resource tracker separated by over a minute. 2019-09-04 17:00:36.434 1 DEBUG nova.compute.resource_tracker [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Auditing locally available compute resources for overcloud-compute ovsdpdk-5.localdomain (node: overcloud-computeovsdpdk-5.localdomain) update_available_resource /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:689 2019-09-04 17:01:49.026 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073 741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/proc essutils.py:372