Bug 1749447 - nova-compute does not send Basic.Publish to conductor and service times out in Red Hat OpenStack Platform 13
Summary: nova-compute does not send Basic.Publish to conductor and service times out i...
Keywords:
Status: CLOSED DUPLICATE of bug 1733930
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Matthew Booth
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-05 15:58 UTC by Andreas Karis
Modified: 2023-03-24 15:25 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-06 13:35:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 677736 0 'None' MERGED libvirt: Fix service-wide pauses caused by un-proxied libvirt calls 2020-11-30 20:56:51 UTC
Red Hat Issue Tracker OSP-23682 0 None None None 2023-03-24 15:25:56 UTC

Description Andreas Karis 2019-09-05 15:58:45 UTC
Description of problem:
nova-compute does not send Basic.Publish to conductor and service times out in Red Hat OpenStack Platform

In the entire cloud, we see that all compute services flap from up to down, regularly.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

We can see that the service status normally updates every 10 seconds but between 17:03:00 and 17:04:20, we receive no updates for compute-5:
~~~
[akaris@wks-akaris packet_capture]$ head -3 nova_service_list.b.txt  | cut -b 1-180
Wed Sep  4 16:54:07 UTC 2019
+--------------------------------------+------------------+------------------------------------------------------+----------+---------+-------+----------------------------+--------
| Id                                   | Binary           | Host                                                 | Zone     | Status  | State | Updated_at                 | Disable
(...)
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:01:10.000000 | -      
Wed Sep  4 17:01:18 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:01:10.000000 | -      
Wed Sep  4 17:01:25 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:01:20.000000 | -      
Wed Sep  4 17:01:32 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:01:30.000000 | -      
Wed Sep  4 17:01:38 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:01:30.000000 | -      
Wed Sep  4 17:01:45 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:01:40.000000 | -      
Wed Sep  4 17:01:52 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:01:50.000000 | -      
Wed Sep  4 17:01:58 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:01:50.000000 | -      
Wed Sep  4 17:02:05 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:00.000000 | -      
Wed Sep  4 17:02:11 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:10.000000 | -      
Wed Sep  4 17:02:18 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:10.000000 | -      
Wed Sep  4 17:02:24 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:20.000000 | -      
Wed Sep  4 17:02:31 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:30.000000 | -      
Wed Sep  4 17:02:37 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:30.000000 | -      
Wed Sep  4 17:02:44 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:40.000000 | -      
Wed Sep  4 17:02:50 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:50.000000 | -      
Wed Sep  4 17:02:57 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:02:50.000000 | -      
Wed Sep  4 17:03:03 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:03:10 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:03:16 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:03:23 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:03:29 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:03:36 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:03:42 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:03:49 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:03:55 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:04:02 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | down  | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:04:09 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | down  | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:04:16 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | down  | 2019-09-04T17:03:00.000000 | -      
Wed Sep  4 17:04:22 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:04:20.000000 | -      
Wed Sep  4 17:04:29 UTC 2019
| 051d586e-e5ca-4eba-b3f4-56101081a3df | nova-compute     | ovsdpdk-5.localdomain | nova     | enabled | up    | 2019-09-04T17:04:30.000000 | - 
(...)
~~~

We see that something goes wrong on the compute node which doesn't send the update. Let's filter more specifically for the update string:
~~~
[akaris@wks-akaris packet_capture]$ tshark -tud -nn -r ovsdpdk-5.bond_api.b.pcap00 'tcp contains 051d586e-e5ca-4eba-b3f4-56101081a3df' | cut -b 1-110,200-300 
   48 2019-09-04 16:53:36.519548 0.000000 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
  233 2019-09-04 16:53:46.527159 10.007611 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
  432 2019-09-04 16:53:56.521869 9.994710 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
  865 2019-09-04 16:54:12.163059 15.641190 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 1069 2019-09-04 16:54:22.219429 10.056370 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 1420 2019-09-04 16:54:32.215538 9.996109 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 1605 2019-09-04 16:54:42.222844 10.007306 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 1789 2019-09-04 16:54:52.218813 9.995969 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 1972 2019-09-04 16:55:02.215341 9.996528 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 2154 2019-09-04 16:55:12.211146 9.995805 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 2341 2019-09-04 16:55:22.207132 9.995986 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 2538 2019-09-04 16:55:32.205246 9.998114 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 4147 2019-09-04 16:56:49.573234 77.367988 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 4492 2019-09-04 16:56:59.573441 10.000207 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 4689 2019-09-04 16:57:09.577283 10.003842 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 4871 2019-09-04 16:57:19.582084 10.004801 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 5058 2019-09-04 16:57:29.587445 10.005361 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 5246 2019-09-04 16:57:39.594848 10.007403 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 5470 2019-09-04 16:57:49.591466 9.996618 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 5949 2019-09-04 16:58:06.860078 17.268612 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 6291 2019-09-04 16:58:25.234476 18.374398 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 6657 2019-09-04 16:58:43.998619 18.764143 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 7039 2019-09-04 16:58:53.993563 9.994944 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 7549 2019-09-04 16:59:20.874215 26.880652 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 7898 2019-09-04 16:59:30.872210 9.997995 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 3c056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
 8090 2019-09-04 16:59:40.877245 10.005035 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 8273 2019-09-04 16:59:50.881285 10.004040 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 8461 2019-09-04 17:00:00.887092 10.005807 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 8782 2019-09-04 17:00:10.893754 10.006662 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 8968 2019-09-04 17:00:20.898925 10.005171 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 9163 2019-09-04 17:00:30.904088 10.005163 aa:bb:cc:dd:ee:ff 11:22:33:44:55:66 192.168.100.37 192.168.100.25 32056 12 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 9449 2019-09-04 17:00:40.907867 10.003779 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 9634 2019-09-04 17:00:50.911825 10.003958 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 9817 2019-09-04 17:01:00.916945 10.005120 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
 9999 2019-09-04 17:01:10.921319 10.004374 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
10182 2019-09-04 17:01:20.916708 9.995389 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
10369 2019-09-04 17:01:30.920120 10.003412 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
10560 2019-09-04 17:01:40.916439 9.996319 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
10826 2019-09-04 17:01:50.911884 9.995445 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
11129 2019-09-04 17:02:00.915896 10.004012 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
11455 2019-09-04 17:02:10.922269 10.006373 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
11643 2019-09-04 17:02:20.926652 10.004383 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
11829 2019-09-04 17:02:30.934475 10.007823 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
12014 2019-09-04 17:02:40.930769 9.996294 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
12199 2019-09-04 17:02:50.927600 9.996831 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
12398 2019-09-04 17:03:00.922429 9.994829 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
14026 2019-09-04 17:04:20.507236 79.584807 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
14377 2019-09-04 17:04:30.504271 9.997035 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
14566 2019-09-04 17:04:40.512945 10.008674 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
14752 2019-09-04 17:04:50.507646 9.994701 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
14935 2019-09-04 17:05:00.512634 10.004988 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
15116 2019-09-04 17:05:10.520156 10.007522 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
15306 2019-09-04 17:05:20.518651 9.998495 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
15502 2019-09-04 17:05:30.516064 9.997413 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
15907 2019-09-04 17:05:52.404665 21.888601 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
16402 2019-09-04 17:06:11.158801 18.754136 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
16751 2019-09-04 17:06:21.157077 9.998276 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
16942 2019-09-04 17:06:31.152929 9.995852 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
17126 2019-09-04 17:06:41.153814 10.000885 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
17336 2019-09-04 17:06:51.150523 9.996709 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
17533 2019-09-04 17:07:01.159175 10.008652 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
17712 2019-09-04 17:07:11.158986 9.999811 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
18069 2019-09-04 17:07:29.937111 18.778125 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
~~~

And our issue is here:
~~~
11829 2019-09-04 17:02:30.934475 10.007823 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
12014 2019-09-04 17:02:40.930769 9.996294 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
12199 2019-09-04 17:02:50.927600 9.996831 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
12398 2019-09-04 17:03:00.922429 9.994829 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
14026 2019-09-04 17:04:20.507236 79.584807 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
14377 2019-09-04 17:04:30.504271 9.997035 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
14566 2019-09-04 17:04:40.512945 10.008674 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 32056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applica
14752 2019-09-04 17:04:50.507646 9.994701 aa:bb:cc:dd:ee:ff 22:22:22:22:22:22 192.168.100.37 192.168.100.32 3c056 36 Basic.Publish x=nova rk=conductor Content-Header type=application/json Content-Body  (applicat
~~~

That matchces the update intervals as reported by nova service-list.

Comment 2 Andreas Karis 2019-09-05 15:59:52 UTC
During that time period, The nova-compute.log shows:

2019-09-04 17:02:53.230 1 DEBUG oslo_service.periodic_task [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running periodic task ComputeManager.update_available_resource run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2019-09-04 17:02:53.248 1 DEBUG nova.compute.resource_tracker [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Auditing locally available compute resources for dcvbgfdellpod-overcloud-computeovsdpdk-5.localdomain (node: dcvbgfdellpod-overcloud-computeovsdpdk-5.localdomain) update_available_resource /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:689
2019-09-04 17:03:07.133 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-09-04 17:03:07.324 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] CMD "/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share" returned: 0 in 0.191s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-09-04 17:03:07.325 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-09-04 17:03:07.588 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] CMD "/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share" returned: 0 in 0.263s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-09-04 17:04:20.299 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/69695e7c-e285-4ade-b592-05014234676d/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-09-04 17:04:20.581 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] CMD "/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/69695e7c-e285-4ade-b592-05014234676d/disk --force-share" returned: 0 in 0.282s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-09-04 17:04:20.582 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/69695e7c-e285-4ade-b592-05014234676d/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-09-04 17:04:20.876 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] CMD "/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/69695e7c-e285-4ade-b592-05014234676d/disk --force-share" returned: 0 in 0.294s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409

Comment 4 Andreas Karis 2019-09-05 16:02:09 UTC
nova compute sends status reports every 10 seconds via AMQP which look similar to this, but these messages are not sent out any more (I masked the hypervisor name in the below to ovsdpdk-5.localdomain):
~~~
"{\"_context_domain\": null, \"_msg_id\": \"88ffd4f05e214cfba46899f44b50b67e\", \"_context_global_request_id\": null, \"_context_quota_class\": null, \"_context_read_only\": false, \"_context_request_id\": \"req-c6ecaa26-d99b-436e-a641-48e998492709\", \"_context_service_catalog\": [], \"args\": {\"objmethod\": \"save\", \"args\": [], \"objinst\": {\"nova_object.version\": \"1.22\", \"nova_object.changes\": [\"report_count\"], \"nova_object.name\": \"Service\", \"nova_object.data\": {\"binary\": \"nova-compute\", \"uuid\": \"051d586e-e5ca-4eba-b3f4-56101081a3df\", \"deleted\": false, \"created_at\": \"2019-08-26T22:57:15Z\", \"updated_at\": \"2019-09-04T16:59:50Z\", \"report_count\": 53574, \"topic\": \"compute\", \"host\": \"ovsdpdk-5.localdomain\", \"version\": 30, \"disabled\": false, \"forced_down\": false, \"last_seen_up\": \"2019-09-04T16:59:50Z\", \"deleted_at\": null, \"disabled_reason\": null, \"id\": 185}, \"nova_object.namespace\": \"nova\"}, \"kwargs\": {}}, \"_unique_id\": \"02e250c11d264100b92c4f33e03fc25f\", \"_context_resource_uuid\": null, \"_context_instance_lock_checked\": false, \"_context_is_admin_project\": true, \"_context_user\": null, \"_context_user_id\": null, \"_context_project_name\": null, \"_context_read_deleted\": \"no\", \"_context_user_identity\": \"- - - - -\", \"_reply_q\": \"reply_301d63b27bf147b8a3bb3ff2556486b1\", \"_context_auth_token\": null, \"_context_show_deleted\": false, \"_context_tenant\": null, \"_context_roles\": [], \"_context_is_admin\": true, \"version\": \"3.0\", \"_context_project_id\": null, \"_context_project_domain\": null, \"_context_timestamp\": \"2019-09-04T15:26:28.894135\", \"_context_user_domain\": null, \"_context_user_name\": null, \"method\": \"object_action\", \"_context_project\": null, \"_context_remote_address\": null}"
~~~

Comment 5 Andreas Karis 2019-09-05 16:03:59 UTC
The nova log entries might hint at:
https://github.com/openstack/nova/blob/master/nova/virt/images.py#L74

And:
https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py#L47

But I think now that this looks like a Red Herring.

We can further see that the nova process goes dead silent and the only thing that passes over the socket is , for most of the time, TCP keepalives. The heartbeat mechanism for oslo still works, too ... however, I suspect that the nova thread responsible for updating the service hangs.

Comment 6 Andreas Karis 2019-09-05 16:05:06 UTC
cat ./sos_commands/docker/docker_ps
CONTAINER ID        IMAGE                                                                                          COMMAND             CREATED             STATUS                       PORTS               NAMES
371e235371da        10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-neutron-openvswitch-agent:13.0-90   "kolla_start"       8 days ago          Up 8 days (healthy)                              neutron_ovs_agent
8a16bf665b92        10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-nova-compute:13.0-98                "kolla_start"       8 days ago          Up About an hour (healthy)                       nova_compute
0a5c0e25091c        10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-cron:13.0-90                        "kolla_start"       8 days ago          Up 8 days                                        logrotate_crond
c0daf6a5ddd0        10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-ceilometer-compute:13.0-82          "kolla_start"       8 days ago          Up 8 days                                        ceilometer_agent_compute
e4677ab3e60a        10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-nova-compute:13.0-98                "kolla_start"       8 days ago          Up About an hour (healthy)                       nova_migration_target
e3ae860918a3        10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-iscsid:13.0-82                      "kolla_start"       8 days ago          Up 8 days (healthy)                              iscsid
fed9252ce94e        10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-nova-libvirt:13.0-101               "kolla_start"       8 days ago          Up About an hour                                 nova_libvirt
2e2b67b81a63        10.243.0.5:8787/ven-ericsson-osp13_containers_for_vbgf_pod-nova-libvirt:13.0-101               "kolla_start"       8 days ago          Up About an hour                                 nova_virtlogd

Comment 12 Matthew Booth 2019-09-06 13:18:50 UTC
At first glance this looks very much like a symptom of https://bugs.launchpad.net/nova/+bug/1840912, which I already have an upstream patch posted for. I'm still pushing this.

Comment 14 Matthew Booth 2019-09-06 13:35:16 UTC

*** This bug has been marked as a duplicate of bug 1733930 ***

Comment 15 Andreas Karis 2019-09-06 16:32:38 UTC
Here we can see consecutive log messages in resource tracker separated by over a minute.

2019-09-04 17:00:36.434 1 DEBUG nova.compute.resource_tracker [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Auditing locally available compute resources for overcloud-compute
ovsdpdk-5.localdomain (node: overcloud-computeovsdpdk-5.localdomain) update_available_resource /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:689

2019-09-04 17:01:49.026 1 DEBUG oslo_concurrency.processutils [req-220c716c-3175-444e-86c1-b4f44497cdf5 - - - - -] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073
741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/3962a175-766a-4a07-a38f-6854c1e46326/disk --force-share execute /usr/lib/python2.7/site-packages/oslo_concurrency/proc
essutils.py:372


Note You need to log in before you can comment on or make changes to this bug.