Bug 1560683

Summary: All OVS threads restart and lead to port flaps in openvswitch-2.6.1-26
Product: Red Hat OpenStack Reporter: Andreas Karis <akaris>
Component: openvswitchAssignee: Open vSwitch development team <ovs-team>
Status: CLOSED NOTABUG QA Contact: Ofer Blaut <oblaut>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: apevec, chrisw, fleitner, marjones, rhos-maint, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-27 14:14:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2018-03-26 18:16:29 UTC
Description of problem:
All OVS threads restart and lead to port flaps in openvswitch-2.6.1-26

Version-Release number of selected component (if applicable):
~~~
[akaris@collab-shell compute-29]$ grep openvswitch installed-rpms 
openstack-neutron-openvswitch-9.4.1-1.el7ost.noarch         Wed Sep 20 17:21:21 2017
openvswitch-2.6.1-26.git20180130.el7.x86_64                 Sun Mar 18 09:05:29 2018
openvswitch-debuginfo-2.6.1-26.git20180130.el7.x86_64       Sun Mar 18 09:05:34 2018
openvswitch-ovn-central-2.6.1-26.git20180130.el7.x86_64     Sun Mar 18 09:05:29 2018
openvswitch-ovn-common-2.6.1-26.git20180130.el7.x86_64      Sun Mar 18 09:05:29 2018
openvswitch-ovn-host-2.6.1-26.git20180130.el7.x86_64        Sun Mar 18 09:05:29 2018
openvswitch-test-2.6.1-26.git20180130.el7.noarch            Sun Mar 18 09:05:29 2018
python-openvswitch-2.6.1-26.git20180130.el7.noarch          Sun Mar 18 09:05:29 2018
~~~


Additional info:
No segfaults.
~~~
[akaris@collab-shell compute-29]$ grep -i segf var/log/messages
[akaris@collab-shell compute-29]$ head -1 !$
head -1 var/log/messages
Mar 25 03:38:01 compute-29 rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="1572" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
[akaris@collab-shell compute-29]$ tail -1 !$
tail -1 var/log/messages
Mar 26 16:32:43 compute-29 kernel: nr_pdflush_threads exported in /proc is scheduled for removal
~~~

~~~
[akaris@collab-shell compute-29]$ grep ovs-vswitchd ./sos_commands/process/ps_auxwwwm
root      273328 1603  0.0 987590176 360532 ?    -    Mar18 191746:56 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
[akaris@collab-shell compute-29]$ grep ovsdb-server ./sos_commands/process/ps_auxwwwm
root      273292  0.0  0.0  51976  4120 ?        -    Mar18   5:46 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
~~~

Looks like all the PMDs restarted .. Customer posted this at 14:06 EDT, 18:06 UTC. 3 hours 8 minutes before that was 14:58. That matches with the log entries for 14:57:
~~~
[root@det1-compute-29 openvswitch]# ps -T -p 273328
    PID    SPID TTY          TIME CMD
 273328  273328 ?        05:37:32 ovs-vswitchd
 273328  273329 ?        00:00:00 vfio-sync
 273328  273547 ?        00:00:00 eal-intr-thread
 273328  273548 ?        00:00:00 lcore-slave-1
 273328  273549 ?        00:00:00 lcore-slave-2
 273328  273550 ?        00:00:00 lcore-slave-3
 273328  273551 ?        00:00:00 lcore-slave-44
 273328  273552 ?        00:00:00 lcore-slave-45
 273328  273553 ?        00:00:00 lcore-slave-46
 273328  273554 ?        00:00:00 lcore-slave-47
 273328  273555 ?        00:00:02 dpdk_watchdog2
 273328  273556 ?        00:00:07 vhost_thread1
 273328  273557 ?        00:00:00 ct_clean3
 273328  273559 ?        01:39:57 urcu4
 273328  273634 ?        00:00:00 handler18
 273328  273635 ?        01:07:49 revalidator17
 273328  289036 ?        03:07:39 pmd230
 273328  289037 ?        03:07:39 pmd227
 273328  289038 ?        03:07:39 pmd228
 273328  289039 ?        03:07:39 pmd229
 273328  289040 ?        03:07:39 pmd231
 273328  289041 ?        03:07:39 pmd232
 273328  289042 ?        03:07:39 pmd233
 273328  289043 ?        03:07:39 pmd234
 273328  289044 ?        03:07:39 pmd235
 273328  289045 ?        03:07:39 pmd236
 273328  289046 ?        03:07:39 pmd237
 273328  289047 ?        03:07:39 pmd238
 273328  289048 ?        03:07:39 pmd239
 273328  289049 ?        03:07:39 pmd240
 273328  289050 ?        03:07:39 pmd241
 273328  289051 ?        03:07:39 pmd242
[root@det1-compute-29 openvswitch]#
~~~

PMD threads restarted at these times:
~~~
[akaris@collab-shell compute-29]$ grep 'Created' var/log/openvswitch/ovs-vswitchd.log | grep threads
2018-03-26T06:11:33.069Z|00453|dpif_netdev|INFO|Created 12 pmd threads on numa node 1
2018-03-26T06:11:33.085Z|00454|dpif_netdev|INFO|Created 4 pmd threads on numa node 0
2018-03-26T06:11:33.173Z|00455|dpif_netdev|INFO|Created 12 pmd threads on numa node 1
2018-03-26T06:11:33.189Z|00456|dpif_netdev|INFO|Created 4 pmd threads on numa node 0
2018-03-26T09:20:49.071Z|00457|dpif_netdev|INFO|Created 12 pmd threads on numa node 1
2018-03-26T09:20:49.087Z|00458|dpif_netdev|INFO|Created 4 pmd threads on numa node 0
2018-03-26T14:46:18.329Z|00459|dpif_netdev|INFO|Created 12 pmd threads on numa node 1
2018-03-26T14:46:18.348Z|00460|dpif_netdev|INFO|Created 4 pmd threads on numa node 0
2018-03-26T14:46:18.406Z|00461|dpif_netdev|INFO|Created 12 pmd threads on numa node 1
2018-03-26T14:46:18.422Z|00462|dpif_netdev|INFO|Created 4 pmd threads on numa node 0
2018-03-26T14:47:23.021Z|00463|dpif_netdev|INFO|Created 12 pmd threads on numa node 1
2018-03-26T14:47:23.038Z|00464|dpif_netdev|INFO|Created 4 pmd threads on numa node 0
2018-03-26T14:47:23.102Z|00465|dpif_netdev|INFO|Created 12 pmd threads on numa node 1
2018-03-26T14:47:23.118Z|00466|dpif_netdev|INFO|Created 4 pmd threads on numa node 0
2018-03-26T14:57:12.333Z|00467|dpif_netdev|INFO|Created 12 pmd threads on numa node 1
2018-03-26T14:57:12.349Z|00468|dpif_netdev|INFO|Created 4 pmd threads on numa node 0
[akaris@collab-shell det1-compute-29]$ 
~~~

From the logs:
~~~
2018-03-26T14:57:12.199Z|01080|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'enabled'
2018-03-26T14:57:12.199Z|01081|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01082|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01083|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01084|dpdk(vhost_thread1)|INFO|State of queue 8 ( tx_qid 4 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01085|dpdk(vhost_thread1)|INFO|State of queue 10 ( tx_qid 5 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01086|dpdk(vhost_thread1)|INFO|State of queue 12 ( tx_qid 6 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01087|dpdk(vhost_thread1)|INFO|State of queue 14 ( tx_qid 7 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01088|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'enabled'
2018-03-26T14:57:12.199Z|01089|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01090|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01091|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01092|dpdk(vhost_thread1)|INFO|State of queue 8 ( tx_qid 4 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01093|dpdk(vhost_thread1)|INFO|State of queue 10 ( tx_qid 5 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01094|dpdk(vhost_thread1)|INFO|State of queue 12 ( tx_qid 6 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.199Z|01095|dpdk(vhost_thread1)|INFO|State of queue 14 ( tx_qid 7 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'disabled'
2018-03-26T14:57:12.204Z|01096|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhuab336177-86'changed to 'enabled'
2018-03-26T14:57:12.205Z|01098|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu93923256-61'changed to 'enabled'
~~~

Regards,

Andreas

Comment 7 Flavio Leitner 2018-03-26 19:06:57 UTC
OVS recreates the PMD threads when the DP needs to be reconfigured, like changing pmd affinity or when a port needs to be reconfigured IIRC.
/* Return true if needs to revalidate datapath flows. */
static bool 
dpif_netdev_run(struct dpif *dpif)
{
[...]
    if (dp_netdev_is_reconf_required(dp) || ports_require_restart(dp)) {
>       reconfigure_pmd_threads(dp);
    }


[...]
static void
reconfigure_pmd_threads(struct dp_netdev *dp)
[...]
    dp_netdev_destroy_all_pmds(dp);
[...]
    /* Restores the non-pmd. */
    dp_netdev_set_nonpmd(dp);
    /* Restores all pmd threads. */
>   dp_netdev_reset_pmd_threads(dp);


static void
dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
[...]
   HMAP_FOR_EACH (port, node, &dp->ports) {
        if (netdev_is_pmd(port->netdev)) {
[...]
            HMAPX_FOR_EACH (numa_node, &numas) {
>               dp_netdev_set_pmds_on_numa(dp, (uintptr_t) numa_node->data);
            }


* Starts pmd threads for the numa node 'numa_id', if not already started.
 * The function takes care of filling the threads tx port cache. */
static void
dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id)
[...]
    n_pmds = get_n_pmd_threads_on_numa(dp, numa_id);
    /* If there are already pmd threads created for the numa node
     * in which 'netdev' is on, do nothing.  Else, creates the
     * pmd threads for the numa node. */
    if (!n_pmds) {
[...]
        VLOG_INFO("Created %d pmd threads on numa node %d", can_have, numa_id);


So, the reason could be in dp_netdev_is_reconf_required(dp) or ports_require_restart(dp)

The first function is related to changing pmd masks or port rx's affinity. The second one is related to port's changes, like creating a new one.

Comment 8 Flavio Leitner 2018-03-26 19:07:48 UTC
By port's change, it can be: mtu changes, port add or port remove as well.

Comment 10 Andreas Karis 2018-03-26 19:44:57 UTC
Customer says that this is a production environment and no changes have been applied during that period nor can they install a debug RPM.

Comment 11 Andreas Karis 2018-03-26 22:04:06 UTC
Hi,

I just ran this against a lab:
~~~
[root@overcloud-compute-0 ~]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 28:
	isolated : false
	port: dpdk0	queue-id: 0
	port: vhubfcfb345-a6	queue-id: 3
	port: vhub2ba0cd8-f3	queue-id: 3
	port: vhua3cadd2d-f1	queue-id: 3
	port: vhu565f9267-93	queue-id: 3
pmd thread numa_id 0 core_id 4:
	isolated : true
	port: dpdk1	queue-id: 1
pmd thread numa_id 0 core_id 6:
	isolated : true
	port: dpdk1	queue-id: 2
pmd thread numa_id 0 core_id 2:
	isolated : true
	port: dpdk1	queue-id: 0
pmd thread numa_id 0 core_id 8:
	isolated : true
	port: dpdk1	queue-id: 3
pmd thread numa_id 0 core_id 22:
	isolated : false
	port: vhubfcfb345-a6	queue-id: 0
	port: vhub2ba0cd8-f3	queue-id: 0
	port: vhua3cadd2d-f1	queue-id: 0
	port: vhu565f9267-93	queue-id: 0
pmd thread numa_id 0 core_id 24:
	isolated : false
	port: vhubfcfb345-a6	queue-id: 1
	port: vhub2ba0cd8-f3	queue-id: 1
	port: vhua3cadd2d-f1	queue-id: 1
	port: vhu565f9267-93	queue-id: 1
pmd thread numa_id 0 core_id 26:
	isolated : false
	port: vhubfcfb345-a6	queue-id: 2
	port: vhub2ba0cd8-f3	queue-id: 2
	port: vhua3cadd2d-f1	queue-id: 2
	port: vhu565f9267-93	queue-id: 2
[root@overcloud-compute-0 ~]# ovs-vsctl set interface dpdk0 options:n_rxq=2     other_config:pmd-rxq-affinity="0:26,1:28"
[root@overcloud-compute-0 ~]# 
[root@overcloud-compute-0 ~]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 8:
	isolated : true
	port: dpdk1	queue-id: 3
pmd thread numa_id 0 core_id 22:
	isolated : false
	port: vhubfcfb345-a6	queue-id: 0 2
	port: vhub2ba0cd8-f3	queue-id: 0 2
	port: vhua3cadd2d-f1	queue-id: 0 2
	port: vhu565f9267-93	queue-id: 0 2
pmd thread numa_id 0 core_id 24:
	isolated : false
	port: vhubfcfb345-a6	queue-id: 1 3
	port: vhub2ba0cd8-f3	queue-id: 1 3
	port: vhua3cadd2d-f1	queue-id: 1 3
	port: vhu565f9267-93	queue-id: 1 3
pmd thread numa_id 0 core_id 2:
	isolated : true
	port: dpdk1	queue-id: 0
pmd thread numa_id 0 core_id 26:
	isolated : true
	port: dpdk0	queue-id: 0
pmd thread numa_id 0 core_id 4:
	isolated : true
	port: dpdk1	queue-id: 1
pmd thread numa_id 0 core_id 6:
	isolated : true
	port: dpdk1	queue-id: 2
pmd thread numa_id 0 core_id 28:
	isolated : true
	port: dpdk0	queue-id: 1
[root@overcloud-compute-0 ~]# 
~~~

The only resulting log message is:
~~~
2018-03-26T22:00:39.513Z|00153|dpdk|INFO|Port 0: a0:36:9f:e5:da:80
2018-03-26T22:00:39.532Z|00154|dpif_netdev|INFO|Created 8 pmd threads on numa node 0
~~~

No port flaps enable / disable.

The PMDs arguably *do* restart:
~~~
[root@overcloud-compute-0 ~]# ps -T -p927988 | grep pmd
 927988  946769 ?        00:00:15 pmd246
 927988  946770 ?        00:00:15 pmd247
 927988  946771 ?        00:00:15 pmd248
 927988  946772 ?        00:00:15 pmd249
 927988  946773 ?        00:00:15 pmd250
 927988  946774 ?        00:00:15 pmd251
 927988  946775 ?        00:00:15 pmd253
 927988  946776 ?        00:00:15 pmd252
~~~

The ovs-vsctl set commands do show in the logs though:
~~~
[root@overcloud-compute-0 ~]# journalctl --since today | grep ovs-vsctl | tail 
Mar 26 21:57:11 overcloud-compute-0 ovs-vsctl[946176]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set interface dpdk0 options:n_rxq=4 other_config:pmd-rxq-affinity=0:2,1:4,2:6,3:8
Mar 26 21:57:23 overcloud-compute-0 ovs-vsctl[946189]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set interface dpdk0 options:n_rxq=2 other_config:pmd-rxq-affinity=0:2,1:4
Mar 26 21:58:24 overcloud-compute-0 ovs-vsctl[946286]: ovs|00001|db_ctl_base|ERR|unknown command 'unset'; use --help for help
Mar 26 21:58:29 overcloud-compute-0 ovs-vsctl[946293]: ovs|00001|vsctl|INFO|Called as ovs-vsctl remove interface dpdk0 other_config pmd-rxq-affinity
Mar 26 21:59:27 overcloud-compute-0 ovs-vsctl[946389]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set interface dpdk0 options:n_rxq=2 other_config:pmd-rxq-affinity=0:26,1:28
Mar 26 21:59:58 overcloud-compute-0 ovs-vsctl[946446]: ovs|00001|db_ctl_base|ERR|unknown command 'unset'; use --help for help
Mar 26 22:00:01 overcloud-compute-0 ovs-vsctl[946457]: ovs|00001|vsctl|INFO|Called as ovs-vsctl remove interface dpdk0 other_config pmd-rxq-affinity
Mar 26 22:00:14 overcloud-compute-0 ovs-vsctl[946491]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set interface dpdk0 options:n_rxq=1
Mar 26 22:00:39 overcloud-compute-0 ovs-vsctl[946534]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set interface dpdk0 options:n_rxq=2 other_config:pmd-rxq-affinity=0:26,1:28
Mar 26 22:02:26 overcloud-compute-0 ovs-vsctl[946767]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set interface dpdk0 options:n_rxq=1 other_config:pmd-rxq-affinity=0:26
[root@overcloud-compute-0 ~]# 
~~~

Comment 12 Andreas Karis 2018-03-26 22:14:44 UTC
I just ran an interface attach to an instance:
~~~
NETID1=`neutron net-list | grep ' private ' | awk '{print $2}'`
neutron port-create $NETID1
nova interface-attach --port=9032c6cd-24c5-4e1a-8456-626c6218aeb6 rhel-multiq-1
~~~

~~~
2018-03-26T22:11:27.832Z|00157|dpdk|INFO|Socket /var/run/openvswitch/vhu9032c6cd-24 created for vhost-user port vhu9032c6cd-24
2018-03-26T22:11:27.999Z|00158|bridge|INFO|bridge br-int: added interface vhu9032c6cd-24 on port 7
2018-03-26T22:11:28.248Z|00159|dpif_netdev|INFO|Created 8 pmd threads on numa node 0
2018-03-26T22:11:29.086Z|00057|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'enabled'
2018-03-26T22:11:29.087Z|00058|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'disabled'
2018-03-26T22:11:29.087Z|00059|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'disabled'
2018-03-26T22:11:29.087Z|00060|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'disabled'
2018-03-26T22:11:29.087Z|00061|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'enabled'
2018-03-26T22:11:29.087Z|00062|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'disabled'
2018-03-26T22:11:29.087Z|00063|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'disabled'
2018-03-26T22:11:29.087Z|00064|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'disabled'
2018-03-26T22:11:29.091Z|00065|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'enabled'
2018-03-26T22:11:29.092Z|00066|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhu9032c6cd-24' has been added on numa node 0
2018-03-26T22:11:29.092Z|00067|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'enabled'
2018-03-26T22:11:29.092Z|00068|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'enabled'
2018-03-26T22:11:29.092Z|00069|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'enabled'
2018-03-26T22:11:29.092Z|00070|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhu9032c6cd-24'changed to 'enabled'
2018-03-26T22:11:29.134Z|00160|dpif_netdev|INFO|Created 8 pmd threads on numa node 0
2018-03-26T22:11:39.342Z|00161|connmgr|INFO|br-int<->tcp:127.0.0.1:6633: 8 flow_mods 10 s ago (7 adds, 1 deletes)
~~~

That already looks a bit more like it.


Although with verbose = true and debug = false, in neutron this would yield:
~~~
2018-03-26 22:10:49.382 928228 INFO neutron.agent.securitygroups_rpc [req-6c6cdcb4-241d-4ab5-820d-0e0a28d68187 3c56432361d5467aba2f48350b0dda7b 55a1
755e44c049a3b5a9da108c4b685a - - -] Security group member updated [u'63cea94f-53c6-4fbe-9893-20d93f2efc6d']
2018-03-26 22:10:50.963 928228 INFO neutron.agent.securitygroups_rpc [req-5b5a9318-d011-4681-980a-b0d439535282 - - - - -] Refresh firewall rules
2018-03-26 22:10:51.152 928228 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-5b5a9318-d011-4681-980a-b0d439535282 - - - 
- -] Configuration for devices up [] and devices down [] completed.
2018-03-26 22:11:29.116 928228 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-5b5a9318-d011-4681-980a-b0d439535282 - - - - -] Port 9032c6cd-24c5-4e1a-8456-626c6218aeb6 updated. Details: {u'profile': {}, u'network_qos_policy_id': None, u'qos_policy_id': None, u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id': u'3ad6f556-10fb-4273-bfb5-92a399f43b14', u'segmentation_id': 906, u'device_owner': u'compute:None', u'physical_network': u'dpdk', u'mac_address': u'fa:16:3e:09:d5:2e', u'device': u'9032c6cd-24c5-4e1a-8456-626c6218aeb6', u'port_security_enabled': True, u'port_id': u'9032c6cd-24c5-4e1a-8456-626c6218aeb6', u'fixed_ips': [{u'subnet_id': u'66e1523c-f984-4eb6-80c5-f8ed9f47ee97', u'ip_address': u'192.168.0.101'}, {u'subnet_id': u'b0b89e95-0032-4cb6-8cd4-b125afea377d', u'ip_address': u'2000:192:168:1:f816:3eff:fe09:d52e'}], u'network_type': u'vlan'}
2018-03-26 22:11:29.143 928228 INFO neutron.agent.l2.extensions.qos [req-5b5a9318-d011-4681-980a-b0d439535282 - - - - -] QoS extension did have no information about the port 9032c6cd-24c5-4e1a-8456-626c6218aeb6 that we were trying to reset
2018-03-26 22:11:29.150 928228 INFO neutron.agent.securitygroups_rpc [req-5b5a9318-d011-4681-980a-b0d439535282 - - - - -] Preparing filters for devices set([u'9032c6cd-24c5-4e1a-8456-626c6218aeb6'])
2018-03-26 22:11:29.436 928228 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-5b5a9318-d011-4681-980a-b0d439535282 - - - - -] Configuration for devices up [u'9032c6cd-24c5-4e1a-8456-626c6218aeb6'] and devices down [] completed.
~~~

Comment 13 Andreas Karis 2018-03-26 22:16:08 UTC
libvirt logs don't move in that case, though:

[root@overcloud-compute-0 ~]# cat /var/log/libvirt/qemu/instance-00000021.log  | tail -1
2018-03-26T19:21:25.195204Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/0 (label charserial1)
[root@overcloud-compute-0 ~]# cat /var/log/libvirt/qemu/instance-0000001d.log  | tail -1
2018-03-26T19:21:25.545330Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/1 (label charserial1)
[root@overcloud-compute-0 ~]#

Comment 15 Andreas Karis 2018-03-26 22:29:05 UTC
interface up down configuration doesn't change anything in the logs:
~~~
[root@rhel-multiq-1 ~]# ip link ls dev eth3
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether fa:16:3e:09:d5:2e brd ff:ff:ff:ff:ff:ff
[root@rhel-multiq-1 ~]# ip link set dev eth3 up
[root@rhel-multiq-1 ~]# ip link ls dev eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether fa:16:3e:09:d5:2e brd ff:ff:ff:ff:ff:ff
[root@rhel-multiq-1 ~]# ip link set dev eth3 down
[root@rhel-multiq-1 ~]# 
~~~

~~~
[root@overcloud-compute-0 ~]# tail -f /var/log/openvswitch/ovs-vswitchd.log -n0
~~~

Comment 16 Andreas Karis 2018-03-26 22:29:49 UTC
An instance reboot yields exactly that in my environment:
~~~
2018-03-26T22:22:59.200Z|00071|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhub2ba0cd8-f3' has been removed
2018-03-26T22:23:04.888Z|00072|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:23:04.888Z|00073|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:23:04.888Z|00074|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:23:04.888Z|00075|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:23:04.888Z|00076|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:23:04.888Z|00077|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:23:04.888Z|00078|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:23:04.889Z|00079|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:23:04.893Z|00080|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:23:04.894Z|00081|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhub2ba0cd8-f3' has been added on numa node 0
2018-03-26T22:23:04.894Z|00082|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:23:04.894Z|00083|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:23:04.894Z|00084|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:23:04.894Z|00085|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
~~~

Although it would reflect in the instance logs, so it can't be that either:
~~~
[root@overcloud-compute-0 ~]# cat /var/log/libvirt/qemu/instance-0000001d.log  | tail -1
2018-03-26T19:21:25.545330Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/1 (label charserial1)
[root@overcloud-compute-0 ~]# cat /var/log/libvirt/qemu/instance-00000021.log  | tail -1
2018-03-26T22:22:59.965387Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/0 (label charserial1)
[root@overcloud-compute-0 ~]# 
~~~

Comment 17 Andreas Karis 2018-03-26 22:42:34 UTC
Rebooted both instances in my lab with virsh reboot:
~~~
[root@overcloud-compute-0 ~]# tail -f /var/log/openvswitch/ovs-vswitchd.log -n0
2018-03-26T22:34:53.677Z|00086|timeval(vhost_thread1)|WARN|Unreasonably long 714476ms poll interval (4ms user, 12ms system)
2018-03-26T22:34:53.677Z|00087|timeval(vhost_thread1)|WARN|context switches: 738 voluntary, 4 involuntary
2018-03-26T22:34:53.677Z|00088|coverage(vhost_thread1)|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=38e4f481:
2018-03-26T22:34:53.677Z|00089|coverage(vhost_thread1)|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0039/sec   total: 42
2018-03-26T22:34:53.677Z|00090|coverage(vhost_thread1)|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0000/sec   total: 3
2018-03-26T22:34:53.677Z|00091|coverage(vhost_thread1)|INFO|ofproto_recv_openflow      1.2/sec     1.400/sec        1.3917/sec   total: 16524
2018-03-26T22:34:53.677Z|00092|coverage(vhost_thread1)|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0031/sec   total: 41
2018-03-26T22:34:53.677Z|00093|coverage(vhost_thread1)|INFO|rev_reconfigure            0.0/sec     0.000/sec        0.0061/sec   total: 59
2018-03-26T22:34:53.677Z|00094|coverage(vhost_thread1)|INFO|rev_bond                   0.0/sec     0.000/sec        0.0000/sec   total: 1
2018-03-26T22:34:53.677Z|00095|coverage(vhost_thread1)|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0008/sec   total: 6
2018-03-26T22:34:53.677Z|00096|coverage(vhost_thread1)|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0022/sec   total: 99
2018-03-26T22:34:53.677Z|00097|coverage(vhost_thread1)|INFO|rev_mac_learning           0.0/sec     0.000/sec        0.0044/sec   total: 41
2018-03-26T22:34:53.677Z|00098|coverage(vhost_thread1)|INFO|dumped_new_flow            0.4/sec     0.250/sec        0.1875/sec   total: 2034
2018-03-26T22:34:53.677Z|00099|coverage(vhost_thread1)|INFO|xlate_actions              0.4/sec     0.333/sec        0.2497/sec   total: 2849
2018-03-26T22:34:53.677Z|00100|coverage(vhost_thread1)|INFO|ccmap_expand               0.0/sec     0.000/sec        0.0000/sec   total: 1
2018-03-26T22:34:53.677Z|00101|coverage(vhost_thread1)|INFO|ccmap_shrink               0.0/sec     0.000/sec        0.0011/sec   total: 39
2018-03-26T22:34:53.677Z|00102|coverage(vhost_thread1)|INFO|cmap_expand                0.2/sec     0.250/sec        0.1819/sec   total: 1754
2018-03-26T22:34:53.677Z|00103|coverage(vhost_thread1)|INFO|cmap_shrink                0.0/sec     0.267/sec        0.1686/sec   total: 1635
2018-03-26T22:34:53.677Z|00104|coverage(vhost_thread1)|INFO|dpif_port_add              0.0/sec     0.000/sec        0.0003/sec   total: 12
2018-03-26T22:34:53.677Z|00105|coverage(vhost_thread1)|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0000/sec   total: 4
2018-03-26T22:34:53.677Z|00106|coverage(vhost_thread1)|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0000/sec   total: 9
2018-03-26T22:34:53.677Z|00107|coverage(vhost_thread1)|INFO|dpif_flow_put              0.0/sec     0.000/sec        0.0014/sec   total: 26
2018-03-26T22:34:53.677Z|00108|coverage(vhost_thread1)|INFO|dpif_flow_del              0.0/sec     0.283/sec        0.1878/sec   total: 2042
2018-03-26T22:34:53.677Z|00109|coverage(vhost_thread1)|INFO|dpif_execute               0.0/sec     0.000/sec        0.0000/sec   total: 3
2018-03-26T22:34:53.677Z|00110|coverage(vhost_thread1)|INFO|flow_extract               0.0/sec     0.000/sec        0.0000/sec   total: 9
2018-03-26T22:34:53.677Z|00111|coverage(vhost_thread1)|INFO|miniflow_malloc            0.4/sec     0.500/sec        0.5219/sec   total: 6337
2018-03-26T22:34:53.677Z|00112|coverage(vhost_thread1)|INFO|hindex_expand              0.0/sec     0.000/sec        0.0000/sec   total: 3
2018-03-26T22:34:53.677Z|00113|coverage(vhost_thread1)|INFO|hmap_pathological          0.0/sec     0.000/sec        0.0000/sec   total: 24
2018-03-26T22:34:53.677Z|00114|coverage(vhost_thread1)|INFO|hmap_expand               29.4/sec    28.933/sec       29.6872/sec   total: 350678
2018-03-26T22:34:53.677Z|00115|coverage(vhost_thread1)|INFO|hmap_shrink                0.0/sec     0.000/sec        0.0181/sec   total: 101
2018-03-26T22:34:53.677Z|00116|coverage(vhost_thread1)|INFO|mac_learning_learned       0.0/sec     0.000/sec        0.0056/sec   total: 55
2018-03-26T22:34:53.677Z|00117|coverage(vhost_thread1)|INFO|mac_learning_expired       0.0/sec     0.000/sec        0.0056/sec   total: 46
2018-03-26T22:34:53.677Z|00118|coverage(vhost_thread1)|INFO|netdev_received            1.6/sec     1.550/sec        1.8369/sec   total: 80322417
2018-03-26T22:34:53.677Z|00119|coverage(vhost_thread1)|INFO|netdev_sent                1.8/sec     2.050/sec        2.6439/sec   total: 80327953
2018-03-26T22:34:53.677Z|00120|coverage(vhost_thread1)|INFO|netdev_get_stats           3.2/sec     3.200/sec        3.0489/sec   total: 35262
2018-03-26T22:34:53.677Z|00121|coverage(vhost_thread1)|INFO|txn_unchanged              0.0/sec     0.000/sec        0.0050/sec   total: 60
2018-03-26T22:34:53.677Z|00122|coverage(vhost_thread1)|INFO|txn_incomplete             0.2/sec     0.200/sec        0.2075/sec   total: 2429
2018-03-26T22:34:53.677Z|00123|coverage(vhost_thread1)|INFO|txn_success                0.2/sec     0.200/sec        0.1997/sec   total: 2346
2018-03-26T22:34:53.677Z|00124|coverage(vhost_thread1)|INFO|poll_create_node         298.2/sec   306.050/sec      289.0506/sec   total: 3290859
2018-03-26T22:34:53.677Z|00125|coverage(vhost_thread1)|INFO|poll_zero_timeout          0.2/sec     0.450/sec        0.1875/sec   total: 1425
2018-03-26T22:34:53.677Z|00126|coverage(vhost_thread1)|INFO|rconn_queued               1.2/sec     1.400/sec        1.3906/sec   total: 16439
2018-03-26T22:34:53.677Z|00127|coverage(vhost_thread1)|INFO|rconn_sent                 1.2/sec     1.400/sec        1.3906/sec   total: 16439
2018-03-26T22:34:53.677Z|00128|coverage(vhost_thread1)|INFO|seq_change               85264.8/sec 83818.367/sec    72893.1503/sec   total: 711286486
2018-03-26T22:34:53.677Z|00129|coverage(vhost_thread1)|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 7
2018-03-26T22:34:53.677Z|00130|coverage(vhost_thread1)|INFO|stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 13
2018-03-26T22:34:53.677Z|00131|coverage(vhost_thread1)|INFO|unixctl_received           0.0/sec     0.000/sec        0.0008/sec   total: 3
2018-03-26T22:34:53.677Z|00132|coverage(vhost_thread1)|INFO|unixctl_replied            0.0/sec     0.000/sec        0.0008/sec   total: 3
2018-03-26T22:34:53.677Z|00133|coverage(vhost_thread1)|INFO|util_xalloc              1071.6/sec  1186.800/sec     1111.1939/sec   total: 12685317
2018-03-26T22:34:53.677Z|00134|coverage(vhost_thread1)|INFO|vconn_open                 0.0/sec     0.000/sec        0.0000/sec   total: 12
2018-03-26T22:34:53.677Z|00135|coverage(vhost_thread1)|INFO|vconn_received             1.2/sec     1.400/sec        1.3917/sec   total: 16527
2018-03-26T22:34:53.677Z|00136|coverage(vhost_thread1)|INFO|vconn_sent                 1.2/sec     1.400/sec        1.3906/sec   total: 16442
2018-03-26T22:34:53.677Z|00137|coverage(vhost_thread1)|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0006/sec   total: 9
2018-03-26T22:34:53.677Z|00138|coverage(vhost_thread1)|INFO|netdev_arp_lookup          0.0/sec     0.000/sec        0.0000/sec   total: 3
2018-03-26T22:34:53.677Z|00139|coverage(vhost_thread1)|INFO|netdev_get_ifindex         0.0/sec     0.000/sec        0.0000/sec   total: 4
2018-03-26T22:34:53.677Z|00140|coverage(vhost_thread1)|INFO|netdev_get_hwaddr          0.0/sec     0.000/sec        0.0047/sec   total: 50
2018-03-26T22:34:53.677Z|00141|coverage(vhost_thread1)|INFO|netdev_set_hwaddr          0.0/sec     0.000/sec        0.0000/sec   total: 3
2018-03-26T22:34:53.677Z|00142|coverage(vhost_thread1)|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0036/sec   total: 51
2018-03-26T22:34:53.677Z|00143|coverage(vhost_thread1)|INFO|netlink_received           0.8/sec     0.800/sec        0.7989/sec   total: 9422
2018-03-26T22:34:53.677Z|00144|coverage(vhost_thread1)|INFO|netlink_recv_jumbo         0.8/sec     0.800/sec        0.7928/sec   total: 9332
2018-03-26T22:34:53.677Z|00145|coverage(vhost_thread1)|INFO|netlink_sent               0.8/sec     0.800/sec        0.7936/sec   total: 9359
2018-03-26T22:34:53.677Z|00146|coverage(vhost_thread1)|INFO|nln_changed                0.0/sec     0.000/sec        0.0011/sec   total: 17
2018-03-26T22:34:53.677Z|00147|coverage(vhost_thread1)|INFO|46 events never hit
2018-03-26T22:34:53.677Z|00148|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhub2ba0cd8-f3' has been removed
2018-03-26T22:34:58.339Z|00149|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:34:58.339Z|00150|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:34:58.339Z|00151|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:34:58.339Z|00152|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:34:58.339Z|00153|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:34:58.339Z|00154|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:34:58.339Z|00155|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:34:58.339Z|00156|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'disabled'
2018-03-26T22:34:58.343Z|00157|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:34:58.343Z|00158|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhub2ba0cd8-f3' has been added on numa node 0
2018-03-26T22:34:58.344Z|00159|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:34:58.344Z|00160|dpdk(vhost_thread1)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:34:58.344Z|00161|dpdk(vhost_thread1)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
2018-03-26T22:34:58.344Z|00162|dpdk(vhost_thread1)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vhub2ba0cd8-f3'changed to 'enabled'
~~~

Guess what, that does *not* show up in the libvirt logs:
~~~
[root@overcloud-compute-0 ~]# cat /var/log/libvirt/qemu/instance-00000021.log  | tail -1 
2018-03-26T22:22:59.965387Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/0 (label charserial1)
[root@overcloud-compute-0 ~]# cat /var/log/libvirt/qemu/instance-0000001d.log  | tail -1
2018-03-26T19:21:25.545330Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/1 (label charserial1)
[root@overcloud-compute-0 ~]# date
Mon Mar 26 22:36:28 UTC 2018
[root@overcloud-compute-0 ~]# 
~~~

It also doesn't show up in the neutron logs, nor in the nova-compute logs (the only logs that we see are the ones for the resource tracker which runs every minute):
~~~
[root@overcloud-compute-0 ~]# tail -20 /var/log/nova/nova-compute.log
2018-03-26 22:35:50.513 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2018-03-26 22:35:51.309 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:35:51.400 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Total usable vcpus: 24, total allocated vcpus: 8
2018-03-26 22:35:51.401 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130978MB used_ram=4096MB phys_disk=558GB used_disk=32GB total_vcpus=24 used_vcpus=8 pci_stats=[]
2018-03-26 22:35:51.423 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:35:51.424 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
2018-03-26 22:36:52.511 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2018-03-26 22:36:53.300 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:36:53.345 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] No authentication information found for placement API. Placement is optional in Newton, but required in Ocata. Please enable the placement service before upgrading.
2018-03-26 22:36:53.388 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Total usable vcpus: 24, total allocated vcpus: 8
2018-03-26 22:36:53.388 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130978MB used_ram=4096MB phys_disk=558GB used_disk=32GB total_vcpus=24 used_vcpus=8 pci_stats=[]
2018-03-26 22:36:53.411 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:36:53.411 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
2018-03-26 22:37:53.511 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2018-03-26 22:37:54.292 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:37:54.380 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Total usable vcpus: 24, total allocated vcpus: 8
2018-03-26 22:37:54.380 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130978MB used_ram=4096MB phys_disk=558GB used_disk=32GB total_vcpus=24 used_vcpus=8 pci_stats=[]
2018-03-26 22:37:54.403 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] No authentication information found for placement API. Placement is optional in Newton, but required in Ocata. Please enable the placement service before upgrading.
2018-03-26 22:37:54.403 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:37:54.403 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
[root@overcloud-compute-0 ~]# tail -1 /var/log/neutron/openvswitch-agent.log 
2018-03-26 22:11:29.436 928228 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-5b5a9318-d011-4681-980a-b0d439535282 - - - - -] Configuration for devices up [u'9032c6cd-24c5-4e1a-8456-626c6218aeb6'] and devices down [] completed.
[root@overcloud-compute-0 ~]# 
~~~

At this point, I'm going to investigate the instances.

Comment 18 Andreas Karis 2018-03-26 22:52:57 UTC
[internal]
Although in that case, the PMDs don't restart (last PMD restart was at 22:12 in my lab, instance restart was at 22:34)
~~~
[root@overcloud-compute-0 ~]# ps -T -p927988 | grep pmd
 927988  947714 ?        00:35:45 pmd262
 927988  947715 ?        00:35:45 pmd265
 927988  947716 ?        00:35:45 pmd264
 927988  947717 ?        00:35:45 pmd263
 927988  947718 ?        00:35:44 pmd268
 927988  947719 ?        00:35:44 pmd269
 927988  947720 ?        00:35:45 pmd267
 927988  947721 ?        00:35:44 pmd266
[root@overcloud-compute-0 ~]# date
Mon Mar 26 22:47:17 UTC 2018
~~~

And an instance virsh reboot would yield a VM Resumed in nova logs:

2018-03-26 22:32:48.514 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:32:48.514 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
2018-03-26 22:33:47.512 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2018-03-26 22:33:48.305 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] No authentication information found for placement API. Placement is optional in Newton, but required in Ocata. Please
 enable the placement service before upgrading.
2018-03-26 22:33:48.306 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:33:48.394 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Total usable vcpus: 24, total allocated vcpus: 8
2018-03-26 22:33:48.394 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130978MB used_ram=4096MB phys_disk=558GB used_disk=3
2GB total_vcpus=24 used_vcpus=8 pci_stats=[]
2018-03-26 22:33:48.416 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:33:48.416 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
2018-03-26 22:34:48.511 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2018-03-26 22:34:49.296 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:34:49.390 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Total usable vcpus: 24, total allocated vcpus: 8
2018-03-26 22:34:49.391 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130978MB used_ram=4096MB phys_disk=558GB used_disk=3
2GB total_vcpus=24 used_vcpus=8 pci_stats=[]
2018-03-26 22:34:49.414 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] No authentication information found for placement API. Placement is optional in Newton, but required in Ocata. Please
 enable the placement service before upgrading.
2018-03-26 22:34:49.414 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:34:49.414 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
2018-03-26 22:34:53.697 928293 INFO nova.compute.manager [req-d6fa4908-6700-4b79-ad2f-46eb44abdc75 - - - - -] [instance: 489afc38-47f3-42aa-b062-a1196c99e6a2] VM Resumed (Lifecycle Event)
2018-03-26 22:35:50.513 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2018-03-26 22:35:51.309 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:35:51.400 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Total usable vcpus: 24, total allocated vcpus: 8
2018-03-26 22:35:51.401 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130978MB used_ram=4096MB phys_disk=558GB used_disk=32GB total_vcpus=24 used_vcpus=8 pci_stats=[]
2018-03-26 22:35:51.423 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:35:51.424 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
2018-03-26 22:36:52.511 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2018-03-26 22:36:53.300 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:36:53.345 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] No authentication information found for placement API. Placement is optional in Newton, but required in Ocata. Please enable the placement service before upgrading.
2018-03-26 22:36:53.388 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Total usable vcpus: 24, total allocated vcpus: 8
2018-03-26 22:36:53.388 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130978MB used_ram=4096MB phys_disk=558GB used_disk=32GB total_vcpus=24 used_vcpus=8 pci_stats=[]
2018-03-26 22:36:53.411 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:36:53.411 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
2018-03-26 22:37:53.511 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2018-03-26 22:37:54.292 928293 WARNING nova.scheduler.client.report [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Unable to refresh my resource provider record
2018-03-26 22:37:54.380 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Total usable vcpus: 24, total allocated vcpus: 8
2018-03-26 22:37:54.380 928293 INFO nova.compute.resource_tracker [req-b5d9434b-1336-4ea6-b91e-e7de2a6d071d - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130978MB used_ram=4096MB phys_disk=558GB used_disk=32GB total_vcpus=24 used_vcpus=8 pci_stats=[]


And this does not happen in the customer's case:

[akaris@collab-shell compute-29]$ grep Resumed var/log/nova/nova-compute.log 
[akaris@collab-shell compute-29]$

Comment 19 Andreas Karis 2018-03-27 14:14:40 UTC
This was an issue in the application layer.