Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1821277

Summary:

virNetSocketReadWire:1806 : End of file while reading data: Input/output error

Product:

Red Hat OpenStack

Reporter:

Ketan Mehta <kmehta>

Component:

openstack-containers

Assignee:

Dan Prince <dprince>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

nlevinki <nlevinki>

Severity:

high

Docs Contact:

Priority:

high

Version:

13.0 (Queens)

CC:

bcafarel, chhu, dprince, efoley, fsumsal, gkadam, hhan, jhardee, kchamart, kecarter, m.andre, mhou, mprivozn

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Flags:

jhardee: needinfo-

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-04-28 14:22:00 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
VM xml - g2.xml	none
libvirtd.log	none
client_libvirtd.log	none
client_hang_libvirtd.log	none

Description Ketan Mehta 2020-04-06 13:23:36 UTC

Description of problem:

The mentioned errors are noticed on all controller and compute nodes on a RHOSP13 setup.

systemctl status docker

~~~
Mar 24 08:47:20 host8..com dockerd-current[7497]: 2020-03-24 08:47:20.804+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Mar 24 08:47:34 host8..com dockerd-current[7497]: 2020-03-24 08:47:34.668+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Mar 24 08:48:20 host8..com dockerd-current[7497]: 2020-03-24 08:48:20.835+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Mar 24 08:48:35 host8..com dockerd-current[7497]: 2020-03-24 08:48:35.015+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Mar 24 08:49:20 host8..com dockerd-current[7497]: 2020-03-24 08:49:20.765+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Mar 24 08:49:34 host8..com dockerd-current[7497]: 2020-03-24 08:49:34.738+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Mar 24 08:50:21 host8..com dockerd-current[7497]: 2020-03-24 08:50:21.006+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Mar 24 08:50:34 host8..com dockerd-current[7497]: 2020-03-24 08:50:34.784+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Mar 24 08:51:20 host8..com dockerd-current[7497]: 2020-03-24 08:51:20.844+0000: 8707: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
~~~

Just to add the environment in which such errors are noticed, there are filebeat container running on those node (as part of ELK stack).

But, stopping the filebeat container does not prevents occurrence of these log messages.

Version-Release number of selected component (if applicable):

docker-1.13.1-109.gitcccb291.el7_7.x86_64                   Sun Feb 23 19:02:51 2020
docker-client-1.13.1-109.gitcccb291.el7_7.x86_64            Sun Feb 23 19:02:48 2020
docker-common-1.13.1-109.gitcccb291.el7_7.x86_64            Sun Feb 23 19:02:47 2020
docker-rhel-push-plugin-1.13.1-109.gitcccb291.el7_7.x86_64  Sun Feb 23 19:02:47 2020
python-docker-2.4.2-2.el7.noarch                            Fri Jun 28 01:59:48 2019
python-docker-pycreds-0.3.0-11.el7.noarch                   Sun Feb 23 19:43:45 2020
python-obfuscateduser40-agent-docker-cmd-1.5.4-1.el7ost.noarch          Fri Jun 28 02:03:27 2019


How reproducible:

Not that I'm sure of as of now, these are not noticed in every RHOSP13 environment.


Actual results:


Expected results:


Additional info:
There is no impact, but would like to know the cause of this.

Comment 9 jhardee 2020-07-27 19:50:53 UTC

Hi,

I have a customer who is showing this message as well.

2020-07-15 13:00:16.389 1 ERROR nova.virt.libvirt.driver [req-779c9ef4-4d2d-4568-bd85-99f80acb67f4 485fe140b1dd426b92cb0744ee3ddf05 d6a5f6c4f88e41b499f823dd147a24d2 - default default] [instance: e0f46da7-9105-4b2e-8ef0-58928e1ad325] Failed to start libvirt guest: libvirtError: Cannot recv data: Connection reset by peer
2020-07-15 13:00:16.391 1 INFO os_vif [req-779c9ef4-4d2d-4568-bd85-99f80acb67f4 485fe140b1dd426b92cb0744ee3ddf05 d6a5f6c4f88e41b499f823dd147a24d2 - default default] Successfully unplugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:86:ea:af,bridge_name='br-int',has_traffic_filtering=True,id=77ee2aaa-f1d4-4d8d-a99f-878ff0842c52,network=Network(c14dce0c-0b08-4b65-b5e1-f70c426e246f),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap77ee2aaa-f1')
2020-07-15 13:00:16.395 1 ERROR nova.compute.manager [req-779c9ef4-4d2d-4568-bd85-99f80acb67f4 485fe140b1dd426b92cb0744ee3ddf05 d6a5f6c4f88e41b499f823dd147a24d2 - default default] [instance: e0f46da7-9105-4b2e-8ef0-58928e1ad325] Cannot reboot instance: Cannot recv data: Connection reset by peer: libvirtError: Cannot recv data: Connection reset by peer
2020-07-15 13:00:17.752 1 INFO nova.compute.manager [req-779c9ef4-4d2d-4568-bd85-99f80acb67f4 485fe140b1dd426b92cb0744ee3ddf05 d6a5f6c4f88e41b499f823dd147a24d2 - default default] [instance: e0f46da7-9105-4b2e-8ef0-58928e1ad325] Successfully reverted task state from reboot_started_hard on failure for instance.
2020-07-15 13:00:17.759 1 ERROR oslo_messaging.rpc.server [req-779c9ef4-4d2d-4568-bd85-99f80acb67f4 485fe140b1dd426b92cb0744ee3ddf05 d6a5f6c4f88e41b499f823dd147a24d2 - default default] Exception during message handling: libvirtError: Cannot recv data: Connection reset by peer


2020-07-15 13:00:17.759 1 ERROR oslo_messaging.rpc.server libvirtError: Cannot recv data: Connection reset by peer
2020-07-15 13:00:17.759 1 ERROR oslo_messaging.rpc.server
2020-07-15 13:00:22.497 1 INFO nova.compute.manager [req-d48db47e-a5cd-476e-b415-539f307687c6 - - - - -] Running instance usage audit for host compute02.localdomain from 2020-07-15 12:00:00 to 2020-07-15 13:00:00. 7 instances.
2020-07-15 13:00:30.940 1 INFO nova.compute.resource_tracker [req-d48db47e-a5cd-476e-b415-539f307687c6 - - - - -] Final resource view: name=compute02.localdomain phys_ram=64163MB used_ram=58368MB phys_disk=8371GB used_disk=540GB total_vcpus=64 used_vcpus=24 pci_stats=[]
2020-07-15 13:01:31.947 1 INFO nova.compute.resource_tracker [req-d48db47e-a5cd-476e-b415-539f307687c6 - - - - -] Final resource view: name=compute02.localdomain phys_ram=64163MB used_ram=58368MB phys_disk=8371GB used_disk=540GB total_vcpus=64 used_vcpus=24 pci_stats=[]
2020-07-15 13:02:31.952 1 INFO nova.compute.resource_tracker [req-d48db47e-a5cd-476e-b415-539f307687c6 - - - - -] Final resource view: name=compute02.localdomain phys_ram=64163MB used_ram=58368MB phys_disk=8371GB used_disk=540GB total_vcpus=64 used_vcpus=24 pci_stats=[]
2020-07-15 13:02:56.420 1 INFO nova.compute.manager [-] [instance: e0f46da7-9105-4b2e-8ef0-58928e1ad325] During _sync_instance_power_state the DB power_state (4) does not match the vm_power_state from the hypervisor (0). Updating power_state in the DB to match the hypervisor.

* snapshot of virtnetsocketreadwire:1806 ERROR
https://access.redhat.com/hydra/rest/cases/02702586/attachments/52410106-1526-4180-89d0-54d0be99cf41?usePresignedUrl=true

Comment 10 Emma Foley 2020-07-29 15:33:08 UTC

@Kevin, This doesn't look relevant to CloudOps, perhaps the compute DFG should take a look at it.

Comment 16 Kashyap Chamarthy 2020-10-23 11:59:54 UTC

[My comment from last week didn't register as the browser crashed.]

From a chat with libvirt folks, that error is usually seen when libvirtd dies, the client loses connection (as expected).

That said, if this is reproducible, then more fine-grained libvirt logs[1] might tell us more

@Michal: Does this ring any bells for you?

[1] https://kashyapc.fedorapeople.org/virt/openstack/request-nova-libvirt-qemu-debug-logs.txt

Comment 20 chhu 2020-10-26 11:02:25 UTC

Created attachment 1724139 [details]
VM xml - g2.xml

Comment 21 chhu 2020-10-26 11:03:36 UTC

Created attachment 1724140 [details]
libvirtd.log

Comment 22 Michal Privoznik 2020-10-26 11:14:15 UTC

(In reply to chhu from comment #21)
> Created attachment 1724140 [details]
> libvirtd.log

This is the daemon log, can you please provide the client log? Per my analysis in comment 17 it's client who is disconnecting and I'm wondering why.

Comment 23 chhu 2020-10-26 12:42:23 UTC

Created attachment 1724151 [details]
client_libvirtd.log

Comment 24 chhu 2020-10-26 13:21:20 UTC

Created attachment 1724156 [details]
client_hang_libvirtd.log

Comment 25 Michal Privoznik 2020-10-27 08:49:21 UTC

Okay, so from the client logs I see some weirdness happening. These are my findings from investigating the problematic run:

Here, the client (virsh) starts up and connects to the daemon.
client:
2020-10-26 12:36:22.226+0000: 14506: info : libvirt version: 6.0.0, package: 25.module+el8.2.1+7154+47ffd890 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-06-25-15:13:13, )
2020-10-26 12:36:22.226+0000: 14506: info : hostname: dell-per740-05.rhts.eng.pek2.redhat.com
2020-10-26 12:36:22.226+0000: 14506: debug : virGlobalInit:244 : register drivers
...
2020-10-26 12:36:22.226+0000: 14507: debug : virThreadJobSet:97 : Thread 14507 is now running job vshEventLoop
2020-10-26 12:36:22.226+0000: 14506: debug : doRemoteOpen:1094 : proceeding with name = 
2020-10-26 12:36:22.226+0000: 14507: debug : virEventRunDefaultImpl:350 : running default event implementation
2020-10-26 12:36:22.226+0000: 14506: debug : doRemoteOpen:1103 : Connecting with transport 1
2020-10-26 12:36:22.226+0000: 14507: debug : virEventPollCleanupTimeouts:520 : Cleanup 1
2020-10-26 12:36:22.226+0000: 14507: debug : virEventPollCleanupHandles:569 : Cleanup 1
2020-10-26 12:36:22.226+0000: 14506: debug : remoteGetUNIXSocketHelper:799 : Built UNIX sockname /var/run/libvirt/libvirt-sock for transport unix prefix libvirt flags=0x0
2020-10-26 12:36:22.226+0000: 14507: debug : virEventPollMakePollFDs:396 : Prepare n=0 w=1, f=4 e=1 d=0
2020-10-26 12:36:22.226+0000: 14506: debug : remoteGetUNIXSocket:885 : Chosen UNIX sockname /var/run/libvirt/libvirt-sock daemon libvirtd for mode legacy transport unix flags=0x0
2020-10-26 12:36:22.226+0000: 14506: debug : doRemoteOpen:1127 : Chosen UNIX socket /var/run/libvirt/libvirt-sock
2020-10-26 12:36:22.226+0000: 14507: debug : virEventPollCalculateTimeout:333 : Calculate expiry of 1 timers
2020-10-26 12:36:22.226+0000: 14507: debug : virEventPollCalculateTimeout:366 : No timeout is pending
2020-10-26 12:36:22.226+0000: 14507: info : virEventPollRunOnce:635 : EVENT_POLL_RUN: nhandles=1 timeout=-1


The daemon accepts connection request and open the qemu driver.
server:
2020-10-26 12:36:22.228+0000: 10048: debug : virThreadJobSet:94 : Thread 10048 (virNetServerHandleJob) is now running job remoteDispatchConnectOpen
2020-10-26 12:36:22.228+0000: 10048: debug : virConnectOpen:1074 : name=
2020-10-26 12:36:22.228+0000: 10048: debug : virConfLoadConfig:1533 : Loading config file '/etc/libvirt/libvirt.conf'
2020-10-26 12:36:22.228+0000: 10048: debug : virConfReadFile:737 : filename=/etc/libvirt/libvirt.conf
2020-10-26 12:36:22.228+0000: 10048: debug : virConfGetValueString:882 : Get value string (nil) 0
2020-10-26 12:36:22.228+0000: 10048: debug : virConnectOpenInternal:859 : Trying to probe for default URI
2020-10-26 12:36:22.228+0000: 10048: debug : virConnectOpenInternal:866 : QEMU driver URI probe returned 'qemu:///system'
2020-10-26 12:36:22.228+0000: 10048: debug : virConfGetValueStringList:930 : Get value string list (nil) 0
2020-10-26 12:36:22.228+0000: 10048: debug : virConnectOpenInternal:903 : Split "qemu:///system" to URI components:
  scheme qemu
  server <null>
  user <null>
  port 0
  path /system

  2020-10-26 12:36:22.228+0000: 10048: debug : virConnectOpenInternal:954 : trying driver 9 (QEMU) ...
2020-10-26 12:36:22.228+0000: 10048: debug : virConnectOpenInternal:969 : Checking for supported URI schemes
2020-10-26 12:36:22.228+0000: 10048: debug : virConnectOpenInternal:972 : Matched URI scheme 'qemu'
2020-10-26 12:36:22.228+0000: 10048: debug : virConnectOpenInternal:1005 : driver 9 QEMU returned SUCCESS
2020-10-26 12:36:22.228+0000: 10048: debug : virThreadJobClear:119 : Thread 10048 (virNetServerHandleJob) finished job remoteDispatchConnectOpen with ret=0


The client proceeds with sending the virDomainCreate() request for the g2 domain.
client:
2020-10-26 12:36:22.231+0000: 14506: debug : virDomainCreate:6525 : dom=0x5623e67c0e80, (VM: name=g2, uuid=743eee8c-16f1-11eb-a35f-20040febab30)
2020-10-26 12:36:22.231+0000: 14506: debug : virNetMessageNew:45 : msg=0x5623e67c47d0 tracked=0
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollDispatchTimeouts:427 : Dispatch 2
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollDispatchHandles:473 : Dispatch 2
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollDispatchHandles:487 : i=0 w=1
2020-10-26 12:36:22.231+0000: 14506: debug : virNetMessageEncodePayload:388 : Encode length as 56
2020-10-26 12:36:22.231+0000: 14507: info : virEventPollDispatchHandles:501 : EVENT_POLL_DISPATCH_HANDLE: watch=1 events=1
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollDispatchHandles:487 : i=1 w=2
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollCleanupTimeouts:520 : Cleanup 2
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollCleanupHandles:569 : Cleanup 2
2020-10-26 12:36:22.231+0000: 14506: info : virNetClientSendInternal:2085 : RPC_CLIENT_MSG_TX_QUEUE: client=0x5623e67c3f20 len=56 prog=536903814 vers=1 proc=9 type=0 status=0 serial=8
2020-10-26 12:36:22.231+0000: 14507: debug : virEventRunDefaultImpl:350 : running default event implementation
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollCleanupTimeouts:520 : Cleanup 2
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollCleanupHandles:569 : Cleanup 2
2020-10-26 12:36:22.231+0000: 14506: debug : virNetClientCallNew:2038 : New call 0x5623e67c32e0: msg=0x5623e67c47d0, expectReply=1, nonBlock=0
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollMakePollFDs:396 : Prepare n=0 w=1, f=4 e=1 d=0
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollMakePollFDs:396 : Prepare n=1 w=2, f=6 e=1 d=0
2020-10-26 12:36:22.231+0000: 14507: debug : virEventPollCalculateTimeout:333 : Calculate expiry of 2 timers


The daemon performs the startup.
server:
2020-10-26 12:36:22.231+0000: 10050: debug : virThreadJobSet:94 : Thread 10050 (virNetServerHandleJob) is now running job remoteDispatchDomainCreate
2020-10-26 12:36:22.231+0000: 10050: debug : virDomainCreate:6525 : dom=0x7f90e802e670, (VM: name=g2, uuid=743eee8c-16f1-11eb-a35f-20040febab30)
2020-10-26 12:36:22.231+0000: 10050: debug : qemuDomainObjBeginJobInternal:9784 : Starting job: job=none agentJob=none asyncJob=start (vm=0x7f90a0106e10 name=g2, current job=none agentJob=none async=none)
2020-10-26 12:36:22.231+0000: 10050: debug : qemuDomainObjBeginJobInternal:9841 : Started async job: start (vm=0x7f90a0106e10 name=g2)
2020-10-26 12:36:22.231+0000: 10050: debug : qemuProcessStart:7077 : conn=0x7f90f4019b10 driver=0x7f90a00f7f60 vm=0x7f90a0106e10 name=g2 id=-1 asyncJob=start migrateFrom=<null> migrateFd=-1 migratePath=<null> snapshot=(nil) vmop=0 flags=0x1


While the daemon is performing the startup, the client waits for reply for the issued RPC, but also set timer to wake up in 5 seconds so that keepalive message can be sent:
client:
2020-10-26 12:36:22.231+0000: 14507: info : virEventPollRunOnce:635 : EVENT_POLL_RUN: nhandles=1 timeout=5000
2020-10-26 12:36:27.571+0000: 14572: info : libvirt version: 6.0.0, package: 25.module+el8.2.1+7154+47ffd890 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-06-25-15:13:13, )


But the last message logged is entering the poll(). The next one is greeting message from another run of new virsh (after 5 seconds; because there's "sleep 5" in the loop in the comment 19). Therefore, I think virsh crashed when entering poll(). Chenli, is there any record of this? something in the system log, perhaps a coredump even?

Comment 28 chhu 2020-10-29 11:55:53 UTC

Thanks Michal Privoznik's debugging!

After further debugging and testing on the env.
We found the `virsh start` hang is a qemu bug, I'll file another qemu bug for the VM hang issue.
The libvirtd.log error message below is happenned when the VM is hang and exit the `virsh start` terminal manually.
- "error : virNetSocketReadWire:1832 : End of file while reading data: Input/output error"

Comment 29 Michal Privoznik 2020-10-29 12:37:15 UTC

Right. So in my investigation in comment 25 the virsh did not crash, but was killed manually by user. Therefore, we don't have a reproducer. If somebody who is able to reproduce could provide logs including client logs that would be great.

Just for the record, the error does not necessarily mean there's a problem. It is reported if client disconnects without closing the connection properly. It is a problem if the client was doing something/wanted to do something more.

Comment 30 chhu 2020-10-30 03:03:47 UTC

(In reply to chhu from comment #28)
> Thanks Michal Privoznik's debugging!
> 
> After further debugging and testing on the env.
> We found the `virsh start` hang is a qemu bug, I'll file another qemu bug
> for the VM hang issue.
Tracked in Bug 1892669 - VM start hang - qemu stuck in query-balloon

Comment 31 chhu 2020-10-30 06:32:26 UTC

(In reply to jhardee from comment #9)
> Hi,
> 
> I have a customer who is showing this message as well.
> 
> 2020-07-15 13:00:16.389 1 ERROR nova.virt.libvirt.driver
> [req-779c9ef4-4d2d-4568-bd85-99f80acb67f4 485fe140b1dd426b92cb0744ee3ddf05
> d6a5f6c4f88e41b499f823dd147a24d2 - default default] [instance:
> e0f46da7-9105-4b2e-8ef0-58928e1ad325] Failed to start libvirt guest:
> libvirtError: Cannot recv data: Connection reset by peer
> ....

Hi, Ketan

The issue in comment 14 and 19 is another Bug 1892669.

As Michal said in comment 29:
"the error (virNetSocketReadWire:1832 : End of file while reading data: Input/output error) does not necessarily mean there's a problem. 
It is reported if client disconnects without closing the connection properly. 
It is a problem if the client was doing something/wanted to do something more."

According to the nova log in comment 9, we can see the customer hit start,reboot VM failures,
we need to know what happened to the VM before the failures. Will you please help to provide below information? Thank you!
--------------------------------------------------------
Tested on packages:
 - OSP version: OSP 13 (compute node with rhel7.9) ?
 - Compute node: realtime kernel ? libvirt, qemu-kvm version ?

Test env:
 - VM xml
 - Env with Realtime ? with SR-IOV PF, VF ?
 - VM storage: nfs ? ceph ? iscsi ?

Reproduced steps:
1. 2. 3. ....

Related logs:
 - sos report, Or
   - compute node:
     - /var/log/containers/nova/nova-compute.log
     - /var/log/containers/libvirt/libvirtd.log
     - /var/log/libvirt/qemu/instance-0000000*.log
   - controller node:
     - /var/log/containers/nova/nova-conductor.log
     - /var/log/containers/nova/nova-scheduler.log

Please do these settings for the libvirtd log, before reproducing the bug:
- On compute node:
# sudo vi /var/lib/config-data/puppet-generated/nova_libvirt/etc/libvirt/libvirtd.conf
  log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util"
  log_outputs="1:file:/var/log/libvirt/libvirtd.log"
# sudo podman restart nova_libvirt

Comment 32 Kashyap Chamarthy 2020-11-20 15:01:10 UTC

(In reply to chhu from comment #31)
> (In reply to jhardee from comment #9)
> > Hi,
> > 
> > I have a customer who is showing this message as well.
> > 
> > 2020-07-15 13:00:16.389 1 ERROR nova.virt.libvirt.driver
> > [req-779c9ef4-4d2d-4568-bd85-99f80acb67f4 485fe140b1dd426b92cb0744ee3ddf05
> > d6a5f6c4f88e41b499f823dd147a24d2 - default default] [instance:
> > e0f46da7-9105-4b2e-8ef0-58928e1ad325] Failed to start libvirt guest:
> > libvirtError: Cannot recv data: Connection reset by peer
> > ....
> 
> Hi, Ketan
> 
> The issue in comment 14 and 19 is another Bug 1892669.
> 
> As Michal said in comment 29:
> "the error (virNetSocketReadWire:1832 : End of file while reading data:
> Input/output error) does not necessarily mean there's a problem. 
> It is reported if client disconnects without closing the connection
> properly. 
> It is a problem if the client was doing something/wanted to do something
> more."
> 
> According to the nova log in comment 9, we can see the customer hit
> start,reboot VM failures,
> we need to know what happened to the VM before the failures. Will you please
> help to provide below information? Thank you!
> --------------------------------------------------------
> Tested on packages:
>  - OSP version: OSP 13 (compute node with rhel7.9) ?
>  - Compute node: realtime kernel ? libvirt, qemu-kvm version ?
> 
> Test env:
>  - VM xml
>  - Env with Realtime ? with SR-IOV PF, VF ?
>  - VM storage: nfs ? ceph ? iscsi ?
> 
> Reproduced steps:
> 1. 2. 3. ....
> 
> Related logs:
>  - sos report, Or
>    - compute node:
>      - /var/log/containers/nova/nova-compute.log
>      - /var/log/containers/libvirt/libvirtd.log
>      - /var/log/libvirt/qemu/instance-0000000*.log
>    - controller node:
>      - /var/log/containers/nova/nova-conductor.log
>      - /var/log/containers/nova/nova-scheduler.log
> 
> Please do these settings for the libvirtd log, before reproducing the bug:
> - On compute node:
> # sudo vi
> /var/lib/config-data/puppet-generated/nova_libvirt/etc/libvirt/libvirtd.conf
>   log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file
> 3:object 1:util"
>   log_outputs="1:file:/var/log/libvirt/libvirtd.log"
> # sudo podman restart nova_libvirt


Ketan, any update on the above?

Comment 37 Han Han 2021-01-20 09:57:42 UTC

FYI, a new bug was found in libvirt https://bugzilla.redhat.com/show_bug.cgi?id=1918211
It has similar error msg to this bug

Comment 38 Kashyap Chamarthy 2021-01-22 12:04:06 UTC

(In reply to Han Han from comment #37)
> FYI, a new bug was found in libvirt
> https://bugzilla.redhat.com/show_bug.cgi?id=1918211
> It has similar error msg to this bug

Right, but Michal confirms that in your new bug that it is not the same cause: https://bugzilla.redhat.com/show_bug.cgi?id=1918211#c4

(Also your new bug seems to be reproducible only in nested KVM setup; and not on physical VMs.  And in _this_ bug, I think, the setup is not nested.)

Comment 39 Kashyap Chamarthy 2021-01-29 11:40:10 UTC

A solid reproducer seems to be completely elusive here — without that we can't do much here.

So closing this bug for now.  

If a clear reproducer emerges, please feel free to reopen this one with appropriate details.