Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1377092

Summary:

[PPC] - Hotplug/hotunplug vNICs ends up with error and VM shuts down

Product:

Red Hat Enterprise Linux 7

Reporter:

Michael Burman <mburman>

Component:

qemu-kvm-rhev

Assignee:

Laurent Vivier <lvivier>

Status:

CLOSED DUPLICATE

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

7.3

CC:

bugs, gklein, knoel, lvivier, mburman, mprivozn, rbalakri, virt-maint

Target Milestone:

pre-dev-freeze

Keywords:

Regression

Target Release:

---

Hardware:

ppc64le

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-09-20 08:55:37 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Network

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Logs	none
more logs	none
screenshot	none

Description Michael Burman 2016-09-18 10:54:13 UTC

Created attachment 1202167 [details]
Logs

Description of problem:
[ppc] - Hotplug/hotunplug vNIC ends up with  error and VM shuts down

After adding few vNICs to running VM, we failing with several errors(each time something else) and VM is shutting down immediately. 

Version-Release number of selected component (if applicable):
3.6.9.2-0.1.el6
vdsm-4.17.35-1.el7ev.noarch

How reproducible:
100, after adding 3 vNICs to the VM

Steps to Reproduce:
1. Add few vNIC to VM on ppc server

Actual results:
In somepoint(usually after adding the 3th vNIC), VM will shutdown. Sometimes with error in the engine, sometimes not.

Expected results:
Should work as expected.

Additional info:
Discovered during tier1 run.

Comment 1 Dan Kenigsberg 2016-09-18 11:26:47 UTC

Please supply kernel, qemu-kvm and libvirt versions, as this seems unrelated to oVirt.

Michael, could you try to reproduce this from the `virsh device-attach` command line?
<interface type="bridge">
	<mac address="00:1a:4a:16:01:5c"/>
	<model type="virtio"/>
	<source bridge="ovirtmgmt"/>
	<filterref filter="vdsm-no-mac-spoofing"/>
	<link state="up"/>
	<bandwidth/>
</interface>

Michal, maybe the attached libvirt logs are enough for you to understand where the problem sit?

Comment 2 Michael Burman 2016-09-18 12:28:27 UTC

3.10.0-327.22.2.el7.ppc64le
libvirt-2.0.0-9.el7.ppc64le
qemu-kvm-rhev-2.6.0-23.el7.ppc64le

Comment 3 Yaniv Kaul 2016-09-18 12:40:49 UTC

Also, can you confirm it's working correctly on x86_64?

Comment 4 Michael Burman 2016-09-18 14:01:40 UTC

- Working as expected on x86_64


- This indeed reproduced from the `virsh device-attach` command line

When i added the 3th vNIC, VM has crashed down. 'Lost connection with qemu process'

2016-09-18 16:53:41,632 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM mb1 is down with error. Exit message: Lost connection with qemu process.


2016-09-18 13:46:17.863+0000: 79576: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x3fff3c10a4a0
2016-09-18 13:53:37.343+0000: shutting down
2016-09-18T13:53:38.528359Z qemu-kvm: terminating on signal 15 from pid 4455

Comment 5 Meni Yakove 2016-09-19 09:10:20 UTC

Working on x86. same automation test run on both.

Comment 6 Yaniv Kaul 2016-09-19 09:18:39 UTC

(In reply to Michael Burman from comment #4)
> - Working as expected on x86_64
> 
> 
> - This indeed reproduced from the `virsh device-attach` command line
> 
> When i added the 3th vNIC, VM has crashed down. 'Lost connection with qemu
> process'

Oops - looks like a qemu-kvm issue. Do you see a core?

Can you move it to VirtQE and the bug to qemu-kvm under RHEL?

> 
> 2016-09-18 16:53:41,632 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (ForkJoinPool-1-worker-1) [] Correlation ID: null, Call Stack: null, Custom
> Event ID: -1, Message: VM mb1 is down with error. Exit message: Lost
> connection with qemu process.
> 
> 
> 2016-09-18 13:46:17.863+0000: 79576: info : virObjectUnref:259 :
> OBJECT_UNREF: obj=0x3fff3c10a4a0
> 2016-09-18 13:53:37.343+0000: shutting down
> 2016-09-18T13:53:38.528359Z qemu-kvm: terminating on signal 15 from pid 4455

Comment 7 Michal Privoznik 2016-09-19 09:19:54 UTC

(In reply to Dan Kenigsberg from comment #1)
> Please supply kernel, qemu-kvm and libvirt versions, as this seems unrelated
> to oVirt.
> 
> Michael, could you try to reproduce this from the `virsh device-attach`
> command line?
> <interface type="bridge">
> 	<mac address="00:1a:4a:16:01:5c"/>
> 	<model type="virtio"/>
> 	<source bridge="ovirtmgmt"/>
> 	<filterref filter="vdsm-no-mac-spoofing"/>
> 	<link state="up"/>
> 	<bandwidth/>
> </interface>
> 
> Michal, maybe the attached libvirt logs are enough for you to understand
> where the problem sit?

I've tried to reproduce too and failed with the same pkgs as Michael reports. Also - for the debug logs you need to set them up:

http://wiki.libvirt.org/page/DebugLogs

The attached logs contain just the basic information.

Comment 10 Laurent Vivier 2016-09-19 12:17:29 UTC

> 2016-09-18 13:46:17.863+0000: 79576: info : virObjectUnref:259 :
> OBJECT_UNREF: obj=0x3fff3c10a4a0
> 2016-09-18 13:53:37.343+0000: shutting down
> 2016-09-18T13:53:38.528359Z qemu-kvm: terminating on signal 15 from pid 4455

Signal 15 is SIGTERM, it's not a crash, someone else has killed QEMU.

We need more logs: 

kernel logs (journalctl)
qemu log (/var/log/libvirt/qemu/XXX.log)

A coredump would be great.

Comment 11 Michael Burman 2016-09-19 14:29:58 UTC

Hi

Attaching the logs you requesting, please note i did libvirtd in debug mode as written here - http://wiki.libvirt.org/page/DebugLogs
but have to say i'm not sure it actually worked. 

And i can't find any coredump.
It looks like libvirt has killed qemu. 

2016-09-19 14:24:30.195+0000: shutting down
2016-09-19T14:24:30.653201Z qemu-kvm: terminating on signal 15 from pid 14603

ps -aux | grep 14603
root      14603  0.1  0.0 1340800 51520 ?       Ssl  09:16   0:01 /usr/sbin/libvirtd --listen

attaching logs

Comment 12 Michael Burman 2016-09-19 14:30:47 UTC

Created attachment 1202516 [details]
more logs

Comment 13 Laurent Vivier 2016-09-19 14:45:06 UTC

It seems you have a lot of I/O errors, is that normal?

Sep 19 07:24:31 ibm-p8-rhevm-hv-02 kernel: blk_update_request: I/O error, dev dm-11, sector 0
Sep 19 07:24:31 ibm-p8-rhevm-hv-02 kernel: blk_update_request: I/O error, dev dm-11, sector 104877952
Sep 19 07:24:31 ibm-p8-rhevm-hv-02 kernel: blk_update_request: I/O error, dev dm-11, sector 104877952
Sep 19 07:24:31 ibm-p8-rhevm-hv-02 kernel: blk_update_request: I/O error, dev dm-11, sector 0
Sep 19 07:24:31 ibm-p8-rhevm-hv-02 kernel: blk_update_request: I/O error, dev dm-11, sector 104877952
Sep 19 07:24:31 ibm-p8-rhevm-hv-02 kernel: blk_update_request: I/O error, dev dm-11, sector 104877952
Sep 19 07:24:31 ibm-p8-rhevm-hv-02 kernel: blk_update_request: I/O error, dev dm-11, sector 0

Comment 14 Michael Burman 2016-09-19 14:48:19 UTC

I don't believe it related(it from some storage tests).
And it's not from the time that i reproduced the issue.

Comment 15 Laurent Vivier 2016-09-19 15:03:35 UTC

(In reply to Michael Burman from comment #14)
> I don't believe it related(it from some storage tests).
> And it's not from the time that i reproduced the issue.

mb5.log is between

    2016-09-19 13:56:01 and 2016-09-19 14:00:44

but messages is between

    Sep 19 07:20:47 and Sep 19 09:06:07

as I don't know your TZ, I can't see what happens when the VM is stopped.
[message is in localtime, and I think mb5.log in UTC, you can use journalctl --utc)

Could you point out from the logs the lines related to the VM shutdown?

Comment 16 Michael Burman 2016-09-19 15:20:02 UTC

Sep 19 09:00:47 ibm-p8-rhevm-hv-02.klab.eng.bos.redhat.com systemd-machined[9372]: Machine qemu-1-mb5 terminated.

Comment 17 Laurent Vivier 2016-09-19 17:14:17 UTC

Thanks.

So machine is started at 8:56:01 and stopped at 9:00:44 (TZ converted mb5.log)

Except the I/O errors that are not related, we have no error in this range (in messages and mb5.log)

So I had a look to the previous attached logs (see the end of the message for the conclusion...).

From vdsm.log:

jsonrpc.Executor/2::DEBUG::2016-09-18 04:14:14,500::__init__::532::jsonrpc.JsonRpcServer::(_handle_request) Calling 'VM.hotplugNic' in bridge with {u'params': {u'nic': {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:16:01:58', u'linkActive': u'true', u'network': u'ovirtmgmt', u'filter': u'vdsm-no-mac-spoofing', u'specParams': {u'inbound': {}, u'outbound': {}}, u'deviceId': u'24650a7e-2cf7-4f71-9966-b976167ecf03', u'device': u'bridge', u'type': u'interface'}, u'vmId': u'41794d2d-0b60-40b9-8d44-685a1ef86a34'}, u'vmID': u'41794d2d-0b60-40b9-8d44-685a1ef86a34'}

jsonrpc.Executor/2::INFO::2016-09-18 04:14:14,598::vm::2051::virt.vm::(hotplugNic) vmId=`41794d2d-0b60-40b9-8d44-685a1ef86a34`::Hotplug NIC xml: <interface type="bridge">
	<mac address="00:1a:4a:16:01:58"/>
	<model type="virtio"/>
	<source bridge="ovirtmgmt"/>
	<filterref filter="vdsm-no-mac-spoofing"/>
	<link state="up"/>
	<bandwidth/>
</interface>

jsonrpc.Executor/2::DEBUG::2016-09-18 04:14:15,679::__init__::557::jsonrpc.JsonRpcServer::(_handle_request) Return 'VM.hotplugNic' in bridge ...

jsonrpc.Executor/2::INFO::2016-09-18 04:14:15,681::__init__::514::jsonrpc.JsonRpcServer::(_serveRequest) RPC call VM.hotplugNic succeeded in 1.18 seconds

jsonrpc.Executor/7::INFO::2016-09-18 04:14:16,594::throttledlog::105::throttled::(log) Current getAllVmStats: {u'41794d2d-0b60-40b9-8d44-685a1ef86a34': 'Up'}

jsonrpc.Executor/0::DEBUG::2016-09-18 04:14:17,049::__init__::532::jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.getVMFullList' in bridge with {u'vmList': [u'41794d2d-0b60-40b9-8d44-685a1ef86a34']}
jsonrpc.Executor/0::DEBUG::2016-09-18 04:14:17,050::__init__::557::jsonrpc.JsonRpcServer::(_handle_request) Return 'Host.getVMFullList' in bridge with [{u'acpiEnable': u'true', u'emulatedMachine': u'pseries...
jsonrpc.Executor/0::INFO::2016-09-18 04:14:17,051::__init__::514::jsonrpc.JsonRpcServer::(_serveRequest) RPC call Host.getVMFullList succeeded in 0.00 seconds
libvirtEventLoop::DEBUG::2016-09-18 04:14:18,000::vm::4648::virt.vm::(onLibvirtLifecycleEvent) vmId=`41794d2d-0b60-40b9-8d44-685a1ef86a34`::event Crashed detail 0 opaque None
...
VM Channels Listener::DEBUG::2016-09-18 04:14:19,525::guestagent::225::virt.vm::(_connect) vmId=`41794d2d-0b60-40b9-8d44-685a1ef86a34`::Attempting connection to /var/lib/libvirt/qemu/channels/41794d2d-0b60-40b9-8d44-685a1ef86a34.com.redhat.rhevm.vdsm
VM Channels Listener::DEBUG::2016-09-18 04:14:19,525::guestagent::239::virt.vm::(_connect) vmId=`41794d2d-0b60-40b9-8d44-685a1ef86a34`::Failed to connect to /var/lib/libvirt/qemu/channels/41794d2d-0b60-40b9-8d44-685a1ef86a34.com.redhat.rhevm.vdsm with 2

In engine.log, we can see hotplug fails because "Virtual machine does not exist".

016-09-18 12:14:14,226 INFO  [org.ovirt.engine.core.bll.network.vm.AddVmInterfaceCommand] (ajp-/127.0.0.1:8702-3) [7585b009] Running command: AddVmInterfaceCommand internal: false. Entities affected :  ID: 41794d2d-0b60-40b9-8d44-685a1ef86a34 Type: VMAction group CONFIGURE_VM_NETWORK with role type USER,  ID: 1367b657-70ac-43b9-b170-89cc2cb070c4 Type: VnicProfileAction group CONFIGURE_VM_NETWORK with role type USER
2016-09-18 12:14:14,396 INFO  [org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand] (ajp-/127.0.0.1:8702-3) [1970b307] Running command: ActivateDeactivateVmNicCommand internal: true. Entities affected :  ID: 41794d2d-0b60-40b9-8d44-685a1ef86a34 Type: VM
2016-09-18 12:14:21,609 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [5a2e1963] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM mb is down with error. Exit message: Lost connection with qemu process.
2016-09-18 12:14:22,671 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (ajp-/127.0.0.1:8702-7) [567447b3] Failed in 'HotPlugNicVDS' method
2016-09-18 12:14:22,690 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) [567447b3] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command failed: Virtual machine does not exist
2016-09-18 12:14:22,733 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) [567447b3] Correlation ID: 567447b3, Call Stack: null, Custom Event ID: -1, Message: Failed to plug Network Interface nic3 (VirtIO) to VM mb. (User: admin@internal)

At the same moment in libvirtd.log, we have:

2016-09-18 09:14:17.884+0000: 4455: info : qemuMonitorIOWrite:534 : QEMU_MONITOR_IO_WRITE: mon=0x3fff7800b7f0 buf={"execute":"query-block","id":"libvirt-313"}
...
2016-09-18 09:14:17.999+0000: 4455: info : qemuMonitorIOProcess:429 : QEMU_MONITOR_IO_PROCESS: mon=0x3fff7800b7f0 buf={"timestamp": {"seconds": 1474190057, "microseconds": 999111}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}
 len=120

So the problem could be triggered by the guest kernel.

Do we have some logs from the guest kernel?
What is the version of the kernel in the guest?

Thanks

Comment 18 David Gibson 2016-09-20 01:34:23 UTC

A few observations,

 * RHEL default config suppresses core dumps, which is why you didn't see one.  We could configure to allow them, but..

 * Since it looks like the SIGTERM comes from outside qemu, a core dump of qemu itself probably wouldn't be useful.

 * What we really want to know is what's the process with PID XXXX when we get the message "qemu-kvm: terminating on signal 15 from pid XXXX" - that's the process that is deciding to kill qemu.
    - My first guess would be libvirtd, since what else would be managing the qemu process that way.  But.. it pays to verify those sorts of assumptions.

 * As Laurent says, it looks like libvirt has detected a guest panic, which might explain a decision to kill qemu.  We'll need guest logs to debug that.  Or, better still, a guest vmcore - that would require setting up the guest to capture vmcores.

Comment 19 Yaniv Kaul 2016-09-20 07:29:58 UTC

(In reply to David Gibson from comment #18)
> A few observations,
> 
>  * RHEL default config suppresses core dumps, which is why you didn't see
> one.  We could configure to allow them, but..

VDSM should configure core dumps.

> 
>  * Since it looks like the SIGTERM comes from outside qemu, a core dump of
> qemu itself probably wouldn't be useful.
> 
>  * What we really want to know is what's the process with PID XXXX when we
> get the message "qemu-kvm: terminating on signal 15 from pid XXXX" - that's
> the process that is deciding to kill qemu.
>     - My first guess would be libvirtd, since what else would be managing
> the qemu process that way.  But.. it pays to verify those sorts of
> assumptions.
> 
>  * As Laurent says, it looks like libvirt has detected a guest panic, which
> might explain a decision to kill qemu.  We'll need guest logs to debug that.
> Or, better still, a guest vmcore - that would require setting up the guest
> to capture vmcores.

Since when is libvirt making such decisions? Isn't it for the upper management layers to decide?

Comment 20 Michael Burman 2016-09-20 08:19:15 UTC

kernel guest is - 3.10.0-327.el7.ppc64le

- The process with the xxxx PID is libvirtd as i wrote in comment 11^^

- Yes indeed when adding the 3th vNIC to the guest VM, a kernel panic is accord and VM immediately goes down. 
kernel:Kernel panic - not syncing: Fatal exception

- Attaching screen shot of the kernel panic and messages log from the guest.

- We can't capture any cores, we tried everything that Google offers and nothing.
Please tell how exactly or contact me and i provide you an access to the guest.

Comment 21 Michael Burman 2016-09-20 08:20:42 UTC

Created attachment 1202772 [details]
screenshot

Comment 22 Laurent Vivier 2016-09-20 08:25:37 UTC

looks like a duplicate of BZ1270717

This is fixed in guest kernel kernel-3.10.0-328.el7.

Could you try it?

Thanks

Comment 23 Michal Privoznik 2016-09-20 08:31:20 UTC

(In reply to Yaniv Kaul from comment #19)
> (In reply to David Gibson from comment #18)
> > 

> >  * As Laurent says, it looks like libvirt has detected a guest panic, which
> > might explain a decision to kill qemu.  We'll need guest logs to debug that.
> > Or, better still, a guest vmcore - that would require setting up the guest
> > to capture vmcores.
> 
> Since when is libvirt making such decisions? Isn't it for the upper
> management layers to decide?

Since mgmt apps offloaded that decision to libvirt. There's <on_crash/> element and depending on its value libvirt might kill the domain. /me goes and checks attached logs ... D'oh. Michael, the logs you've attached, well, can you attach the actual logs please? You've attached config file and /var/log/messages. What I wanted was what the wiki page describe: domain log (okay, this one is attached) and the daemon log (/var/log/libvirt/libvirtd.log) - NOT attached and very crucial.

Comment 24 Michael Burman 2016-09-20 08:51:49 UTC

Hi Laurent

We updated the kernel to 3.10.0-506.el7.ppc64le and now it is working as expected.Thanks.

Comment 25 Laurent Vivier 2016-09-20 08:55:37 UTC


*** This bug has been marked as a duplicate of bug 1270717 ***