Bug 1284725 - host side OVS core dump when start testpmd in guest
host side OVS core dump when start testpmd in guest
Status: CLOSED CURRENTRELEASE
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch-dpdk (Show other bugs)
unspecified
x86_64 Linux
high Severity high
: ---
: 8.0 (Liberty)
Assigned To: Flavio Leitner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-23 21:45 EST by Peter Xu
Modified: 2016-01-25 16:41 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-25 16:41:21 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 6 Flavio Leitner 2016-01-11 08:17:22 EST
Peter,

Could you please attach a core?
Thanks,
fbl
Comment 7 Peter Xu 2016-01-11 20:51:55 EST
Hi, Flavio, 

I am using my current environment with other tests, I am just afraid replacing packages might break something. Sorry for not being able to provide the core now.

I was thinking that the backtrace is meaningful enough, so I didn't keep the core. I will keep it next time when I encounter problems.

If you would not mind, I would suggest the following options:

1. Do you have any free environment to test? I think using these packages:

openvswitch-dpdk-2.4.0-0.10346.git97bab959.1.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64

And then start testpmd in the guest should possibly trigger the bug (at least 100% for me before, if nothing important I missed). When the bug triggers, guest will not die, however, ovs-vswitchd will core (so guest network down).

2. Since this bug is only happening with specific QEMU packages (neither it will happen with old ones, nor the new ones), and if we are going to rebase to newer DPDK some day, the problem might solve itself too. So I would also have no problem to just put this bug aside at least for now. 

What do you think?
Comment 8 Flavio Leitner 2016-01-18 11:51:17 EST
The plan is to rebase openvswitch-dpdk soon to 2.5, but we would still support 2.4 anyways.  However, if you can't reproduce the issue with newer qemu then I would assume it's fixed somehow.

This is the DPDK code:
    case VHOST_USER_RESET_OWNER:
        ops->reset_owner(ctx);
        break;

This is the QEMU vhost-user spec:
 * VHOST_USER_RESET_OWNER

      Id: 4
      Master payload: N/A

      This is no longer used. Used to be sent to request disabling
      all rings, but some clients interpreted it to also discard
      connection state (this interpretation would lead to bugs).
      It is recommended that clients either ignore this message,
      or use it to disable all rings.

So maybe the newer qemu doesn't use it anymore.
Having said that, I'd follow your suggestion in #2.
Comment 9 Peter Xu 2016-01-18 20:45:21 EST
(In reply to Flavio Leitner from comment #8)
> The plan is to rebase openvswitch-dpdk soon to 2.5, but we would still
> support 2.4 anyways.  However, if you can't reproduce the issue with newer
> qemu then I would assume it's fixed somehow.
> 
> This is the DPDK code:
>     case VHOST_USER_RESET_OWNER:
>         ops->reset_owner(ctx);
>         break;
> 
> This is the QEMU vhost-user spec:
>  * VHOST_USER_RESET_OWNER
> 
>       Id: 4
>       Master payload: N/A
> 
>       This is no longer used. Used to be sent to request disabling
>       all rings, but some clients interpreted it to also discard
>       connection state (this interpretation would lead to bugs).
>       It is recommended that clients either ignore this message,
>       or use it to disable all rings.
> 
> So maybe the newer qemu doesn't use it anymore.
> Having said that, I'd follow your suggestion in #2.

Okay. Is there anything that I can provide (regarding to needinfo)?

Peter
Comment 10 Flavio Leitner 2016-01-25 16:41:21 EST
Closing with current release since the newer qemu fixes the issue.

Note You need to log in before you can comment on or make changes to this bug.