Hide Forgot
Created attachment 1211993 [details] Result of: strace -Cvfr -p `pidof ovs-vswitchd` -o vswitchd.strace When the openvswitch service is started, the ovs-vswitchd process takes a 100% of CPU time. Top dump: 1227 root 10 -10 48208 11240 9908 R 100.0 0.1 1534:17 ovs-vswitchd The error seems to have appeared without any specific cause. Once it started, it is now always occurring whenever the openvswitch service is started. Result of: strace -Cvfr -p `pidof ovs-vswitchd` -o vswitchd.strace shows the following recurring piece of log being generated at an enormous rate: 1227 0.000013 recvfrom(3, 0x55f5a48f6dcd, 339, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 1227 0.000013 recvmsg(11, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) 1227 0.000012 accept(10, 0x7ffc6ea263d0, [128]) = -1 EAGAIN (Resource temporarily unavailable) 1227 0.000014 recvmsg(14, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) 1227 0.000012 recvmsg(13, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) 1227 0.000013 recvmsg(13, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) 1227 0.000012 recvmsg(13, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) 1227 0.000014 poll([{fd=13, events=POLLIN}, {fd=14, events=POLLIN}, {fd=11, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=3, events=POLLIN}], 6, 0) = 0 (Timeout) 1227 0.000015 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={53405, 379161}, ru_stime={38423, 776487}, ru_maxrss=11240, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=496, ru_majflt=0, ru_nswap=0, ru_inblock=1688, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=53779241, ru_nivcsw=260447}) = 0
Created attachment 1215516 [details] patched ovs with fix and debug Hi. Can you test this package and let me know if it fixes the problem? If it doesn't, can you enable debug for poll-loop and send the logs, please? ovs-appctl vlog/set poll_loop:DBG Thanks. Cascardo.
Created attachment 1216532 [details] openswitch logs after patch from 2016-10-30 The patch seems to have fixed the issue. I attach the collected openvswitch logs.
Hi, Marcin. The fix has been applied upstream, can you test master and see if it fixes your problem? Just in case it is really fixed by the fix and not the debug messages I added. Thanks. Cascardo.
Created attachment 1220123 [details] ovs-vswitchd log
Hi guys, I have the problem reproduced on my environment: (Open vSwitch) 2.6.90.
The logs seem to indicate timeouts every 500ms for revalidator and every 5s for stats update. All normal timeouts. Also, only 0% CPU use is shown. Either the logs do not represent the time when the issue happened, or there is something that is not being accounted. Cascardo.
I grabbed the log only after verifying that ovs-vswitchd process consumes high CPU usage. I also set the OVS logging level to debug (ovs-appctl vlog/set dbg). I will try to upgrade to OVS 2.6.1. Thanks, Mor
Hi Marcin, I'm network-QE and reviewing the BZ, but cannot reproduce the issue. Could you please give more info? like kernel version, openvswitch package, steps, etc. Thanks QJ
Hi, Its not always reproducible on my environment. And I have no steps to reproduce it. I am attaching additional debug level logs taken from OVN host. Kernel version: 3.10.0-512.el7.x86_64 OVS packages: openvswitch-2.6.90-1.fc24.x86_64
Created attachment 1222398 [details] ovs-vswitchd log
Created attachment 1222447 [details] strace log
I will try to upgrade to the latest master release and update with my findings.
It seems to be fixed with the latest master release.
(In reply to Mor from comment #10) > > Its not always reproducible on my environment. And I have no steps to > reproduce it. > > I am attaching additional debug level logs taken from OVN host. > > Kernel version: > 3.10.0-512.el7.x86_64 > > OVS packages: > openvswitch-2.6.90-1.fc24.x86_64 Thank you for the update. Without steps, could you please give something like command line history? I'd like to know what you have done before the issue happens to help me to reproduce. thanks QJ
In our case, the changes are done by a software provider (which provides CMS access to OVN). This software component uses the OVS API. So unfortunately, the command-line history won't us to track the changes.
The validated fix is applied to fdBeta package. Please open a fresh a bug if there is another issue to be resolved. Thanks!
(In reply to Mor from comment #16) > In our case, the changes are done by a software provider (which provides CMS > access to OVN). This software component uses the OVS API. So unfortunately, > the command-line history won't us to track the changes. Now the issue has been fixed in openvswitch-2.5.0-22.git20160727.el7fdb. I cannot reproduce the issue. Could you please help verify it? thank you QJ
Hi QJ, In our environment we require OVS 2.6.0 or higher, so I unable to verify it on 2.5.0. THanks. Mor
(In reply to Mor from comment #21) > Hi QJ, > > In our environment we require OVS 2.6.0 or higher, so I unable to verify it > on 2.5.0. > It's ok. Thank you anyway.
clearing needinfo (already resolved)