Bug 1546719
Summary: | OVS errors in ovs_idl.connection thread not handled properly | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marcin Mirecki <mmirecki> | |
Component: | python-ovsdbapp | Assignee: | Terry Wilson <twilson> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Eran Kuris <ekuris> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 13.0 (Queens) | CC: | amuller, danken, ekuris, jlibosva, jschluet, njohnston, shdunne, twilson | |
Target Milestone: | z9 | Keywords: | TestOnly, Triaged, ZStream | |
Target Release: | 13.0 (Queens) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | python-ovsdbapp-0.10.4-1.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1757512 (view as bug list) | Environment: | ||
Last Closed: | 2019-12-10 14:26:46 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1757512 |
Description
Marcin Mirecki
2018-02-19 12:11:53 UTC
If ovs.db.idl.run() raises an Exception, it means that the server has most likely sent us a message that ovsdbapp can't handle. run() is called here outside of a Transaction, so there is no way to pass the Exception back to the other thread like we do for Exceptions in do_commit(). I think the only thing we could really do would be to log the exception and continue, though I'm not sure this would be good. If we've gotten an exception when updating our in-memory copy of the DB, that means we are now no longer reflecting what is in the database. Messages that we have sent to the DB have modified it, but those changes will not be reflected when we examine idl.tables[table].rows, etc. Another thing to try would be to try to force a reconnect, but that will cause the whole database to be dumped back into memory--including whatever message most likely would have been sent that caused the Exception in the first place. I'm just not sure that an Exception in idl.run() is actually recoverable. We can log as much information as we can, but I think that stopping might actually be the best action to take in this exceptional instance. Maybe shutting down the thread more cleanly. But ovsdbapp is designed to allow a txn to be queued even if we aren't currently connected (it will be run upon connection), so we still wouldn't be notifying the caller that anything had happened until they time out. Since it isn't really possible to recover (the in-memory copy of the database will be out of sync if we've failed on a read from the database in idl.run()), I've added a patch to log the exception and to ensure that we eventually time out when queueing a transaction. According to Terry, this bug should be verified by functional test - impl_idl. It looks like it failed in the latest run. https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-13-dsvm-functional-rhos/482/consoleFull failed QA according to CI functional test failures. (In reply to Eran Kuris from comment #20) > According to Terry, this bug should be verified by functional test - > impl_idl. > It looks like it failed in the latest run. > https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/ > network/view/networking-ovn/job/DFG-network-networking-ovn-13-dsvm- > functional-rhos/482/consoleFull > > failed QA according to CI functional test failures. Moving back to ON_QA as this is ovsdbapp and the failure above is from networking-ovn and related to backported port groups. |