Bug 1787319

Summary: [OVN] ovn-controller: crash due to use after free in I-P engine
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: ovn2.11Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: unspecified    
Version: FDP 20.ACC: ctrautma, jishi, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.11-2.11.1-33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1787318 Environment:
Last Closed: 2020-11-10 15:23:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1787318    
Bug Blocks:    

Description Dumitru Ceara 2020-01-02 11:37:37 UTC
+++ This bug was initially created as a clone of Bug #1787318 +++

Description of problem:
With the attached scaled configuration if logical-switches are deleted ovn-controller might access freed memory and crash.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start ovn-northd and point it to the attached northbound db (ovnnb_db.db).
2. Start ovn-controller.
3. Start OVS and bind the logical_switch_ports locally:

for i in $(ovn-nbctl --bare --columns name find logical_switch_port type=\"\"); do
    vm=$(echo $i | cut -f 1 -d "-")
    ovs-vsctl add-port br-int $vm -- set interface $vm type=internal
    ovs-vsctl set interface $vm external_ids:iface-id=$i
done

4. Delete all logical switches:
for s in $(ovn-nbctl list logical_switch | grep -E "^name" | cut -f 2 -d ':' | cut -f 2 -d '"'); do ovn-nbctl ls-del $s; done

Actual results:
ovn-controller might crash:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004b8f47 in hmap_first_with_hash (hmap=hmap@entry=0x91da08, hmap=hmap@entry=0x91da08, hash=2346380341)
    at ./include/openvswitch/hmap.h:328
328         return hmap_next_with_hash__(hmap->buckets[hash & hmap->mask], hash);


Expected results:
ovn-controller shouldn't use memory after it was freed.

Additional info:
Fixed upstream by commits:
2a4965c0e187db0c4218556ed9b06f988e88cb62: ovn-controller: Refactor I-P engine_run() tracking.
5ed53faecef12c09330ced445418c961cb1f8caf: ovn-controller: Add per node states to I-P engine.
2117ba0a91f36206d0f3665e8680c15f1f6fa0a0: ovn-controller: Add separate I-P engine node for processing ct-zones.
94cbc59dc0f1cb56e56d1551956efe5824561864: ovn-controller: Fix use of dangling pointers in I-P runtime_data.

Comment 2 Jianlin Shi 2020-03-23 09:41:24 UTC
Hi Dumitru,

I failed to reproduce the issue on ovn2.11-2.11.1-24.el7fdp.x86_64 with steps in https://bugzilla.redhat.com/show_bug.cgi?id=1787318#c3.

Comment 3 Dumitru Ceara 2020-03-30 15:24:16 UTC
Hi Jianlin,

The crash was made more visible by commit [1] but this was squashed in the patches for ovn2.11-2.11.1-26 which also fix the crash.
The steps described in https://bugzilla.redhat.com/show_bug.cgi?id=1787318#c3 don't work in replicating the issue because they were exercising the code path added by [1].

I don't see a straight forward way of replicating the issue without [1]. There are, in theory, code paths that would trigger the memory corruption but I couldn't hit them.

Regards,
Dumitru

[1] https://github.com/ovn-org/ovn/commit/fc1e1640cd47f255c68488b0ec36052b0af58fd2#diff-452d44dee1f09b8a972c69ef7499a69c

Comment 4 Jianlin Shi 2020-03-31 03:49:34 UTC
set VERIFIED per comment 3

Comment 5 Dan Williams 2020-11-10 15:23:08 UTC
All these bugs have been verified and have shipped in FDP 20.G or earlier.