Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1787319

Summary: [OVN] ovn-controller: crash due to use after free in I-P engine
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: ovn2.11Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: unspecified    
Version: FDP 20.ACC: ctrautma, jishi, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.11-2.11.1-33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1787318 Environment:
Last Closed: 2020-11-10 15:23:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1787318    
Bug Blocks:    

Description Dumitru Ceara 2020-01-02 11:37:37 UTC
+++ This bug was initially created as a clone of Bug #1787318 +++

Description of problem:
With the attached scaled configuration if logical-switches are deleted ovn-controller might access freed memory and crash.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start ovn-northd and point it to the attached northbound db (ovnnb_db.db).
2. Start ovn-controller.
3. Start OVS and bind the logical_switch_ports locally:

for i in $(ovn-nbctl --bare --columns name find logical_switch_port type=\"\"); do
    vm=$(echo $i | cut -f 1 -d "-")
    ovs-vsctl add-port br-int $vm -- set interface $vm type=internal
    ovs-vsctl set interface $vm external_ids:iface-id=$i
done

4. Delete all logical switches:
for s in $(ovn-nbctl list logical_switch | grep -E "^name" | cut -f 2 -d ':' | cut -f 2 -d '"'); do ovn-nbctl ls-del $s; done

Actual results:
ovn-controller might crash:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004b8f47 in hmap_first_with_hash (hmap=hmap@entry=0x91da08, hmap=hmap@entry=0x91da08, hash=2346380341)
    at ./include/openvswitch/hmap.h:328
328         return hmap_next_with_hash__(hmap->buckets[hash & hmap->mask], hash);


Expected results:
ovn-controller shouldn't use memory after it was freed.

Additional info:
Fixed upstream by commits:
2a4965c0e187db0c4218556ed9b06f988e88cb62: ovn-controller: Refactor I-P engine_run() tracking.
5ed53faecef12c09330ced445418c961cb1f8caf: ovn-controller: Add per node states to I-P engine.
2117ba0a91f36206d0f3665e8680c15f1f6fa0a0: ovn-controller: Add separate I-P engine node for processing ct-zones.
94cbc59dc0f1cb56e56d1551956efe5824561864: ovn-controller: Fix use of dangling pointers in I-P runtime_data.

Comment 2 Jianlin Shi 2020-03-23 09:41:24 UTC
Hi Dumitru,

I failed to reproduce the issue on ovn2.11-2.11.1-24.el7fdp.x86_64 with steps in https://bugzilla.redhat.com/show_bug.cgi?id=1787318#c3.

Comment 3 Dumitru Ceara 2020-03-30 15:24:16 UTC
Hi Jianlin,

The crash was made more visible by commit [1] but this was squashed in the patches for ovn2.11-2.11.1-26 which also fix the crash.
The steps described in https://bugzilla.redhat.com/show_bug.cgi?id=1787318#c3 don't work in replicating the issue because they were exercising the code path added by [1].

I don't see a straight forward way of replicating the issue without [1]. There are, in theory, code paths that would trigger the memory corruption but I couldn't hit them.

Regards,
Dumitru

[1] https://github.com/ovn-org/ovn/commit/fc1e1640cd47f255c68488b0ec36052b0af58fd2#diff-452d44dee1f09b8a972c69ef7499a69c

Comment 4 Jianlin Shi 2020-03-31 03:49:34 UTC
set VERIFIED per comment 3

Comment 5 Dan Williams 2020-11-10 15:23:08 UTC
All these bugs have been verified and have shipped in FDP 20.G or earlier.