Bug 1787318

Summary: [OVN] ovn-controller: crash due to use after free in I-P engine
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: ovn2.12Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: unspecified    
Version: FDP 20.ACC: ctrautma, jishi, mmichels, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1787319 (view as bug list) Environment:
Last Closed: 2020-03-10 10:08:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1787319, 1802325, 1802716    
Attachments:
Description Flags
NB database for replicating the issue. none

Description Dumitru Ceara 2020-01-02 11:36:39 UTC
Created attachment 1649164 [details]
NB database for replicating the issue.

Description of problem:
With the attached scaled configuration if logical-switches are deleted ovn-controller might access freed memory and crash.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start ovn-northd and point it to the attached northbound db (ovnnb_db.db).
2. Start ovn-controller.
3. Start OVS and bind the logical_switch_ports locally:

for i in $(ovn-nbctl --bare --columns name find logical_switch_port type=\"\"); do
    vm=$(echo $i | cut -f 1 -d "-")
    ovs-vsctl add-port br-int $vm -- set interface $vm type=internal
    ovs-vsctl set interface $vm external_ids:iface-id=$i
done

4. Delete all logical switches:
for s in $(ovn-nbctl list logical_switch | grep -E "^name" | cut -f 2 -d ':' | cut -f 2 -d '"'); do ovn-nbctl ls-del $s; done

Actual results:
ovn-controller might crash:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004b8f47 in hmap_first_with_hash (hmap=hmap@entry=0x91da08, hmap=hmap@entry=0x91da08, hash=2346380341)
    at ./include/openvswitch/hmap.h:328
328         return hmap_next_with_hash__(hmap->buckets[hash & hmap->mask], hash);


Expected results:
ovn-controller shouldn't use memory after it was freed.

Additional info:
Fixed upstream by commits:
2a4965c0e187db0c4218556ed9b06f988e88cb62: ovn-controller: Refactor I-P engine_run() tracking.
5ed53faecef12c09330ced445418c961cb1f8caf: ovn-controller: Add per node states to I-P engine.
2117ba0a91f36206d0f3665e8680c15f1f6fa0a0: ovn-controller: Add separate I-P engine node for processing ct-zones.
94cbc59dc0f1cb56e56d1551956efe5824561864: ovn-controller: Fix use of dangling pointers in I-P runtime_data.

Comment 3 Jianlin Shi 2020-02-04 09:25:24 UTC
reproduced on 2.12.0-19 with steps in description:

#!/bin/bash

systemctl restart openvswitch
systemctl restart ovn-northd
ovn-nbctl set-connection ptcp:6641                                                                    
ovn-sbctl set-connection ptcp:6642

ovs-vsctl set open . external-ids:system_id=hv1 external-ids:ovn-remote=tcp:20.0.30.25:6642 external-ids:ovn-encap-type=geneve external-ids:ovn-encap-ip=20.0.30.25

systemctl restart ovn-controller                                                                      

cp ovnnb_db.db /var/lib/ovn -f                                                                        
systemctl restart ovn-northd                                                                          
                                                                                                      

for i in $(ovn-nbctl --bare --columns name find logical_switch_port type=\"\"); do                    
    vm=$(echo $i | cut -f 1 -d "-")
    ovs-vsctl add-port br-int $vm -- set interface $vm type=internal
    ovs-vsctl set interface $vm external_ids:iface-id=$i                                              
done                                                                                                  

for s in $(ovn-nbctl list logical_switch | grep -E "^name" | cut -f 2 -d ':' | cut -f 2 -d '"'); do ovn-nbctl ls-del $s; done

[root@dell-per740-12 bz1787318]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch2.12-2.12.0-21.el7fdp.x86_64                                                               
ovn2.12-2.12.0-19.el7fdp.x86_64                                                                       
ovn2.12-host-2.12.0-19.el7fdp.x86_64                                                                  
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
ovn2.12-central-2.12.0-19.el7fdp.x86_64

log in /var/log/messages:

Feb  4 04:17:12 dell-per740-12 kernel: ovn-controller[109991]: segfault at 45ed761a8 ip 000056012d39b027 sp 00007fff94ebd890 error 4 in ovn-controller[56012d2b4000+23d000]


Verified on ovn2.12.0-26:

[root@dell-per740-12 bz1787318]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch2.12-2.12.0-21.el7fdp.x86_64                                                               
ovn2.12-2.12.0-26.el7fdp.x86_64                                                                       
ovn2.12-central-2.12.0-26.el7fdp.x86_64                                                               
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
ovn2.12-host-2.12.0-26.el7fdp.x86_64

no segfault error in /var/log/messages.

Comment 5 errata-xmlrpc 2020-03-10 10:08:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0752