Bug 1547070

Summary: [DB] [DNS] - Updating the host's capabilities while running a VM may cause 'ERROR: duplicate key value violates unique constraint "name_server_pkey"'
Product: [oVirt] ovirt-engine Reporter: Michael Burman <mburman>
Component: BLL.NetworkAssignee: Alona Kaplan <alkaplan>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.1.4CC: alkaplan, bugs, danken, ipinto, lrotenbe, ylavi
Target Milestone: ovirt-4.2.2Flags: rule-engine: ovirt-4.2+
ylavi: exception+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-05 09:39:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log none

Description Michael Burman 2018-02-20 13:09:59 UTC
Created attachment 1398219 [details]
engine log

Description of problem:
[DB] [DNS] - Updating the host's capabilities while running a VM may cause 'ERROR: duplicate key value violates unique constraint "name_server_pkey"'

The issue is the dns_configuration of the host is part of the vds_dynamic table.
If the table is updated not as part of getCapapbilies (for example it is updated when running a VM) and at the same time getCaps (or other updates of the vdsDynamic table) is performed, a race can happen.

- Those are the problematic two lines -

removeNameServersByDnsResolverConfigurationId(entity.getId());
saveNameServersByDnsResolverConfigurationId(entity.getId(), entity.getNameServers());

- If a context switch happens after/during the removal (thread A) and another thread  (thread B) updates the dns configuration.
When returning to thread A, the duplicate name server error will happen.

Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "name_server_pkey"
  Detail: Key (dns_resolver_configuration_id, address)=(86405e21-b3fd-4968-aa66-287daa178a81, 10.35.28.28) already exists.
  Where: SQL statement "INSERT INTO
    name_server(
      address,
      position,
      dns_resolver_configuration_id)
    VALUES (
      v_address,
      v_position,
      v_dns_resolver_configuration_id)"
PL/pgSQL function insertnameserver(uuid,character varying,smallint) line 3 at SQL statement


Version-Release number of selected component (if applicable):
4.2.2-0.1.el7

How reproducible:
100%

Steps to Reproduce:
1. Refresh caps during start VM operation

Actual results:
duplicate name server error will happen

Comment 1 Dan Kenigsberg 2018-03-13 12:40:06 UTC
This bug seems to have caused a failure to start a VM, raising severity

http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1064/testReport/junit/(root)/004_basic_sanity/run_vms/

Comment 2 Yaniv Kaul 2018-03-15 14:04:13 UTC
Is this on track to 4.2.2? If not, please defer to 4.2.3.

Comment 3 Michael Burman 2018-03-29 09:03:02 UTC
- Network status is PASS - (manual testing)
- No regression introduced (Network tier2 is PASS)
- Waiting for an ACK from Virt team - As they saw this bug multiple times on their automation runs + make sure no regression introduced there as well. 

Tested on - 4.2.2.5-0.1.el7 and vdsm-4.20.23-1.el7ev.x86_64 

Keeping ON_QA until the ACK from Virt team.

Comment 4 Israel Pinto 2018-04-01 12:18:02 UTC
In Virt we don't see it anymore. 
Michael you can verify it.

Israel

Comment 5 Michael Burman 2018-04-01 14:18:01 UTC
Thanks Israel, 

Verified on - 4.2.2.5-0.1.el7 and vdsm-4.20.23-1.el7ev.x86_64

Comment 6 Sandro Bonazzola 2018-04-05 09:39:11 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.