Created attachment 1398219 [details] engine log Description of problem: [DB] [DNS] - Updating the host's capabilities while running a VM may cause 'ERROR: duplicate key value violates unique constraint "name_server_pkey"' The issue is the dns_configuration of the host is part of the vds_dynamic table. If the table is updated not as part of getCapapbilies (for example it is updated when running a VM) and at the same time getCaps (or other updates of the vdsDynamic table) is performed, a race can happen. - Those are the problematic two lines - removeNameServersByDnsResolverConfigurationId(entity.getId()); saveNameServersByDnsResolverConfigurationId(entity.getId(), entity.getNameServers()); - If a context switch happens after/during the removal (thread A) and another thread (thread B) updates the dns configuration. When returning to thread A, the duplicate name server error will happen. Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "name_server_pkey" Detail: Key (dns_resolver_configuration_id, address)=(86405e21-b3fd-4968-aa66-287daa178a81, 10.35.28.28) already exists. Where: SQL statement "INSERT INTO name_server( address, position, dns_resolver_configuration_id) VALUES ( v_address, v_position, v_dns_resolver_configuration_id)" PL/pgSQL function insertnameserver(uuid,character varying,smallint) line 3 at SQL statement Version-Release number of selected component (if applicable): 4.2.2-0.1.el7 How reproducible: 100% Steps to Reproduce: 1. Refresh caps during start VM operation Actual results: duplicate name server error will happen
This bug seems to have caused a failure to start a VM, raising severity http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1064/testReport/junit/(root)/004_basic_sanity/run_vms/
Is this on track to 4.2.2? If not, please defer to 4.2.3.
- Network status is PASS - (manual testing) - No regression introduced (Network tier2 is PASS) - Waiting for an ACK from Virt team - As they saw this bug multiple times on their automation runs + make sure no regression introduced there as well. Tested on - 4.2.2.5-0.1.el7 and vdsm-4.20.23-1.el7ev.x86_64 Keeping ON_QA until the ACK from Virt team.
In Virt we don't see it anymore. Michael you can verify it. Israel
Thanks Israel, Verified on - 4.2.2.5-0.1.el7 and vdsm-4.20.23-1.el7ev.x86_64
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.