Bug 1786675

Summary: IPI on upshift openstack failed due to 'Security group rule already exists' error
Product: OpenShift Container Platform Reporter: Johnny Liu <jialiu>
Component: InstallerAssignee: Adolfo Duarte <adduarte>
Installer sub component: OpenShift on OpenStack QA Contact: David Sanz <dsanzmor>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: adduarte, mfedosin, ppitonak, pprinett, xtian
Version: 4.3.0Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1788062 1788585 (view as bug list) Environment:
Last Closed: 2020-04-22 21:54:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1788062, 1788585    

Description Johnny Liu 2019-12-27 03:33:16 UTC
Description of problem:

Version-Release number of the following components:
4.3.0-0.nightly-2019-12-25-124912

How reproducible:
Always

Steps to Reproduce:
1. Trigger a ipi install on upshift OSP
2.
3.

Actual results:
Installation failed with the following terraform error log:
<--snip-->
level=debug msg="module.masters.openstack_compute_instance_v2.master_conf[1]: Creation complete after 49s [id=37039f09-bd05-4815-8c63-989cf0c0eeb0]"
level=debug msg="module.bootstrap.openstack_compute_instance_v2.bootstrap: Creation complete after 49s [id=ec459c7d-fca2-4713-9b8d-30593aaa45fc]"

level=debug msg="module.masters.openstack_compute_instance_v2.master_conf[0]: Still creating... [50s elapsed]"

level=debug msg="module.masters.openstack_compute_instance_v2.master_conf[0]: Creation complete after 59s [id=0790db4d-5277-4186-ae08-9194e9d060b1]"
level=error
level=error msg="Error: Error creating openstack_networking_secgroup_rule_v2: Expected HTTP response code [] when accessing [POST https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13696/v2.0/security-group-rules], but got 409 instead"
level=error msg="{\"NeutronError\": {\"message\": \"Security group rule already exists. Rule id is 566d9f7c-a8d5-476c-8073-2ff193f0fe25.\", \"type\": \"SecurityGroupRuleExists\", \"detail\": \"\"}}"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-677071673/topology/sg-master.tf line 231, in resource \"openstack_networking_secgroup_rule_v2\" \"master_ingress_kubelet_secure_from_worker\":"
level=error msg=" 231: resource \"openstack_networking_secgroup_rule_v2\" \"master_ingress_kubelet_secure_from_worker\" {"
level=error
level=error
level=error
level=error msg="Error: Error creating openstack_networking_secgroup_rule_v2: Expected HTTP response code [] when accessing [POST https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13696/v2.0/security-group-rules], but got 409 instead"
level=error msg="{\"NeutronError\": {\"message\": \"Security group rule already exists. Rule id is b0efd6d6-e5e6-4a70-afd8-5d64fb08009b.\", \"type\": \"SecurityGroupRuleExists\", \"detail\": \"\"}}"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-677071673/topology/sg-master.tf line 261, in resource \"openstack_networking_secgroup_rule_v2\" \"master_ingress_services_udp\":"
level=error msg=" 261: resource \"openstack_networking_secgroup_rule_v2\" \"master_ingress_services_udp\" {"
level=error
level=error
level=error
level=error msg="Error: Error creating openstack_networking_secgroup_rule_v2: Expected HTTP response code [] when accessing [POST https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13696/v2.0/security-group-rules], but got 409 instead"
level=error msg="{\"NeutronError\": {\"message\": \"Security group rule already exists. Rule id is 54b43ba0-a394-4591-86b9-bc8592a7ba70.\", \"type\": \"SecurityGroupRuleExists\", \"detail\": \"\"}}"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-677071673/topology/sg-master.tf line 271, in resource \"openstack_networking_secgroup_rule_v2\" \"master_ingress_vrrp\":"
level=error msg=" 271: resource \"openstack_networking_secgroup_rule_v2\" \"master_ingress_vrrp\" {"
level=error
level=error
level=error
level=error msg="Error: Error creating openstack_networking_secgroup_rule_v2: Expected HTTP response code [] when accessing [POST https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13696/v2.0/security-group-rules], but got 409 instead"
level=error msg="{\"NeutronError\": {\"message\": \"Security group rule already exists. Rule id is bdcb485d-763e-41ea-9afa-2258284824d7.\", \"type\": \"SecurityGroupRuleExists\", \"detail\": \"\"}}"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-677071673/topology/sg-worker.tf line 19, in resource \"openstack_networking_secgroup_rule_v2\" \"worker_ingress_ssh\":"
level=error msg="  19: resource \"openstack_networking_secgroup_rule_v2\" \"worker_ingress_ssh\" {"
level=error
level=error
level=error
level=error msg="Error: Error creating openstack_networking_secgroup_rule_v2: Expected HTTP response code [] when accessing [POST https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13696/v2.0/security-group-rules], but got 409 instead"
level=error msg="{\"NeutronError\": {\"message\": \"Security group rule already exists. Rule id is 0ffcba9c-a3f1-4a8d-99ec-4d9817671ecd.\", \"type\": \"SecurityGroupRuleExists\", \"detail\": \"\"}}"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-677071673/topology/sg-worker.tf line 150, in resource \"openstack_networking_secgroup_rule_v2\" \"worker_ingress_kubelet_insecure\":"
level=error msg=" 150: resource \"openstack_networking_secgroup_rule_v2\" \"worker_ingress_kubelet_insecure\" {"
level=error
level=error
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"

Expected results:
Install succeed.

Additional info:
After the failure, use neutron client against upshift openstack, the security group rules are indeed already created. I am not sure if this is something about upshift openstack performance issue.

Comment 9 David Sanz 2020-01-08 09:50:05 UTC
Verified on 4.4.0-0.nightly-2020-01-08-072157

Comment 13 Johnny Liu 2020-01-08 11:25:57 UTC
Hmm, seem like upshift did some enhancement to make performance better, now I can not reproduce this bug any more. I tried the following builds, all succeeded.
4.2.1
4.4.0-0.nightly-2020-01-07-172830
4.4.0-0.nightly-2020-01-05-221122
4.3.0-0.nightly-2020-01-08-005052
4.3.0-0.nightly-2020-01-01-081457
4.3.0-0.nightly-2020-01-06-005750


In the above build, only 4.4.0-0.nightly-2020-01-07-172830 has the fix PR. 

What QE can do now is ensure the fix PR does not introduce any regression issue.