Bug 1917579

Summary: DNS daemonset rollout is taking 5.27 hours on a 250 node cluster during the upgrades
Product: OpenShift Container Platform Reporter: Naga Ravi Chaitanya Elluri <nelluri>
Component: NetworkingAssignee: aos-network-edge-staff <aos-network-edge-staff>
Networking sub component: DNS QA Contact: Hongan Li <hongli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: amcdermo, aos-bugs, nelluri, sdodson, sgreene, wking
Version: 4.7Keywords: Upgrades
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard: aos-scalability-47
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-19 17:53:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Naga Ravi Chaitanya Elluri 2021-01-18 20:00:04 UTC
Description of problem:
DNS cluster operator is taking ~5 hours for the daemonsets to roll out on a 250 node cluster during the upgrade from 4.6.9 -> 4.7.0-fc.2 even when the max unavilable replicas is set to 10% instead of 1 after patching https://bugzilla.redhat.com/show_bug.cgi?id=1880148:

  updateStrategy:
    rollingUpdate:
      maxUnavailable: 10%
    type: RollingUpdate

The replicas seem to still rollout one at a time: https://snapshot.raintank.io/dashboard/snapshot/NOB8VPaH1UUNQIdl2gl5CwREdiPiWYwX.

Logs: http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/large-scale/4.7-sdn-kube-1.20/bugs/dns-slow-rollout-upgrades/
 
Version-Release number of selected component (if applicable):
4.7.0-fc.2

How reproducible:
We just ran the upgrade once for this version.

Steps to Reproduce:
1. Install a large scale cluster using 4.6.9 bits.
2. Upgrade to 4.7.0-fc.2
3. Monitor the upgrade timing of the dns cluster operator

Actual results:
1 replica of DNS daemonset is rolled out at a time.

Expected results:
10% of the replicas are rolled out at a time.

Additional info:

Comment 1 Andrew McDermott 2021-01-19 17:53:12 UTC

*** This bug has been marked as a duplicate of bug 1903887 ***