Bug 1443687 - Few Completed/Failed builds are missing duration
Summary: Few Completed/Failed builds are missing duration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Ben Parees
QA Contact: Vikas Laad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-19 18:04 UTC by Vikas Laad
Modified: 2017-08-16 19:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Build durations are recorded as part of a storage hook Consequence: Build duration is sometimes calculated incorrectly and reported with an invalid value Fix: Calculate build duration when recording build time of build completion Result: Build durations are reported correctly and align with the build start/completion times
Clone Of:
Environment:
Last Closed: 2017-08-10 05:20:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
completed build json (2.90 KB, text/plain)
2017-04-19 18:04 UTC, Vikas Laad
no flags Details
failed build desc (1.85 KB, text/plain)
2017-04-19 18:04 UTC, Vikas Laad
no flags Details
completed build desc (1.91 KB, text/plain)
2017-04-19 18:05 UTC, Vikas Laad
no flags Details
build json v3.6.65 (5.14 KB, text/plain)
2017-05-08 18:04 UTC, Vikas Laad
no flags Details
build desc v3.6.65 (2.20 KB, text/plain)
2017-05-08 18:04 UTC, Vikas Laad
no flags Details
logs with loglevel5 v3.6.65 (231.04 KB, text/plain)
2017-05-08 19:53 UTC, Vikas Laad
no flags Details
build json v3.6.65 (5.31 KB, text/plain)
2017-05-09 18:39 UTC, Vikas Laad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Vikas Laad 2017-04-19 18:04:08 UTC
Created attachment 1272696 [details]
completed build json

Description of problem:
Some of the Completed/Failed builds are missing duration

proj9       cakephp-mysql-example-19   Source    Git@0014dde   Complete                             2 hours ago         20s
proj9       cakephp-mysql-example-2    Source    Git@0014dde   Complete                             2 hours ago         8s
proj9       cakephp-mysql-example-20   Source    Git@0014dde   Complete                             2 hours ago         
proj9       cakephp-mysql-example-21   Source    Git@0014dde   Complete                             2 hours ago         5s
proj9       cakephp-mysql-example-22   Source    Git@0014dde   Complete                             2 hours ago         48s
proj9       cakephp-mysql-example-23   Source    Git@0014dde   Complete                             2 hours ago         
proj9       cakephp-mysql-example-24   Source    Git@0014dde   Complete                             2 hours ago         35s
proj9       cakephp-mysql-example-25   Source    Git@0014dde   Complete                             2 hours ago         
proj9       cakephp-mysql-example-26   Source    Git@0014dde   Complete                             2 hours ago         46s
proj9       cakephp-mysql-example-27   Source    Git@0014dde   Complete                             2 hours ago         42s

proj13      cakephp-mysql-example-38   Source    Git@0014dde   Failed (PushImageToRegistryFailed)   5 minutes ago       4m47s
proj15      cakephp-mysql-example-36   Source    Git@0014dde   Failed                               About an hour ago   
proj17      cakephp-mysql-example-36   Source    Git@0014dde   Failed                               About an hour ago   2m15s
proj19      cakephp-mysql-example-36   Source    Git@0014dde   Failed                               About an hour ago   
proj2       cakephp-mysql-example-36   Source    Git@0014dde   Failed                               About an hour ago   1m36s



Version-Release number of selected component (if applicable):
openshift v3.5.5.7
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-16.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Tue Mar 21 13:30:59 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-16.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Tue Mar 21 13:30:59 2017
 OS/Arch:         linux/amd64

Steps to Reproduce:
1. Create 20 cakephp build configs
2. start 20 concurrent builds
3. This happens after few builds are completed fine

Actual results:
Some builds are missing duration column

Expected results:
All the builds should have duration column populated


Additional info:
Please see attached description of Failed and Completed builds.

Comment 1 Vikas Laad 2017-04-19 18:04:45 UTC
Created attachment 1272698 [details]
failed build desc

Comment 2 Vikas Laad 2017-04-19 18:05:15 UTC
Created attachment 1272699 [details]
completed build desc

Comment 3 Ben Parees 2017-04-21 09:24:53 UTC
the build in https://bugzilla.redhat.com/attachment.cgi?id=1272696 at least shows a started timestamp and a completion timestamp, which makes me think you're hitting this:
https://github.com/openshift/origin/issues/12549

which was fixed here:
https://github.com/openshift/origin/pull/12569

Comment 4 Vikas Laad 2017-05-08 18:03:51 UTC
Tested with following version of openshift and docker, I still see the problem. Please see attachments for details

root@ip-172-31-4-124: ~ # openshift version
openshift v3.6.65
kubernetes v1.6.1+5115d708d7
etcd 3.1.0
root@ip-172-31-4-124: ~ # docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-19.git92b10e4.el7.x86_64
 Go version:      go1.7.4
 Git commit:      92b10e4/1.12.6
 Built:           Tue May  2 15:06:29 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-19.git92b10e4.el7.x86_64
 Go version:      go1.7.4
 Git commit:      92b10e4/1.12.6
 Built:           Tue May  2 15:06:29 2017
 OS/Arch:         linux/amd64

Comment 5 Vikas Laad 2017-05-08 18:04:18 UTC
Created attachment 1277151 [details]
build json v3.6.65

Comment 6 Vikas Laad 2017-05-08 18:04:38 UTC
Created attachment 1277152 [details]
build desc v3.6.65

Comment 7 Ben Parees 2017-05-08 18:18:41 UTC
I'm not seeing a missing duration, but I am seeing an incorrect duration (it's reported as 0s):

Started:		Mon, 08 May 2017 13:53:19 EDT
Duration:		0s
  FetchInputs:		  1s
  CommitContainer:	  14s
  Assemble:		  59s
  PostCommit:		  0s
  PushImage:		  14s


The startTimestamp value is the same as the completionTimestamp value.  This is likely because we introduced logic to force-set the startTimestamp to the same value as the completionTimestamp, if the startTimestamp was not already set when the build completes.

That should only happen when a build completes so fast that the pod controller doesn't even see the pod "start", so we never updated the startTimestamp value.  However this build clearly took quite a bit of time, so it's not obvious to me why the pod controller would not have seen the pod in a running state and updated the build with a start timestamp before the completion occurred.

if you can recreate this with loglevel 5 logs for both the master and the build in question, that would be helpful for us to track down the chain of events leading to this.

Comment 8 Vikas Laad 2017-05-08 18:22:34 UTC
Hi Ben,

This is how I see the completed builds where duration is missing in oc get builds command

proj6       cakephp-mysql-example-7    Source    Git@0014dde   Complete   13 minutes ago   4s
proj6       cakephp-mysql-example-8    Source    Git@0014dde   Complete   11 minutes ago   
proj6       cakephp-mysql-example-9    Source    Git@0014dde   Complete   8 minutes ago    6s
proj7       cakephp-mysql-example-1    Source    Git@0014dde   Complete   40 minutes ago   2m17s
proj7       cakephp-mysql-example-10   Source    Git@0014dde   Complete   8 minutes ago    
proj7       cakephp-mysql-example-11   Source    Git@0014dde   Complete   5 minutes ago    
proj7       cakephp-mysql-example-12   Source    Git@0014dde   Complete   2 minutes ago    
proj7       cakephp-mysql-example-2    Source    Git@0014dde   Complete   30 minutes ago   11s

I am re-running the test with loglevel 5 will update the bug again.

Comment 9 Vikas Laad 2017-05-08 19:53:50 UTC
Created attachment 1277163 [details]
logs with loglevel5 v3.6.65

Comment 11 Ben Parees 2017-05-08 20:58:02 UTC
do you know the name of one of the builds that had a zero/missing duration in this run?

Comment 12 Vikas Laad 2017-05-09 13:15:11 UTC
Yes, proj26/builds/cakephp-mysql-example-15 Logs of this build with logleve 5 is also attached.

Comment 13 Vikas Laad 2017-05-09 18:39:57 UTC
Created attachment 1277421 [details]
build json v3.6.65

Comment 14 Ben Parees 2017-05-10 19:12:25 UTC
found the issue, PR here:
https://github.com/openshift/origin/pull/14131

Comment 16 Vikas Laad 2017-06-05 18:25:30 UTC
Verified on following version

openshift v3.6.94
kubernetes v1.6.1+5115d708d7
etcd 3.1.0


Complete many iterations of 50 concurrent builds where lot of builds also failed due to  context deadline exceeded errors from docker. All the builds have Duration field populated.

Comment 18 errata-xmlrpc 2017-08-10 05:20:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.