Bug 1825623

Summary: RHCOS crio tests failing (version parsing?)
Product: OpenShift Container Platform Reporter: Colin Walters <walters>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED UPSTREAM Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, jhonce, jokerman, miabbott, mpatel, nagrawal, pehunt, umohnani, wking
Version: 4.5   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-17 17:00:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Colin Walters 2020-04-19 12:25:04 UTC
We have a coreos-assembler test for RHCOS:
https://github.com/coreos/coreos-assembler/blob/master/mantle/kola/tests/crio/crio.go

That just started failing after:

Upgraded:
  cri-o 1.17.3-1.rhaos4.4.el8 -> 1.18.0-1.rhaos4.5.el8

Job link:
https://jenkins-rhcos-art.cloud.privileged.psi.redhat.com/job/rhcos-art-rhcos-4.5/981/console

The relevant logs (you have to dig into the kola_artifacts.zip) are:

Apr 19 11:57:04.838286 crio[1615]: time="2020-04-19 11:57:04.838064771Z" level=info msg="Using conmon executable: /usr/libexec/crio/conmon"
Apr 19 11:57:04.838760 crio[1615]: time="2020-04-19 11:57:04.838744324Z" level=info msg="No seccomp profile specified, using the internal default"
Apr 19 11:57:04.838812 crio[1615]: time="2020-04-19 11:57:04.838804199Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Apr 19 11:57:04.851766 crio[1615]: time="2020-04-19 11:57:04.851703693Z" level=info msg="Found CNI network crio (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf"
Apr 19 11:57:04.859226 crio[1615]: time="2020-04-19 11:57:04.859167194Z" level=info msg="Found CNI network 200-loopback.conf (type=loopback) at /etc/cni/net.d/200-loopback.conf"
Apr 19 11:57:04.882144 crio[1615]: time="2020-04-19 11:57:04.881910488Z" level=info msg="Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist"
Apr 19 11:57:04.882144 crio[1615]: time="2020-04-19 11:57:04.881926497Z" level=info msg="Update default CNI network name to crio"
Apr 19 11:57:04.928708 crio[1615]: time="2020-04-19 11:57:04.928654298Z" level=fatal msg="Version string empty"
Apr 19 11:57:04.933869 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE
Apr 19 11:57:04.934014 systemd[1]: crio.service: Failed with result 'exit-code'.

A grep in the source for `Version string empty` led me to semver parsing, which in the main codebase looks like it will fail when we try to write our version string.

And yeah:

```
[core@cosa-devsh ~]$ crio version
Version:       
GitCommit:     
GitTreeState:  
BuildDate:     
GoVersion:     go1.13.4
Compiler:      gc
Platform:      linux/amd64
Linkmode:      unknown: `ldd crio` failed: ldd: ./crio: No such file or directory
  (exit status 1)
```

This is probably related to how ART (Koji/Brew) isn't using the actual upstream git repositories for builds but the lookaside cache.  I think there's an UPSTREAM_GIT environment variable or something.

Comment 1 Colin Walters 2020-04-19 12:33:00 UTC
Ah it's SOURCE_GIT_COMMIT, see:
https://github.com/openshift/machine-config-operator/pull/744

But that said...even with that I'm not sure that will have the correct tag.

Oh wow, this "latest-version" script...yeah.

So...what I would do is just hardcode the major.minor in the crio source and if you want an auto-bumping release version then you could try to count git commits since the last tag etc., but it wouldn't be in the critical path.

Comment 3 W. Trevor King 2020-04-19 14:02:17 UTC
Doozer also sets a BUILD_VERSION environment variable [1].

[1]: https://github.com/openshift/installer/pull/1829

Comment 5 Colin Walters 2020-04-20 12:17:28 UTC
OK so https://github.com/cri-o/cri-o/pull/3613 merged, but there's no new build - are we waiting on something for that?

Comment 6 Micah Abbott 2020-04-20 13:50:27 UTC
*** Bug 1825924 has been marked as a duplicate of this bug. ***

Comment 7 Urvashi Mohnani 2020-04-20 18:32:49 UTC
The new cri-o 1.18 build is available at https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=28067923

Comment 8 Micah Abbott 2020-04-23 01:56:35 UTC
The new `cri-o` build landed in RHCOS 45.81.202004210032-0

Comment 10 Sunil Choudhary 2020-06-10 13:53:46 UTC
crio version now showing correct version string.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-06-09-223121   True        False         25m     Cluster version is 4.5.0-0.nightly-2020-06-09-223121

$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-134-0.us-east-2.compute.internal     Ready    worker   33m   v1.18.3+a637491
ip-10-0-147-128.us-east-2.compute.internal   Ready    master   43m   v1.18.3+a637491
ip-10-0-178-86.us-east-2.compute.internal    Ready    worker   33m   v1.18.3+a637491
ip-10-0-191-178.us-east-2.compute.internal   Ready    master   44m   v1.18.3+a637491
ip-10-0-208-118.us-east-2.compute.internal   Ready    master   44m   v1.18.3+a637491
ip-10-0-208-172.us-east-2.compute.internal   Ready    worker   33m   v1.18.3+a637491

$ oc debug node/ip-10-0-134-0.us-east-2.compute.internal
Starting pod/ip-10-0-134-0us-east-2computeinternal-debug ...
...
sh-4.2# chroot /host
   
sh-4.4# crio --version
crio version 
Version:    1.18.1-5.dev.rhaos4.5.git5e39296.el8
GoVersion:  go1.13.4
Compiler:   gc
Platform:   linux/amd64
Linkmode:   dynamic

Comment 11 Jhon Honce 2022-08-17 17:00:06 UTC
Closed as stale, please re-open if this issue is still active.