Bug 1770387 - network_attachment_definition_enabled_instance_up{networks="any"} should not be 1 if pod is not running
Summary: network_attachment_definition_enabled_instance_up{networks="any"} should not ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.3.0
Assignee: Feng Pan
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-08 20:48 UTC by Weibin Liang
Modified: 2020-01-23 11:12 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:11:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift multus-admission-controller pull 18 0 'None' 'closed' 'Bug 1770387: [Backport] Fixed bug with wrong counts' 2019-12-09 06:57:08 UTC
Github openshift multus-admission-controller pull 20 0 'None' 'open' 'Bug 1770387: Fix check default namespace telemetry backport' 2019-12-06 12:42:07 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:12:03 UTC

Description Weibin Liang 2019-11-08 20:48:21 UTC
Description of problem:
Follow  https://docs.google.com/document/d/1pdEQnX1FXHP1h89lwZeEOt5m3uU-OyD5PqIUuHdo--8/edit#, if I miss step 1 to create net-attach-def, and just using step 2 to create create a pod, then this pod will be in ContainerCreating forever, but I saw 1 for network_attachment_definition_enabled_instance_up{networks="any"} and network_attachment_definition_enabled_instance_up{networks="any"}, because this pod is not really in running state, it should be 0 for both metrics.

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-11-08-094604

How reproducible:
Always

Steps to Reproduce:
[root@dhcp-41-193 FILE]# oc get net-attach-def --all-namespaces
NAMESPACE   NAME             AGE
test1       macvlan-bridge   68m
[root@dhcp-41-193 FILE]# oc delete net-attach-def macvlan-bridge
networkattachmentdefinition.k8s.cni.cncf.io "macvlan-bridge" deleted
[root@dhcp-41-193 FILE]# oc login -u testuser-0 -p OC_IT3-uRzrF
Login successful.

You have one project on this server: "test1"

Using project "test1".
[root@dhcp-41-193 FILE]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/multus-cni/Pods/1interface-macvlan-bridge.yaml
pod/macvlan-bridge-pod-pd4ng created
[root@dhcp-41-193 FILE]# oc login -u kubeadmin -p 5CvKS-2xJay-TX85m-AyfXe
Login successful.

You have access to 54 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "test1".
[root@dhcp-41-193 FILE]# oc rsh -n openshift-multus multus-admission-controller-5rh6d
sh-4.2# curl localhost:9091/metrics | grep  network_attachment_definition
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1202  100  1202    0     0  1016k      0 --:--:-- --:--:-- --:--:-- 1173k
# HELP network_attachment_definition_enabled_instance_up Metric to identify clusters with network attachment definition enabled instances.
# TYPE network_attachment_definition_enabled_instance_up gauge
network_attachment_definition_enabled_instance_up{networks="any"} 1
network_attachment_definition_enabled_instance_up{networks="sriov"} 0
# HELP network_attachment_definition_instances Metric to get number of instance using network attachment definition in the cluster.
# TYPE network_attachment_definition_instances gauge
network_attachment_definition_instances{networks="any"} 1
network_attachment_definition_instances{networks="macvlan"} 0
network_attachment_definition_instances{networks="sriov"} 0
sh-4.2# exit
exit
[root@dhcp-41-193 FILE]# oc get pods
NAME                       READY   STATUS              RESTARTS   AGE
macvlan-bridge-pod-pd4ng   0/1     ContainerCreating   0          72s
[root@dhcp-41-193 FILE]# oc get net-attach-def --all-namespaces
No resources found.
[root@dhcp-41-193 FILE]# 
[root@dhcp-41-193 FILE]# oc delete pod macvlan-bridge-pod-pd4ng
pod "macvlan-bridge-pod-pd4ng" deleted
[root@dhcp-41-193 FILE]# oc rsh -n openshift-multus multus-admission-controller-5rh6d
sh-4.2# curl localhost:9091/metrics | grep  network_attachment_definition
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1202  100  1202    0     0   970k      0 --:--:-- --:--:-- --:--:-- 1173k
# HELP network_attachment_definition_enabled_instance_up Metric to identify clusters with network attachment definition enabled instances.
# TYPE network_attachment_definition_enabled_instance_up gauge
network_attachment_definition_enabled_instance_up{networks="any"} 0
network_attachment_definition_enabled_instance_up{networks="sriov"} 0
# HELP network_attachment_definition_instances Metric to get number of instance using network attachment definition in the cluster.
# TYPE network_attachment_definition_instances gauge
network_attachment_definition_instances{networks="any"} 0
network_attachment_definition_instances{networks="macvlan"} 0
network_attachment_definition_instances{networks="sriov"} 0
sh-4.2# 

Actual results:
network_attachment_definition_enabled_instance_up{networks="any"} 1
network_attachment_definition_instances{networks="any"} 1

Expected results:
network_attachment_definition_enabled_instance_up{networks="any"} 0
network_attachment_definition_instances{networks="any"} 0

Additional info:

Comment 1 Pawel Krupa 2019-11-08 21:25:12 UTC
By looking at metrics I think this might be a problem with some network component - reassigning.

Comment 2 Aneesh Puttur 2019-11-11 20:06:53 UTC
Yes, Checking for Pod status periodically is expensive, So not implemented to check the pod status.
The event is captured when the pod is created and metrics are incremented. It doesn't consider the state of the metrics. (Creating forever, error ). 
The metrics decrease count when delete event is fired.

Comment 6 Weibin Liang 2019-12-04 16:33:03 UTC
Verification failed on 4.3.0-0.nightly-2019-12-04-054458.

1. Create a pod
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/multus-cni/Pods/1interface-macvlan-bridge.yaml

2. check pod:
pod is in ContainerCreating state

3. Metrics show: 
network_attachment_definition_enabled_instance_up{networks="any"} 0
network_attachment_definition_instances{networks="any"} 0

4. create net-attach-def
curl -s https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/multus-cni/NetworkAttachmentDefinitions/macvlan-bridge.yaml | sed s/eth0/ens5/g | oc create -f-

5. check pod:
pod is in Running state

6. Metrics show:
network_attachment_definition_enabled_instance_up{networks="any"} 0
network_attachment_definition_instances{networks="any"} 0

Expect: network_attachment_definition_instances{networks="macvlan"} 1

Comment 7 Aneesh Puttur 2019-12-04 18:04:32 UTC
Tried with AWS cluster and could not recreate.
1. oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/multus-cni/Pods/1interface-macvlan-bridge.yaml

2. oc get pods
NAME                       READY   STATUS              RESTARTS   AGE
macvlan-bridge-pod-9swhw   0/1     ContainerCreating   0          4s

3. curl -s https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/multus-cni/NetworkAttachmentDefinitions/macvlan-bridge.yaml | sed s/eth0/ens3/g | oc create -f-

4. oc get pods
NAME                       READY   STATUS    RESTARTS   AGE
macvlan-bridge-pod-9swhw   1/1     Running   0          3m20s

5. oc logs -f  multus-admission-controller-nzl2p -n openshift-multus
....
I1204 17:54:36.245547       1 webhook.go:142] AdmissionReview request allowed: Network Attachment Definition '{"cniVersion":"0.3.0","ipam":{"gateway":"10.1.1.1","rangeEnd":"10.1.1.200","rangeStart":"10.1.1.100","routes":[{"dst":"0.0.0.0/0"}],"subnet":"10.1.1.0/24","type":"host-local"},"master":"ens3","mode":"bridge","type":"macvlan"}' is valid
I1204 17:54:49.247354       1 localmetrics.go:50] UPdating net-attach-def metrics for macvlan with value 1
I1204 17:54:49.247388       1 localmetrics.go:50] UPdating net-attach-def metrics for any with value 1

6. oc rsh -n openshift-multus  multus-admission-controller-nzl2p  curl localhost:9091/metrics
...
.....
# HELP network_attachment_definition_enabled_instance_up Metric to identify clusters with network attachment definition enabled instances.
# TYPE network_attachment_definition_enabled_instance_up gauge
network_attachment_definition_enabled_instance_up{networks="any"} 1
network_attachment_definition_enabled_instance_up{networks="sriov"} 0
# HELP network_attachment_definition_instances Metric to get number of instance using network attachment definition in the cluster.
# TYPE network_attachment_definition_instances gauge
network_attachment_definition_instances{networks="any"} 1
network_attachment_definition_instances{networks="macvlan"} 1
network_attachment_definition_instances{networks="sriov"} 0

Comment 8 Weibin Liang 2019-12-04 18:47:37 UTC
(In reply to Aneesh Puttur from comment #7)
> Tried with AWS cluster and could not recreate.
> 1. oc create -f
> https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/
> networking/multus-cni/Pods/1interface-macvlan-bridge.yaml
> 
> 2. oc get pods
> NAME                       READY   STATUS              RESTARTS   AGE
> macvlan-bridge-pod-9swhw   0/1     ContainerCreating   0          4s
> 
> 3. curl -s
> https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/
> networking/multus-cni/NetworkAttachmentDefinitions/macvlan-bridge.yaml | sed
> s/eth0/ens3/g | oc create -f-
> 
> 4. oc get pods
> NAME                       READY   STATUS    RESTARTS   AGE
> macvlan-bridge-pod-9swhw   1/1     Running   0          3m20s
> 
> 5. oc logs -f  multus-admission-controller-nzl2p -n openshift-multus
> ....
> I1204 17:54:36.245547       1 webhook.go:142] AdmissionReview request
> allowed: Network Attachment Definition
> '{"cniVersion":"0.3.0","ipam":{"gateway":"10.1.1.1","rangeEnd":"10.1.1.200",
> "rangeStart":"10.1.1.100","routes":[{"dst":"0.0.0.0/0"}],"subnet":"10.1.1.0/
> 24","type":"host-local"},"master":"ens3","mode":"bridge","type":"macvlan"}'
> is valid
> I1204 17:54:49.247354       1 localmetrics.go:50] UPdating net-attach-def
> metrics for macvlan with value 1
> I1204 17:54:49.247388       1 localmetrics.go:50] UPdating net-attach-def
> metrics for any with value 1
> 
> 6. oc rsh -n openshift-multus  multus-admission-controller-nzl2p  curl
> localhost:9091/metrics
> ...
> .....
> # HELP network_attachment_definition_enabled_instance_up Metric to identify
> clusters with network attachment definition enabled instances.
> # TYPE network_attachment_definition_enabled_instance_up gauge
> network_attachment_definition_enabled_instance_up{networks="any"} 1
> network_attachment_definition_enabled_instance_up{networks="sriov"} 0
> # HELP network_attachment_definition_instances Metric to get number of
> instance using network attachment definition in the cluster.
> # TYPE network_attachment_definition_instances gauge
> network_attachment_definition_instances{networks="any"} 1
> network_attachment_definition_instances{networks="macvlan"} 1
> network_attachment_definition_instances{networks="sriov"} 0


Worked with Aneesh, when creating the pod and NAD under a new project, the problem will be shown up

Comment 11 Weibin Liang 2019-12-10 18:43:06 UTC
Tested and verified on 4.3.0-0.nightly-2019-12-10-120829

Comment 13 errata-xmlrpc 2020-01-23 11:11:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.