Bug 1690259
| Summary: | [3.11][crio] Event process failure error appears in atomic-openshift-node log | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Samuel <smoro> |
| Component: | Node | Assignee: | Peter Hunt <pehunt> |
| Status: | CLOSED WONTFIX | QA Contact: | weiwei jiang <wjiang> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.11.0 | CC: | amurdaca, aos-bugs, bmeng, dornelas, dwalsh, jhonce, jokerman, mmccomas, mpatel, nagrawal, ph.hutter, pruan, rbost, vlaad |
| Target Milestone: | --- | Keywords: | NeedsTestCase |
| Target Release: | 3.11.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1591118 | Environment: | |
| Last Closed: | 2020-05-13 21:13:09 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1186913 | ||
|
Description
Samuel
2019-03-19 07:36:13 UTC
Looks like that issue is still present, at least running 3.11 / cri-o 1.11.7-1 ``` [root@master1 ~]# grep 'Failed to create' /var/log/messages | tail -5 Mar 19 08:36:37 master1 origin-node: E0319 08:36:37.844187 97021 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod47be93eaccaa069bb51ece81ac83d4dd.slice/crio-a404d24228a566d7868275171610473cc576bad2186767e6cd12dd958b2986cf.scope: invalid character 'c' looking for beginning of value Mar 19 08:36:37 master1 origin-node: E0319 08:36:37.845545 97021 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podda15a24e_3d44_11e9_b6a9_52540016f6b8.slice/crio-048fa6c53c26e783ddb6d23c921552ed63182468c150863e3176f4092f8f2e76.scope: invalid character 'c' looking for beginning of value Mar 19 08:36:37 master1 origin-node: E0319 08:36:37.850936 97021 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-pod6584060d_426b_11e9_b846_525400ddfbf0.slice/crio-0c255b94555d6263c8f776b87e9a6d77cc89062f84b7c5ddd99e212a628b3103.scope: invalid character 'c' looking for beginning of value Mar 19 08:36:37 master1 origin-node: E0319 08:36:37.853019 97021 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-pod42752a46_321d_11e9_844d_52540016f6b8.slice/crio-b4423ddd6c52fc11c3fa9bb9b7abaf94d04a09fa2953c11da2871bb1a8c12b81.scope: invalid character 'c' looking for beginning of value Mar 19 08:36:37 master1 origin-node: E0319 08:36:37.854713 97021 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pode6f04d9558a3d478e7971667ef0cc6f4.slice/crio-f9e540c0da1dea83e1664030aa33cc3373c1e72623baf990540a6eddc50a43dc.scope: invalid character 'c' looking for beginning of value [root@master1 ~]# grep 'Failed to create' /var/log/messages | head -1 Mar 14 18:47:25 master1 origin-node: E0314 18:47:25.598035 4205 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc96e1c1a_3205_11e9_b6f4_525400ddfbf0.slice/crio-36002b3eb498cc6805b72c8593659eb73e1c1a61940a6d067697bb3082598050.scope: invalid character 'c' looking for beginning of value [root@master1 ~]# grep 'Failed to create' /var/log/messages | wc -l 50681 ``` According to github, 1.11.7 was released mid-october 2018. Since the fix discussed for 1.10 was validated in august, I've been hoping they were cherry-picked. And maybe they did. Yet looking at my logs, those "invalid character 'c'" are still there, spamming. We have a 1.11.11 release. Can you try this out? https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=829231 Sure, doesn't seem to help though: [root@master3 ~]# grep 'Failed to create' /var/log/messages | tail -5 ; uptime ; rpm -qa | grep cri-o Mar 19 15:48:05 master3 origin-node: E0319 15:48:05.882733 26633 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod6b449b4a_3202_11e9_b6f4_525400ddfbf0.slice/crio-3263424e308267d201f2b63fb2c415acd40f80b77a45298a19ba70b99b93096b.scope: invalid character 'c' looking for beginning of value Mar 19 15:48:05 master3 origin-node: E0319 15:48:05.885366 26633 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod857c7e5cc6472964d18dacaed15e0068.slice/crio-5939681e5e03ff1b6effa7c9496d9facff00128cf69f0cc0cff3e2fb02c6a1da.scope: invalid character 'c' looking for beginning of value Mar 19 15:48:05 master3 origin-node: E0319 15:48:05.887436 26633 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7fbd123d_3202_11e9_b6f4_525400ddfbf0.slice/crio-f4e1906fef97a7bf7692a36c0375ebb6427fbed32aef0bc3f71c438b34e05430.scope: invalid character 'c' looking for beginning of value Mar 19 15:48:05 master3 origin-node: E0319 15:48:05.889320 26633 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf08aaa63_3204_11e9_b6f4_525400ddfbf0.slice/crio-a630dedc5ca6c46df97e01be98e8400094947117dfc69fe513dedc9669d92630.scope: invalid character 'c' looking for beginning of value Mar 19 15:48:05 master3 origin-node: E0319 15:48:05.890950 26633 manager.go:1130] Failed to create existing container: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-poda02ff2c5f7bde3d006c9b493d09fc61e.slice/crio-d014ae537449af1c60cde717bc26fc3168ce1f6abff5ad156711c8b907ddc81d.scope: invalid character 'c' looking for beginning of value 15:48:47 up 29 min, 1 user, load average: 2.53, 2.83, 3.29 cri-o-1.11.11-1.rhaos3.11.git474f73d.el7.x86_64 Regards. Meanwhile, I should point out that this 1.11.11 package seems to generate an awful lot of logs. My /var/log/messages is rotated weekly. Average size from previous copies is something like 50M, no compression, ... While my current /var/log/messages is almost 1G today. Taking one log at random, that comes back again and again: [root@master1 log]# grep /opt/cni/bin messages | wc -l 876318 [root@master1 log]# grep /opt/cni/bin messages | tail -3 Mar 21 15:18:54 master1 crio: time="2019-03-21 15:18:54.116562963+01:00" level=error msg="Error adding network: failed to find plugin "bridge" in path [/opt/cni/bin/ /opt/bridge/bin]" Mar 21 15:18:54 master1 crio: time="2019-03-21 15:18:54.116628238+01:00" level=error msg="Error while adding to cni network: failed to find plugin "bridge" in path [/opt/cni/bin/ /opt/bridge/bin]" Mar 21 15:18:54 master1 crio: time="2019-03-21 15:18:54.116729436+01:00" level=error msg="Error deleting network: failed to find plugin "bridge" in path [/opt/cni/bin/ /opt/bridge/bin]" I'ld suggest not publishing that package as is ... Regards. Hey Samuel, I've set up a cluster to reproduce (3.11 with 1.11.11), and haven't been able to. The second set of logs look like faulty CNI configs, which may be a problem with how the cluster was set up. Can you send me the contents of your plugin_dir and network_dir ? You should be able to find which they are by looking at /etc/crio/crio.conf Best Peter Also, can you give me more details on how your cluster was brought up? I'm tracking this issue for a customer having this problem in OpenShfit 3.11 and Cri-o 1.11.10. > Since the fix discussed for 1.10 was validated in august, I've been hoping they were cherry-picked. I see the fix from 1.10 is at least present in 1.11 so perhaps we're seeing this issue manifest another way? 3.11 -> https://github.com/cri-o/cri-o/commit/d0e48a80d Verified 3.10 fix for comparison -> https://github.com/cri-o/cri-o/pull/1616/files As a follow-up, I've got another customer complaining about those messages (3.11.98).
> How was that cluster brought up?
Nothing out of the ordinary:
os_sdn_network_plugin_name: redhat/openshift-ovs-multitenant
os_firewall_use_firewalld: True
openshift_master_cluster_hostname: xxx
openshift_master_cluster_public_hostname: xxx
openshift_master_default_subdomain: xxx
openshift_master_api_port: '443'
openshift_master_console_port: "{{ openshift_master_api_port }}"
openshift_master_session_name: openshift-internal
openshift_master_named_certificates:
- certfile: xxx
keyfile: xxx
cafile: xxx
names:
- xxx
openshift_master_overwrite_named_certificates: true
oreg_url: "xxx/openshift3/ose-${component}:${version}"
oreg_auth_user: foo
oreg_auth_password: bar
openshift_examples_modify_imagestreams: true
openshift_docker_blocked_registries: all
openshift_docker_insecure_registries: []
openshift_metrics_install_metrics: False
openshift_metrics_server_install: False
openshift_logging_install_eventrouter: False
openshift_logging_install_logging: False
openshift_cluster_monitoring_operator_install: False
openshift_crio_var_sock: /var/run/crio/crio.sock
openshift_use_crio: True
openshift_use_crio_only: False
openshift_crio_edits:
- key: kubeletArguments.container-runtime
value: [ remote ]
- key: kubeletArguments.container-runtime-endpoint
value: [ "{{ openshift_crio_var_sock }}" ]
- key: kubeletArguments.image-service-endpoint
value: [ "{{ openshift_crio_var_sock }}" ]
- key: kubeletArguments.runtime-request-timeout
value: [ 10m ]
#openshift_node_groups: / don't have a copy, main diff with crio default node groups being a zone label
Last time I've seen that issue, the openshift_use_crio_only was set to True, the use_firewalld to False. We also had some non-default openshift_node_groups.
|