Bug 1844939
Summary: | Some systemd services randomly killed (KILL) | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Brendan Shirren <bshirren> | ||||
Component: | RHCOS | Assignee: | Jonathan Lebon <jlebon> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Michael Nguyen <mnguyen> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 4.3.z | CC: | bbreard, bshirren, dornelas, imcleod, jlebon, jligon, mharri, miabbott, nstielau | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.6.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-08-21 20:59:23 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1186913 | ||||||
Attachments: |
|
Description
Brendan Shirren
2020-06-08 05:12:02 UTC
Created attachment 1695987 [details]
journalctl - 1min around issue
This is likely the OOM killer. Can you provide full logs from boot start? Also, how much memory is provided to the machines? @Jonathan thanks for checking it has me baffled not seeing signs of OOM kill, segfault, page fault, hung tasks or any usual suspects. Infra node = 16GB RAM / 4 CPU. Will attach journalctl logs covering the period of incident (May 18 18:43 to 18:47 AEST) -- Logs begin at Fri 2020-05-15 17:38:10 AEST, end at Tue 2020-05-19 07:48:32 AEST. -- May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.033026 1370 eviction_manager.go:229] eviction manager: synchronize housekeeping May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.056977 1370 helpers.go:781] eviction manager: observations: signal=memory.available, available: 12662468Ki, capacity: 16421332Ki, time: 2020-05-18 18:41:10.034400571 +1000 AEST m=+2018428.815971594 May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.057198 1370 helpers.go:781] eviction manager: observations: signal=allocatableMemory.available, available: 14987256Ki, capacity: 15909332Ki, time: 2020-05-18 18:41:10.056903168 +1000 AEST m=+2018428.838474157 May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.057291 1370 helpers.go:781] eviction manager: observations: signal=nodefs.available, available: 37761312Ki, capacity: 133156844Ki, time: 2020-05-18 18:41:10.034400571 +1000 AEST m=+2018428.815971594 May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.057370 1370 helpers.go:781] eviction manager: observations: signal=nodefs.inodesFree, available: 63914439, capacity: 66583488, time: 2020-05-18 18:41:10.034400571 +1000 AEST m=+2018428.815971594 May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.057455 1370 helpers.go:781] eviction manager: observations: signal=imagefs.available, available: 37761312Ki, capacity: 133156844Ki, time: 2020-05-18 18:41:10.034400571 +1000 AEST m=+2018428.815971594 May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.057534 1370 helpers.go:781] eviction manager: observations: signal=imagefs.inodesFree, available: 63914439, capacity: 66583488, time: 2020-05-18 18:41:10.034400571 +1000 AEST m=+2018428.815971594 May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.057608 1370 helpers.go:781] eviction manager: observations: signal=pid.available, available: 4193622, capacity: 4Mi, time: 2020-05-18 18:41:10.055950277 +1000 AEST m=+2018428.837521278 May 18 18:41:10 infra-0 hyperkube[1370]: I0518 18:41:10.057792 1370 eviction_manager.go:320] eviction manager: no resources are starved @Brendan please provide the full journal from the affected node(s). @Brendan, have you been able to reproduce this again since that first time it happened? I've never seen it reproduced sorry it only happened once on a customer VM. Starting to suspect issue at VM level (hypervisor) just no certain signs. Ack. Going to close this for now. Feel free to re-open if you hit it again and have more information! (Deep down I still think this is the OOM killer and the journal just doesn't have any traces of it.) |