Bug 1450626
| Summary: | [RFE] Enhance journald to allow rate-limits to be applied per unit instead of just per server | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Peter Portante <pportant> | |
| Component: | systemd | Assignee: | systemd-maint | |
| Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 7.5-Alt | CC: | aaron_wilk, aivaraslaimikis, dtardon, msekleta, pdwyer, systemd-maint-list | |
| Target Milestone: | rc | Keywords: | FutureFeature | |
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1719577 (view as bug list) | Environment: | ||
| Last Closed: | 2021-01-15 07:35:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Peter Portante
2017-05-14 04:43:24 UTC
(In reply to Peter Portante from comment #0) > See https://bugzilla.redhat.com/show_bug.cgi?id=1445797 > > For kubernetes, and potentially other sub-systems in a similar situation, > having rate limits applied per-service does not work when most of the > logging traffic sent to journald comes through one service. kubernetes and others should then run containers as separate units. Kubernetes can create scope unit for each container. IIRC docker used to it by default, hence I am not sure how come you have this problem in the first place. Anyway, container processes then live in separate cgroup and malicious container can't mess up logging for other containers. Also as Linux containers are built on top of namespaces and cgroups having all containers run in the same service (=~ cgroup) means you can't really manage system resources on per container base. Btw, is there any other kernel based process group mechanism other than cgroups that we could leverage here? (In reply to Michal Sekletar from comment #2) > (In reply to Peter Portante from comment #0) > > See https://bugzilla.redhat.com/show_bug.cgi?id=1445797 > > > > For kubernetes, and potentially other sub-systems in a similar situation, > > having rate limits applied per-service does not work when most of the > > logging traffic sent to journald comes through one service. > > kubernetes and others should then run containers as separate units. > Kubernetes can create scope unit for each container. IIRC docker used to it > by default, hence I am not sure how come you have this problem in the first > place. Anyway, container processes then live in separate cgroup and > malicious container can't mess up logging for other containers. > > Also as Linux containers are built on top of namespaces and cgroups having > all containers run in the same service (=~ cgroup) means you can't really > manage system resources on per container base. > > Btw, is there any other kernel based process group mechanism other than > cgroups that we could leverage here? They don't all run in the same namespace/cgroup, but are in fact isolated. There are a few "best practices" recommended by K8s for collecting logging from containers: -Use a node-level logging agent that runs on every node. -Include a dedicated sidecar container for logging in an application pod. --The sidecar container streams application logs to its own stdout. --The sidecar container runs a logging agent, which is configured to pick up logs from an application container. -Push logs directly to a backend from within an application. https://kubernetes.io/docs/concepts/cluster-administration/logging/ The particular scenario that I believe Peter (as well as my team) is having trouble with, is where the jouranld log driver is used to send containers stdout/stderr to journald. For example, this could occur when using a node-level logging agent that reads events from journald and fwd's them off to say ELK. The issue that occurs is that containers log to stdout/stderror which is picked up by docker's journald log driver and written by docker.service to the journald for all containers on the node/host. When this happens, the journald log driver appends some journald metadata to the event which are listed in the link below. One solution could be to have journald rate limit by unit and if it exists CONTAINER_ID_FULL instead of just service/unit. Documentation on journald log driver - https://docs.docker.com/config/containers/logging/journald/ General documentation on docker log drivers - https://docs.docker.com/config/containers/logging/configure/ To work around this issue and report to the application owners when their app is consuming too much of the rate limit enforced on docker.service, we detect when a journald "Suppressed" event occurs via cron'ing something like this: nice -n 18 ( journalctl --since "1 hour ago" --unit systemd-journald.service | grep -i "Suppressed" | wc -l ) ...and if the returned count is >0 we then look deeper: journalctl -o json-pretty --since "1 hour ago" | jq -s '[.[] | { name: (if .CONTAINER_NAME then .CONTAINER_NAME else ._COMM end), cursor:.__CURSOR }] | group_by(.name) | map({name: .[0].name, length: [.[].cursor] | length}) | sort_by(.length)' It's a bit messy, but under normal circumstances the tenants/containers do not exceed the rate limit journald applies to docker.service. So when it does happen, we want to notify the application owner they probably should have a look if they aren't already due to some alert they've received. What I think Peter and certainly myself are requesting is that journald somehow rate limit on a per container basis instead of all of docker.service which impacts other application's/container's abilities to log to journald. The above comment about checking if CONTAINER_ID_FULL exists was merely a suggestion, I'm sure there are other ways this could be done. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |