Bug 1667185 - [3.11] More consistent modification of journald.conf.
Summary: [3.11] More consistent modification of journald.conf.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 3.11.z
Assignee: Jeremiah Stuever
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks: 1676905 1676906
TreeView+ depends on / blocked
 
Reported: 2019-01-17 16:50 UTC by Rupesh Patel
Modified: 2023-09-07 19:39 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1676905 1676906 (view as bug list)
Environment:
Last Closed: 2019-03-14 02:17:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1183244 0 urgent CLOSED sosreport tracebacks with memory error with huge system journall 2022-03-13 13:55:50 UTC
Red Hat Knowledge Base (Solution) 3987941 0 Upgrade None High resources utilization by journald on openshift box. 2019-03-19 14:57:13 UTC
Red Hat Product Errata RHBA-2019:0407 0 None None None 2019-03-14 02:18:07 UTC

Comment 1 Vadim Rutkovsky 2019-01-18 13:45:46 UTC
openshift-ansible doesn't set default journald config, but journald settings can be tuned using `journald_vars_to_replace` ansible var.

Comment 2 Ryan Howe 2019-01-23 20:41:33 UTC
The real bug here is that since this journald.conf is managed by the installer we should add a line at the top saying we modify these values: 

https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_node/defaults/main.yml#L111-L123

Also should have a way to skip this if not desired. 

Further this journald role [1] will only make changes if the key is commented or present. 


[1]
   https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_node/tasks/journald.yml#L15


Rough Example (didnt test): Then this role is only called when journald_manage is true which would be the default. 

```
- name: Remove Journald lines
  lineinfile:
    dest: /etc/systemd/journald.conf
    regexp: '^(\#| )?{{ item.var }}=\s*.*?$
    state: absent
  with_items: "{{ journald_vars_to_replace | default([]) }}"

- name: Update journald setup
  lineinfile:
    dest: /etc/systemd/journald.conf
    line: '{{ item.var }}={{ item.val }}'
    insertafter: '\[Journal\]'
  with_items:  "{{ journald_vars_to_replace | default([]) }}"

- name: Tag journald managed by Ansible
  lineinfile:
    dest: /etc/systemd/journald.conf
    line: '\#Managed by OpenShift Ansible'
    insertbefore: '\[Journal\]'
```

Comment 3 Ryan Howe 2019-01-23 20:43:45 UTC
Also I think setting the following in your hosts file will skip this too. 

journald_vars_to_replace=""

Comment 4 Vadim Rutkovsky 2019-01-23 23:17:41 UTC
(In reply to Ryan Howe from comment #2)
> The real bug here is that since this journald.conf is managed by the
> installer we should add a line at the top saying we modify these values: 
> 
> https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/
> openshift_node/defaults/main.yml#L111-L123

Agree

 
> Also should have a way to skip this if not desired. 


That's possible if journald_vars_to_replace=[], could you verify it does that?

Comment 5 Scott Dodson 2019-01-24 14:50:16 UTC
Lets address the ansible code that results in unpredictable variable definition and provides a reliable method for the customer to override our defaults.

However, the data provided in the description of this bug does not provide a clear enough picture to warrant adjusting the defaults. It's not clear which data sets are from synthetic testing versus observed data from the customer who may be running different packages.  If you really want us to change the defaults please conduct a full test in a controlled environment where the only thing that's changed is journal config.

Comment 6 Scott Dodson 2019-01-24 14:52:50 UTC
Need to make sure to document definition of journald_vars_to_replace both to non default and to an empty set to avoid updating any journald config.

Comment 7 Rupesh Patel 2019-01-29 11:19:22 UTC
(In reply to Scott Dodson from comment #5)
> Lets address the ansible code that results in unpredictable variable
> definition and provides a reliable method for the customer to override our
> defaults.
> 
> However, the data provided in the description of this bug does not provide a
> clear enough picture to warrant adjusting the defaults. It's not clear which
> data sets are from synthetic testing versus observed data from the customer
> who may be running different packages.  If you really want us to change the
> defaults please conduct a full test in a controlled environment where the
> only thing that's changed is journal config.

Hi,
I don't have a test system available right now with me, let me know if below details helps you. 

 o Two things I'm talking about here are; 

    SystemMaxUse=8G
    SystemMaxFileSize=10M

    These are the default value in the journald setting we ship/add with openshift installation. 
 
    Considering a scenario where any of the services run in debug mode or somehow generate the huge amount of data (like in the customer case) which filled 8G space (SystemMaxUse=8G). 
    As we have mentioned that single file should not be bigger than 10M, for 8G it would create around 800 files and that is where customer face issue. 

    A utility like sosreport collecting openshift logs (using journalctl) stuck/kill due to high memory utilization and to overcome this situation I found above two options which can be tuned to reduce memory usage drastically.

Comment 8 Jeremiah Stuever 2019-02-12 14:00:56 UTC
https://github.com/openshift/openshift-ansible/pull/11176

Allows for empty set to skip all modifications:
journald_vars_to_replace=[]

Comment 10 Gaoyun Pei 2019-02-27 09:24:23 UTC
Verify this bug with openshift-ansible-3.11.87-1.git.0.a7b07ff.el7.noarch.rpm

By default, journald.conf would be updated by openshift-ansible as below:

[root@ip-172-18-4-92 ~]# cat /etc/systemd/journald.conf
# Managed by OpenShift Ansible
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See journald.conf(5) for details.

[Journal]
Storage=persistent
Compress=True
#Seal=yes
#SplitMode=uid
SyncIntervalSec=1s
RateLimitInterval=1s
RateLimitBurst=10000
SystemMaxUse=8G
#SystemKeepFree=
SystemMaxFileSize=10M
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
MaxRetentionSec=1month
MaxFileSec=1day
ForwardToSyslog=False
#ForwardToKMsg=no
#ForwardToConsole=no
ForwardToWall=False
#TTYPath=/dev/console
#MaxLevelStore=debug
#MaxLevelSyslog=debug
#MaxLevelKMsg=notice
#MaxLevelConsole=info
#MaxLevelWall=emerg
#LineMax=48K



With journald_vars_to_replace=[] set in ansible inventory file, update journald conf step would be skipped:

TASK [openshift_node : Update journald setup] **********************************
Wednesday 27 February 2019  13:09:08 +0800 (0:00:00.376)       0:02:27.358 **** 

TASK [openshift_node : Tag journald managed by Ansible] ************************
Wednesday 27 February 2019  13:09:08 +0800 (0:00:00.146)       0:02:27.505 **** 
skipping: [ec2-3-87-114-42.compute-1.amazonaws.com] => {"changed": false, "skip_reason": "Conditional result was False"}

Comment 12 errata-xmlrpc 2019-03-14 02:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0407


Note You need to log in before you can comment on or make changes to this bug.