Bug 1564179

Summary:

first master's docker-daemon has higher debug level than the rest of the cluster

Product:

OpenShift Container Platform

Reporter:

Nicholas Schuetz <nick>

Component:

Installer

Assignee:

Jay Boyd <jaboyd>

Status:

CLOSED ERRATA

QA Contact:

Weihua Meng <wmeng>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

3.9.0

CC:

aos-bugs, jokerman, mmccomas, nick, smunilla, wsun

Target Milestone:

---

Flags:

wmeng: needinfo-

Target Release:

3.9.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: Service Catalog pods had a high log verbosity set by default. Consequence: Service Catalog pods on master node produced large amount of log data. Fix: default log verbosity reset to a lower level.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-05-17 06:43:35 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
dockerd-current logs on master01	none

Description Nicholas Schuetz 2018-04-05 14:57:06 UTC

deploying OCP 3.9.14 in HA mode, i noticed that after a few days my master01 box would fill up /var/log/journal and then begin to evict pods.  The reason is, that the docker-daemon is in a more verbose mode than the rest of the cluster on that VM alone (master01).  This is problematic for obvious reasons.

A short term fix is to put a script in /etc/cron.daily that does this:
journalctl --vacuum-size=100M

It stands to reason that the debug mode of all of the ocp-related daemons in all of the VMs (in the cluster) would be uniform.

I'm guessing this is a bug in openshift-ansible, if not, please re-assign.

-Nick

Comment 1 Nicholas Schuetz 2018-04-05 17:07:35 UTC

It's worth noting that this occurs whether i use the first master as the install host or not.  I've got the same result when using an external bastion host (not on a cluster member node).  Also, i use 10GB root vol sizes... so it's possible that there is a weekly job that "vacuums" the journal and docker is filling it up before that can occur (within a couple of days).  

This will likely be an issue on AWS/Cloud installs where the default root disk is also 10GB in size.

Comment 2 Nicholas Schuetz 2018-04-08 16:28:24 UTC

The journal is filling up at a rate of 528.0M / per day.  Again, this is default behavior and i have to add a script in /etc/cron.daily to keep it from causing master01 to fail after a few days.

Comment 3 Scott Dodson 2018-04-09 15:26:36 UTC

Have you been able to confirm whether or not this is triggered by the installer? 
I don't see any code in 3.9 that would trigger docker logging to be set to debug level.

Which specific option is set incorrectly? Can you provide your inventory so we have a proposed reproducer?

Comment 4 Nicholas Schuetz 2018-04-09 16:17:38 UTC

hosts file used for deployment below.

[OSEv3:children]
masters
nodes
new_nodes
etcd
lb
glusterfs

## Set variables common for all OSEv3 hosts
[OSEv3:vars]
openshift_deployment_type=openshift-enterprise
#openshift_deployment_type=origin
#containerized=true

##internal image repos
##openshift_additional_repos=[{'id': 'ose-devel', 'name': 'rhaos-3.9', 'baseurl': 'http://repo.home.nicknach.net/repo/rhaos-3.9', 'enabled': 1, 'gpgcheck': 0}]
openshift_docker_additional_registries=repo.home.nicknach.net
openshift_docker_insecure_registries=repo.home.nicknach.net
openshift_docker_blocked_registries=registry.access.redhat.com,docker.io
oreg_url=repo.home.nicknach.net/openshift3/ose-${component}:${version}
openshift_examples_modify_imagestreams=true
openshift_metrics_image_prefix=repo.home.nicknach.net/openshift3/
openshift_metrics_image_version=v3.9.14
openshift_logging_image_prefix=repo.home.nicknach.net/openshift3/
openshift_logging_image_version=v3.9.14
ansible_service_broker_image_prefix=repo.home.nicknach.net/openshift3/ose-
ansible_service_broker_image_tag=v3.9.14
ansible_service_broker_etcd_image_prefix=repo.home.nicknach.net/rhel7/
ansible_service_broker_etcd_image_tag=latest
openshift_service_catalog_image_prefix=repo.home.nicknach.net/openshift3/ose-
openshift_service_catalog_image_version=v3.9.14
openshift_cockpit_deployer_prefix=repo.home.nicknach.net/openshift3/
openshift_web_console_prefix=repo.home.nicknach.net/openshift3/ose-
openshift_web_console_version=v3.9.14
openshift_prometheus_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_image_version=v3.9.14
openshift_prometheus_alertmanager_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_alertmanager_image_version=v3.9.14
openshift_prometheus_alertbuffer_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_alertbuffer_image_version=v3.9.14
openshift_prometheus_node_exporter_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_node_exporter_image_version=v3.9.14
openshift_prometheus_proxy_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_proxy_image_version=v3.9.14
template_service_broker_prefix=repo.home.nicknach.net/openshift3/ose-
template_service_broker_version=v3.9.14
openshift_storage_glusterfs_image=repo.home.nicknach.net/rhgs3/rhgs-server-rhel7
openshift_storage_glusterfs_version=latest
openshift_storage_glusterfs_heketi_image=repo.home.nicknach.net/rhgs3/rhgs-volmanager-rhel7
openshift_storage_glusterfs_heketi_version=latest

# release ver
#openshift_release=v3.9.14
#openshift_image_tag=v3.9.14

## enable ntp
#openshift_clock_enabled=false

## disable template imports
#openshift_install_examples=false

## If ansible_ssh_user is not root, ansible_sudo must be set to true
ansible_ssh_user=root
#ansible_ssh_user=cloud-user
#ansible_sudo=true
#ansible_become=yes

## authentication stuff
## htpasswd file auth
#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
#openshift_master_htpasswd_users={'ocpuser':'welcome1'}

## ldap auth (AD)
#openshift_master_identity_providers=[{"name":"NNWIN","challenge":true,"login":true,"kind":"LDAPPasswordIdentityProvider","attributes":{"id":["dn"],"email":["mail"],"name":["cn"],"preferredUsername":["sAMAccountName"]},"bindDN":"CN=SVC-nn-ose,OU=SVC,OU=FNA,DC=nnwin,DC=ad,DC=nncorp,DC=com","bindPassword":"<REDACTED>","insecure":true,"url":"ldap://uswin.nicknach.com:389/DC=uswin,DC=ad,DC=nncorp,DC=com?sAMAccountName?sub"}]
#openshift_master_ldap_ca_file=/etc/ssl/certs/NNWINDC_Cert_Chain.pem

## ldap auth (IPA)
openshift_master_identity_providers=[{"name":"myipa","challenge":true,"login":true,"kind":"LDAPPasswordIdentityProvider","attributes":{"id":["dn"],"email":["mail"],"name":["cn"],"preferredUsername":["uid"]},"bindDN":"","bindPassword":"","ca":"my-ldap-ca-bundle.crt","insecure":false,"url":"ldap://gw.home.nicknach.net/cn=users,cn=accounts,dc=home,dc=nicknach,dc=net?uid"}]
openshift_master_ldap_ca_file=~/my-ldap-ca-bundle.crt

#openshift_master_named_certificates=[{"certfile": "/etc/origin/master/ocp.nicknach.net.crt", "keyfile": "/etc/origin/master/ocp.nicknach.net.key", "names": ["console.ocp.nicknach.net"]}]
#openshift_master_overwrite_named_certificates=false

## registry on nfs
openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_host=storage.home.nicknach.net
openshift_hosted_registry_storage_nfs_directory=/data/openshift/enterprise
#openshift_hosted_registry_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_hosted_registry_storage_volume_name=docker-registry
openshift_hosted_registry_storage_volume_size=20Gi

# etcd on nfs
openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_host=storage.home.nicknach.net
openshift_hosted_etcd_storage_nfs_directory=/data/openshift/enterprise
#openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_volume_name=etcd 
openshift_hosted_etcd_storage_volume_size=1Gi
openshift_hosted_etcd_storage_labels={'storage':'etcd'}

# logging on nfs
openshift_logging_install_logging=true
openshift_logging_storage_kind=nfs
openshift_logging_storage_access_modes=['ReadWriteOnce']
openshift_logging_storage_host=storage.home.nicknach.net
openshift_logging_storage_nfs_directory=/data/openshift/enterprise
#openshift_logging_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_logging_storage_volume_name=logging
openshift_logging_storage_volume_size=10Gi
openshift_logging_storage_labels={'storage':'logging'}
openshift_logging_es_pv_selector=region=infra

# metrics on nfs
openshift_metrics_install_metrics=true
openshift_metrics_storage_kind=nfs
openshift_metrics_storage_access_modes=['ReadWriteOnce']
openshift_metrics_storage_host=storage.home.nicknach.net
openshift_metrics_storage_nfs_directory=/data/openshift/enterprise
#openshift_metrics_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_metrics_storage_volume_name=metrics
openshift_metrics_storage_volume_size=15Gi
openshift_metrics_storage_labels={'storage':'metrics'}
openshift_metrics_hawkular_nodeselector={'region':'infra'}
openshift_metrics_heapster_nodeselector={'region':'infra'}
openshift_metrics_cassandra_nodeselector={'region':'infra'}

# prometheus on nfs
openshift_hosted_prometheus_deploy=true
openshift_prometheus_storage_kind=nfs
openshift_prometheus_storage_access_modes=['ReadWriteOnce']
openshift_prometheus_storage_host=storage.home.nicknach.net
openshift_prometheus_storage_nfs_directory=/data/openshift/enterprise
#openshift_prometheus_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_prometheus_storage_volume_name=prometheus
openshift_prometheus_storage_volume_size=7Gi
openshift_prometheus_storage_labels={'storage':'prometheus'}
openshift_prometheus_node_selector={'region':'infra'}
openshift_prometheus_storage_type='pvc'
# For prometheus-alertmanager
openshift_prometheus_alertmanager_storage_kind=nfs
openshift_prometheus_alertmanager_storage_access_modes=['ReadWriteOnce']
openshift_prometheus_alertmanager_storage_host=storage.home.nicknach.net
openshift_prometheus_alertmanager_storage_nfs_directory=/data/openshift/enterprise
#openshift_prometheus_alertmanager_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_prometheus_alertmanager_storage_volume_name=prometheus-alertmanager
openshift_prometheus_alertmanager_storage_volume_size=6Gi
openshift_prometheus_alertmanager_storage_labels={'storage':'prometheus-alertmanager'}
openshift_prometheus_alertmanager_storage_type='pvc'
# For prometheus-alertbuffer
openshift_prometheus_alertbuffer_storage_kind=nfs
openshift_prometheus_alertbuffer_storage_access_modes=['ReadWriteOnce']
openshift_prometheus_alertbuffer_storage_host=storage.home.nicknach.net
openshift_prometheus_alertbuffer_storage_nfs_directory=/data/openshift/enterprise
#openshift_prometheus_alertbuffer_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_prometheus_alertbuffer_storage_volume_name=prometheus-alertbuffer
openshift_prometheus_alertbuffer_storage_volume_size=5Gi
openshift_prometheus_alertbuffer_storage_labels={'storage':'prometheus-alertbuffer'}
openshift_prometheus_alertbuffer_storage_type='pvc'

# disable checks
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability,package_availability,package_version

# cluster stuff (uncomment for multi-master mode)
openshift_master_cluster_method=native
openshift_master_cluster_hostname=api.ocp.nicknach.net
openshift_master_cluster_public_hostname=console.ocp.nicknach.net


## cns
openshift_storage_glusterfs_namespace=app-storage
openshift_storage_glusterfs_storageclass=true
#openshift_hosted_registry_storage_kind=glusterfs
#openshift_metrics_install_metrics=true
#openshift_metrics_storage_kind=dynamic
#openshift_logging_es_pvc_size=10Gi
#openshift_logging_install_logging=true
#openshift_logging_storage_kind=dynamic
#openshift_storage_glusterfs_block_deploy=true
#openshift_storage_glusterfs_registry_namespace=infra-storage
#openshift_storage_glusterfs_registry_storageclass=false
#openshift_storage_glusterfs_registry_block_deploy=true
#openshift_storage_glusterfs_registry_block_host_vol_size=50
#openshift_storage_glusterfs_registry_block_storageclass=true
#openshift_storage_glusterfs_registry_block_storageclass_default=true
#openshift_storageclass_default=false

##  cloud provider configs
##  AWS
#openshift_cloudprovider_kind=aws
#openshift_cloudprovider_aws_access_key=
#openshift_cloudprovider_aws_secret_key=
##  GCE
#openshift_cloudprovider_kind=gce
##  Openstack
#openshift_cloudprovider_kind=openstack
#openshift_cloudprovider_openstack_auth_url=https://controller.home.nicknach.com:35357/v2.0
#openshift_cloudprovider_openstack_username=svc-openshift-np
#openshift_cloudprovider_openstack_password=kX7mE10dkX7mE10d
#openshift_cloudprovider_openstack_tenant_id=f741ba7204ec47c9886c050891dd592e
#openshift_cloudprovider_openstack_tenant_name=nn-dev
#openshift_cloudprovider_openstack_region=RegionOne
#openshift_cloudprovider_openstack_lb_subnet_id=d7c61f2a-d591-461d-af28-308ade046c0d

## set the router region
#openshift_hosted_manage_router=true
#openshift_hosted_router_selector=region=infra

## domain stuff
openshift_master_default_subdomain=apps.ocp.nicknach.net

## network stuff
#os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
# set these if you are behind a proxy
#openshift_http_proxy=http://192.168.0.254:3128
#openshift_https_proxy=http://192.168.0.254:3128
#openshift_no_proxy=

## use these if there is a conflict with the docker bridge and/or SDN networks
#osm_cluster_network_cidr=10.129.0.0/14
#openshift_portal_net=172.31.0.0/16

## use these if you want to switch the console/api port to something other that 8443
#openshift_master_public_api_url=https://api.ocp.nicknach.net:443
#openshift_master_public_console_url=https://console.ocp.nicknach.net:443/console
#openshift_master_api_port=443
#openshift_master_console_port=443

## adjust max pods for scale testing
#openshift_node_kubelet_args={'pods-per-core': ['15'], 'max-pods': ['500'], 'image-gc-high-threshold': ['85'], 'image-gc-low-threshold': ['80']}
## adjust scheduler
#osm_controller_args={'node-monitor-period': ['2s'], 'node-monitor-grace-period': ['16s'], 'pod-eviction-timeout': ['30s']}
#osm_controller_args={'resource-quota-sync-period': ['10s']}

## load balancer
[lb]
lb.ocp.nicknach.net

## host group for etcd (uncomment for multi-master)
[etcd]
master01.ocp.nicknach.net
master02.ocp.nicknach.net
master03.ocp.nicknach.net

## host group for masters
[masters]
master01.ocp.nicknach.net
master02.ocp.nicknach.net
master03.ocp.nicknach.net

[nodes]
master01.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'a', 'role': 'master'}" openshift_schedulable=true
master02.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'a', 'role': 'master'}" openshift_schedulable=true
master03.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'a', 'role': 'master'}" openshift_schedulable=true
infra01.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'a', 'role': 'infra'}" openshift_schedulable=true
infra02.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'a', 'role': 'infra'}" openshift_schedulable=true
infra03.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'a', 'role': 'infra'}" openshift_schedulable=true
node01.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'a', 'role': 'compute'}" openshift_schedulable=true
node02.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'a', 'role': 'compute'}" openshift_schedulable=true
node03.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'a', 'role': 'compute'}" openshift_schedulable=true

## if using gluster (Container Native Storage)
[glusterfs]
node01.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
node02.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
node03.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'

#[glusterfs_registry]
#infra01.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
#infra02.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
#infra03.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'

[new_nodes]
## hold for use when adding new nodes

Comment 5 Nicholas Schuetz 2018-04-09 17:13:31 UTC

Created attachment 1419442 [details]
dockerd-current logs on master01

here's a sample of the excessive logs being displayed on master01

Comment 6 Nicholas Schuetz 2018-04-09 17:15:09 UTC

ah ha!  the apiserver (deployed to master01) appears to be the culprit. 


kube-service-catalog                apiserver-lc6gd                               1/1       Running     1          4d        10.129.0.9    master01.ocp.nicknach.net

Comment 7 Scott Dodson 2018-04-09 20:09:14 UTC

This was fixed in master here https://github.com/openshift/openshift-ansible/pull/7681/files#diff-f5c4b4675369f72d180a86be3772fe87R43

Comment 8 Scott Dodson 2018-04-09 20:09:41 UTC

Needs a backport of at least the verbosity log changes.

Comment 9 Jay Boyd 2018-04-11 13:17:31 UTC

PR for 3.9:   https://github.com/openshift/openshift-ansible/pull/7910

Comment 10 Jay Boyd 2018-04-13 12:13:43 UTC

merged on april 12

Comment 12 Weihua Meng 2018-04-17 15:01:45 UTC

Fixed.
openshift-ansible-3.9.22-1.git.0.2e15102.el7.noarch.rpm

# oc describe pod apiserver-jc96t
    Command:
      /usr/bin/service-catalog
    Args:
      apiserver
      --storage-type
      etcd
      --secure-port
      6443
      --etcd-servers
      https://qe-wmengrpm39-master-etcd-1:2379
      --etcd-cafile
      /etc/origin/master/master.etcd-ca.crt
      --etcd-certfile
      /etc/origin/master/master.etcd-client.crt
      --etcd-keyfile
      /etc/origin/master/master.etcd-client.key
      -v
      3

# oc describe pod controller-manager-r9nnt
    Command:
      /usr/bin/service-catalog
    Args:
      controller-manager
      --port
      8080
      -v
      3

Comment 18 errata-xmlrpc 2018-05-17 06:43:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1566