Bug 1564179
| Summary: | first master's docker-daemon has higher debug level than the rest of the cluster | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Nicholas Schuetz <nick> | ||||
| Component: | Installer | Assignee: | Jay Boyd <jaboyd> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Weihua Meng <wmeng> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.9.0 | CC: | aos-bugs, jokerman, mmccomas, nick, smunilla, wsun | ||||
| Target Milestone: | --- | Flags: | wmeng:
needinfo-
|
||||
| Target Release: | 3.9.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause: Service Catalog pods had a high log verbosity set by default.
Consequence: Service Catalog pods on master node produced large amount of log data.
Fix: default log verbosity reset to a lower level.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-05-17 06:43:35 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Nicholas Schuetz
2018-04-05 14:57:06 UTC
It's worth noting that this occurs whether i use the first master as the install host or not. I've got the same result when using an external bastion host (not on a cluster member node). Also, i use 10GB root vol sizes... so it's possible that there is a weekly job that "vacuums" the journal and docker is filling it up before that can occur (within a couple of days). This will likely be an issue on AWS/Cloud installs where the default root disk is also 10GB in size. The journal is filling up at a rate of 528.0M / per day. Again, this is default behavior and i have to add a script in /etc/cron.daily to keep it from causing master01 to fail after a few days. Have you been able to confirm whether or not this is triggered by the installer? I don't see any code in 3.9 that would trigger docker logging to be set to debug level. Which specific option is set incorrectly? Can you provide your inventory so we have a proposed reproducer? hosts file used for deployment below.
[OSEv3:children]
masters
nodes
new_nodes
etcd
lb
glusterfs
## Set variables common for all OSEv3 hosts
[OSEv3:vars]
openshift_deployment_type=openshift-enterprise
#openshift_deployment_type=origin
#containerized=true
##internal image repos
##openshift_additional_repos=[{'id': 'ose-devel', 'name': 'rhaos-3.9', 'baseurl': 'http://repo.home.nicknach.net/repo/rhaos-3.9', 'enabled': 1, 'gpgcheck': 0}]
openshift_docker_additional_registries=repo.home.nicknach.net
openshift_docker_insecure_registries=repo.home.nicknach.net
openshift_docker_blocked_registries=registry.access.redhat.com,docker.io
oreg_url=repo.home.nicknach.net/openshift3/ose-${component}:${version}
openshift_examples_modify_imagestreams=true
openshift_metrics_image_prefix=repo.home.nicknach.net/openshift3/
openshift_metrics_image_version=v3.9.14
openshift_logging_image_prefix=repo.home.nicknach.net/openshift3/
openshift_logging_image_version=v3.9.14
ansible_service_broker_image_prefix=repo.home.nicknach.net/openshift3/ose-
ansible_service_broker_image_tag=v3.9.14
ansible_service_broker_etcd_image_prefix=repo.home.nicknach.net/rhel7/
ansible_service_broker_etcd_image_tag=latest
openshift_service_catalog_image_prefix=repo.home.nicknach.net/openshift3/ose-
openshift_service_catalog_image_version=v3.9.14
openshift_cockpit_deployer_prefix=repo.home.nicknach.net/openshift3/
openshift_web_console_prefix=repo.home.nicknach.net/openshift3/ose-
openshift_web_console_version=v3.9.14
openshift_prometheus_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_image_version=v3.9.14
openshift_prometheus_alertmanager_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_alertmanager_image_version=v3.9.14
openshift_prometheus_alertbuffer_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_alertbuffer_image_version=v3.9.14
openshift_prometheus_node_exporter_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_node_exporter_image_version=v3.9.14
openshift_prometheus_proxy_image_prefix=repo.home.nicknach.net/openshift3/
openshift_prometheus_proxy_image_version=v3.9.14
template_service_broker_prefix=repo.home.nicknach.net/openshift3/ose-
template_service_broker_version=v3.9.14
openshift_storage_glusterfs_image=repo.home.nicknach.net/rhgs3/rhgs-server-rhel7
openshift_storage_glusterfs_version=latest
openshift_storage_glusterfs_heketi_image=repo.home.nicknach.net/rhgs3/rhgs-volmanager-rhel7
openshift_storage_glusterfs_heketi_version=latest
# release ver
#openshift_release=v3.9.14
#openshift_image_tag=v3.9.14
## enable ntp
#openshift_clock_enabled=false
## disable template imports
#openshift_install_examples=false
## If ansible_ssh_user is not root, ansible_sudo must be set to true
ansible_ssh_user=root
#ansible_ssh_user=cloud-user
#ansible_sudo=true
#ansible_become=yes
## authentication stuff
## htpasswd file auth
#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
#openshift_master_htpasswd_users={'ocpuser':'welcome1'}
## ldap auth (AD)
#openshift_master_identity_providers=[{"name":"NNWIN","challenge":true,"login":true,"kind":"LDAPPasswordIdentityProvider","attributes":{"id":["dn"],"email":["mail"],"name":["cn"],"preferredUsername":["sAMAccountName"]},"bindDN":"CN=SVC-nn-ose,OU=SVC,OU=FNA,DC=nnwin,DC=ad,DC=nncorp,DC=com","bindPassword":"<REDACTED>","insecure":true,"url":"ldap://uswin.nicknach.com:389/DC=uswin,DC=ad,DC=nncorp,DC=com?sAMAccountName?sub"}]
#openshift_master_ldap_ca_file=/etc/ssl/certs/NNWINDC_Cert_Chain.pem
## ldap auth (IPA)
openshift_master_identity_providers=[{"name":"myipa","challenge":true,"login":true,"kind":"LDAPPasswordIdentityProvider","attributes":{"id":["dn"],"email":["mail"],"name":["cn"],"preferredUsername":["uid"]},"bindDN":"","bindPassword":"","ca":"my-ldap-ca-bundle.crt","insecure":false,"url":"ldap://gw.home.nicknach.net/cn=users,cn=accounts,dc=home,dc=nicknach,dc=net?uid"}]
openshift_master_ldap_ca_file=~/my-ldap-ca-bundle.crt
#openshift_master_named_certificates=[{"certfile": "/etc/origin/master/ocp.nicknach.net.crt", "keyfile": "/etc/origin/master/ocp.nicknach.net.key", "names": ["console.ocp.nicknach.net"]}]
#openshift_master_overwrite_named_certificates=false
## registry on nfs
openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_host=storage.home.nicknach.net
openshift_hosted_registry_storage_nfs_directory=/data/openshift/enterprise
#openshift_hosted_registry_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_hosted_registry_storage_volume_name=docker-registry
openshift_hosted_registry_storage_volume_size=20Gi
# etcd on nfs
openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_host=storage.home.nicknach.net
openshift_hosted_etcd_storage_nfs_directory=/data/openshift/enterprise
#openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_volume_name=etcd
openshift_hosted_etcd_storage_volume_size=1Gi
openshift_hosted_etcd_storage_labels={'storage':'etcd'}
# logging on nfs
openshift_logging_install_logging=true
openshift_logging_storage_kind=nfs
openshift_logging_storage_access_modes=['ReadWriteOnce']
openshift_logging_storage_host=storage.home.nicknach.net
openshift_logging_storage_nfs_directory=/data/openshift/enterprise
#openshift_logging_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_logging_storage_volume_name=logging
openshift_logging_storage_volume_size=10Gi
openshift_logging_storage_labels={'storage':'logging'}
openshift_logging_es_pv_selector=region=infra
# metrics on nfs
openshift_metrics_install_metrics=true
openshift_metrics_storage_kind=nfs
openshift_metrics_storage_access_modes=['ReadWriteOnce']
openshift_metrics_storage_host=storage.home.nicknach.net
openshift_metrics_storage_nfs_directory=/data/openshift/enterprise
#openshift_metrics_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_metrics_storage_volume_name=metrics
openshift_metrics_storage_volume_size=15Gi
openshift_metrics_storage_labels={'storage':'metrics'}
openshift_metrics_hawkular_nodeselector={'region':'infra'}
openshift_metrics_heapster_nodeselector={'region':'infra'}
openshift_metrics_cassandra_nodeselector={'region':'infra'}
# prometheus on nfs
openshift_hosted_prometheus_deploy=true
openshift_prometheus_storage_kind=nfs
openshift_prometheus_storage_access_modes=['ReadWriteOnce']
openshift_prometheus_storage_host=storage.home.nicknach.net
openshift_prometheus_storage_nfs_directory=/data/openshift/enterprise
#openshift_prometheus_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_prometheus_storage_volume_name=prometheus
openshift_prometheus_storage_volume_size=7Gi
openshift_prometheus_storage_labels={'storage':'prometheus'}
openshift_prometheus_node_selector={'region':'infra'}
openshift_prometheus_storage_type='pvc'
# For prometheus-alertmanager
openshift_prometheus_alertmanager_storage_kind=nfs
openshift_prometheus_alertmanager_storage_access_modes=['ReadWriteOnce']
openshift_prometheus_alertmanager_storage_host=storage.home.nicknach.net
openshift_prometheus_alertmanager_storage_nfs_directory=/data/openshift/enterprise
#openshift_prometheus_alertmanager_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_prometheus_alertmanager_storage_volume_name=prometheus-alertmanager
openshift_prometheus_alertmanager_storage_volume_size=6Gi
openshift_prometheus_alertmanager_storage_labels={'storage':'prometheus-alertmanager'}
openshift_prometheus_alertmanager_storage_type='pvc'
# For prometheus-alertbuffer
openshift_prometheus_alertbuffer_storage_kind=nfs
openshift_prometheus_alertbuffer_storage_access_modes=['ReadWriteOnce']
openshift_prometheus_alertbuffer_storage_host=storage.home.nicknach.net
openshift_prometheus_alertbuffer_storage_nfs_directory=/data/openshift/enterprise
#openshift_prometheus_alertbuffer_storage_nfs_options='*(rw,root_squash,sync,no_wdelay)'
openshift_prometheus_alertbuffer_storage_volume_name=prometheus-alertbuffer
openshift_prometheus_alertbuffer_storage_volume_size=5Gi
openshift_prometheus_alertbuffer_storage_labels={'storage':'prometheus-alertbuffer'}
openshift_prometheus_alertbuffer_storage_type='pvc'
# disable checks
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability,package_availability,package_version
# cluster stuff (uncomment for multi-master mode)
openshift_master_cluster_method=native
openshift_master_cluster_hostname=api.ocp.nicknach.net
openshift_master_cluster_public_hostname=console.ocp.nicknach.net
## cns
openshift_storage_glusterfs_namespace=app-storage
openshift_storage_glusterfs_storageclass=true
#openshift_hosted_registry_storage_kind=glusterfs
#openshift_metrics_install_metrics=true
#openshift_metrics_storage_kind=dynamic
#openshift_logging_es_pvc_size=10Gi
#openshift_logging_install_logging=true
#openshift_logging_storage_kind=dynamic
#openshift_storage_glusterfs_block_deploy=true
#openshift_storage_glusterfs_registry_namespace=infra-storage
#openshift_storage_glusterfs_registry_storageclass=false
#openshift_storage_glusterfs_registry_block_deploy=true
#openshift_storage_glusterfs_registry_block_host_vol_size=50
#openshift_storage_glusterfs_registry_block_storageclass=true
#openshift_storage_glusterfs_registry_block_storageclass_default=true
#openshift_storageclass_default=false
## cloud provider configs
## AWS
#openshift_cloudprovider_kind=aws
#openshift_cloudprovider_aws_access_key=
#openshift_cloudprovider_aws_secret_key=
## GCE
#openshift_cloudprovider_kind=gce
## Openstack
#openshift_cloudprovider_kind=openstack
#openshift_cloudprovider_openstack_auth_url=https://controller.home.nicknach.com:35357/v2.0
#openshift_cloudprovider_openstack_username=svc-openshift-np
#openshift_cloudprovider_openstack_password=kX7mE10dkX7mE10d
#openshift_cloudprovider_openstack_tenant_id=f741ba7204ec47c9886c050891dd592e
#openshift_cloudprovider_openstack_tenant_name=nn-dev
#openshift_cloudprovider_openstack_region=RegionOne
#openshift_cloudprovider_openstack_lb_subnet_id=d7c61f2a-d591-461d-af28-308ade046c0d
## set the router region
#openshift_hosted_manage_router=true
#openshift_hosted_router_selector=region=infra
## domain stuff
openshift_master_default_subdomain=apps.ocp.nicknach.net
## network stuff
#os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
# set these if you are behind a proxy
#openshift_http_proxy=http://192.168.0.254:3128
#openshift_https_proxy=http://192.168.0.254:3128
#openshift_no_proxy=
## use these if there is a conflict with the docker bridge and/or SDN networks
#osm_cluster_network_cidr=10.129.0.0/14
#openshift_portal_net=172.31.0.0/16
## use these if you want to switch the console/api port to something other that 8443
#openshift_master_public_api_url=https://api.ocp.nicknach.net:443
#openshift_master_public_console_url=https://console.ocp.nicknach.net:443/console
#openshift_master_api_port=443
#openshift_master_console_port=443
## adjust max pods for scale testing
#openshift_node_kubelet_args={'pods-per-core': ['15'], 'max-pods': ['500'], 'image-gc-high-threshold': ['85'], 'image-gc-low-threshold': ['80']}
## adjust scheduler
#osm_controller_args={'node-monitor-period': ['2s'], 'node-monitor-grace-period': ['16s'], 'pod-eviction-timeout': ['30s']}
#osm_controller_args={'resource-quota-sync-period': ['10s']}
## load balancer
[lb]
lb.ocp.nicknach.net
## host group for etcd (uncomment for multi-master)
[etcd]
master01.ocp.nicknach.net
master02.ocp.nicknach.net
master03.ocp.nicknach.net
## host group for masters
[masters]
master01.ocp.nicknach.net
master02.ocp.nicknach.net
master03.ocp.nicknach.net
[nodes]
master01.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'a', 'role': 'master'}" openshift_schedulable=true
master02.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'a', 'role': 'master'}" openshift_schedulable=true
master03.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'a', 'role': 'master'}" openshift_schedulable=true
infra01.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'a', 'role': 'infra'}" openshift_schedulable=true
infra02.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'a', 'role': 'infra'}" openshift_schedulable=true
infra03.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'a', 'role': 'infra'}" openshift_schedulable=true
node01.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'a', 'role': 'compute'}" openshift_schedulable=true
node02.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'a', 'role': 'compute'}" openshift_schedulable=true
node03.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'a', 'role': 'compute'}" openshift_schedulable=true
## if using gluster (Container Native Storage)
[glusterfs]
node01.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
node02.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
node03.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
#[glusterfs_registry]
#infra01.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
#infra02.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
#infra03.ocp.nicknach.net glusterfs_devices='[ "/dev/vdc" ]'
[new_nodes]
## hold for use when adding new nodes
Created attachment 1419442 [details]
dockerd-current logs on master01
here's a sample of the excessive logs being displayed on master01
ah ha! the apiserver (deployed to master01) appears to be the culprit. kube-service-catalog apiserver-lc6gd 1/1 Running 1 4d 10.129.0.9 master01.ocp.nicknach.net This was fixed in master here https://github.com/openshift/openshift-ansible/pull/7681/files#diff-f5c4b4675369f72d180a86be3772fe87R43 Needs a backport of at least the verbosity log changes. merged on april 12 Fixed.
openshift-ansible-3.9.22-1.git.0.2e15102.el7.noarch.rpm
# oc describe pod apiserver-jc96t
Command:
/usr/bin/service-catalog
Args:
apiserver
--storage-type
etcd
--secure-port
6443
--etcd-servers
https://qe-wmengrpm39-master-etcd-1:2379
--etcd-cafile
/etc/origin/master/master.etcd-ca.crt
--etcd-certfile
/etc/origin/master/master.etcd-client.crt
--etcd-keyfile
/etc/origin/master/master.etcd-client.key
-v
3
# oc describe pod controller-manager-r9nnt
Command:
/usr/bin/service-catalog
Args:
controller-manager
--port
8080
-v
3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1566 |