Description of problem: When running a cluster on AWS using S3 as the back-end storage for docker registry and using the oadm diagnostics command an error occurs. Version-Release number of selected component (if applicable): How reproducible: Deploy cluster on AWS with S3 backed registry and run oadm diagnostics Steps to Reproduce: 1. deploy cluster on aws 2. back docker registry with S3 bucket 3. oadm diagnostics Actual results: ERROR: [DClu1007 from diagnostic ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:209] The "docker-registry" service has multiple associated pods each using ephemeral storage. These are likely to have inconsistent stores of images. Builds and deployments that use images from the registry may fail sporadically. Use a single registry or add a shared storage volume to the registries.<scollier> ERROR: [DClu1007 from diagnostic ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:209] The "docker-registry" service has multiple associated pods each using ephemeral storage. These are likely to have inconsistent stores of images. Builds and deployments that use images from the registry may fail sporadically. Use a single registry or add a shared storage volume to the registries. Expected results: No error. When manually browsing the S3 bucket all files are in place Additional info: I believe this is a new health check within the latest release as I never received it before 3 weeks ago.
This occurs with the registry scaled up and using backing S3 storage. It should check for the presence of user-replaced config as described under https://docs.openshift.com/enterprise/3.2/install_config/install/docker_registry.html#storage-for-the-registry and assume the user knows what they're doing if it is present.
I have a PR at https://github.com/openshift/origin/pull/10313
Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/d7d58deba96d942f244490509b3b933ffe5659c5 diagnostics: fix bug 1359771
Confirmed with ami:devenv-rhel7_4801, the bug has fixed: openshift version openshift v1.3.0-alpha.3+e1e7edb kubernetes v1.3.0+507d3a7 etcd 2.3.0+git oadm diagnostics --config=openshift.local.config/master/admin.kubeconfig [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at 'openshift.local.config/master/admin.kubeconfig' Info: Successfully read a client config file at '/openshift.local.config/master/admin.kubeconfig' Info: Using context for cluster-admin access: 'default/172-18-8-237:8443/system:admin' [Note] Running diagnostic: ConfigContexts[default/172-18-8-237:8443/system:admin] Description: Validate client config context is complete and has connectivity Info: The current client config context is 'default/172-18-8-237:8443/system:admin': The server URL is 'https://172.18.8.237:8443' The user authentication is 'system:admin/172-18-8-237:8443' The current project is 'default' Successfully requested project list; has access to project(s): [default kube-system openshift openshift-infra test zhouy] [Note] Running diagnostic: ConfigContexts[default/ec2-54-196-94-236-compute-1-amazonaws-com:8443/system:admin] Description: Validate client config context is complete and has connectivity Info: For client config context 'default/ec2-54-196-94-236-compute-1-amazonaws-com:8443/system:admin': The server URL is 'https://ec2-54-196-94-236.compute-1.amazonaws.com:8443' The user authentication is 'system:admin/172-18-8-237:8443' The current project is 'default' Successfully requested project list; has access to project(s): [kube-system openshift openshift-infra test zhouy default] [Note] Running diagnostic: DiagnosticPod Description: Create a pod to run diagnostics from the application standpoint WARN: [DCli2006 from diagnostic DiagnosticPod@openshift/origin/pkg/diagnostics/client/run_diagnostics_pod.go:134] Timed out preparing diagnostic pod logs for streaming, so this diagnostic cannot run. It is likely that the image 'openshift/origin-deployer:v1.3.0-alpha.3' was not pulled and running yet. Last error: (*errors.StatusError[2]) container "pod-diagnostics" in pod "pod-diagnostic-test-jxd7s" is waiting to start: ContainerCreating [Note] Running diagnostic: ClusterRegistry Description: Check that there is a working Docker registry [Note] Running diagnostic: ClusterRoleBindings Description: Check that the default ClusterRoleBindings are present and contain the expected subjects [Note] Running diagnostic: ClusterRoles Description: Check that the default ClusterRoles are present and contain the expected permissions [Note] Running diagnostic: ClusterRouterName Description: Check there is a working router WARN: [DClu2001 from diagnostic ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:129] There is no "router" DeploymentConfig. The router may have been named something different, in which case this warning may be ignored. A router is not strictly required; however it is needed for accessing pods from external networks and its absence likely indicates an incomplete installation of the cluster. Use the 'oadm router' command to create a router. [Note] Running diagnostic: MasterNode Description: Check if master is also running node (for Open vSwitch) Info: Found a node with same IP as master: ip-172-18-8-237.ec2.internal [Note] Skipping diagnostic: MetricsApiProxy Description: Check the integrated heapster metrics can be reached via the API proxy Because: The heapster service does not exist in the openshift-infra project at this time, so it is not available for the Horizontal Pod Autoscaler to use as a source of metrics. [Note] Running diagnostic: NodeDefinitions Description: Check node records on master [Note] Skipping diagnostic: ServiceExternalIPs Description: Check for existing services with ExternalIPs that are disallowed by master config Because: No master config file was detected [Note] Summary of diagnostics execution (version v1.3.0-alpha.3+e1e7edb): [Note] Warnings seen: 2
Moving to MODIFIED for enterprise to manage.
This has been merged into ose and is in OSE v3.3.0.28 or newer.
confirmed with latest OCP, the issue has fixed: openshift version openshift v3.3.0.28 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git [root@ip-172-18-10-128 ~]# oadm diagnostics [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' Info: Using context for cluster-admin access: 'default/ip-172-18-10-128-ec2-internal:8443/system:admin' [Note] Performing systemd discovery [Note] Running diagnostic: ConfigContexts[default/ec2-54-161-124-51-compute-1-amazonaws-com:8443/system:admin] Description: Validate client config context is complete and has connectivity Info: For client config context 'default/ec2-54-161-124-51-compute-1-amazonaws-com:8443/system:admin': The server URL is 'https://ec2-54-161-124-51.compute-1.amazonaws.com:8443' The user authentication is 'system:admin/ip-172-18-10-128-ec2-internal:8443' The current project is 'default' Successfully requested project list; has access to project(s): [logging management-infra openshift openshift-infra default install-test kube-system] [Note] Running diagnostic: ConfigContexts[default/ip-172-18-10-128-ec2-internal:8443/system:admin] Description: Validate client config context is complete and has connectivity Info: The current client config context is 'default/ip-172-18-10-128-ec2-internal:8443/system:admin': The server URL is 'https://ip-172-18-10-128.ec2.internal:8443' The user authentication is 'system:admin/ip-172-18-10-128-ec2-internal:8443' The current project is 'default' Successfully requested project list; has access to project(s): [management-infra openshift openshift-infra default install-test kube-system logging] [Note] Running diagnostic: DiagnosticPod Description: Create a pod to run diagnostics from the application standpoint Info: Output from the diagnostic pod (image openshift3/ose-deployer:v3.3.0.28): [Note] Running diagnostic: PodCheckAuth Description: Check that service account credentials authenticate as expected Info: Service account token successfully authenticated to master Info: Service account token was authenticated by the integrated registry. [Note] Running diagnostic: PodCheckDns Description: Check that DNS within a pod works as expected [Note] Summary of diagnostics execution (version v3.3.0.28): [Note] Completed with no errors or warnings seen. [Note] Running diagnostic: ClusterRegistry Description: Check that there is a working Docker registry [Note] Running diagnostic: ClusterRoleBindings Description: Check that the default ClusterRoleBindings are present and contain the expected subjects Info: clusterrolebinding/cluster-readers has more subjects than expected. Use the `oadm policy reconcile-cluster-role-bindings` command to update the role binding to remove extra subjects. Info: clusterrolebinding/cluster-readers has extra subject {ServiceAccount management-infra management-admin }. [Note] Running diagnostic: ClusterRoles Description: Check that the default ClusterRoles are present and contain the expected permissions [Note] Running diagnostic: ClusterRouterName Description: Check there is a working router [Note] Running diagnostic: MasterNode Description: Check if master is also running node (for Open vSwitch) Info: Found a node with same IP as master: ip-172-18-10-128.ec2.internal [Note] Skipping diagnostic: MetricsApiProxy Description: Check the integrated heapster metrics can be reached via the API proxy Because: The heapster service does not exist in the openshift-infra project at this time, so it is not available for the Horizontal Pod Autoscaler to use as a source of metrics. [Note] Running diagnostic: NodeDefinitions Description: Check node records on master WARN: [DClu0003 from diagnostic NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112] Node ip-172-18-10-128.ec2.internal is ready but is marked Unschedulable. This is usually set manually for administrative reasons. An administrator can mark the node schedulable with: oadm manage-node ip-172-18-10-128.ec2.internal --schedulable=true While in this state, pods should not be scheduled to deploy on the node. Existing pods will continue to run until completed or evacuated (see other options for 'oadm manage-node'). [Note] Running diagnostic: ServiceExternalIPs Description: Check for existing services with ExternalIPs that are disallowed by master config [Note] Running diagnostic: AnalyzeLogs Description: Check for recent problems in systemd service logs Info: Checking journalctl logs for 'atomic-openshift-master' service Info: Checking journalctl logs for 'atomic-openshift-node' service WARN: [DS2005 from diagnostic AnalyzeLogs@openshift/origin/pkg/diagnostics/systemd/analyze_logs.go:120] Found 'atomic-openshift-node' journald log message: W0901 21:43:32.203430 16664 subnets.go:236] Could not find an allocated subnet for node: ip-172-18-10-128.ec2.internal, Waiting... This warning occurs when the node is trying to request the SDN subnet it should be configured with according to the master, but either can't connect to it or has not yet been assigned a subnet. This can occur before the master becomes fully available and defines a record for the node to use; the node will wait until that occurs, so the presence of this message in the node log isn't necessarily a problem as long as the SDN is actually working, but this message may help indicate the problem if it is not working. If the master is available and this log message persists, then it may be a sign of a different misconfiguration. Check the master's URL in the node kubeconfig. * Is the protocol http? It should be https. * Can you reach the address and port from the node using curl -k? Info: Checking journalctl logs for 'docker' service [Note] Running diagnostic: MasterConfigCheck Description: Check the master config file WARN: [DH0005 from diagnostic MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:52] Validation of master config file '/etc/origin/master/master-config.yaml' warned: assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console [Note] Running diagnostic: NodeConfigCheck Description: Check the node config file Info: Found a node config file: /etc/origin/node/node-config.yaml [Note] Running diagnostic: UnitStatus Description: Check status for related systemd units [Note] Summary of diagnostics execution (version v3.3.0.28): [Note] Warnings seen: 3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933