Bug 1359771
| Summary: | False error: oadm diagnostics reports emptyDIR error when using S3 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ryan Cook <rcook> |
| Component: | Node | Assignee: | Luke Meyer <lmeyer> |
| Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 3.2.1 | CC: | aos-bugs, erich, jokerman, lmeyer, mmccomas, tdawson |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
Diagnostics reported an error when the registry was not backed by a persistent storage volume on the pod, without considering alternative methods of storage.
Consequence:
If the registry had been reconfigured to use S3 as storage, diagnostics reported an error.
Fix:
Now this diagnostic checks to see if registry configuration has been customized and does not report an error if so. It is assumed the admin that does the configuration knows what they're doing.
Result:
No more false alerts on S3-configured registry.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-09-27 09:41:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ryan Cook
2016-07-25 12:29:46 UTC
This occurs with the registry scaled up and using backing S3 storage. It should check for the presence of user-replaced config as described under https://docs.openshift.com/enterprise/3.2/install_config/install/docker_registry.html#storage-for-the-registry and assume the user knows what they're doing if it is present. I have a PR at https://github.com/openshift/origin/pull/10313 Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/d7d58deba96d942f244490509b3b933ffe5659c5 diagnostics: fix bug 1359771 Confirmed with ami:devenv-rhel7_4801, the bug has fixed:
openshift version
openshift v1.3.0-alpha.3+e1e7edb
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git
oadm diagnostics --config=openshift.local.config/master/admin.kubeconfig
[Note] Determining if client configuration exists for client/cluster diagnostics
Info: Successfully read a client config file at 'openshift.local.config/master/admin.kubeconfig'
Info: Successfully read a client config file at '/openshift.local.config/master/admin.kubeconfig'
Info: Using context for cluster-admin access: 'default/172-18-8-237:8443/system:admin'
[Note] Running diagnostic: ConfigContexts[default/172-18-8-237:8443/system:admin]
Description: Validate client config context is complete and has connectivity
Info: The current client config context is 'default/172-18-8-237:8443/system:admin':
The server URL is 'https://172.18.8.237:8443'
The user authentication is 'system:admin/172-18-8-237:8443'
The current project is 'default'
Successfully requested project list; has access to project(s):
[default kube-system openshift openshift-infra test zhouy]
[Note] Running diagnostic: ConfigContexts[default/ec2-54-196-94-236-compute-1-amazonaws-com:8443/system:admin]
Description: Validate client config context is complete and has connectivity
Info: For client config context 'default/ec2-54-196-94-236-compute-1-amazonaws-com:8443/system:admin':
The server URL is 'https://ec2-54-196-94-236.compute-1.amazonaws.com:8443'
The user authentication is 'system:admin/172-18-8-237:8443'
The current project is 'default'
Successfully requested project list; has access to project(s):
[kube-system openshift openshift-infra test zhouy default]
[Note] Running diagnostic: DiagnosticPod
Description: Create a pod to run diagnostics from the application standpoint
WARN: [DCli2006 from diagnostic DiagnosticPod@openshift/origin/pkg/diagnostics/client/run_diagnostics_pod.go:134]
Timed out preparing diagnostic pod logs for streaming, so this diagnostic cannot run.
It is likely that the image 'openshift/origin-deployer:v1.3.0-alpha.3' was not pulled and running yet.
Last error: (*errors.StatusError[2]) container "pod-diagnostics" in pod "pod-diagnostic-test-jxd7s" is waiting to start: ContainerCreating
[Note] Running diagnostic: ClusterRegistry
Description: Check that there is a working Docker registry
[Note] Running diagnostic: ClusterRoleBindings
Description: Check that the default ClusterRoleBindings are present and contain the expected subjects
[Note] Running diagnostic: ClusterRoles
Description: Check that the default ClusterRoles are present and contain the expected permissions
[Note] Running diagnostic: ClusterRouterName
Description: Check there is a working router
WARN: [DClu2001 from diagnostic ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:129]
There is no "router" DeploymentConfig. The router may have been named
something different, in which case this warning may be ignored.
A router is not strictly required; however it is needed for accessing
pods from external networks and its absence likely indicates an incomplete
installation of the cluster.
Use the 'oadm router' command to create a router.
[Note] Running diagnostic: MasterNode
Description: Check if master is also running node (for Open vSwitch)
Info: Found a node with same IP as master: ip-172-18-8-237.ec2.internal
[Note] Skipping diagnostic: MetricsApiProxy
Description: Check the integrated heapster metrics can be reached via the API proxy
Because: The heapster service does not exist in the openshift-infra project at this time,
so it is not available for the Horizontal Pod Autoscaler to use as a source of metrics.
[Note] Running diagnostic: NodeDefinitions
Description: Check node records on master
[Note] Skipping diagnostic: ServiceExternalIPs
Description: Check for existing services with ExternalIPs that are disallowed by master config
Because: No master config file was detected
[Note] Summary of diagnostics execution (version v1.3.0-alpha.3+e1e7edb):
[Note] Warnings seen: 2
Moving to MODIFIED for enterprise to manage. This has been merged into ose and is in OSE v3.3.0.28 or newer. confirmed with latest OCP, the issue has fixed:
openshift version
openshift v3.3.0.28
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git
[root@ip-172-18-10-128 ~]# oadm diagnostics
[Note] Determining if client configuration exists for client/cluster diagnostics
Info: Successfully read a client config file at '/root/.kube/config'
Info: Using context for cluster-admin access: 'default/ip-172-18-10-128-ec2-internal:8443/system:admin'
[Note] Performing systemd discovery
[Note] Running diagnostic: ConfigContexts[default/ec2-54-161-124-51-compute-1-amazonaws-com:8443/system:admin]
Description: Validate client config context is complete and has connectivity
Info: For client config context 'default/ec2-54-161-124-51-compute-1-amazonaws-com:8443/system:admin':
The server URL is 'https://ec2-54-161-124-51.compute-1.amazonaws.com:8443'
The user authentication is 'system:admin/ip-172-18-10-128-ec2-internal:8443'
The current project is 'default'
Successfully requested project list; has access to project(s):
[logging management-infra openshift openshift-infra default install-test kube-system]
[Note] Running diagnostic: ConfigContexts[default/ip-172-18-10-128-ec2-internal:8443/system:admin]
Description: Validate client config context is complete and has connectivity
Info: The current client config context is 'default/ip-172-18-10-128-ec2-internal:8443/system:admin':
The server URL is 'https://ip-172-18-10-128.ec2.internal:8443'
The user authentication is 'system:admin/ip-172-18-10-128-ec2-internal:8443'
The current project is 'default'
Successfully requested project list; has access to project(s):
[management-infra openshift openshift-infra default install-test kube-system logging]
[Note] Running diagnostic: DiagnosticPod
Description: Create a pod to run diagnostics from the application standpoint
Info: Output from the diagnostic pod (image openshift3/ose-deployer:v3.3.0.28):
[Note] Running diagnostic: PodCheckAuth
Description: Check that service account credentials authenticate as expected
Info: Service account token successfully authenticated to master
Info: Service account token was authenticated by the integrated registry.
[Note] Running diagnostic: PodCheckDns
Description: Check that DNS within a pod works as expected
[Note] Summary of diagnostics execution (version v3.3.0.28):
[Note] Completed with no errors or warnings seen.
[Note] Running diagnostic: ClusterRegistry
Description: Check that there is a working Docker registry
[Note] Running diagnostic: ClusterRoleBindings
Description: Check that the default ClusterRoleBindings are present and contain the expected subjects
Info: clusterrolebinding/cluster-readers has more subjects than expected.
Use the `oadm policy reconcile-cluster-role-bindings` command to update the role binding to remove extra subjects.
Info: clusterrolebinding/cluster-readers has extra subject {ServiceAccount management-infra management-admin }.
[Note] Running diagnostic: ClusterRoles
Description: Check that the default ClusterRoles are present and contain the expected permissions
[Note] Running diagnostic: ClusterRouterName
Description: Check there is a working router
[Note] Running diagnostic: MasterNode
Description: Check if master is also running node (for Open vSwitch)
Info: Found a node with same IP as master: ip-172-18-10-128.ec2.internal
[Note] Skipping diagnostic: MetricsApiProxy
Description: Check the integrated heapster metrics can be reached via the API proxy
Because: The heapster service does not exist in the openshift-infra project at this time,
so it is not available for the Horizontal Pod Autoscaler to use as a source of metrics.
[Note] Running diagnostic: NodeDefinitions
Description: Check node records on master
WARN: [DClu0003 from diagnostic NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112]
Node ip-172-18-10-128.ec2.internal is ready but is marked Unschedulable.
This is usually set manually for administrative reasons.
An administrator can mark the node schedulable with:
oadm manage-node ip-172-18-10-128.ec2.internal --schedulable=true
While in this state, pods should not be scheduled to deploy on the node.
Existing pods will continue to run until completed or evacuated (see
other options for 'oadm manage-node').
[Note] Running diagnostic: ServiceExternalIPs
Description: Check for existing services with ExternalIPs that are disallowed by master config
[Note] Running diagnostic: AnalyzeLogs
Description: Check for recent problems in systemd service logs
Info: Checking journalctl logs for 'atomic-openshift-master' service
Info: Checking journalctl logs for 'atomic-openshift-node' service
WARN: [DS2005 from diagnostic AnalyzeLogs@openshift/origin/pkg/diagnostics/systemd/analyze_logs.go:120]
Found 'atomic-openshift-node' journald log message:
W0901 21:43:32.203430 16664 subnets.go:236] Could not find an allocated subnet for node: ip-172-18-10-128.ec2.internal, Waiting...
This warning occurs when the node is trying to request the
SDN subnet it should be configured with according to the master,
but either can't connect to it or has not yet been assigned a subnet.
This can occur before the master becomes fully available and defines a
record for the node to use; the node will wait until that occurs,
so the presence of this message in the node log isn't necessarily a
problem as long as the SDN is actually working, but this message may
help indicate the problem if it is not working.
If the master is available and this log message persists, then it may
be a sign of a different misconfiguration. Check the master's URL in
the node kubeconfig.
* Is the protocol http? It should be https.
* Can you reach the address and port from the node using curl -k?
Info: Checking journalctl logs for 'docker' service
[Note] Running diagnostic: MasterConfigCheck
Description: Check the master config file
WARN: [DH0005 from diagnostic MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:52]
Validation of master config file '/etc/origin/master/master-config.yaml' warned:
assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console
assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console
[Note] Running diagnostic: NodeConfigCheck
Description: Check the node config file
Info: Found a node config file: /etc/origin/node/node-config.yaml
[Note] Running diagnostic: UnitStatus
Description: Check status for related systemd units
[Note] Summary of diagnostics execution (version v3.3.0.28):
[Note] Warnings seen: 3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933 |