Description of problem: when running this script https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh on OSE master. Met no execute permission error: + ssh root.79.154 env KUBECONFIG=/tmp/openshift-sdn-debug-v9h5Nnl0A/.kubeconfig /tmp/openshift-sdn-debug-v9h5Nnl0A/debug.sh --node Warning: Permanently added '10.66.79.154' (ECDSA) to the list of known hosts. env: /tmp/openshift-sdn-debug-v9h5Nnl0A/debug.sh: Permission denied Version-Release number of selected component (if applicable): https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh How reproducible: always Steps to Reproduce: 1. setup ose multi-node env 2. make sure master can ssh all node without password 3. run below debug.sh on master https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh Actual results: As description Expected results: no this error and can collect node information Additional info:
Also met the following error when running the script on local(non-master/node): Analyzing openshift-161.lab.eng.nay.redhat.com (10.66.79.161) sed: can't read ${CONFIG_FILE}: No such file or directory Could not find node name in ${CONFIG_FILE}
Both of these problems are fixed by https://github.com/openshift/openshift-sdn/pull/191
fixed in openshift-sdn master
still meet some error, you can see https://paste.fedoraproject.org/284326/44600099/
paste urls get deleted after a while; it's better to attach files to the bug. Anyway, the relevant bit here is: > /tmp/openshift-sdn-debug-41J6TeEeo/debug.sh: line 106: no: No such file or directory > /tmp/openshift-sdn-debug-41J6TeEeo/debug.sh: eval: line 103: syntax error near unexpected token `newline' > /tmp/openshift-sdn-debug-41J6TeEeo/debug.sh: eval: line 103: `pod_node=value>' That suggests that this command is unexpectedly outputting junk: > oc get pods --all-namespaces --template '{{range .items}}{{if .status.containerStatuses}}{{if not .spec.hostNetwork}}{{.spec.nodeName}}:{{.metadata.name}}:{{.metadata.namespace}}:{{.status.podIP}}:{{printf "%.21s" (index .status.containerStatuses 0).containerID}} {{end}}{{end}}{{end}}' what does that output if you run it? (Also, this seems unrelated to the original 'execute permission' problem. At first I thought it might be because you were using "sh -x", but I can't reproduce the problem here even with that. So I think it's because there's something unexpected about your setup that debug.sh isn't dealing with.)
Here get the wrong container Id, should filter off 'docker://': openshift-145.lab.eng.nay.redhat.com:docker-registry-2-0nyal:default:10.1.1.5:docker://042d5f6ff8b3 oc get pods --all-namespaces --template '{{range .items}}{{if .status.containerStatuses}}{{if not .spec.hostNetwork}}{{.spec.nodeName}}:{{.metadata.name}}:{{.metadata.namespace}}:{{.status.podIP}}:{{printf "%.21s" (index .status.containerStatuses 0).containerID}} {{end}}{{end}}{{end}}' openshift-145.lab.eng.nay.redhat.com:docker-registry-2-0nyal:default:10.1.1.5:docker://042d5f6ff8b3 openshift-145.lab.eng.nay.redhat.com:docker-registry-2-0swp1:default:10.1.1.6:docker://a67b8a16750f openshift-145.lab.eng.nay.redhat.com:nodejs-example-1-build:xiama:<no value>:docker://6e5c97786244 openshift-145.lab.eng.nay.redhat.com:nodejs-example-2-build:xiama:<no value>:docker://ae39ee4ccefd openshift-145.lab.eng.nay.redhat.com:nodejs-example-2-x4oan:xiama:10.1.1.13:docker://00ce994dece2
oh, sorry my fault, please ignore comment 6 The root reason should '<no value>' make the pod_id as nil output: > oc get pods --all-namespaces --template '{{range .items}}{{if .status.containerStatuses}}{{if not .spec.hostNetwork}}{{.spec.nodeName}}:{{.metadata.name}}:{{.metadata.namespace}}:{{.status.podIP}}:{{printf "%.21s" (index .status.containerStatuses 0).containerID}} {{end}}{{end}}{{end}}' openshift-145.lab.eng.nay.redhat.com:docker-registry-2-0nyal:default:10.1.1.5:docker://042d5f6ff8b3 openshift-145.lab.eng.nay.redhat.com:docker-registry-2-0swp1:default:10.1.1.6:docker://a67b8a16750f openshift-145.lab.eng.nay.redhat.com:nodejs-example-1-build:xiama:<no value>:docker://6e5c97786244 openshift-145.lab.eng.nay.redhat.com:nodejs-example-2-build:xiama:<no value>:docker://ae39ee4ccefd openshift-145.lab.eng.nay.redhat.com:nodejs-example-2-x4oan:xiama:10.1.1.13:docker://00ce994dece2 some pods do not have 'PodIP" likes the build pod: # oc get pod nodejs-example-1-build NAME READY STATUS RESTARTS AGE nodejs-example-1-build 0/1 Completed 0 2h <--snip---> "status": { "phase": "Succeeded", "conditions": [ { "type": "Ready", "status": "False", "lastProbeTime": null, "lastTransitionTime": "2015-10-29T01:16:42Z", "reason": "ContainersNotReady", "message": "containers with unready status: [sti-build]" } ], "hostIP": "10.66.79.145", "startTime": "2015-10-29T01:13:31Z", "containerStatuses": [
https://raw.githubusercontent.com/danwinship/openshift-sdn/debug-more-fixes/hack/debug.sh should fix that
I think you mean 'https://raw.githubusercontent.com/danwinship/openshift-sdn/debug-pods-with-no-ip/hack/debug.sh I tested using this script, this issue has been fixed will verify this issue once it's merged.
oops, yes, that branch. it's merged now.
verified this bug