https://github.com/openshift/origin/issues/17464 -> related Upstream issue
a4/node_stack.txt.20180911_100543 goroutine 356 [chan receive, 1256 minutes]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/proxy/config.(*EndpointsConfig).Run.func2(0xc420062120, 0xc4217f5900) /builddir/build/BUILD/atomic-openshift-git-0.95ef3c2/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/proxy/config/config.go:138 +0x40 created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/proxy/config.(*EndpointsConfig).Run /builddir/build/BUILD/atomic-openshift-git-0.95ef3c2/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/proxy/config/config.go:140 +0xb9 goroutine 358 [chan receive, 1256 minutes]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/proxy/config.(*ServiceConfig).Run.func2(0xc420062120, 0xc421053080) /builddir/build/BUILD/atomic-openshift-git-0.95ef3c2/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/proxy/config/config.go:242 +0x40 created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/proxy/config.(*ServiceConfig).Run /builddir/build/BUILD/atomic-openshift-git-0.95ef3c2/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/proxy/config/config.go:244 +0xb9
Can we get the node changed to loglevel 4? That should give us more information on what the node is doing.
I failed to reproduce this. I deployed a busybox daemonset with a hostport and ran a watch on its iptables rule: Every 2.0s: sudo iptables -n -v -t nat -L KUBE-HOSTPORTS -w Tue Nov 20 17:42:30 2018 Chain KUBE-HOSTPORTS (2 references) pkts bytes target prot opt in out source destination 0 0 KUBE-HP-Q3TH7YJFUJLCPTEB tcp -- * * 0.0.0.0/0 0.0.0.0/0 / * hello-daemonset-826c2_testing hostport 14236 */ tcp dpt:14236 I also made sure there were NO other pods running on the node except for a fluentd pod from a fluentd daemonset: $ oc get pod -o wide -n logging NAME READY STATUS RESTARTS AGE IP NODE . . . logging-fluentd-wr981 1/1 Running 0 2m 10.130.0.9 node-0.datadyne.lab.example.com . . . When I removed the fluentd label on that node (thus removing the pod), nothing changed to the hostport rule. Same when I re-added it, and *deleted* the fluentd daemonset. The only thing I can think which might be different is that busybox restarts itself regularly (the container, not the pod). $ openshift version openshift v3.6.173.0.130 kubernetes v1.6.1+5115d708d7 etcd 3.2.1
Follow same testing steps used in v3.6 and run the testing in v3.11.72, can not see hostport rules lost problem.
Customer upgraded to 3.7.72. They saw the issue again when node services restarted. Uploading details.
Requesting customer reproduce with higher verbosity log level
Ack, closing in favor of https://bugzilla.redhat.com/show_bug.cgi?id=1723924