Bug 1573648

Summary: jenkins slave does not respect no_proxy
Product: OpenShift Container Platform Reporter: Steven Walter <stwalter>
Component: BuildAssignee: Gabe Montero <gmontero>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.1CC: aos-bugs, bparees, clemens.utschig-utschig, gmontero, stwalter
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Jenkins core/remoting has subpar handling of the no_proxy environment variable which affects communication between Jenkins agents and master when starting a build using the Kubernetes plugin in the OpenShift Jenkins image Consequence: Pipelines using the Kubernetes plugin are unable to start agents with that plugin when http proxies are defined Fix: The sample maven and nodejs OpenShift Jenkins images have been updated to automatically add the server url and tunnel hosts to the no_proxy list to ensure that communication works when http proxies are defined. Result: Jenkins Pipelines can now leverage the Kubernetes plugin to start pods based on the OpenShift Jenkins Maven and NodeJS images.
Story Points: ---
Clone Of:
: 1578987 (view as bug list) Environment:
Last Closed: 2018-06-28 07:54:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steven Walter 2018-05-01 21:00:03 UTC
Description of problem:
Customer has set proxy variables as per our docs (https://docs.openshift.com/container-platform/3.6/install_config/http_proxies.html) using ansible. Jenkins master respects the proxy variables but images build off our slave image do not.

Version-Release number of selected component (if applicable):
3.6

How reproducible:
Unconfirmed


At install customer set these variables:

openshift_http_proxy="http://user:pass@xxxxxxx.com:80"
openshift_https_proxy="http://user:pass@xxxxxxx.com:80"
openshift_no_proxy='.svc,.default,.local,localhost,.example.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13'10.250.127.4'

openshift_generate_no_proxy_hosts=True

openshift_builddefaults_http_proxy="http://user:pass@xxxxxxx.com:80"
openshift_builddefaults_https_proxy="http://user:pass@xxxxxxx.com:80"
openshift_builddefaults_no_proxy='.svc,.default,.local,localhost,.example.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4'
openshift_builddefaults_git_http_proxy="http://user:pass@xxxxxx.com:80"
openshift_builddefaults_git_https_proxy="http://user:pass@xxxxxx.com:80"
openshift_builddefaults_git_no_proxy='.svc,.default,.local,localhost,.example.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4'




Which then modifies these files.
## = Password

Master Nodes
/etc/origin/master/master-config.yaml
        - name: HTTP_PROXY
          value: http://user:pass@xxxxxx.com:80
        - name: HTTPS_PROXY
          value: http://user:pass@xxxxxx.com:80
        - name: NO_PROXY
          value: .svc,.default,.local,localhost,.example.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4
        - name: http_proxy
          value: http://user:pass@xxxxxx.com:80
        - name: https_proxy
          value: http://user:pass@xxxxxx.com:80
        - name: no_proxy
          value: .svc,.default,.local,localhost,.example.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4
        gitHTTPProxy: http://user:pass@xxxxxx.com:80
        gitHTTPSProxy:http://user:pass@xxxxxx.com:80
        gitNoProxy: .svc,.default,.local,localhost,.example.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4


All Nodes
/etc/sysconfig/docker

HTTP_PROXY='http://user:pass@xxxxxx.com:80'
HTTPS_PROXY='http://user:pass@xxxxxx.com:80'
NO_PROXY='.svc,.example.com,.cluster.local,.default,.local,.svc,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4,10.250.0.0/16,10.251.0.0/16,example.com,localhost'


We note that the variables appear inside the running container but are not used. We are concerned because the variable which shows up is lowercase no_proxy and not uppercase NO_PROXY. Why is this set as lowercase and does this matter?

Comment 3 Ben Parees 2018-05-02 07:18:26 UTC
> We note that the variables appear inside the running container but are not used.

inside what running container?  the slave pod?  your application container?

> Jenkins master respects the proxy variables but images build off our slave image do not.

please provide the pod yaml for your master and slave pods

> We are concerned because the variable which shows up is lowercase no_proxy and not uppercase NO_PROXY. Why is this set as lowercase and does this matter?

there's no great standard here, different software respects different cases..some respect upper, some lower, some both.  The safest thing is to ensure both are set.


What is failing in your build when the proxy vars are not set?  Based on the information you provided my current theory is that accessing your git repo requires going through a proxy and the proxy is not being set when the git-clone occurs on a slave node, but is being set when the git-clone occurs on the master.

Comment 4 clemens utschig 2018-05-02 10:41:02 UTC
Hey @Ben
we run jenkins with slave triggered thru node ('xxx'). 

In both the pods (jenkins master & the slave that is boostrapped env variables are set for HTTP_PROXY / HTTPS_PROXY & NO_PROXY).

Jenkins slave callback to master (during pod start) is failing with 

java.io.IOException: http://jenkins.bns-cd.svc:80/tcpSlaveAgentListener/ is invalid: 407 Proxy Authentication Required 

at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:165) 
at hudson.remoting.Engine.innerRun(Engine.java:335) 
at hudson.remoting.Engine.run(Engine.java:287)

which to me looks like that HTTP_PROXY is honored - but NO_PROXY is not - because a .svc call (e.g. http://jenkins.bns-cd.svc) should not go to the proxy at all (because of NO_PROXY settings as in the OP comment).

We also tried to set the proxy & exceptions in jenkins / manage jenkins / plugins / advanced - this did NOT yield success either. 

Sorry  for the badly filed bug in the first place - i hope this post now provides all info needed. /and we run on latest jenkins & slave

Comment 5 clemens utschig 2018-05-02 10:49:25 UTC
slave pod yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: restricted
  creationTimestamp: '2018-04-30T13:44:39Z'
  labels:
    jenkins: slave
    jenkins/nodejs-6-angular: 'true'
  name: nodejs-6-angular-6746b6221e1cc
  namespace: bns-cd
  resourceVersion: '25329347'
  selfLink: /api/v1/namespaces/bns-cd/pods/nodejs-6-angular-6746b6221e1cc
  uid: a0e20491-4c7c-11e8-8865-0050569e3732
spec:
  containers:
    - args:
        - e7a7491e3d9421d206aa1642abdd03616d79f3a2db3313fe4c80e7a8230e39c3
        - nodejs-6-angular-6746b6221e1cc
      env:
        - name: JENKINS_LOCATION_URL
          value: 'https://jenkins-bns-cd.inh-devapps.eu.xxxx.com/'
        - name: JENKINS_SECRET
          value: e7a7491e3d9421d206aa1642abdd03616d79f3a2db3313fe4c80e7a8230e39c3
        - name: JENKINS_JNLP_URL
          value: >-
            http://jenkins.bns-cd.svc:80/computer/nodejs-6-angular-6746b6221e1cc/slave-agent.jnlp
        - name: JENKINS_TUNNEL
          value: 'jenkins-jnlp.bns-cd.svc:50000'
        - name: JENKINS_NAME
          value: nodejs-6-angular-6746b6221e1cc
        - name: JENKINS_URL
          value: 'http://jenkins.bns-cd.svc:80'
        - name: HOME
          value: /tmp
      image: 'docker-registry.default.svc:5000/cd/jenkins-nodejs-6-angular'
      imagePullPolicy: IfNotPresent
      name: jnlp
      resources: {}
      securityContext:
        capabilities:
          drop:
            - KILL
            - MKNOD
            - SETGID
            - SETUID
            - SYS_CHROOT
        privileged: false
        runAsUser: 1000240000
        seLinuxOptions:
          level: 's0:c16,c0'
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /tmp
          name: workspace-volume
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: default-token-3kxcl
          readOnly: true
      workingDir: /tmp
  dnsPolicy: ClusterFirst
  imagePullSecrets:
    - name: default-dockercfg-9nmlf
  nodeName: inhas65290.eu.boehringer.com
  nodeSelector:
    region: primary
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000240000
    seLinuxOptions:
      level: 's0:c16,c0'
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
    - emptyDir: {}
      name: workspace-volume
    - name: default-token-3kxcl
      secret:
        defaultMode: 420
        secretName: default-token-3kxcl
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2018-04-30T13:44:39Z'
      status: 'True'
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: '2018-04-30T13:44:41Z'
      message: 'containers with unready status: [jnlp]'
      reason: ContainersNotReady
      status: 'False'
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: '2018-04-30T13:44:39Z'
      status: 'True'
      type: PodScheduled
  containerStatuses:
    - containerID: >-
        docker://e341e5aebef6836ce18d5bd957222046c195eae9f568a07e0ceb8b5b5ffae26f
      image: 'docker-registry.default.svc:5000/cd/jenkins-nodejs-6-angular:latest'
      imageID: >-
        docker-pullable://docker-registry.default.svc:5000/cd/jenkins-nodejs-6-angular@sha256:d3575d25a0c32d22c3f44504c77abc32d068f9a0890c61b8ff4922b8abed8756
      lastState: {}
      name: jnlp
      ready: false
      restartCount: 0
      state:
        terminated:
          containerID: >-
            docker://e341e5aebef6836ce18d5bd957222046c195eae9f568a07e0ceb8b5b5ffae26f
          exitCode: 255
          finishedAt: '2018-04-30T13:44:41Z'
          reason: Error
          startedAt: '2018-04-30T13:44:40Z'
  hostIP: 10.183.195.13
  phase: Failed
  qosClass: BestEffort
  startTime: '2018-04-30T13:44:39Z'

xxxx => omitted

Comment 6 clemens utschig 2018-05-02 10:52:13 UTC
slave is inheriting from latest slave image - see below

FROM registry.access.redhat.com/openshift3/jenkins-slave-base-rhel7

MAINTAINER Richard Attermeyer <richard.attermeyer>

# Labels consumed by Red Hat build service
LABEL com.redhat.component="jenkins-slave-nodejs-rhel7-docker" \
      name="openshift3/jenkins-slave-nodejs-rhel7" \
      version="3.6" \
      architecture="x86_64" \
      release="4" \
      io.k8s.display-name="Jenkins Slave Nodejs" \
      io.k8s.description="The jenkins slave nodejs image has the nodejs tools on top of the jenkins slave base image." \
      io.openshift.tags="openshift,jenkins,slave,nodejs"

ENV NODEJS_VERSION=6.10 \
    NPM_CONFIG_PREFIX=$HOME/.npm-global \
    PATH=$HOME/node_modules/.bin/:$HOME/.npm-global/bin/:$PATH \
    BASH_ENV=/usr/local/bin/scl_enable \
    ENV=/usr/local/bin/scl_enable \
    PROMPT_COMMAND=". /usr/local/bin/scl_enable"


# Install cypress dependencies
# Please note: xorg-x11-server-Xvfb is not available on RHEL via yum anymore, so "RUN yum install -y xorg-x11-server-Xvfb" won't work.
#   Therefore this Dockerfile uses the version from CentOS instead.
ADD http://mirror.centos.org/centos/7/os/x86_64/Packages/xorg-x11-server-Xvfb-1.19.3-11.el7.x86_64.rpm /root/xorg-x11-server-Xvfb.x86_64.rpm
RUN yum -y install /root/xorg-x11-server-Xvfb.x86_64.rpm && \
    yum install -y gtk2-2.24* && \
    yum install -y libXtst*
# provides libXss
RUN yum install -y libXScrnSaver*
# provides libgconf-2
RUN yum install -y GConf2*
# provides libasound
RUN yum install -y alsa-lib* && \
    yum install -y nss-devel libnotify-devel gnu-free-sans-fonts


# Install NodeJS + Yarn + Angular CLI + cypress
# unfortunately nodejs6 is not yet available on rhel 7 on the scl
# see: https://www.softwarecollections.org/en/
# and the base image relies on scl_enable
COPY contrib/bin/scl_enable /usr/local/bin/scl_enable
COPY npmrc $HOME/.npmrc
RUN curl --silent --location https://rpm.nodesource.com/setup_6.x | bash - && \
    curl --silent --location https://dl.yarnpkg.com/rpm/yarn.repo -o /etc/yum.repos.d/yarn.repo && \
    yum install -y yarn nodejs gcc-c++ make && \
    yum clean all -y && \
    npm install -g @angular/cli.2 && \
    npm install -g cypress


# install google-chrome (for angular)
ADD https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm /root/google-chrome-stable_current_x86_64.rpm
RUN yum -y install /root/google-chrome-stable_current_x86_64.rpm && \
    ln -s /usr/lib64/libOSMesa.so.8 /opt/google/chrome/libosmesa.so && \
    yum clean all && \
    dbus-uuidgen > /etc/machine-id

RUN chown -R 1001:0 $HOME && \
    chmod -R g+rw $HOME

USER 1001

we have verified that ENV with proxy settings is available in the slave container

Comment 7 Gabe Montero 2018-05-02 16:56:49 UTC
Some analysis of the data provided:

1) The stack trace snippet provided:

java.io.IOException: http://jenkins.bns-cd.svc:80/tcpSlaveAgentListener/ is invalid: 407 Proxy Authentication Required 

at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:165) 
at hudson.remoting.Engine.innerRun(Engine.java:335) 
at hudson.remoting.Engine.run(Engine.java:287)

"407 Proxy Authentication Required ", based on various internet searches, means that it is hitting a URL that is expecting proxy authentication

That would imply that Jenkins is *NOT* finding the "http.proxyHost" or "http_proxy" env vars / sys props when it needs to ... or that the slave pod the master pod communication is going through the proxy when it should not ... i.e. the opposite of what was suggested initially in Comment #4

2) And this snippet from the pod spec:

        - name: JENKINS_JNLP_URL
          value: >-
            http://jenkins.bns-cd.svc:80/computer/nodejs-6-angular-6746b6221e1cc/slave-agent.jnlp

 http://jenkins.bns-cd.svc:80/ should be coming from the jenkins url set the k8s cloud configuration

 the openshit jenkins image's kube-slave-common.sh sets the jenkins URL to http://${JENKINS_SERVICE_HOST}:${JENKINS_SERVICE_PORT} for the k8s cloud config in Jenkins when *INITIALLY* configuring jenkins on the very first start up of the jenkins image (assuming you are running jenkins on a PV)... it leaves it alone and any subsequent jenkins restarts

By default, that resolves to an IP, not a host name. But typical differences in how the cluster is brought up might explain that.

Or there has been subsequent configuration change of that value?

So, after some inspection of the jenkins code:

a) The list of URLs JnlpAgentEndpointResolver.resolve is suppose to iterate  appears to come from the cloud provider's jenkins url setting ... i.e. what 2) discusses

b) but it does look at the "http.proxyHost", "http_proxy" and "no_proxy" env vars when deciding if the proxy URL should be used in conjunction with the jenkins URL

So, short term, what to do:

I) Can the Clemens Utschig or whoever is appropriate confirm what is set for the "Jenkins URL" in the kubernetes cloud configuration ... i.e. from the console, "Manage Jenkins" -> "Configure System", "Cloud" and the "Kubernetes" entry.

I would expect "http://jenkins.bns-cd.svc:80" is there, but please confirm

II) When Clemens says "we have verified that ENV with proxy settings is available in the slave container" .... please provide the precise keys and values around proxy/noproxy variables found, and how they were obtained ... so I can match with the Jenkins code to confirm whether it should find it or not, as Ben mentioned, case sensitivity is in play here ... I saw no presence of the lower case env vars in the pod yaml provided

III) Do we expect the slave to master pod communication to be going through the proxy?  If not, sounds like cluster construction issue .... again, the 407 means it is going through a proxy

IV) On the "We also tried to set the proxy & exceptions in jenkins / manage jenkins / plugins / advanced - this did NOT yield success either. " point .... can the precise changes performed be provided, and can the precise  exeception / problems seen be provided

Comment 8 clemens utschig 2018-05-02 17:53:41 UTC
I) Can the Clemens Utschig or whoever is appropriate confirm what is set for the "Jenkins URL" in the kubernetes cloud configuration ... i.e. from the c

the jenkins url is set to http://jenkins.bns-cd.svc:80 ( I assume . can't reach the cluster right now)

II) When Clemens says "we have verified that ENV with proxy settings is available in the slave container" .... please provide the precise keys and values around proxy/noproxy variables found, and how they were obtained ... so I can match with the Jenkins code to confirm whether it should find it or not, as Ben mentioned, case sensitivity is in play here ... I saw no presence of the lower case env vars in the pod yaml provided

-> we have provided them already - they are captured thru the pod terminal -> typing env

no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.1
13,10.183.195.13,10.250.127.4                                                                                                                                                                           
HTTP_PROXY=http://xxx:yyyyyzzzze@inhproxy.eu.boehringer.com:80                                                        HTTPS_PROXY=http://xxx:yyyyyzzzze@inhproxy.eu.boehringer.com:80                 


III) Do we expect the slave to master pod communication to be going through the proxy?  If not, sounds like cluster construction issue .... again, the 407 means it is going through a proxy

-> correct, it's NOT honoring NO_PROXY settings- I would NOT expect the cluster to do any communication internally thru the proxy .. 

IV) On the "We also tried to set the proxy & exceptions in jenkins / manage jenkins / plugins / advanced - this did NOT yield success either. " point .... can the precise changes performed be provided, and can the precise  exeception / problems seen be provided

-> manage jenkins / plugins / advanced - enter proxy host, port, user / pw and also the exception list including *.svc - test with the jenkins callback url - call works.

Comment 9 clemens utschig 2018-05-02 18:08:05 UTC
ps - we need the proxy for when the slave goes out to the internet to fetch stuff from github or other sources. so we need NO_PROXY to work and also proxy settings :)

Comment 10 clemens utschig 2018-05-02 18:12:35 UTC
verified slave url in jenkins / cloud / kubernetes: 

jenkins url: http://jenkins.bns-cd.svc:80

jenkins tunnel: jenkins-jnlp.bns-cd.svc:50000

also checked jenkins.bns-cd.svc is a valid service

Comment 11 clemens utschig 2018-05-02 18:15:34 UTC
slave pod / terminal / env

PWD=/tmp                                                                                                                                                                                                
JENKINS_JNLP_URL=http://jenkins.bns-cd.svc:80/computer/nodejs-6-angular-720426b0cf091/slave-agent.jnlp                                                                                                  
KUBERNETES_PORT_53_UDP_PORT=53                                                                                                                                                                          
JENKINS_URL=http://jenkins.bns-cd.svc:80                                                                                                                                                                
JENKINS_LOCATION_URL=https://jenkins-bns-cd.inh-devapps.eu.boehringer.com/                                                                                                                              
HTTPS_PROXY=http://x2inhocproxy:xxxxxxxxxxx@inhproxy.eu.boehringer.com:80                                                                                                                      
https_proxy=http://x2inhocproxy:xxxxxxxxxxx@inhproxy.eu.boehringer.com:80                                                                                                                      
JENKINS_TUNNEL=jenkins-jnlp.bns-cd.svc:50000                                                                                                                                                            
JENKINS_SERVICE_HOST=10.250.133.71                                                                                                                                                                      
HOME=/tmp                                                                                                                                                                                               
JENKINS_SECRET=faec3d2c544c132579790af5db95c79a0d13076928cc3aa7fc2c34f2e5f49b31                                                                                                                         
SHLVL=2                                                                                                                                                                                                 
KUBERNETES_PORT_53_UDP_PROTO=udp                                                                                                                                                                        
KUBERNETES_PORT_443_TCP_PROTO=tcp                                                                                                                                                                       
KUBERNETES_SERVICE_PORT_HTTPS=443                                                                                                                                                                       
no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.1
13,10.183.195.13,10.250.127.4                                                                                                                                                                           
HTTP_PROXY=http://x2inhocproxy:xxxxxxxxxxxxx@inhproxy.eu.boehringer.com:80

Comment 12 clemens utschig 2018-05-02 18:29:23 UTC
re precise exception - see above - the slave starts - and terminates immediately as it can't reach master.

Comment 13 Gabe Montero 2018-05-02 18:38:05 UTC
-> correct, it's NOT honoring NO_PROXY settings- I would NOT expect the cluster to do any communication internally thru the proxy .. 

The fact that you get the "407 Proxy Authentication Required " says it is though

-> we have provided them already - they are captured thru the pod terminal -> typing env

Apologies for missing it....thanks for reposting.

So, one of my concerns has been confirmed.  Jenkins does not check upper case HTTP_PROXY, etc., only http_proxy, etc., and the java System.getenv is case sensitive on linux, per the javadoc.

It does look for "no_proxy" lower case though.

So from what I see in the jenkins code (and I saw this in several version of remoting.jar based on the line numbers), it *WOULD* pick up the no_proxy setting.


For 
-> manage jenkins / plugins / advanced - enter proxy host, port, user / pw and also the exception list including *.svc - test with the jenkins callback url - call works.

That typically does not come into play for certain aspects of the Jenkins core, and I see that the plugin mgr proxy setting is not leveraged in the remoting path.

Short term things to try:
- set an env var on the slave container in the pod template config that sets http_proxy=http://xxx:yyyyyzzzze@inhproxy.eu.boehringer.com:80 and https_proxy=http://xxx:yyyyyzzzze@inhproxy.eu.boehringer.com:80
- in conjunction, you have two choices a) leave the default no_proxy ... based on what I see in the jenkins code, I would expect it to filter URLs ending with .svc, not apply the proxy setting, and you get the 407
b) no-op the no_proxy setting ... the http_proxy setting should be applied, and in theory you should not get the 407

or 

- perhaps change the jenkins url in the cloud config to the service ip / port ... that way, you can better confirm it does not go through the proxy, and would rule out http://jenkins.bns-cd.svc:80 resolving to something unexpected that does go through the proxy

Comment 14 Gabe Montero 2018-05-02 18:39:14 UTC
re precise exception - see above - the slave starts - and terminates immediately as it can't reach master.

I assume you mean Comment #4

So the exception in comment #4 is when you configured the proxy in adv setting, or not?

Comment 15 Gabe Montero 2018-05-02 18:44:08 UTC
OK ... 

Given 

HTTPS_PROXY=http://x2inhocproxy:xxxxxxxxxxx@inhproxy.eu.boehringer.com:80                                                                                                                      
https_proxy=http://x2inhocproxy:xxxxxxxxxxx@inhproxy.eu.boehringer.com:80                                                                                                                      


And this snippet from the jenkins code:  

    static URLConnection openURLConnection(URL url, String credentials, String proxyCredentials,
                                           SSLSocketFactory sslSocketFactory, boolean disableHttpsCertValidation) throws IOException {
        String httpProxy = null;
        // If http.proxyHost property exists, openConnection() uses it.
        if (System.getProperty("http.proxyHost") == null) {
            httpProxy = System.getenv("http_proxy");
        }


that env var needs to be "http_proxy", not "https_proxy"

Comment 16 clemens utschig 2018-05-02 18:46:33 UTC
Gabe ( > clemens comments)

- So the exception in comment #4 is when you configured the proxy in adv setting, or not?

> it occurs whether or NOT this is configured - so jenkins remoting does not care it seems.

- set an env var on the slave container in the pod template config that sets http_proxy=http://xxx:yyyyyzzzze@inhproxy.eu.boehringer.com:80 and https_proxy=http://xxx:yyyyyzzzze@inhproxy.eu.boehringer.com:80

> how are they are set? where is this magic template.. for me that's a sucky solution - we configured the global proxy settings on OC as per the doc - why on earth do I have to set something else?!

- in conjunction, you have two choices a) leave the default no_proxy ... based on what I see in the jenkins code, I would expect it to filter URLs ending with .svc, not apply the proxy setting, and you get the 407

> so NOT set no proxy? ... but that would only fly for cluster cnames, and not for anything outside of the cluster BUT inside our network :(

b) no-op the no_proxy setting ... the http_proxy setting should be applied, and in theory you should not get the 407

> no op? remove / leave empty .. ?!

it sounds to me like a fat jenkins bug...?! ...

Comment 17 clemens utschig 2018-05-02 18:49:55 UTC
Gabe - from the code snippet 2018-05-02 14:44:08 EDT

this is an even worse bug ... because the global settings

etc/origin/master/master-config.yaml
        - name: HTTP_PROXY
          value: http://user:pass@xxxxxx.com:80
        - name: HTTPS_PROXY
          value: http://user:pass@xxxxxx.com:80
        - name: NO_PROXY
          value: 

are NOT pushed to the slave ... :( 
so realistically the only chance we have is to start setting env vars?! ... which would be http_proxy an no_proxy then,... BUT ... what's the codebase to pickup no_proxy?

Comment 18 Ben Parees 2018-05-02 19:37:09 UTC
I'm not sure how the variables are being set on either the jenkins master or the jenkins slave, but i'm pretty sure they do not come from the master-config. (I would have to see the context around the excerpt you provided to know for sure what that is setting).

I also need to take a step back in this thread to understand what behavior is desired.

My understanding of the current issue is that "no_proxy" is not being respected by the Jenkins slave process, despite it being set (at least as lowercase) per the pod yaml you supplied.  As a result, the jenkins slave process communication is going to the proxy and getting 407ed by the proxy.

So I think the next steps are:

1) try manually setting NO_PROXY (since you already seem to have no_proxy set) to at least confirm that works (Gabe can continue checking the code to see what form of no_proxy it *should* respect)
2) We need to understand where all your proxy variables are coming from on the master+slave to see if there's an openshift issue here w/ what we're setting up.  Again, I do not think they are coming from your master-config.yaml.

Comment 19 clemens utschig 2018-05-02 19:41:01 UTC
Ben
re 1) try manually setting NO_PROXY (since you already seem to have no_proxy set) to at least confirm that works (Gabe can continue checking the code to see what form of no_proxy it *should* respect)

> how? where is the pod config template (it's not of type template) - so I have NO idea how to set it..

Should I set it on the jenkins DC? ... if so - with *.svc, or just .svc. 

Seriously - the documentation (and support) for a pretty simple proxy case sucks!

we are completely dead in the water migrating from an AWS deployment of OC - without proxy inhouse.

Comment 20 clemens utschig 2018-05-02 19:42:26 UTC
Ben 
re I'm not sure how the variables are being set on either the jenkins master or the jenkins slave, but i'm pretty sure they do not come from the master-config. (I would have to see the context around the excerpt you provided to know for sure what that is setting).

> we are NOT setting anything on the DC of jenkins - when we build the image - the env vars are injected without us doing anything in the BC or alike

Comment 21 Ben Parees 2018-05-02 20:03:02 UTC
Ok, that makes sense.  they are being injected into the image when you build the image because your master-config has build default env vars setup (presumably that is the section of the master-config you pasted).

So those proxy/no_proxy env vars are baked directly into your slave image right now, they aren't coming from jenkins or the pod definition.

If you haven't rebuilt your image since editing the master-config, you will need to do so or the changes you made will not be reflected in the image you built.

I would be interested to see the output of a docker inspect on your slave image, just to confirm what env vars are baked into it.

And to summarize I think there are a few issues here:

1) Let's find a no_proxy/NO_PROXY env var that works when it is set in the slave container, assuming the jenkins slave process respects any such env var.  right now I think that means rebuilding your slave now that your master-config contains properly defined "no_proxy" and "NO_PROXY" build default env vars.

2) setting those vars by baking them into the image isn't an ideal solution (I realize you probably didn't do it intentionally), Jenkins+its slaves should be configured properly w/ the proxy/noproxy information.  That said I think there are some limitations around doing that today and the proxy plugin for Jenkins.  Gabe can confirm/elaborate but I view that as a followup after we get *something* working.

Comment 22 clemens utschig 2018-05-02 20:16:39 UTC
Hey ben 
I am re-building master and slave now (ps - I have no idea how we can disable this automated injection into the build).

And will verify tmrw am CET the settings - can you tell me what I am supposed to look at - I will run env again - to see what's in master and what's in the slave image.

Best, clemens

Comment 23 Ben Parees 2018-05-02 21:02:06 UTC
> ps - I have no idea how we can disable this automated injection into the build

you'd have to remove the builddefaulter configuration that is currently doing it:

https://docs.openshift.org/latest/install_config/build_defaults_overrides.html#manually-setting-global-build-defaults

> can you tell me what I am supposed to look at - I will run env again - to see what's in master and what's in the slave image.

i'd like to see two things:

1) the output of env from within the slave pod (as you provided earlier)
2) the output of docker inspect yourslaveimage:tag  (this will show us the env vars that are baked into the image itself)

If you can't run docker inspect, you can also get this information via "oc describe istag yourslaveimagestream:tag", assuming you're pushing the slave image to an openshift imagestream.  (The information will be reported as Docker Labels).

Comment 24 Ben Parees 2018-05-02 21:03:19 UTC
(btw removing that config from your builddefaulter will likely cause your builds to start failing since I assume you need those proxy settings for your builds to succeed.  The issue is that those env vars will both be present during the build, and also baked into the resulting image produced by the build.  The latter can cause confusion).

Comment 25 clemens utschig 2018-05-03 06:31:06 UTC
master env

LC_ALL=en_US.UTF-8                                                                                                                                                                                                                          
NO_PROXY=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4       
KUBERNETES_PORT_53_UDP=udp://10.250.0.1:53                                                                                                                                                                                                  
KUBERNETES_PORT_53_TCP_PORT=53                                                                                                                                                                                                              

http_proxy=http://x2inhocproxy:xxxxxxxx@inhproxy.eu.boehringer.com:80                                                                                                                                                           JENKINS_JNLP_PORT_50000_TCP_PORT=50000                                                                                                                                                                                                      
KUBERNETES_PORT_53_UDP_ADDR=10.250.0.1                                                                                                                                                                                                      
JENKINS_UC=https://updates.jenkins-ci.org                                                                                                                                                                                                   
OPENSHIFT_BUILD_NAMESPACE=bix-shared                                                                                                                                                                                                        
JENKINS_SERVICE_PORT=80                                                                                                                                                                                                                     
JENKINS_JNLP_PORT=tcp://10.250.27.49:50000                                                                                                                                                                                                  
                                                                                                                                                                                                   
HTTPS_PROXY=http://x2inhocproxy:xxxxxxxxx@inhproxy.eu.boehringer.com:80                                                                                                                                                          
https_proxy=http://x2inhocproxy:xxxxxxxxx@inhproxy.eu.boehringer.com:80                                                                                                                                                          
JENKINS_SERVICE_HOST=10.250.133.71                                                                                                                                                                                                          
KUBERNETES_MASTER=https://kubernetes.default:443                                                                                                                                                                                            
                                                                                                                                                                                                      
no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4       
HTTP_PROXY=http://x2inhocproxy:xxxxxxx@inhproxy.eu.boehringer.com:80

Comment 26 clemens utschig 2018-05-03 06:36:10 UTC
slave env (as fast as I could grab it)

KUBERNETES_PORT_53_UDP_PROTO=udp                                                
KUBERNETES_PORT_443_TCP_PROTO=tcp                                               
KUBERNETES_SERVICE_PORT_HTTPS=443                                               
no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0
/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10
.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4           
HTTP_PROXY=http://x2inhocproxy:xxxxxxxxxxx@inhproxy.eu.boehringer.com:8
0                                                                               
JENKINS_JNLP_SERVICE_PORT_AGENT=50000                                           
JENKINS_PORT_80_TCP_PORT=80                                                     
JENKINS_PORT=tcp://10.250.133.71:80                                             
JENKINS_JNLP_PORT_50000_TCP=tcp://10.250.27.49:50000                            
KUBERNETES_PORT_53_TCP_PROTO=tcp                                                
KUBERNETES_SERVICE_PORT_DNS_TCP=53                                              
KUBERNETES_PORT_443_TCP_ADDR=10.250.0.1                                         
JENKINS_PORT_80_TCP_PROTO=tcp                                                   
KUBERNETES_PORT_443_TCP=tcp://10.250.0.1:443                                    
JENKINS_JNLP_SERVICE_PORT=50000                                                 
container=oci                                                                   
JENKINS_PORT_80_TCP_ADDR=10.250.133.71

Comment 27 clemens utschig 2018-05-03 06:37:47 UTC
C:\Users\utschig>oc describe istag jenkins:latest
Image Name:     sha256:3aaf2d384d0f19ac61dc81ea97177b243ad91e7e03660b9869fbe099e87962a9
Docker Image:   docker-registry.default.svc:5000/bix-shared/jenkins@sha256:3aaf2d384d0f19ac61dc81ea97177b243ad91e7e03660b9869fbe099e87962a9
Name:           sha256:3aaf2d384d0f19ac61dc81ea97177b243ad91e7e03660b9869fbe099e87962a9
Created:        10 hours ago
Annotations:    openshift.io/image.managed=true
Image Size:     436.3 MB (first layer 254.9 kB, last binary layer 74.83 MB)
Image Created:  10 hours ago
Author:         <none>
Arch:           amd64
Command:        /usr/libexec/s2i/run
Working Dir:    <none>
User:           1001
Exposes Ports:  50000/tcp, 8080/tcp
Docker Labels:  architecture=x86_64
                authoritative-source-url=registry.access.redhat.com
                build-date=2017-09-01T16:17:26.812452
                com.redhat.build-host=ip-10-29-120-11.ec2.internal
                com.redhat.component=openshift-jenkins-2-docker
                description=Jenkins is a continuous integration server
                distribution-scope=public
                io.k8s.description=Jenkins is a continuous integration server
                io.k8s.display-name=Jenkins 2
                io.openshift.build.commit.author=Utschig-Utschig,Clemens (IT) BIG-AT-V \u003cclemens.utschig-utschig\u003e
                io.openshift.build.commit.date=Fri Jan 5 15:29:14 2018 +0000
                io.openshift.build.commit.id=7c335219ed3fdae83fd0b8f87cf0c69d669faf88
                io.openshift.build.commit.message=kube-slave-common.sh - replace more IPs with services
                io.openshift.build.commit.ref=master
                io.openshift.build.name=bixjenkins-8
                io.openshift.build.namespace=bix-shared
                io.openshift.build.source-context-dir=jenkins-customization
                io.openshift.expose-services=8080:http
                io.openshift.s2i.scripts-url=image:///usr/libexec/s2i
                io.openshift.tags=jenkins,jenkins2,ci
                name=openshift3/jenkins-2-rhel7
                release=17
                summary=Provides the latest release of Red Hat Enterprise Linux 7 in a fully featured and supported base image.
                url=https://access.redhat.com/containers/#/registry.access.redhat.com/openshift3/jenkins-2-rhel7/images/v3.6.173.0.21-17
                vcs-ref=0459742e070cfe8410f0b0b2cf72a3b87d020fb8
                vcs-type=git
                vendor=Red Hat, Inc.
                version=v3.6.173.0.21
Environment:    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
                container=oci
                JENKINS_VERSION=2.46
                HOME=/var/lib/jenkins
                JENKINS_HOME=/var/lib/jenkins
                JENKINS_UC=https://updates.jenkins-ci.org
                LANG=en_US.UTF-8
                LC_ALL=en_US.UTF-8
                HTTP_PROXY=http://x2inhocproxy:xxxxxx@inhproxy.eu.boehringer.com:80
                HTTPS_PROXY=http://x2inhocproxy:xxxxx@inhproxy.eu.boehringer.com:80
                NO_PROXY=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4
                http_proxy=http://x2inhocproxy:xxxxx@inhproxy.eu.boehringer.com:80
              https_proxy=http://x2inhocproxy:xxxxx@inhproxy.eu.boehringer.com:80
                no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4
                JAVA_OPTS=-Dhudson.tasks.MailSender.SEND_TO_UNKNOWN_USERS=true -Dhudson.tasks.MailSender.SEND_TO_USERS_WITHOUT_READ=true
                OPENSHIFT_BUILD_NAME=bixjenkins-8
                OPENSHIFT_BUILD_NAMESPACE=bix-shared
                OPENSHIFT_BUILD_SOURCE=https://bitbucket.bix-digital.com/scm/cicd/bixjenkins.git
                OPENSHIFT_BUILD_COMMIT=7c335219ed3fdae83fd0b8f87cf0c69d669faf88
Volumes:        /var/lib/jenkins

Comment 28 clemens utschig 2018-05-03 06:39:33 UTC
C:\Users\utschig>oc describe istag jenkins-nodejs-6-angular:latest
Image Name:     sha256:87aa6e93a6d88ce479fe7e273cc0fc4fadd0a3ba5822c277e49b4ac7e3ce07b2
Docker Image:   docker-registry.default.svc:5000/cd/jenkins-nodejs-6-angular@sha256:87aa6e93a6d88ce479fe7e273cc0fc4fadd0a3ba5822c277e49b4ac7e3ce07b2
Name:           sha256:87aa6e93a6d88ce479fe7e273cc0fc4fadd0a3ba5822c277e49b4ac7e3ce07b2
Created:        10 hours ago
Annotations:    openshift.io/image.managed=true
Image Size:     1.214 GB (first layer 166.6 MB, last binary layer 74.87 MB)
Image Created:  10 hours ago
Author:         Richard Attermeyer <richard.attermeyer>
Arch:           amd64
Entrypoint:     /usr/local/bin/run-jnlp-client
Working Dir:    <none>
User:           1001
Exposes Ports:  <none>
Docker Labels:  License=GPLv2+
                architecture=x86_64
                authoritative-source-url=registry.access.redhat.com
                build-date=2018-04-18T04:07:14.688798
                com.redhat.build-host=ip-10-29-120-29.ec2.internal
                com.redhat.component=jenkins-slave-nodejs-rhel7-docker
                description=The jenkins slave base image is intended to be built on top of, to add your own tools that your jenkins job needs. The slave base image includes all the jenkins logic to operate as a slave, so users just have to yum install any additional packages their specific jenkins job will need
                distribution-scope=public
                io.k8s.description=The jenkins slave nodejs image has the nodejs tools on top of the jenkins slave base image.
                io.k8s.display-name=Jenkins Slave Nodejs
                io.openshift.build.commit.author=Schweikert,Christian (IT BI X) BIX-DE-I \u003cchristian.schweikert\u003e
                io.openshift.build.commit.date=Thu Apr 12 15:18:10 2018 +0000
                io.openshift.build.commit.id=846f0913831705b2f952748cedc86148774887cc
                io.openshift.build.commit.message=Merge pull request #5 in CICD/dockerimages-jenkins-slaves from feature/BIX-342..
                io.openshift.build.commit.ref=master
                io.openshift.build.name=nodejs-6-angular-slave-32
                io.openshift.build.namespace=cd
                io.openshift.build.source-context-dir=nodejs-6-angular
                io.openshift.tags=openshift,jenkins,slave,nodejs
                name=openshift3/jenkins-slave-nodejs-rhel7
                release=4
                summary=Provides the latest release of Red Hat Enterprise Linux 7 in a fully featured and supported base image.
                url=https://access.redhat.com/containers/#/registry.access.redhat.com/openshift3/jenkins-slave-base-rhel7/images/v3.6.173.0.113-3
                vcs-ref=59fe52c7eb78ada3d2ba6ce9ec3be55656001a74
                vcs-type=git
                vendor=Red Hat, Inc.
                version=3.6
Environment:    PATH=/home/jenkins/node_modules/.bin/:/home/jenkins/.npm-global/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
                container=oci
                HOME=/home/jenkins
                HTTP_PROXY=http://x2inhocproxy:xxxxxxxxxx@inhproxy.eu.boehringer.com:80
                HTTPS_PROXY=http://x2inhocproxy:xxxxxxxxx@inhproxy.eu.boehringer.com:80
                NO_PROXY=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4
                http_proxy=http://x2inhocproxy:xxxxxxxxxx@inhproxy.eu.boehringer.com:80
                https_proxy=http://x2inhocproxy:xxxxxxxxx@inhproxy.eu.boehringer.com:80
                no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4
                NODEJS_VERSION=6.10
                NPM_CONFIG_PREFIX=/home/jenkins/.npm-global
                BASH_ENV=/usr/local/bin/scl_enable
                ENV=/usr/local/bin/scl_enable
                PROMPT_COMMAND=. /usr/local/bin/scl_enable
                OPENSHIFT_BUILD_NAME=nodejs-6-angular-slave-32
                OPENSHIFT_BUILD_NAMESPACE=cd
                OPENSHIFT_BUILD_SOURCE=https://bitbucket.bix-digital.com/scm/cicd/dockerimages-jenkins-slaves.git
                OPENSHIFT_BUILD_COMMIT=846f0913831705b2f952748cedc86148774887cc

Comment 29 clemens utschig 2018-05-03 06:41:50 UTC
still failing - same error


/usr/local/bin/scl_enable: line 3: scl_source: No such file or directory
May 03, 2018 6:40:03 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: nodejs-6-angular-15002fa5876cd
May 03, 2018 6:40:03 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
May 03, 2018 6:40:03 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.bns-cd.svc:80]
May 03, 2018 6:40:03 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: http://jenkins.bns-cd.svc:80/tcpSlaveAgentListener/ is invalid: 407 Proxy Authentication Required
java.io.IOException: http://jenkins.bns-cd.svc:80/tcpSlaveAgentListener/ is invalid: 407 Proxy Authentication Required
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:165)
	at hudson.remoting.Engine.innerRun(Engine.java:335)
	at hudson.remoting.Engine.run(Engine.java:287)

Comment 30 Ben Parees 2018-05-03 06:54:07 UTC
ok so I see both the no_proxy and NO_PROXY env vars in your slave image but I don't see it in the pod env you grabbed.  I am guessing that's because the node already had an old version of the slave image and thus did not pull the new one.  I think you will need to enable the force pull option[1] in your slave pod template to ensure your slave is running w/ the updated image.  Sorry I didn't think about that earlier.

If that still doesn't work we'll have to wait for the results of Gabe's investigation into whether the slave client properly respects no_proxy/NO_PROXY.

[1] https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/resources/org/csanchez/jenkins/plugins/kubernetes/ContainerTemplate/config.jelly#L17

Comment 31 clemens utschig 2018-05-03 07:34:32 UTC
see comment: 018-05-03 02:36:10 EDT
slave env (as fast as I could grab it)

this shows what I could get quickly .. it contains no_proxy :)

Comment 32 Ben Parees 2018-05-03 07:46:47 UTC
But it doesn't contain NO_PROXY, right? (even though your new image does, which would imply the pod didn't run the new image)

Comment 33 clemens utschig 2018-05-03 08:41:56 UTC
Ben - I am rebuilding the image now as well - right now - I am just not fast enough to grab the whole env. but I am pretty sure it's there.

Comment 34 clemens utschig 2018-05-03 08:59:46 UTC
I have verified latest image is pulled - it just stops too fast to grab the env - but describe shows that the ENV is there.

Comment 35 Ben Parees 2018-05-03 09:18:10 UTC
Seems like no_proxy is supposed to work:

https://issues.jenkins-ci.org/plugins/servlet/mobile#issue/jenkins-32326

Gabe, we made need to try to recreate this locally (set garbage proxy env vars and then set no_proxy and ensure the slave can reach the master)

Comment 36 clemens utschig 2018-05-03 09:44:35 UTC
i checked and no_proxy is set when doing env on the slave pod

no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.

Comment 37 clemens utschig 2018-05-03 10:08:50 UTC
running with Jenkins ver. 2.46.3 - which is pulled thru 
openshift3/jenkins-2-rhel7

the fix seems to be 2.9 and above?! ...

Comment 38 clemens utschig 2018-05-03 10:26:39 UTC
build yaml:

apiVersion: v1
kind: BuildConfig
metadata:
  creationTimestamp: '2018-01-05T08:49:53Z'
  labels:
    build: bixjenkins
  name: bixjenkins
  namespace: bix-shared
  resourceVersion: '25618183'
  selfLink: /oapi/v1/namespaces/bix-shared/buildconfigs/bixjenkins
  uid: 65b79010-f1f5-11e7-8d02-0050569e2dbf
spec:
  nodeSelector: null
  output:
    to:
      kind: ImageStreamTag
      name: 'jenkins:latest'
  postCommit: {}
  resources: {}
  runPolicy: Serial
  source:
    contextDir: jenkins-customization
    git:
      uri: 'https://cd_user@bitbucket.bix-digital.com/scm/cicd/bixjenkins.git'
    sourceSecret:
      name: cd-user-pwd
    type: Git
  strategy:
    dockerStrategy:
      from:
        kind: ImageStreamTag
        name: 'jenkins:2'
        namespace: openshift
    type: Docker
  triggers: []
status:
  lastVersion: 8

Comment 39 Ben Parees 2018-05-03 11:18:08 UTC
2.9 is pretty old, so i would expect the openshift jenkins image to include the fix if indeed that is the release it was delivered in.  I don't think we ever shipped anything older than 2.35 or so.

Comment 40 clemens utschig 2018-05-03 12:43:22 UTC
Cloning "https://bitbucket.bix-digital.com/scm/cicd/bixjenkins.git" ...
	Commit:	7c335219ed3fdae83fd0b8f87cf0c69d669faf88 (kube-slave-common.sh - replace more IPs with services)
	Author:	Utschig-Utschig,Clemens (IT) BIG-AT-V <clemens.utschig-utschig>
	Date:	Fri Jan 5 15:29:14 2018 +0000
Step 1 : FROM registry.access.redhat.com/openshift3/jenkins-2-rhel7@sha256:c47b5d8c9ba8a57255e5191cbf0ed9e0cb998bc823846ba52c34cca11a3cf2a0
 ---> 8789fa88d268

interesting is that it does not grab latest (although there is no latest tag) ..

Comment 41 Gabe Montero 2018-05-03 13:25:08 UTC
i.e. no_proxy ... yes Ben, I noted earlier that the lower case form was suppose to work, not the upper case form.

And yeah, any jenkins jira's around this that I found were supposed to have been resolved in earlier versions than the ones we have shipped in 3.6, which is why I didn't mention them.

I'll start working on recreates today. In addition to the end to end test, I should be able to pull out the regexp logic in org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.inNoProxyEnvVar(String) and put it in a simple test program, and then start feeding it Clemens' input of:


no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.


for the regexp and 

http://jenkins.bns-cd.svc:80/tcpSlaveAgentListener/

for the candidate to match

Comment 42 clemens utschig 2018-05-03 13:39:11 UTC
looks like : 
openshift3/jenkins-2-rhel7 

always resolves to 


 https://access.redhat.com/containers/#/registry.access.redhat.com/openshift3/jenkins-2-rhel7/images/v3.6.173.0.21-17


which is really really weird ... ?!

Comment 44 Gabe Montero 2018-05-03 18:08:52 UTC
OK, my experiment putting the jenkins no proxy regexp logic into a simple main program revealed what is going on, and allowed me to quickly experiment with various permutations.

First, turns out, the JENKINS_URL value of 'http://jenkins.bns-cd.svc:80' 

is *NOT* passing the regexp logic they employ 

when running against `no_proxy=.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.`

I also confirmed that the same algorithm is used from the latest 2.107 code (and the remoting dependency of 3.14 that has) to 2.46.3, etc. (and the remoting dependency of 3.7 that has).

I was able to get the no_proxy match to work when I

1) stripped the "http://" prefix and ":80" suffix ... reducing the string to "jenkins.bns-cd.svc"
2) changed ".svc" to "bns-cd.svc" in the no_proxy setting ... it doesn't like a "single element" domain.

Here is my simple program that pulls the org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.inNoProxyEnvVar(String) logic into a main program:

public class TestNoProxyRegexp {

	public static void main(String[] args) {
		boolean rc = false;
		String host = "jenkins.bnd-cd.svc";
		//String host = "http://jenkins.bnd-cd.svc";
		//String host = "jenkins.bns-cd.svc:80";
		//String host = "http://jenkins.bns-cd.svc:80/tcpSlaveAgentListener/";
		//String host = "http://jenkins.bns-cd.svc:80/";
		//String host = "http://jenkins.bns-cd.svc";
		//String noProxy = ".svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.";
		String noProxy = "bnd-cd.svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.";
        noProxy = noProxy.trim()
                // Remove spaces
                .replaceAll("\\s+", "")
                // Convert .foobar.com to foobar.com
                .replaceAll("((?<=^|,)\\.)*(([a-z0-9]+(-[a-z0-9]+)*\\.)+[a-z]{2,})(?=($|,))", "$2");

        if (!noProxy.isEmpty()) {
            // IPV4 and IPV6
            if (host.matches("^(?:[0-9]{1,3}\\.){3}[0-9]{1,3}$") || host
                    .matches("^(?:[a-fA-F0-9]{1,4}:){7}[a-fA-F0-9]{1,4}$")) {
                rc =  noProxy.matches(".*(^|,)\\Q" + host + "\\E($|,).*");
                System.out.println("GGM checkpoint 1 " + rc);
            } else {
                int depth = 0;
                // Loop while we have a valid domain name: acme.com
                // We add a safeguard to avoid a case where the host would always be valid because the regex would
                // for example fail to remove subdomains.
                // According to Wikipedia (no RFC defines it), 128 is the max number of subdivision for a valid
                // FQDN:
                // https://en.wikipedia.org/wiki/Subdomain#Overview
                while (host.matches("^([a-z0-9]+(-[a-z0-9]+)*\\.)+[a-z]{2,}$") && depth < 128) {
                    ++depth;
                    // Check if the no_proxy contains the host
                    if (noProxy.matches(".*(^|,)\\Q" + host + "\\E($|,).*")) {
                    	rc = true;
                        System.out.println("GGM checkpoint 2");
                        break;
                    }
                    // Remove first subdomain: master.jenkins.acme.com -> jenkins.acme.com
                    else {
                        host = host.replaceFirst("^[a-z0-9]+(-[a-z0-9]+)*\\.", "");
                        System.out.println("GGM checkpoint 3, host name " + host);
                    }
                }
            }
        }
        System.out.println("GGM " + rc);
	}

}

Comment 45 Gabe Montero 2018-05-03 18:12:02 UTC
Oops ... forgot one point

they use java.netURL.getHost() before calling the above logic, so that should strip the "http://" and ":80".

But the change to no_proxy and changing .svc to bns-cd.svc would be what is still needed.

Comment 46 clemens utschig 2018-05-03 20:52:14 UTC
Gabe - awesome that we are getting to a resolution - or at least know whats going wrong.


As we set those env props globally, and they are injected into the build and people create jenkins instances on the fly - we have NO way of doing this (your workaround) scripted - (q1) as I don't think there is a way in openshift to reference in a DC an already existing ENV (or in our case prepend to the one that exists).

q2) Is CDIR working or not working (as in the no_proxy settings above) - I assume not, given that it's a regexp?! That's a must fix as well - to allow dynamic addition of nodes.

q3) ENV is not honored on jenkins master as well (it should automatically populate the plugins / advanced / proxy section) - I assume.

So can we have this bug be the tracking bug to fix all this - clearly - right now jenkins is UNUSABLE behind a corporate proxy on kubernetes / openshift.

Comment 47 Ben Parees 2018-05-03 21:55:32 UTC
To address the earlier question about why :latest is pointing to a v3.6 image...  after v3.6 we introduced version specific tags for all jenkins related images. This was to address compatibility issues in which the v3.7 images could not be used on v3.6 clusters.  To ensure that v3.6 clusters which were already configured  to use the "latest" tag, we decided to lock the "latest" tag to always point to v3.6, so that older clusters would never pick up the v3.7+ images and break (I think you were impacted by this break when it first occurred).

So for a v3.7+ cluster, jenkins images (master+slave) should be referenced using a version specific tag that aligns w/ the cluster version.


regarding q1, no, you can't prepend, but you can certainly override the entire env value with a new one.  or of course you can rebuild your slave images w/ the new env value baked in.  You can explicitly define env vars for slave pods via the slave pod template configuration also, and this would override the value that is baked into your images.  I'm not clear on what you mean by not being able to do it scripted.  Are you concerned about the case where you need to change the proxy settings and roll them out after people have built their own jenkins slave images w/ the old values baked in?


regarding q2, to the extent that there's any standardization around it, no_proxy doesn't support CIDR in general:  https://unix.stackexchange.com/questions/23452/set-a-network-range-in-the-no-proxy-environment-variable?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

So i'm not surprised the Jenkins implementation is also deficient in this respect.  We can certainly open an issue against Jenkins to consider changing/supporting it.

I'll let Gabe speak to q3 but I know there are some complexities around proxy management in Jenkins and the recommended approach tends to be the proxy plugin which carries its own limitations.

So all that said, clearly there are documentation deficiencies and less than ideal configuration steps, but I still believe we can provide you and your users with a working configuration either by:

1) baking the proper proxy related env vars into the image (now that Gabe has uncovered the nuances of the Jenkins no_proxy implementation)

2) baking the env vars into a slave pod template configuration that is installed as a configmap (which jenkins will then pick up automatically)

3) baking the env vars into a slave pod template configuration that is part of a custom jenkins configuration that is baked into a custom jenkins image

Comment 48 clemens utschig 2018-05-04 06:43:10 UTC
here is the setup we run. we have ONE image - that sits on top of the core openshift one - and people in our org create DCs in many different projects- whose names we don't know. so we cannot populate global settings with project names (e.g. bns-cd) as you suggested because we don't know them.

We also use the same base configuration (git repo & dockerfile) in a second cluster that is @AWS without proxy - so the only way I see on how to make this work is to get a code fix from you folks - that fixes this end 2 end for 3.6++ both for master and slave.

Comment 49 clemens utschig 2018-05-04 07:05:46 UTC
ok adding
            - name: no_proxy
              value: 'bns-cd.svc,default,localhost,local,17.0.0.1'

to the master jenkins DC - does NOT help.

We need an ETA for the codefix on this - we cannot introduce changing proxy settings in config maps all over the place, as it's a development cluster - and hence we will have potentially 100s of projects with a DC referencing the in jenkins image.

Comment 50 Gabe Montero 2018-05-04 21:43:07 UTC
Centered on the Jenkins aspects of this (ignoring the "how no_proxy and http_proxy get set" list of questions/concerns for the moment), I was able to get set both http_proxy and no_proxy and get a slave based build to work (cumbersome as it was to do it).

I also constructed a debug jenkins remoting.jar file that dumps out what is found for the env vars, what the regexp code decides when determining whether no_proxy should be applied, and ultimately whether a proxy connection is attempted for access to both the jenkins and jenkins-jnlp end points within the slave pod.

Quick details on my no_proxy setting:
- my pod  template's jenkins url is set the the jenkins service IP ... i.e. the default when instantiating our vanilla template on an `oc cluster up`
- my no_proxy includes the IP of the jenkins and jenkins-jnlp service


**** those env vars have to be set on the slave pods ... either as env vars on the pod templates or env vars on the image used; Jenkins, nor their K8s plugin, *DOES NOTHING* wrt taking those env vars, if they are set on master, and sets them on the slave pods ******** ... submitting a PR against the k8s plugin to allow for that somehow I think is warranted though. 

And of course the use of IPs is not a long term solution for you Clemens, but some sort of baseline proof was needed to prove this stuff worked at all in Jenkins, however cumbersome it is.

I played around with the value of the http_proxy setting to confirm it was getting used for URLs not in no_proxy, and I removed either the jenkins or jenkins-jnlp service IP from no_proxy and proved that the slave pod would fail with connection/comm errors

The debug jar .... 
- I updated a container running the registry.access.redhat.com/openshift3/jenkins-2-rhel7:v3.6.173.0.21-17, and oc rsync'ed my update of the remoting.jar 3.7 level
- I oc rscyn'ed that jar to /tmp/target
- I then oc rsh'ed into the pod, and went to /var/lib/jenkins/war/WEB-INF/lib, and removed the existing remoting-3.7.jar
- I copied over the remoting-3.7.jar I uploaded at /tmp/target to /var/lib/jenkins/war/WEB-INF/lib, and removed the existing remoting-3.7.jar
- I restarted jenkins
- on my next build attempt, the slave pod logs had all my debug statements, showing whether the proxy was used, and whether the host in question was or was not in the no_proxy list
- I committed that container to a image, and pushed it to docker.io/gmontero/jenkins-2-rhel7-v3.6.173.0.21-17-with-debug:latest

Now, since /var/lib/jenkins if the image volume, none of my changes of in that dir are included.

But the /tmp/target contents are there.

Now for the request .... 

Clemens - I know this only addresses a subset of your concerns, and is a multi-step and manual task ... but to eliminate at least some of the concerns, is it possible for you to take the image I pushed to docker.io, launch your master with it, update the remoting jar and restart Jenkins, update the set the ENVs on either the pod template or slave image, and capture the slave POD logs (my debug statements have "GGM" in them) so we can at least confirm if the http_proxy / no_proxy setting are getting processed as expected?

If so, and if we can sort out the Jenkins side of things at least, we can then zero in on the openshift side.

thanks

Comment 51 clemens utschig 2018-05-06 07:30:15 UTC
hey gabe . super happy to do all that BUT .. can you please help me understand 

<update the set the ENVs on either the pod template> 

Where is this template hidden? - We cannot update the dockerfile we use to build the image as it's shared with a non proxy cluster @AWS.

Let me know and I'll try the rest tmrw CET time.

Comment 52 clemens utschig 2018-05-06 07:35:52 UTC
hey gabe - question 2 - is there anything on the slave I need to do? (except the template). or is this all based on the new remoting jar?

Comment 53 Gabe Montero 2018-05-07 00:03:56 UTC
Hey Clemens,

Thanks for giving the repro a go.

As to the pod template, I am referring to the kubernetes plugin configuration within Jenkins.  And yeah, I should describe the path to this in more detail for you.

So, via the "traditional" way, from the Jenkins Console's main screen, go to "Manage Jenkins", then "Configure System".

The bottom of that screen will be the "Cloud" portion of the Jenkins configuration.  You should see a "Kubernets" cloud defined, with a set of global config setting for the kubernetes plugin.  You will then see an "Images" subsection, and a set of "Kubernetes Pod Template" settings.  The openshift jenkins image ships 2 by default, one maven image and one nodejs image.

The image that you have been referring to ... that should also be one of these images.

Under the "kubernetes pod templates" you will also see a set of "Containers" and then "Container template" entries.  Within these "Container templates", you should see references to your specific image as well.

Also, within the "Container template" you should see an "EnvVars" section, with a button to add env vars.

In my test, I injected the "http_proxy" and "no_proxy" env vars by setting those name / value pairs there, vs. building an image with those env vars baked in.

Now, it should not matter ... I've seen jenkins pick up env vars from images and templates before..... *MAYBE* that is where the discrepancy in what we are seeing stems from.

I'll go back on Monday and build my slave images with no_proxy and http_proxy baked in, and confirm that.

But in parallel, with your Jenkins restarted with the remoting jar from my image placed in /var/lib/jenkins/war/WEB-INF/lib, then, with either 

a) your slave images running as is 
b) or by setting the env vars in the kubernetes plugin configuration as described above

my new remoting jar will print our what it is finding in the JVM sys/env props, and what decisions it has made wrt those props, and whether it is using proxies (those print statements will start with "GGM")

You should not need to do anything else with your slaves (i.e. your second question).

If you can get pod logs with the "GGM" prints, analyze them or upload them for me to look at as needed.

I'll report back when I've obtained useful results from baking in no_proxy / http_proxy.  

Hopefully we can put the Jenkins side of things to rest after all this.

thanks again

Comment 54 clemens utschig 2018-05-07 11:36:56 UTC
master : with your latest image - and NO changes

ogs: INFO: Jenkins agent is running in headless mode.
May 07, 2018 11:33:15 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.bixpr-cd.svc:80]
May 07, 2018 11:33:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ creds null proxy creds null prox http://x2inhocproxy:xxxxxxxx@inhproxy.eu.boehringer.com:80
May 07, 2018 11:33:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host jenkins.bixpr-cd.svc no proxy .svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4
May 07, 2018 11:33:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host svc no proxy .svc,.default,.local,localhost,boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4 returning(3) false
May 07, 2018 11:33:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ opening conn with proxy HTTP @ inhproxy.eu.boehringer.com/10.183.157.6:80
May 07, 2018 11:33:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ returning conn sun.net.www.protocol.http.HttpURLConnection:http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/
May 07, 2018 11:33:15 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ is invalid: 407 Proxy Authentication Required
java.io.IOException: http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ is invalid: 407 Proxy Authentication Required
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:166)
	at hudson.remoting.Engine.innerRun(Engine.java:335)
	at hudson.remoting.Engine.run(Engine.java:287)
May 07, 2018 11:33:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
SEVERE: Error in provisioning; slave=KubernetesSlave name: maven-ada5f716026, template=org.csanchez.jenkins.plugins.kubernetes.PodTemplate@4782b3
May 07, 2018 11:33:22 AM hudson.slaves.NodeProvisioner$2 run

Comment 55 clemens utschig 2018-05-07 11:43:28 UTC
change both jenkins service / jnlp to IP:

slave log:

Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
Downloading http://10.250.108.230:80/jnlpJars/remoting.jar ...

and stops here - looks like it can't connect?!

service:

Selectors:
name=jenkins
Type:
ClusterIP
IP:
10.250.108.230
Hostname:
jenkins.bixpr-cd.svc 
Session affinity:
None


ay 07, 2018 11:40:02 AM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning Kubernetes Pod Template from openshift with 1 executors. Remaining excess workload: 0
May 07, 2018 11:40:02 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
INFO: Created Pod: maven-b39d5cdf92b
May 07, 2018 11:40:02 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
INFO: Waiting for Pod to be scheduled (0/100): maven-b39d5cdf92b
May 07, 2018 11:40:08 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
INFO: Waiting for slave to connect (0/100): maven-b39d5cdf92b

Comment 56 clemens utschig 2018-05-07 12:09:59 UTC
is there some magic miracle - naming convention or alike - that the download works?! ... because the jar is called *-3.7.jar 

after ages - got the following err message (on the slave).

Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
Downloading http://10.250.108.230:80/jnlpJars/remoting.jar ...
Running java -XX:+UseParallelGC -XX:MaxPermSize=100m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://10.250.108.230:80 -tunnel 10.250.28.120:50000 bea6648a2655796b8d956748b8d3e2ecb465daa68cdcd8fa544c47f1a4586b08 maven-c8dc38c2f98
Error: Could not find or load main class hudson.remoting.jnlp.Main

Comment 57 clemens utschig 2018-05-07 12:13:36 UTC
1986 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/GuiListener.class                                                                                                                                                                  
  1852 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/Main$CuiListener.class                                                                                                                                                             
  1765 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/GuiListener$2.class                                                                                                                                                                
  1204 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/MainMenu.class                                                                                                                                                                     
   202 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/Main$1.class                                                                                                                                                                       
 14370 Fri May 04 16:46:30 UTC 2018 hudson/remoting/jnlp/title.png                                                                                                                                                                          
  3391 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/MainDialog.class                                                                                                                                                                   
  1260 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/GuiListener$1.class                                                                                                                                                                
  2136 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/GUI.class                                                                                                                                                                          
  9585 Fri May 04 16:46:32 UTC 2018 hudson/remoting/jnlp/Main.class                                                                                                                                                                         
   938 Fri May 04 16:46:32 UTC 2018 hudson/remoting/SocketInputStream.class                                                                                                                                                                 
  1239 Fri May 04 16:46:32 UTC 2018 hudson/remoting/Request$Cancel.class               

it's in there though ... but /home/jenkins is empty ...

is there some naming convention .... ?!

Comment 58 clemens utschig 2018-05-07 12:20:44 UTC
if I point jenkins back the CNAME instead of IP address ...

jenkins slave comes up - and the well known error

Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
Downloading http://jenkins.bixpr-cd.svc:80/jnlpJars/remoting.jar ...
Running java -XX:+UseParallelGC -XX:MaxPermSize=100m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://jenkins.bixpr-cd.svc:80 -tunnel jenkins-jnlp.bixpr-cd.sv:50000 a9af87f068a79e7adcd69ee85bd006f55bc4c402c5f786a29dfeecc5448d0042 maven-d5602d52f71
May 07, 2018 12:18:45 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: maven-d5602d52f71
May 07, 2018 12:18:45 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
May 07, 2018 12:18:45 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.bixpr-cd.svc:80]
May 07, 2018 12:18:45 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ creds null proxy creds null prox http://x2inhocproxy:Am28UaKrTHpbqC:9HhAu@inhproxy.eu.boehringer.com:80
May 07, 2018 12:18:45 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host jenkins.bixpr-cd.svc no proxy .svc,.default,.local,localhost,.boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4
May 07, 2018 12:18:45 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host svc no proxy .svc,.default,.local,localhost,boehringer.com,10.250.0.0/16,10.251.0.0/16,10.183.195.106,10.183.195.107,10.183.195.108,10.183.195.109,10.183.195.11,10.183.195.111,10.183.195.112,10.183.195.113,10.183.195.13,10.250.127.4 returning(3) false
May 07, 2018 12:18:45 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ opening conn with proxy HTTP @ inhproxy.eu.boehringer.com/10.183.157.6:80
May 07, 2018 12:18:45 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ returning conn sun.net.www.protocol.http.HttpURLConnection:http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/
May 07, 2018 12:18:46 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ is invalid: 407 Proxy Authentication Required
java.io.IOException: http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ is invalid: 407 Proxy Authentication Required
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:166)
	at hudson.remoting.Engine.innerRun(Engine.java:335)
	at hudson.remoting.Engine.run(Engine.java:287)

Comment 59 clemens utschig 2018-05-07 16:50:02 UTC
the only thing I did not try is to set the noproxy as env ... (here it's with CIDR setting currently - which is supported by openshift).

Comment 60 Gabe Montero 2018-05-07 17:39:42 UTC
Hi Clemens,

Both Ben and Gabe here (though Gabe is typing).

So, we were able to make some inroads with the analysis.  It is more and more looking like Jenkins' sub standard regexp for no_proxy is the immediate road block.

1) As you probably saw, the debug in https://bugzilla.redhat.com/show_bug.cgi?id=1573648#c54 proved the analysis from last week.  The Jenkins regexp for no_proxy does *NOT* account for a single segment domain like "svc", or ".svc" or "*svc"

We clearly see it constructing a proxy URL

2) Now ... why your use of IP failed to download the remoting.jar

This was insightful as well for us ... we use `curl` in our jenkins slave image to download Jenkins remoting jar.

As it turns out, it does honor both `http_proxy` and `no_proxy`.

When you switched to ip's, but did not update the `no_proxy` list (at least your comment https://bugzilla.redhat.com/show_bug.cgi?id=1573648#c55 did not include you saying you updated it),
the curl download tries to go through the proxy and hangs/fails (the missing class is a red herring manifestation of that problem)

When I tried it with IPs, I included the precise IP addresses of both jenkins and jenkins-jnlp services in no_proxy.

When I do not include the service IPs in no_proxy on the slave pod, the download of the remoting jar fails in the same way as you saw.

Also, this revealed to us that `curl`'s regexp for the no_proxy evn var is much better than Jenkins.  It *DOES* honor the use of ".svc" for example. 

We then went back and were in fact able to verify that with simple experiments from the command line.. we see  curl handle no_proxy settings like ".svc" or ".com"
through varying experiments setting those env vars in various fashion befoe executing the curl (including executing curl with verbose logging on ).

Action items / resolution:

a) In your comment https://bugzilla.redhat.com/show_bug.cgi?id=1573648#c49 you said you added 'bns-cd.svc' to 'no_proxy' and it did *NOT* help.  Did the project in fact change run to run, and it was no longer 'bns-cd'.  Or is 'bns-cd' not the project name, and is it possible that the jenkins and jenkins-jnlp service hostnames end in different domain suffixes?
I did NOT think that was the case based on prior data provided, but am asking now  just in case.

Or did you only add bns-cd.svc to the no_proxy for the master and it was not added to the no_proxy value in the slave image?  It needs to be in the slave image.  We see it set in the slave image in the output you provided today, but is there any chance it was not set when you tried https://bugzilla.redhat.com/show_bug.cgi?id=1573648#c49 ?

The jenkins slave boot up communicates twice the the jenkins service (i.e. the "jenkins server url" when looking ), once to download the jar file, once to initiate communication, and then it initiates a connection to the jnlp service (or the "JENKINS_TUNNEL" as I saw in some of the data you provided).  We want all three to be "no_proxy".

Seeing a run with the debug jar, and using hostnames, where no_proxy is set on the slave and has the correct domain suffix like 'bns-cd.svc' for both jenkins and jenkins-jnlp service could provide closure on that element of this problem.

b) Assuming what is happening in a) is what we think is happening, and after reviewing the resolv.conf search patterns OpenShift specifies, Ben suggests changing the jenkins server url and jenkins jnlp url to end in "svc.cluster.local" and add "svc.cluster.local" to the no_proxy value in the slave image (this can be done by updating the master-config and then rebuilding the slave images, based on what has been done previously).
Or you can manipulate those setting from the kubernetes plugin configuration panel as I detailed before.

thanks

Comment 61 clemens utschig 2018-05-08 06:58:28 UTC
if I add 


sing 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
Downloading http://jenkins.bixpr-cd.svc:80/jnlpJars/remoting.jar ...
Running java -XX:+UseParallelGC -XX:MaxPermSize=100m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://jenkins.bixpr-cd.svc:80 -tunnel jenkins-jnlp.bixpr-cd.svc:50000 12781525397c8ceabb34632ba1d4d2394d340a7a4d83d834d92a8ad9586a4c3b maven-576cd988d3015
May 08, 2018 6:55:30 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: maven-576cd988d3015
May 08, 2018 6:55:32 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
May 08, 2018 6:55:34 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.bixpr-cd.svc:80]
May 08, 2018 6:55:34 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ creds null proxy creds null prox http://x2inhocproxy:Am28UaKrTHpbqC:9HhAu@inhproxy.eu.boehringer.com:80
May 08, 2018 6:55:34 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host jenkins.bixpr-cd.svc no proxy .bixpr-cd.svc,eu.boehringer.com
May 08, 2018 6:55:34 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host bixpr-cd.svc no proxy bixpr-cd.svc,eu.boehringer.com returning(2) true
May 08, 2018 6:55:34 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ opening without proxy 
May 08, 2018 6:55:36 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ returning conn sun.net.www.protocol.http.HttpURLConnection:http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/
May 08, 2018 6:56:36 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Read timed out
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:164)
	at hudson.remoting.Engine.innerRun(Engine.java:335)
	at hudson.remoting.Engine.run(Engine.java:287)

Comment 62 clemens utschig 2018-05-08 07:42:33 UTC
Never mind - some network hickup

Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
Downloading http://jenkins.bixpr-cd.svc:80/jnlpJars/remoting.jar ...
Running java -XX:+UseParallelGC -XX:MaxPermSize=100m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://jenkins.bixpr-cd.svc:80 -tunnel jenkins-jnlp.bixpr-cd.svc:50000 38297bf1d7dec5d085bbc31a8f3195d2e1390d249c96a7688e309f7aa5814cfc maven-576ee3c0457b1
May 08, 2018 6:57:14 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: maven-576ee3c0457b1
May 08, 2018 6:57:14 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
May 08, 2018 6:57:15 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.bixpr-cd.svc:80]
May 08, 2018 6:57:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ creds null proxy creds null prox http://x2inhocproxy:Am28UaKrTHpbqC:9HhAu@inhproxy.eu.boehringer.com:80
May 08, 2018 6:57:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host jenkins.bixpr-cd.svc no proxy .bixpr-cd.svc,eu.boehringer.com
May 08, 2018 6:57:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host bixpr-cd.svc no proxy bixpr-cd.svc,eu.boehringer.com returning(2) true
May 08, 2018 6:57:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ opening without proxy 
May 08, 2018 6:57:15 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver openURLConnection
INFO: GGM openURLConnection http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/ returning conn sun.net.www.protocol.http.HttpURLConnection:http://jenkins.bixpr-cd.svc:80/tcpSlaveAgentListener/
May 08, 2018 6:57:16 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, CLI2-connect, JNLP-connect, Ping, CLI-connect, JNLP2-connect]
May 08, 2018 6:57:16 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
  Agent address: jenkins-jnlp.bixpr-cd.svc
  Agent port:    50000
  Identity:      e0:9d:d9:c0:02:78:a0:53:40:a5:d5:03:38:68:16:90
May 08, 2018 6:57:16 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
May 08, 2018 6:57:16 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins-jnlp.bixpr-cd.svc:50000
May 08, 2018 6:57:16 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver getResolvedHttpProxyAddress
INFO: GGM getResolvedHttpProxyAddress host jenkins-jnlp.bixpr-cd.svc port 50000 proxies java.util.ArrayList$Itr@71265072
May 08, 2018 6:57:16 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver getResolvedHttpProxyAddress
INFO: GGM getResolvedHttpProxyAddress host jenkins-jnlp.bixpr-cd.svc port 50000 proxy http://x2inhocproxy:Am28UaKrTHpbqC:9HhAu@inhproxy.eu.boehringer.com:80
May 08, 2018 6:57:16 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host jenkins-jnlp.bixpr-cd.svc no proxy .bixpr-cd.svc,eu.boehringer.com
May 08, 2018 6:57:16 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver inNoProxyEnvVar
INFO: GGM inNoProxyEnvVar host bixpr-cd.svc no proxy bixpr-cd.svc,eu.boehringer.com returning(2) true
May 08, 2018 6:57:16 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver getResolvedHttpProxyAddress
INFO: GGM getResolvedHttpProxyAddress host jenkins-jnlp.bixpr-cd.svc port 50000 returning null
May 08, 2018 6:57:16 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
May 08, 2018 6:58:15 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: e0:9d:d9:c0:02:78:a0:53:40:a5:d5:03:38:68:16:90
May 08, 2018 6:58:15 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected

it's working now it seems ... ;)

so - what have we learnt
a) proxy settings not propagated from the master
b) no proxy needs to be set on the slave pod template / or ENV 
c) no proxy NOT honoring simple domain ending (.svc)
d) no proxy NOT honoring CIDR
e) curl of remoting jar broken (at least the error message if it does not work)

we cannot start changing global service settings (as we are migrating between two clusters and don't want to change stable cnames in the services). So we need fixes for this whole thing (as we also cannot predict the project names - that will be used - e.g bixpr-cd)

Comment 63 Gabe Montero 2018-05-14 20:24:22 UTC
I have some at least "partial" updates:

1) I dropped https://github.com/openshift/jenkins/commit/4d27b1113496a71a3a1420b28b0671f7826bc601 into our current release and upstream pipelines last week on May 9th

This change to our existing maven/nodejes agent/slave images will dynamically add which ever addresses the k8s plugin has seeded the slave pod with for the jenkins master and jnlp endpoints to the "no_proxy" env var.  With that change, the slave bringup works with the type of static, no_proxy setting Clemens provided in conjunction with the setting of http_proxy for access to other endpoints.

In other words, the user will *NOT* have to manage the agent/master communication in his no_proxy setting.  The image will do it.

The change has worked for the various consumers in those pipelines I mentioned to date.  Ideally, I'd like to give it a full week, and if all is well, backport to our existing release streams (3.6, 3.7, 3.9) on Wednesday May 16 to go out as needed in errata, etc. via validation/testing in that pipeline.

Also note, this new behavior can be turned off by setting a documented env var on the slave image used to any non-empty value.

But our thought is this change could satisfy the "meets minimum" requirement for this scenario.

2) Last week I also submitted PR https://github.com/jenkinsci/kubernetes-plugin/pull/321 against the kubernetes plugin so that if desired it will propagate the http_proxy and no_proxy env vars set on the jenkins master to each of the jenkins agents/slaves it starts up.

The maintainer to that plugin seems amenable to the change, though it will be off by default.  So if we want our images to do it by default, after we bump to the appropriate version of the k8s plugin, we will have to set the appropriate flag to initiate the propagation in our images.

3) I opened issue https://issues.jenkins-ci.org/browse/JENKINS-51223 and provide PR https://github.com/jenkinsci/remoting/pull/269 to update Jenkins core to better process no_proxy.  We've at least gotten acknowledgement from the upstream Jenkins maintainer, and will work with them to get to an amenable change.  Though we won't be able to consume said change until upstream Jenkins / CloudBees includes it in a LTS release we can consume.

And certainly 1), as well as 2), should be more of what is needed for the immediate scenario here.


I'll report back as various threads make progress.

Comment 65 Gabe Montero 2018-05-16 18:02:14 UTC
PR https://github.com/openshift/jenkins/pull/607 has merged, which brings in the defaulting of no_proxy discussed in https://bugzilla.redhat.com/show_bug.cgi?id=1573648#c63 to the 3.6.z stream

Per process creating clones for the 3.7, 3.9, and 3.10 release streams to initiate testing for the PRs merged in each of those as well.

Will report back when I see the 3.6 image in brew-pulp with these changes.  We can then checkpoint to see if it makes sense to commence internal testing and initiate errata update on succesful tests.

Comment 66 Gabe Montero 2018-05-29 14:55:24 UTC
OK, I'm going to send this to QA for verification against 3.6

Tag v3.6.173.0.122.20180525.154052 for the brew-pulp images has the fix.

So, for example, image brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-slave-maven-rhel7:v3.6.173.0.122.20180525.154052

See the testing instructions at https://bugzilla.redhat.com/show_bug.cgi?id=1578993#c1 but simply substitute the 3.6 image for the 3.10 image referenced, and run this against a 3.6 cluster.

This verification insures that slave to master / master to slave communication is not adversely affected by the http_proxy/no_proxy configuration.

Also, per other discussion points in this bugzilla, unrelated to what QA will verify:

1) v1.6.2 of the kubernetes plugin has my fix to allow that plugin to be configured such that any http_proxy/no_proxy settings on the master will be propagated to any slaves.  That version of the plugin was pulled into our master branch last week and is undergoing testing now.  When sufficiently satisfied with the results we will initiate backports to older release as needed.

2) My PR to fix jenkins core/remoting so that it can tolerate no_proxy values like ".svc" merged last week as well.  I am still waiting on cloudbees to provide a target as to which versions of Jenkins that change will land.

Comment 68 Wenjing Zheng 2018-06-04 08:13:41 UTC
Change back to ON_QA to double confirm:
@Gabe, I found below release version still has no such issue by according to steps like https://bugzilla.redhat.com/show_bug.cgi?id=1578993#c1 ,anything special to 3.6 when verify this bug? since other versions like 3.7 and 3.9 with release version with same steps can reproduce this issue. 

registry.access.redhat.com/openshift3/jenkins-slave-maven-rhel7                              latest                           0c8695d0aa95        12 days ago         1.019 GB

Comment 69 Gabe Montero 2018-06-04 13:54:01 UTC
You can use the same steps to verify 3.6 as well @Wenjing Zheng

Comment 70 Wenjing Zheng 2018-06-05 03:21:26 UTC
OK, thanks for reply! Per comment #67, will verify this bug now.

Comment 72 errata-xmlrpc 2018-06-28 07:54:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2007