Bug 1320490

Summary: Failed to scp the key of edge/reencrypt route to F5 server when using hostnetwork scc
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: DocumentationAssignee: Vikram Goyal <vigoyal>
Status: CLOSED CURRENTRELEASE QA Contact: Vikram Goyal <vigoyal>
Severity: medium Docs Contact: Vikram Goyal <vigoyal>
Priority: medium    
Version: 3.2.0CC: aos-bugs, bbennett, bmeng, eparis, erich, gmarcote, jokerman, mmccomas, ramr, rchopra, xtian
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-24 10:08:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description zhaozhanqi 2016-03-23 10:48:06 UTC
Description of problem:
Given the F5 router with scc hostnetwork has been running. Create edge route. Check the router logs will print: Error copying certificate openshift_route_default_secured-edge-route-https-cert to F5 BIG-IP.
	Output from scp command: unknown user 1000010000

Version-Release number of selected component (if applicable):
oc v3.2.0.6
kubernetes v1.2.0-36-g4a3f9c5
F5 router images id:
4f888f02bb09

How reproducible:
always

Steps to Reproduce:
1. Given the openshift and F5 server is running 
2. Create F5 router with scc hostnetwork
3. Create edge route with 
   oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/edge/route_edge.json
4. Check the router logs
  
 
Actual results:
[root@ip-10-3-90-123 ~]# oc logs router-1-x3jcb
W0323 04:23:54.988062       1 f5.go:243] Strict certificate verification is *DISABLED*
I0323 04:23:55.478979       1 router.go:161] Router is including routes in all namespaces
E0323 04:24:52.135020       1 f5.go:1535] Error copying certificate openshift_route_default_secured-edge-route-https-cert to F5 BIG-IP.
	Output from scp command: unknown user 1000010000

	Error: exit status 255
E0323 04:24:52.183018       1 f5.go:1528] Error deleting tempfile for certificate openshift_route_default_secured-edge-route-https-cert from F5 BIG-IP.
	Output from ssh command: No user exists for uid 1000010000

	Error: exit status 255


Expected results:

should not this error and work well.

Additional info:

if using scc privileged in step 2. the error message will be:

E0323 03:56:44.886367       1 f5.go:1535] Error copying certificate openshift_route_default_secured-edge-route-https-cert to F5 BIG-IP.
	Output from scp command: Warning: Permanently added '10.3.88.53' (RSA) to the list of known hosts.
Permission denied (publickey,keyboard-interactive,hostbased).
lost connection

	Error: exit status 1
E0323 03:56:51.341102       1 f5.go:1528] Error deleting tempfile for certificate openshift_route_default_secured-edge-route-https-cert from F5 BIG-IP.
	Output from ssh command: Warning: Permanently added '10.3.88.53' (RSA) to the list of known hosts.
Permission denied (publickey,keyboard-interactive,hostbased).

	Error: exit status 255
E0323 03:56:51.341219       1 controller.go:85] exit status 1

Comment 1 zhaozhanqi 2016-03-23 11:33:29 UTC
And also passthrough route cannot be synced to F5 server's policies. since this block the all F5 tls testing. raise the Severity to high

Comment 2 Ram Ranganathan 2016-03-24 05:34:28 UTC
This has to do with the changes to how the router user is now added to the hostnetwork scc - which means that the router user/uid inside the container has restrictive access/capabilities. 

@zhaozhanqi, remove the router user from the hostnetwork SCC and add to the privileged SCC. That should make it work.

# oadm policy remove-scc-from-user hostnetwork -z router
# oadm policy add-scc-to-user privileged -z router

Comment 3 zhaozhanqi 2016-03-24 05:57:02 UTC
@Ram Ranganathan 

 I also tried privileged, you can refer to the 'Additional info' in the bug description.

Comment 4 Ram Ranganathan 2016-03-24 17:16:04 UTC
@zhaozhanqi, my bad - was late and I didn't notice the additional info section.
But in any case, the privileged scc allows scp to proceed (not the user id issue) - it  looks to be a credentials issue here (permission denied: invalid username/password?).

Comment 5 zhaozhanqi 2016-03-25 03:02:58 UTC
@Ram Ranganathan 

  Yes, but I can scp the file to F5 server using the ca key (--external-host-private-key=) by manually

Comment 11 Rajat Chopra 2016-04-11 19:48:14 UTC
I checked your environment, the router.pem inside the container is not the same as ~/.ssh/id_rsa on the host. Did you start the router with the correct path to the external host key?

Comment 16 zhaozhanqi 2016-04-13 02:31:51 UTC
hi, 

this still be issue if using scc/hostnetwork for service account router since the default is using hostnetwork. you can refer "error: router could not be created; service account "router" is not allowed to access the host network on nodes, grant access with oadm policy add-scc-to-user hostnetwork -z router"

the following is the router logs when using scc/hostnetwork 

[root@ip-10-3-90-123 ~]# oc logs router-1-hg56m
W0412 22:24:49.482868       1 f5.go:243] Strict certificate verification is *DISABLED*
I0412 22:24:50.019568       1 router.go:161] Router is including routes in all namespaces
E0412 22:24:52.814037       1 f5.go:1535] Error copying certificate openshift_route_default_secured-edge-route-https-cert to F5 BIG-IP.
	Output from scp command: unknown user 1000010000

	Error: exit status 255
E0412 22:24:52.852055       1 f5.go:1528] Error deleting tempfile for certificate openshift_route_default_secured-edge-route-https-cert from F5 BIG-IP.
	Output from ssh command: No user exists for uid 1000010000

	Error: exit status 255
E0412 22:24:52.852140       1 controller.go:85] exit status 255

Comment 17 Rajat Chopra 2016-08-09 19:12:14 UTC
This error happens when the secret is stale. See comment#13 (https://bugzilla.redhat.com/show_bug.cgi?id=1320490#c13).

Closing this bug. Re-open if the error is seen even when the keys are correct.

Comment 18 zhaozhanqi 2016-08-16 06:49:39 UTC
Please refer to comment 12. this is still cannot work for hostnetwork scc

Comment 19 zhaozhanqi 2016-08-16 06:59:01 UTC
(In reply to zhaozhanqi from comment #18)
> Please refer to comment 12. this is still cannot work for hostnetwork scc

typo.. should be comment 16

Comment 20 Ram Ranganathan 2016-08-16 19:03:21 UTC
@zhaozhanqi / @rchopra, so the main issue I see here is that you can not run scp with the generated uid (example 1000020000).

By default that's the preallocated user id the /usr/bin/openshift-router process runs under inside the container because of the permissions of the hostnetwork scc 
(runAsUser === MustRunInRange). In order for scp to work, that would need to be runAsUser === RunAsAny.

Create an scc which has that set (and add the router service user to that scc) and it will work or I think using the privileged scc will also work - though that's a "wee" bit more perms than is needed.

Since its late, just updating the docs would be a better bet here. You probably need to get the right magic "scc"/oadm policy incantations before that!

Comment 21 Rajat Chopra 2016-08-16 22:25:15 UTC
For now, to cover this bug, the documentation changes are proposed in this PR: https://github.com/openshift/openshift-docs/pull/2660

Comment 22 Eric Rich 2016-08-23 15:30:00 UTC
I also believe that https://bugzilla.redhat.com/show_bug.cgi?id=1369513 is needed a the customer pointed out that this was also an issue.

Comment 23 Vikram Goyal 2018-01-24 10:08:07 UTC
Since https://bugzilla.redhat.com/show_bug.cgi?id=1320490#c21 pointed out that a fix was applied, I am closing this bug as current release.