1683163 – consecutive cdi-upload an image results in "500 internal server error"

Bug 1683163 - consecutive cdi-upload an image results in "500 internal server error"

Summary: consecutive cdi-upload an image results in "500 internal server error"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	1.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	2.1.0
Assignee:	Irit Goihman
QA Contact:	Asher Shoshan
Docs Contact:
URL:
Whiteboard:
Depends On:	1707418 1725730
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-26 11:25 UTC by Asher Shoshan
Modified:	2019-10-31 14:09 UTC (History)
CC List:	10 users (show)
Fixed In Version:	1.9.3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-31 14:08:47 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubevirt containerized-data-importer pull 830	0	None	closed	Upload to the same target PVC should have a meaningful error	2020-07-30 05:26:36 UTC
Red Hat Product Errata	RHEA-2019:3282	0	None	None	None	2019-10-31 14:09:01 UTC

Description Asher Shoshan 2019-02-26 11:25:24 UTC

Expecting to have a meaningful message describing the upload retry.

Comment 1 Adam Litke 2019-02-26 13:18:33 UTC

Please provide error logs for the uploadserver pod(s) and uploadproxy pod.

Comment 2 Nelly Credi 2019-04-22 06:42:41 UTC

@Asher ping

Comment 3 Asher Shoshan 2019-04-22 14:32:54 UTC

In the second try the user receives 500 :

curl -v --insecure -H "Authorization: Bearer $TOKEN" --data-binary @cirros-0.4.0-x86_64-disk.img https://cnv-executor-ashoshan-master1.example.com:32758/v1alpha1/upload
*   Trying 172.16.0.10...
* TCP_NODELAY set
* Connected to cnv-executor-ashoshan-master1.example.com (172.16.0.10) port 32758 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* ignoring certificate verify locations due to disabled peer verification
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=cdi-uploadproxy.cdi
*  start date: Apr  3 15:56:50 2019 GMT
*  expire date: Apr  2 15:56:50 2020 GMT
*  issuer: CN=proxy.upload.cdi.kubevirt.io
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x557bd428e270)
> POST /v1alpha1/upload HTTP/2
> Host: cnv-executor-ashoshan-master1.example.com:32758
> User-Agent: curl/7.59.0
> Accept: */*
> Authorization: Bearer eyJhbGciOiJQUzUxMiIsImtpZCI6IiJ9.eyJwdmNOYW1lIjoidXBsb2FkLWRhdGF2b2x1bWUiLCJuYW1lc3BhY2UiOiJkZWZhdWx0IiwiY3JlYXRpb25UaW1lc3RhbXAiOiIyMDE5LTA0LTIyVDE0OjIyOjU3LjUyMzQ2NDIzOVoifQ.EJ3KdvD5epggCeyn9MUaJwOzLfEGpyoHv9ACOfaLQcsPKBH3yU0er1FEvUDaK0twfEjhPYn1nbsjcwTbGyVxCH3J8t0_pqMTWxzvhL4iyVc8kOhgnqis8LtYrPGIywVFY_yClSaWLVY4_YbBNehrBqD6iMRsMZcR80q9trwpisTO80uotlqTmxUJFm2GHDw3HK-JAEe6MztUajPchtomYJwUFj_xXVjpMEueIWA937Ru214D0Czwerg_rEHIyFX7zU7cPtpFR-PBqr-5ltPTZIKN4DYLvD4Gn0IPT_bv7KA3W4pr8LQajPV2fbD4bhgs4pSL6NbPqMmJR27VFn1XbQ
> Content-Length: 12716032
> Content-Type: application/x-www-form-urlencoded
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 500 
< content-length: 0
< date: Mon, 22 Apr 2019 14:24:34 GMT
* HTTP error before end of send, stop sending
< 
* Connection #0 to host cnv-executor-ashoshan-master1.example.com left intact


In the second try (after the first successful one), there is no pod "cdi-upload-xxxxx", where xxxxx is the datavolume name.

Log of cdi-uploadproxy:


oc -n cdi logs cdi-uploadproxy-7b76f58c56-4x4nj
I0404 14:26:05.678554       1 uploadproxy.go:47] Note: increase the -v level in the api deployment for more detailed logging, eg. -v=2 or -v=3
W0404 14:26:05.679237       1 client_config.go:548] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0407 16:48:49.515268       1 uploadproxy.go:149] Received valid token: pvc: pvc-vm6, namespace: default
I0407 16:53:14.788850       1 uploadproxy.go:149] Received valid token: pvc: pvc-win, namespace: default
E0407 16:55:40.542190       1 uploadproxy.go:164] Error proxying Post https://cdi-upload-pvc-win.default.svc/v1alpha1/upload: http: unexpected EOF reading trailer
I0410 15:00:39.926174       1 uploadproxy.go:149] Received valid token: pvc: pvc-vm6, namespace: default
I0422 14:23:20.455059       1 uploadproxy.go:149] Received valid token: pvc: upload-datavolume, namespace: default
I0422 14:24:33.062385       1 uploadproxy.go:149] Received valid token: pvc: upload-datavolume, namespace: default
E0422 14:24:34.364639       1 uploadproxy.go:164] Error proxying Post https://cdi-upload-upload-datavolume.default.svc/v1alpha1/upload: dial tcp: lookup cdi-upload-upload-datavolume.default.svc on 172.16.0.20:53: no such host

Comment 4 Irit Goihman 2019-04-30 09:25:03 UTC

@Asher, what CDI version are you using?
Can you provide more details about the scenario?
- Are you trying again with the same datavolume and token?
- Was the first try successful?

Comment 5 Asher Shoshan 2019-04-30 09:49:33 UTC

Tried with the same token and the same datavolume for the second time.
The first try was successful

Comment 6 Nelly Credi 2019-05-28 12:55:37 UTC

@Irit, can you please if any progress was made?

Comment 7 Irit Goihman 2019-05-28 13:05:59 UTC

I am working on a fix and will publish it soon.

Comment 8 Asher Shoshan 2019-07-08 14:17:17 UTC

Hi,
trying to verify the bug with: OCP 4.0 virt-cdi-operator:v2.0.0-27

This time I'm getting http 503, when trying the 2nd time to upload an image via curl

1st time is successful

virtctl upload working fine for 1st time, and replies with proper message (pvc already cloned..) when doing the same again

Comment 9 Irit Goihman 2019-07-08 14:30:22 UTC

error 503 means service unavailable, this is the change I made.
If you check uploadproxy logs you will see the exact error.
curl is an external tool, it shows the http response status it gets from the server.

Comment 10 Asher Shoshan 2019-07-08 15:38:22 UTC

I expected upload via curl to reply with same msg as virtctl (PVC xxx already successfully imported/cloned/updated) on the 2nd attempt.

Comment 11 Irit Goihman 2019-07-09 07:06:29 UTC

this is not how curl works.
While I have control over error messages in virtctl, curl is a completely different tool and in order to get further info you need to check the logs. I fixed them to show a meaningful error in the 2nd upload.

Comment 12 Adam Litke 2019-07-09 11:27:37 UTC

I agree with Irit that we are handling this error condition correctly according to HTTP the protocol by returning a 503 error code in this scenario.

Comment 13 Asher Shoshan 2019-07-10 08:13:18 UTC

Guys,
From the end-user perspective, it is the same issue .  i.e. http 503 is returned (when before - it was 500).
However when doing this with virtctl, end-user getting the exact and clear response.

please advise on how to proceed here.

Comment 14 Irit Goihman 2019-07-10 11:07:55 UTC

Asher,
It is expected behavior from curl, it shows http server status code and writes an error according to that.
Trying to upload again to the same pvc can't be successful. the uploadserver pod is gone after the successful upload, meaning that the service is no longer available.
status code 503 reflects the desired status.
The user knows to check the logs for further information in case he sees such error when he tries to upload with curl.

Comment 16 Qixuan Wang 2019-09-18 10:52:39 UTC

Tested with hco-bundle-registry-v2.1.0-47, CDI v2.1.0-14, rook-ceph-block, I can't reproduce this bug.

Comment 17 Natalie Gavrielov 2019-09-26 10:17:21 UTC

Ran the following scenario:
1. create a data volume for upload
2. create a token
3. upload an image using curl

Result:

First try - upload succeds (200 OK):

# curl -v --insecure -H "Authorization: Bearer $TOKEN" --data-binary @cirros-0.4.0-x86_64-disk.qcow2  https://cdi-uploadproxy-openshift-cnv.apps.working.oc4/v1alpha1/upload
*   Trying 192.168.126.51...
* TCP_NODELAY set
* Connected to cdi-uploadproxy-openshift-cnv.apps.working.oc4 (192.168.126.51) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.apps.working.oc4
*  start date: Sep 19 09:40:09 2019 GMT
*  expire date: Sep 18 09:40:10 2021 GMT
*  issuer: CN=ingress-operator@1568886010
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
> POST /v1alpha1/upload HTTP/1.1
> Host: cdi-uploadproxy-openshift-cnv.apps.working.oc4
> User-Agent: curl/7.64.0
> Accept: */*
> Authorization: Bearer eyJhbGciOiJQUzI1NiIsImtpZCI6IiJ9.eyJleHAiOjE1Njk0OTI1MDYsImlhdCI6MTU2OTQ5MjIwNiwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJ1cGxvYWQtZGF0YXZvbHVtZSIsIm5hbWVzcGFjZSI6ImRzdCIsIm5iZiI6MTU2OTQ5MjIwNiwib3BlcnRhdGlvbiI6IlVwbG9hZCIsInJlc291cmNlIjp7Imdyb3VwIjoiIiwicmVzb3VyY2UiOiJwZXJzaXN0ZW50dm9sdW1lY2xhaW1zIiwidmVyc2lvbiI6InYxIn19.lYjbDXq4NRPXwE8xQ7w12lJ3DU0vgDLuwdet4jIFKe7SnAudF8D5143Fn78uD6XsSyqQcLWy71L6ucbOkkMzmSHY2qK5HpoZBSO3X4_lFjSUd3FAW1HvDaxDf6G6b5h543xnTe-AGkxZYGDad_vvwGVBtg_CFkpdmA75MsxFlv0_P-KMjfbqwbxZLMYXe2UXCE_kouOpddg0qLUB5_3q_dNuYFGiqba87L9xM8oo75F259XzRiDOujNqV1OAMKa311tILGDUSWwsrLyCSalh9Uv9yRxE8xNvkKBtqEbFrrMLGw5VNRX4fI0hsn8DWSR_xG1Rwc4PAeoSr_GeKdGCDg
> Content-Length: 12716032
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Date: Thu, 26 Sep 2019 10:05:59 GMT
< Content-Length: 0
< Set-Cookie: 20168e521efcc36db27fbf0d0d57e7b6=aeb599c16161329e675706790810ee2a; path=/; HttpOnly; Secure
<
* Connection #0 to host cdi-uploadproxy-openshift-cnv.apps.working.oc4 left intact


The second try we are getting error (503 Service Unavailable):

# curl -v --insecure -H "Authorization: Bearer $TOKEN" --data-binary @cirros-0.4.0-x86_64-disk.qcow2  https://cdi-uploadproxy-openshift-cnv.apps.working.oc4/v1alpha1/upload
*   Trying 192.168.126.51...
* TCP_NODELAY set
* Connected to cdi-uploadproxy-openshift-cnv.apps.working.oc4 (192.168.126.51) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.apps.working.oc4
*  start date: Sep 19 09:40:09 2019 GMT
*  expire date: Sep 18 09:40:10 2021 GMT
*  issuer: CN=ingress-operator@1568886010
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
> POST /v1alpha1/upload HTTP/1.1
> Host: cdi-uploadproxy-openshift-cnv.apps.working.oc4
> User-Agent: curl/7.64.0
> Accept: */*
> Authorization: Bearer eyJhbGciOiJQUzI1NiIsImtpZCI6IiJ9.eyJleHAiOjE1Njk0OTI1MDYsImlhdCI6MTU2OTQ5MjIwNiwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJ1cGxvYWQtZGF0YXZvbHVtZSIsIm5hbWVzcGFjZSI6ImRzdCIsIm5iZiI6MTU2OTQ5MjIwNiwib3BlcnRhdGlvbiI6IlVwbG9hZCIsInJlc291cmNlIjp7Imdyb3VwIjoiIiwicmVzb3VyY2UiOiJwZXJzaXN0ZW50dm9sdW1lY2xhaW1zIiwidmVyc2lvbiI6InYxIn19.lYjbDXq4NRPXwE8xQ7w12lJ3DU0vgDLuwdet4jIFKe7SnAudF8D5143Fn78uD6XsSyqQcLWy71L6ucbOkkMzmSHY2qK5HpoZBSO3X4_lFjSUd3FAW1HvDaxDf6G6b5h543xnTe-AGkxZYGDad_vvwGVBtg_CFkpdmA75MsxFlv0_P-KMjfbqwbxZLMYXe2UXCE_kouOpddg0qLUB5_3q_dNuYFGiqba87L9xM8oo75F259XzRiDOujNqV1OAMKa311tILGDUSWwsrLyCSalh9Uv9yRxE8xNvkKBtqEbFrrMLGw5VNRX4fI0hsn8DWSR_xG1Rwc4PAeoSr_GeKdGCDg
> Content-Length: 12716032
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 503 Service Unavailable
< Date: Thu, 26 Sep 2019 10:06:27 GMT
< Content-Length: 0
< Set-Cookie: 20168e521efcc36db27fbf0d0d57e7b6=aeb599c16161329e675706790810ee2a; path=/; HttpOnly; Secure
* HTTP error before end of send, stop sending
<
* Closing connection 0
#

Marking as verified following comment 14

Comment 18 Natalie Gavrielov 2019-09-26 10:17:43 UTC

Note: tested CNV 2.1

Comment 20 errata-xmlrpc 2019-10-31 14:08:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3282

Note You need to log in before you can comment on or make changes to this bug.