Bug 2041902

Summary: 504 timeout error when trying to upload an image to NFS store.
Product: Red Hat OpenStack Reporter: Tzach Shefi <tshefi>
Component: openstack-glanceAssignee: Cyril Roelandt <cyril>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact: Andy Stillman <astillma>
Priority: unspecified    
Version: 16.2 (Train)CC: athomas, eglynn
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-27 06:20:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Glance api logs none

Description Tzach Shefi 2022-01-18 13:42:00 UTC
Description of problem: On a system where Glance uses NFS, a Netapp share in my case, when I try to upload am image, cirros in this case, I got this response:

HTTP 504 Gateway Time-out: The server didn't respond in time.


Version-Release number of selected component (if applicable):
python3-glance-19.0.5-2.20210528112824.b3de9da.el8ost.1.noarch
python3-glance-store-1.0.2-2.20210528102039.79e043a.el8ost.noarch
openstack-glance-19.0.5-2.20210528112824.b3de9da.el8ost.1.noarch
python3-glanceclient-2.17.1-2.20210528071101.1aba8f2.el8ost.1.noarch


How reproducible:
Both Liron and I hit the same issue on this system. 
I think he mentioned he may have noticed this in the past before. 


Steps to Reproduce:
1. Deploy a system with Glance over NFS
BTW on this system Cinder also uses NFS, cinder create vol worked fine.


    GlanceBackend: 'file'
    GlanceNfsEnabled: true
    GlanceNfsShare: '172.16.X.Y:/vol_glance'
    GlanceNfsOptions: '_netdev,bg,intr,context=system_u:object_r:svirt_sandbox_file_t:s0,nosharecache'



2. Try to upload a small cirros image
(overcloud) [stack@undercloud-0 ~]$ glance  image-create --name tshefi --file cirr.img --disk-format qcow2 --container-format bare 
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | None                                 |
| container_format | bare                                 |
| created_at       | 2022-01-18T11:20:30Z                 |
| disk_format      | qcow2                                |
| id               | 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | tshefi                               |
| os_hash_algo     | None                                 |
| os_hash_value    | None                                 |
| os_hidden        | False                                |
| owner            | 8cc5e1fbf0cb488bb80ac0e8f2a211ac     |
| protected        | False                                |
| size             | None                                 |
| status           | queued                               |
| tags             | []                                   |
| updated_at       | 2022-01-18T11:20:30Z                 |
| virtual_size     | Not available                        |
| visibility       | shared                               |
+------------------+--------------------------------------+
HTTP 504 Gateway Time-out: The server didn't respond in time.
^ this is the error, I shouldn't get.

3. Recheck image status a few seconds later, image is available
(overcloud) [stack@undercloud-0 ~]$ glance image-show 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | 1d3062cd89af34e419f7100277f38b2b                                                 |
| container_format | bare                                                                             |
| created_at       | 2022-01-18T11:20:30Z                                                             |
| direct_url       | file:///var/lib/glance/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20               |
| disk_format      | qcow2                                                                            |
| id               | 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20                                             |
| min_disk         | 0                                                                                |
| min_ram          | 0                                                                                |
| name             | tshefi                                                                           |
| os_hash_algo     | sha512                                                                           |
| os_hash_value    | 553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b9 |
|                  | 1489acf687183adcd689b53b38e3ddd22e627e7f98a09c46                                 |
| os_hidden        | False                                                                            |
| owner            | 8cc5e1fbf0cb488bb80ac0e8f2a211ac                                                 |
| protected        | False                                                                            |
| size             | 16338944                                                                         |
| status           | active                                                                           |
| stores           | default_backend                                                                  |
| tags             | []                                                                               |
| updated_at       | 2022-01-18T11:26:33Z                                                             |
| virtual_size     | Not available                                                                    |
| visibility       | shared                                                                           |
+------------------+----------------------------------------------------------------------------------+

Also confirm file is indeed upload/found on NFS share:
-rw-r-----. 1 42415 42415 16338944 Jan 18 13:26 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20


4. Liron also attempted to upload a cirros image via python tempest conf 
again hit same 504 error.
| 4ff37631-a516-42d8-8835-f9a0f51323e1 | cirros-0.4.0-x86_64-disk.img |



Actual results:
Image upload reports:
"HTTP 504 Gateway Time-out: The server didn't respond in time." 

I noticed this on the logs:
[root@controller-0 ~]# grep -ir 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20  
/var/log/containers/glance/api.log:2022-01-18 11:20:30.645 18 DEBUG glance.api.v2.images [req-eed18d90-9135-4ecb-b305-3e66f1bc2595 be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] The 'locations' list of image 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 is empty, not including 'direct_url' in response _format_image /usr/lib/python3.6/site-packages/glance/api/v2/images.py:1339


Expected results:
Image should upload without the 504 error.

Comment 1 Tzach Shefi 2022-01-18 14:02:04 UTC
Created attachment 1851604 [details]
Glance api logs

The component probably should have been set to python-glance-store.
But I'll let you folks decide. 

Attaching the api logs:

grep 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 controller-2.api.log 
2022-01-18 11:20:30.657 18 DEBUG glance.api.middleware.version_negotiation [-] Determining version of request: PUT /v2/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20/file Accept: */* process_request /usr/lib/python3.6/site-packages/glance/api/middleware/version_negotiation.py:45
2022-01-18 11:20:30.657 18 DEBUG glance.api.middleware.version_negotiation [-] new path /v2/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20/file process_request /usr/lib/python3.6/site-packages/glance/api/middleware/version_negotiation.py:70
2022-01-18 11:26:33.983 18 DEBUG glance_store._drivers.filesystem [req-34ce50ec-8205-42bf-a4af-6fc7db40222b be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] Wrote 16338944 bytes to /var/lib/glance/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 with checksum 1d3062cd89af34e419f7100277f38b2b and multihash 553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46 add /usr/lib/python3.6/site-packages/glance_store/_drivers/filesystem.py:777
2022-01-18 11:26:34.033 18 INFO eventlet.wsgi.server [req-34ce50ec-8205-42bf-a4af-6fc7db40222b be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] 172.17.1.120 - - [18/Jan/2022 11:26:34] "PUT /v2/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20/file HTTP/1.1" 204 0 363.376649
2022-01-18 11:28:01.801 18 DEBUG glance.api.middleware.version_negotiation [-] Determining version of request: GET /v2/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 Accept: */* process_request /usr/lib/python3.6/site-packages/glance/api/middleware/version_negotiation.py:45
2022-01-18 11:28:01.802 18 DEBUG glance.api.middleware.version_negotiation [-] new path /v2/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 process_request /usr/lib/python3.6/site-packages/glance/api/middleware/version_negotiation.py:70
2022-01-18 11:28:02.792 18 INFO eventlet.wsgi.server [req-41c7f358-86e3-4a3e-9ea0-ee95d4832077 be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] 172.17.1.120 - - [18/Jan/2022 11:28:02] "GET /v2/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 HTTP/1.1" 200 1067 0.991540
 

And if you look at the log I guess this is the issue:
2022-01-18 11:26:33.983 18 DEBUG glance_store._drivers.filesystem [req-34ce50ec-8205-42bf-a4af-6fc7db40222b be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] Wrote 16338944 bytes to /var/lib/glance/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 with checksum 1d3062cd89af34e419f7100277f38b2b and multihash 553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46 add /usr/lib/python3.6/site-packages/glance_store/_drivers/filesystem.py:777
2022-01-18 11:26:34.030 18 DEBUG glance_store.multi_backend [req-34ce50ec-8205-42bf-a4af-6fc7db40222b be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] Skipping store.set_acls... not implemented set_acls_for_multi_store /usr/lib/python3.6/site-packages/glance_store/multi_backend.py:549
2022-01-18 11:26:34.033 18 INFO eventlet.wsgi.server [req-34ce50ec-8205-42bf-a4af-6fc7db40222b be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 599, in handle_one_response
    write(b'')
  File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib64/python3.6/socket.py", line 604, in write
    return self._sock.send(b)
  File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 397, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 384, in _send_loop
    return send_method(data, *args)
ConnectionResetError: [Errno 104] Connection reset by peer

2022-01-18 11:26:34.033 18 INFO eventlet.wsgi.server [req-34ce50ec-8205-42bf-a4af-6fc7db40222b be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] 172.17.1.120 - - [18/Jan/2022 11:26:34] "PUT /v2/images/5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20/file HTTP/1.1" 204 0 363.376649
2022-01-18 11:26:35.241 18 DEBUG eventlet.wsgi.server [-] (18) accepted ('172.17.1.104', 37736) serv




grep 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 controller-0.api.log 
2022-01-18 11:20:30.645 18 DEBUG glance.api.v2.images [req-eed18d90-9135-4ecb-b305-3e66f1bc2595 be681907aee94c4ba19f0ce901304a5f 8cc5e1fbf0cb488bb80ac0e8f2a211ac - default default] The 'locations' list of image 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 is empty, not including 'direct_url' in response _format_image /usr/lib/python3.6/site-packages/glance/api/v2/images.py:1339


$ grep 5cc8e1f8-2b3b-45bd-b4b2-958847ec5b20 controller-1.api.log 
Didn't report anything at all per this image.

Comment 4 Tzach Shefi 2022-01-26 08:44:51 UTC
We suspect there is an issue with either the netapp or networking,
I'm still looking into this, it may reach a "closed as not a bug" state.

I'll follow up and update this bug within a few days.

Comment 5 Tzach Shefi 2022-01-27 06:20:20 UTC
Closed as not a bug, rot cause turned out to be a netapp issue. 
After rebooting both cluster nodes Glance uploads and CI jobs no seem to be running.