Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1635213

Summary:

civetweb get shows http status 500

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Jos Collin <jcollin>

Component:

RGW

Assignee:

Matt Benjamin (redhat) <mbenjamin>

Status:

CLOSED NOTABUG

QA Contact:

Tejas <tchandra>

Severity:

low

Docs Contact:

Priority:

unspecified

Version:

2.5

CC:

cbodley, ceph-eng-bugs, jcollin, kbader, mbenjamin, mkogan, sweil

Target Milestone:

Target Release:

3.*

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-08-23 07:06:17 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
radosgw-log	none

Description Jos Collin 2018-10-02 11:36:46 UTC

Created attachment 1489404 [details]
radosgw-log

Description of problem:
There is a failed_req field on the ceph radosgw "perf dump": "failed_req - Aborted requests". The customer is getting "errno -125" and "errno -5/500" in the radosgw.log. The log file is attached with this bug. 

Version-Release number of selected component (if applicable):
RHCS 2.5

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
There is a failed_req field on the ceph radosgw "perf dump": "failed_req - Aborted requests". The customer is getting "errno -125" and "errno -5/500". 

Expected results:


Additional info:

Additional Info from the customer:

There is a failed_req field on the ceph radosgw "perf dump" and documentation is bit terse about it: 
"failed_req - Aborted requests"

We're constantly seeing  these so it could be useful to know what it actually is counting as "Aborted request".
Is it counting  404 (not found) as failed_req or 500 (internal errors) or is it something in underlying system (network, etc?)

# ceph daemon /var/run/ceph/ceph-client.radosgw.es-si-s3-z4.asok perf dump|grep failed_req
        "failed_req": 6228224,
-------------------------------------------

# tail -f /var/log/ceph/client.radosgw.es-ka-s3-dhn-14.log> /tmp/lokki&date; ceph daemon client.radosgw.es-ka-s3-dhn-14 perf dump|grep failed_req; sleep 10; date; ceph daemon client.radosgw.es-ka-s3-dhn-14 perf dump|grep failed_req; kill %1
[1] 58910
Mon Sep 10 12:34:00 EEST 2018
        "failed_req": 290254755,
Mon Sep 10 12:34:10 EEST 2018
        "failed_req": 290255005,
(generated log-file attached)
So for this 10sec time we had 290255005-290254755 = 250 failed_requests.

If we count the numbers from the logfile it almost matches number internal errors + "not found":
[root@esclosto014008 ~]# grep http_status= /tmp/lokki |egrep -v '=200|=206'|wc -l
    256 

For the 10sec period we have 3 internal errors (which feels not so good for just 10 sec period):
[root@esclosto014008 ~]# grep http_status= /tmp/lokki |grep  =500|wc -l
      3 
Those seem to be these:
2018-09-10 12:34:06.309861 7fc6bffcf700  0 ERROR: flush_read_list(): d->client_c->handle_data() returned -5
2018-09-10 12:34:06.309902 7fc6bffcf700  0 WARNING: set_req_state_err err_no=5 resorting to 500
2018-09-10 12:34:06.309930 7fc6bffcf700  0 ERROR: s->cio->send_content_length() returned err=-5
2018-09-10 12:34:06.309933 7fc6bffcf700  0 ERROR: s->cio->print() returned err=-5
2018-09-10 12:34:06.309935 7fc6bffcf700  0 ERROR: STREAM_IO(s)->print() returned err=-5
2018-09-10 12:34:06.309941 7fc6bffcf700  0 ERROR: STREAM_IO(s)->complete_header() returned err=-5
2018-09-10 12:34:06.309957 7fc6bffcf700  1 ====== req done req=0x7fc6bffc9710 op status=-5 http_status=500 ======

And 253 times "not found":
[root@esclosto014008 ~]# grep http_status= /tmp/lokki |grep  =404|wc -l
    253 


Interestingly we have a same object that is first working (status=200) and then it goes to status=500 and next query is 200 again... :)
...
2018-09-10 13:02:21.848517 7f6c5ffbf700  1 civetweb: 0x7f6f48002e90: 127.0.0.1 - - [10/Sep/2018:13:02:17 +0300] "GET /cd262a96-a981-4027-853d-cb9fe63d915e/workspace.tar.lz4 HTTP/1.1" 200 0 - Wget/1.16 (linux-gnu)
2018-09-10 13:02:21.906691 7f6c677ce700  1 civetweb: 0x7f6d94001540: 127.0.0.1 - - [10/Sep/2018:13:02:21 +0300] "GET /cd262a96-a981-4027-853d-cb9fe63d915e/workspace.tar.lz4 HTTP/1.1" 500 0 - Wget/1.16 (linux-gnu)
2018-09-10 13:02:23.785745 7f6c637c6700  1 civetweb: 0x7f6f94002e90: 127.0.0.1 - - [10/Sep/2018:13:02:22 +0300] "GET /cd262a96-a981-4027-853d-cb9fe63d915e/workspace.tar.lz4 HTTP/1.1" 200 0 - Wget/1.16 (linux-gnu)
...

Comment 21 Giridhar Ramaraju 2019-08-05 13:12:09 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 22 Giridhar Ramaraju 2019-08-05 13:12:57 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 24 Red Hat Bugzilla 2023-09-15 00:12:42 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days