Bug 1635213
| Summary: | civetweb get shows http status 500 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Jos Collin <jcollin> | ||||
| Component: | RGW | Assignee: | Matt Benjamin (redhat) <mbenjamin> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Tejas <tchandra> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 2.5 | CC: | cbodley, ceph-eng-bugs, jcollin, kbader, mbenjamin, mkogan, sweil | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 3.* | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-08-23 07:06:17 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. Regards, Giri Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. Regards, Giri The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |
Created attachment 1489404 [details] radosgw-log Description of problem: There is a failed_req field on the ceph radosgw "perf dump": "failed_req - Aborted requests". The customer is getting "errno -125" and "errno -5/500" in the radosgw.log. The log file is attached with this bug. Version-Release number of selected component (if applicable): RHCS 2.5 How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: There is a failed_req field on the ceph radosgw "perf dump": "failed_req - Aborted requests". The customer is getting "errno -125" and "errno -5/500". Expected results: Additional info: Additional Info from the customer: There is a failed_req field on the ceph radosgw "perf dump" and documentation is bit terse about it: "failed_req - Aborted requests" We're constantly seeing these so it could be useful to know what it actually is counting as "Aborted request". Is it counting 404 (not found) as failed_req or 500 (internal errors) or is it something in underlying system (network, etc?) # ceph daemon /var/run/ceph/ceph-client.radosgw.es-si-s3-z4.asok perf dump|grep failed_req "failed_req": 6228224, ------------------------------------------- # tail -f /var/log/ceph/client.radosgw.es-ka-s3-dhn-14.log> /tmp/lokki&date; ceph daemon client.radosgw.es-ka-s3-dhn-14 perf dump|grep failed_req; sleep 10; date; ceph daemon client.radosgw.es-ka-s3-dhn-14 perf dump|grep failed_req; kill %1 [1] 58910 Mon Sep 10 12:34:00 EEST 2018 "failed_req": 290254755, Mon Sep 10 12:34:10 EEST 2018 "failed_req": 290255005, (generated log-file attached) So for this 10sec time we had 290255005-290254755 = 250 failed_requests. If we count the numbers from the logfile it almost matches number internal errors + "not found": [root@esclosto014008 ~]# grep http_status= /tmp/lokki |egrep -v '=200|=206'|wc -l 256 For the 10sec period we have 3 internal errors (which feels not so good for just 10 sec period): [root@esclosto014008 ~]# grep http_status= /tmp/lokki |grep =500|wc -l 3 Those seem to be these: 2018-09-10 12:34:06.309861 7fc6bffcf700 0 ERROR: flush_read_list(): d->client_c->handle_data() returned -5 2018-09-10 12:34:06.309902 7fc6bffcf700 0 WARNING: set_req_state_err err_no=5 resorting to 500 2018-09-10 12:34:06.309930 7fc6bffcf700 0 ERROR: s->cio->send_content_length() returned err=-5 2018-09-10 12:34:06.309933 7fc6bffcf700 0 ERROR: s->cio->print() returned err=-5 2018-09-10 12:34:06.309935 7fc6bffcf700 0 ERROR: STREAM_IO(s)->print() returned err=-5 2018-09-10 12:34:06.309941 7fc6bffcf700 0 ERROR: STREAM_IO(s)->complete_header() returned err=-5 2018-09-10 12:34:06.309957 7fc6bffcf700 1 ====== req done req=0x7fc6bffc9710 op status=-5 http_status=500 ====== And 253 times "not found": [root@esclosto014008 ~]# grep http_status= /tmp/lokki |grep =404|wc -l 253 Interestingly we have a same object that is first working (status=200) and then it goes to status=500 and next query is 200 again... :) ... 2018-09-10 13:02:21.848517 7f6c5ffbf700 1 civetweb: 0x7f6f48002e90: 127.0.0.1 - - [10/Sep/2018:13:02:17 +0300] "GET /cd262a96-a981-4027-853d-cb9fe63d915e/workspace.tar.lz4 HTTP/1.1" 200 0 - Wget/1.16 (linux-gnu) 2018-09-10 13:02:21.906691 7f6c677ce700 1 civetweb: 0x7f6d94001540: 127.0.0.1 - - [10/Sep/2018:13:02:21 +0300] "GET /cd262a96-a981-4027-853d-cb9fe63d915e/workspace.tar.lz4 HTTP/1.1" 500 0 - Wget/1.16 (linux-gnu) 2018-09-10 13:02:23.785745 7f6c637c6700 1 civetweb: 0x7f6f94002e90: 127.0.0.1 - - [10/Sep/2018:13:02:22 +0300] "GET /cd262a96-a981-4027-853d-cb9fe63d915e/workspace.tar.lz4 HTTP/1.1" 200 0 - Wget/1.16 (linux-gnu) ...