Bug 1194828
| Summary: | rados_bench tests pass on 7.1 and fail on 6.6 | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Warren <wusui> |
| Component: | RADOS | Assignee: | Samuel Just <sjust> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Warren <wusui> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 1.2.3 | CC: | ceph-eng-bugs, dzafman, icolle, kchai, kdreyer, sgraf, wusui |
| Target Milestone: | rc | ||
| Target Release: | 1.2.3 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-10-05 22:57:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This test doesn't appear to have the ceph install portion. Can you reproduce from a clean machine on 6.6? The problem that is happening here is explained in 1197287. On 6.6, for some reason the initial iptables looked like: Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT icmp -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all -- anywhere anywhere reject-with icmp-host-prohibited This caused a 'sudo ceph osd crush tunables default' command to time out. Running iptables -F fixed this problem. The problem that is happening here is explained in 1197287. On 6.6, for some reason the initial iptables looked like: Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT icmp -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all -- anywhere anywhere reject-with icmp-host-prohibited This caused a 'sudo ceph osd crush tunables default' command to time out. Running iptables -F fixed this problem. Can we close this with the doc fixes to 1197287? The iptables -F work around fixes this issue. Sorry. The text of John's iptables instructions are not quite what I expected. I will do what he says here, and if works, then I will verify both this and 1197287 Sorry. The text of John's iptables instructions are not quite what I expected. I will do what he says here, and if works, then I will verify both this and 1197287 I can get this to pass by cleaning the iptables. I think for this release we can go with the documented change (which is still iffy). |
Description of problem: The rados bench test fails on 6.6 -- It appears that 'sudo ceph osd crush tunables default' is timing-out. Version-Release number of selected component (if applicable): 6.6 How reproducible: 100% of the time Steps to Reproduce: 1. Run teuthology using the following yaml file. -------------------------------------------------- interactive-on-error: true roles: - [mon.a, osd.0, osd.1] - [mon.b, mon.c, osd.2, osd.3] - [client.0] overrides: ceph: conf: global: ms inject delay max: 1 ms inject delay probability: 0.005 ms inject delay type: osd ms inject internal delays: 0.002 ms inject socket failures: 2500 tasks: - install.ship_utilities: null - ceph: branch: firefly fs: btrfs log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 2 chance_pgpnum_fix: 1 timeout: 1200 - radosbench: clients: - client.0 time: 1800 -------------------------------------------------- Actual results: 2015-02-20 13:38:48,619.619 INFO:teuthology.orchestra.run.magna106.stderr:2015-02-20 10:38:48.618505 7f0e63ed9700 0 librados: client.admin authentication error (110) Connection timed out 2015-02-20 13:38:48,640.640 INFO:teuthology.orchestra.run.magna106.stderr:Error connecting to cluster: TimedOut 2015-02-20 13:38:48,655.655 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/wusui/teuthology/teuthology/contextutil.py", line 28, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/ceph-qa-suite/tasks/ceph.py", line 156, in crush_setup args=['sudo', 'ceph', 'osd', 'crush', 'tunables', profile]) File "/home/wusui/teuthology/teuthology/orchestra/remote.py", line 137, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 378, in run r.wait() File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 114, in wait label=self.label) CommandFailedError: Command failed on magna106 with status 1: 'sudo ceph osd crush tunables default' Expected results: Teuthology should pass Additional info: The teuthology run passes on 7.1