Description of problem: A check in PG.cc relies on checking last_backfill == hobject_t::get_max(). Older versions could produce last_backfill values with the max bit set, but which do not compare equal to get_max(). This can cause recent writes to appear unrecoverable even though they are recoverable. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
This should be considered a blocker for release of 2.0. It's not a blocker for the beta as long as we advise users upgrading a cluster to the beta to *not set sortbitwise*.
I *think* this should be reproducible by: 1) create a 1.3.2 cluster (3 osds, 1 mon should be fine) with a pool with a bunch of pgs (1000) and size=2 2) create a bunch of objects in that pool (rados bench -t 128 -b 1 for a while) 3) upgrade to 2.0 with noout set (we need to avoid backfill during this part) 4) start running rados bench again 5) while rados bench is running, set sortbitwise
This bug also implies that we aren't setting the sortbitwise flag in our upgrade testing. That suggests that it isn't covered in the docs either. We need to address those two things.
Merged into master, pending merge into jewel next. https://github.com/ceph/ceph/pull/9432
Merged into jewel. https://github.com/ceph/ceph/pull/9427
Oops, merged into jewel: https://github.com/ceph/ceph/pull/9674
We will take this change in as part of the rebase to ceph 10.2.2.
Verified in ceph version 10.2.2-38.el7cp (119a68752a5671253f9daae3f894a90313a6b8e4) With high number of pgs and too many objects we have to renable the osd-targets and start them again after the chown'ship of /var/lib/ceph, I am not sure why this is required but I had to do this for one of the osd's node to bring them back again. the failed case without twice renable: http://pulpito.ceph.redhat.com/vasu-2016-08-11_22:03:24-upgrades-ds-jewel---basic-magna/
2016-08-11T22:32:30.843 INFO:tasks.ceph_deploy:Initiate auto relabel after reboot 2016-08-11T22:32:30.844 INFO:teuthology.orchestra.run.magna007:Running: 'sudo touch /.autorelabel' 2016-08-11T22:32:30.912 INFO:tasks.ceph_deploy:Rebooting node magna007 2016-08-11T22:32:30.913 INFO:teuthology.orchestra.run.magna007:Running: 'sudo reboot' 2016-08-11T22:33:30.919 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'magna007.ceph.redhat.com', 'timeout': 60} 2016-08-11T22:33:48.961 DEBUG:teuthology.orchestra.remote:[Errno None] Unable to connect to port 22 on 10.8.128.7 2016-08-11T22:34:18.962 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'magna007.ceph.redhat.com', 'timeout': 60} 2016-08-11T22:35:18.969 DEBUG:teuthology.orchestra.remote:timed out 2016-08-11T22:35:48.971 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'magna007.ceph.redhat.com', 'timeout': 60} 2016-08-11T22:36:48.978 DEBUG:teuthology.orchestra.remote:timed out 2016-08-11T22:37:18.981 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'magna007.ceph.redhat.com', 'timeout': 60} 2016-08-11T22:37:19.318 INFO:teuthology.orchestra.run.magna007:Running: 'true' 2016-08-11T22:38:19.787 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'magna007.ceph.redhat.com', 'timeout': 60} 2016-08-11T22:38:20.102 INFO:teuthology.orchestra.run.magna007:Running: 'true' 2016-08-11T22:38:20.266 INFO:tasks.ceph_deploy:Enable systemd files 2016-08-11T22:38:20.267 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl stop firewalld' 2016-08-11T22:38:20.472 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl enable ceph-mon.target' 2016-08-11T22:38:20.624 INFO:teuthology.orchestra.run.magna007.stderr:Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-mon.target to /usr/lib/systemd/system/ceph-mon.target. 2016-08-11T22:38:20.625 INFO:teuthology.orchestra.run.magna007.stderr:Created symlink from /etc/systemd/system/ceph.target.wants/ceph-mon.target to /usr/lib/systemd/system/ceph-mon.target. 2016-08-11T22:38:20.625 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl enable ceph-osd.target' 2016-08-11T22:38:20.742 INFO:teuthology.orchestra.run.magna007.stderr:Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-osd.target to /usr/lib/systemd/system/ceph-osd.target. 2016-08-11T22:38:20.743 INFO:teuthology.orchestra.run.magna007.stderr:Created symlink from /etc/systemd/system/ceph.target.wants/ceph-osd.target to /usr/lib/systemd/system/ceph-osd.target. 2016-08-11T22:38:20.744 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl enable ceph-radosgw.target' 2016-08-11T22:38:20.855 INFO:teuthology.orchestra.run.magna007.stderr:Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-radosgw.target to /usr/lib/systemd/system/ceph-radosgw.target. 2016-08-11T22:38:20.855 INFO:teuthology.orchestra.run.magna007.stderr:Created symlink from /etc/systemd/system/ceph.target.wants/ceph-radosgw.target to /usr/lib/systemd/system/ceph-radosgw.target. 2016-08-11T22:38:20.856 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl enable ceph.target' 2016-08-11T22:38:20.959 INFO:teuthology.orchestra.run.magna007:Running: 'sudo chown -R ceph:ceph /var/lib/ceph' 2016-08-11T22:51:12.792 INFO:teuthology.orchestra.run.magna007:Running: 'sudo chown -R ceph:ceph /var/log/ceph' 2016-08-11T22:51:17.865 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl stop ceph.target' 2016-08-11T22:51:22.927 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl start ceph.target' 2016-08-11T22:51:27.984 INFO:teuthology.orchestra.run.magna007:Running: 'sudo chown -R ceph:ceph /var/lib/ceph' 2016-08-11T22:51:28.822 INFO:teuthology.orchestra.run.magna007:Running: 'sudo chown -R ceph:ceph /var/log/ceph' 2016-08-11T22:51:33.877 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl stop ceph.target' 2016-08-11T22:51:38.936 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl start ceph.target' 2016-08-11T22:51:43.997 INFO:teuthology.orchestra.run.magna007:Running: 'sudo chown -R ceph:ceph /var/lib/ceph' 2016-08-11T22:51:44.816 INFO:teuthology.orchestra.run.magna007:Running: 'sudo chown -R ceph:ceph /var/log/ceph' 2016-08-11T22:51:49.871 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl stop ceph.target' 2016-08-11T22:51:54.930 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl start ceph.target' 2016-08-11T22:51:59.988 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl start ceph-mon@`hostname`.service' 2016-08-11T22:52:05.066 INFO:teuthology.orchestra.run.magna007:Running: 'sudo systemctl status ceph-mon@`hostname`.service' I don't immediately see code that does this in the upstream ceph-qa-suite. I think this is a bug in the way you are doing the upgrade. You need to stop the daemon *before* doing the chown.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html