Bug 1748766
| Summary: | number range depletion when multiple clones created from same master | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Fraser Tweedale <ftweedal> | |
| Component: | pki-core | Assignee: | Fraser Tweedale <ftweedal> | |
| Status: | CLOSED ERRATA | QA Contact: | PKI QE <bugzilla-pkiqe> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.7 | CC: | aakkiang, ddas, gkapoor, mharmsen, msauton | |
| Target Milestone: | rc | Keywords: | ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | pki-core-10.5.17-3.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1754845 (view as bug list) | Environment: | ||
| Last Closed: | 2020-03-31 19:54:16 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1754845 | |||
Moving to ASSIGNED. Already fixed in master, 10.6 and 10.7 branches, but backport to 10.5 is needed. See PR https://github.com/dogtagpki/pki/pull/40. Pull request: https://github.com/dogtagpki/pki/pull/246 DOGTAG_10_5_BRANCH backport merged upstream. Moving to POST.
commit 851a0bdd79c12c627a04cfc376338c1727cd50d9 (origin/DOGTAG_10_5_BRANCH)
Author: Fraser Tweedale <ftweedal>
Date: Wed Aug 29 22:22:10 2018 +1000
Add missing synchronisation for range management
Several methods in Repository (and CertificateRepository) need
synchronisation on the intrisic lock. Make these methods
synchronised.
Also take the lock in UpdateNumberRange so that no serial numbers
can be handed out in other threads between peekNextSerialNumber()
and set(Next)?MaxSerial(). Without this synchronisation, it is
possible that the master instance will use some of the serial
numbers it transfers to the clone.
Fixes: https://pagure.io/dogtagpki/issue/3055
commit 5a606e83719272fb488047b28a9ca7d5ce2ea30b
Author: Fraser Tweedale <ftweedal>
Date: Wed Aug 29 21:42:40 2018 +1000
checkRange: small refactor and add commentary
Add some commentary about the behaviour and proper usage of
Repository.checkRange(). Also perform a small refactor, avoiding
a redundant stringify and parse.
Part of: https://pagure.io/dogtagpki/issue/3055
commit 85e356580f64f87c0b01736b71dc3d385db0bcba
Author: Fraser Tweedale <ftweedal>
Date: Wed Aug 29 17:31:34 2018 +1000
rename method getTheSerialNumber -> peekNextSerialNumber
Rename Repository.getTheSerialNumber -> peekNextSerialNumber to more
accurately reflect what it does: peek at the next serial number
without actually consuming it.
Part of: https://pagure.io/dogtagpki/issue/3055
commit 2fb3611db5145dbdd5e7e14daaad1470691494f0
Author: Fraser Tweedale <ftweedal>
Date: Mon Sep 3 15:55:35 2018 +1000
Repository: handle depleted range in initCache()
Repository.initCache() does not handle the case where the current
range has been fully depleted, but the switch to the next range has
not occurred yet. This situation arises when the range has been
fully depleted by servicing UpdateNumberRange requests for clones.
Detect this situation and handle it by switching to the next range
(when available).
Part of: https://pagure.io/dogtagpki/issue/3055
commit f1615df509053a8f474b82ea6a2fa0883ab06d09
Author: Fraser Tweedale <ftweedal>
Date: Wed Aug 29 16:55:31 2018 +1000
getTheSerialNumber: only return null if next range not available
When cloning, if the master's current number range has been depleted
due to a previous UpdateNumberRange request,
Repository.getTheSerialNumber() returns null because the next serial
number is out of the current range, but the next range has not been
activated yet. NullPointerException ensues.
Update getTheSerialNumber() to return the next serial number even
when it exceeds the current number range, as long as there is a next
range. If there is no next range, return null (as before). It is
assumed that the next range is non-empty
Also do a couple of drive-by method extractions to improve
readability.
Part of: https://pagure.io/dogtagpki/issue/3055
Test Env:
=========
# rpm -qa pki-*
pki-base-java-10.5.16-6.el7_7.noarch
pki-kra-10.5.16-6.el7_7.noarch
pki-base-10.5.16-6.el7_7.noarch
pki-ca-10.5.16-6.el7_7.noarch
pki-tps-10.5.16-6.el7pki.x86_64
pki-console-10.5.16-1.el7pki.noarch
pki-tks-10.5.16-6.el7pki.noarch
pki-tools-10.5.16-6.el7_7.x86_64
pki-server-10.5.16-6.el7_7.noarch
pki-ocsp-10.5.16-6.el7pki.noarch
pki-symkey-10.5.16-6.el7_7.x86_64
Test Cases:
===========
Scenario 1:
-----------
Master --> clone
Scenario 2:
-----------
Master --> clone 1
--> clone2
Scenario 3:
----------
Master CA --> Clone1
--> Clone 2
--> Clone 3
Master KRA --> Clone 1
--> Clone 2
Tried above test scenario's but unable to see the same issue reported. Do we need to test any other test case.
I am checking with ipa team as well if they see any failures when they run ipa-kra-install.
ipa-run for ipa-kra-install test : http://ci-vm-10-0-148-68.hosted.upshift.rdu2.redhat.com/ipa-nightly-tier1/RHEL7.7/28/ipa-replica-promotion/pytest-run.log
Geetika, the scenario we need to test is: Master --> Clone 1 Clone 1 --> Clone 2 Clone 1 --> Clone 3 That is, Clone 2 and Clone 3 have to be created from Clone 1, not the original Master. It suffices to test this for CA subsystem. All the same machinery is used for KRA so IMO there is no need to test it specifically. (Of course, it does not hurt!) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1078 |
When multiple clones are created from a single master (which is also a clone) the depletion of the master's number range(s) can occur, causing cloning failure. There are two specific issues: If the master's current range has been depleted due a previous UpdateNumberRange request, Repository.getTheSerialNumber() returns null because the next serial number is out of the current range, but the next range has not been activated yet. A NullPointerException occurs. Similar to (1), but it is possible (though unlikely) that a next range has not even been assigned to the master. Long term, a better solution would be that instead of the master delegating part of its own range to the new clone, it create a full range assignment for the new clone. This will require changes to the UpdateNumberRange protocol. For now, we should at least address (1) to ensure that a range delegation that depletes a clone's current range does not cause a subsequent clone creation to fail. How reproducible: always