Bug 1652652
Summary: | Registering a system fails randomly (409 Conflict) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Lukáš Hellebrandt <lhellebr> | ||||||||
Component: | Hosts | Assignee: | Jonathon Turel <jturel> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Lukáš Hellebrandt <lhellebr> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 6.5.0 | CC: | bbuckingham, bcourt, daviddavis, egolov, inecas, jturel, lhellebr, mhulan, mmccune, wpoteat | ||||||||
Target Milestone: | 6.6.0 | Keywords: | Triaged | ||||||||
Target Release: | Unused | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | tfm-rubygem-katello-3.10.0.32-1 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1679696 1728291 (view as bug list) | Environment: | |||||||||
Last Closed: | 2019-10-22 19:48:56 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1679696 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
Created attachment 1507981 [details]
rhsm.log
Created attachment 1507982 [details]
production.log
This causes another issue. When the registered system has some subscription and then the same system is re-registered (s-m clean, rename, s-m reg), the old system in Satellite loses these subscriptions. Furthermore, when listing subscriptions / available subscriptions / subscription events for the old system, we get 404. We also get 404 in e.g. Subscriptions report. This means that not only did the old system lose subscription even when it should not have, but also the database got somehow to inconsistent state. After more testing, I became pretty confident that this is not so random. It happens when re-registering the same system after s-m clean and hostname change, every second time. I think (sorry for my info being so fuzzy in this BZ) that this also caused the following issues: 1) After thousands of registrations, 's-m register' started to say: "Candlepin is in Suspend mode, please check /status resource to get more details" 2) Accessing https://<FQDN>/katello/api/v2/organizations/1 resulted in: """ { "error": {"message":"Katello::Resources::Candlepin::Owner: 404 Not Found (GET /candlepin/owners/Default_Organization)"} } """ 'katello-service restart' didn't help in either case. Hi Lukáš, I've not had any luck in reproducing this with your script on the latest snap and an up-to-date RHEL8 virtual machine. Would you mind checking for this behavior with the latest 6.5 bits? Also, what OS is running on your client? Maybe that has something to do with it w.r.t the version of subscription-manager there - but I am speculating. Reproduced with Sat 6.5 snap 14 and client RHEL 7, subscription-manager-1.20.10-1.el7.x86_64. Still no luck and I've registered about a thousand systems with your script. Can you give me access to a server that's exhibiting this problem? Lukas, thank you! I was able to see the problem and I've added some debug logging on the server which pointed me to a possible culprit for the problem. Please make sure that these hosts stick around for the rest of the week while I investigate further. It looks like this bug is related to a possible regression or change in behavior of a particular Candlepin API which is being investigated through the dependent bugs linked up to this BZ. Once there's a new Candlepin build and/or related Satellite change then this bug can move forward. I've learned that this behavior is the result of a change in Candlepin. Here's an explanation of the flow in Satellite: # first iteration of registration script --> generate UUID for the system --> create system in Candlepin with the generated UUID (+ system facts) ---> Candlepin gives us back a system with that UUID --> taking UUID from Candlepin, create system in Pulp with the generated UUID # second iteration of registration script --> generate (new) UUID for the system --> create system in Candlepin w/ generated UUID (+ system facts) !! ---> Candlepin gives us back the system profile from the previous registration !! ---> Using the UUID coming back from Candlepin (which matches first registration) we fail to create the consumer in Pulp as it already exists The Candlepin change was a deliberate one; the intention was to prevent multiple systems being created in the database and instead return the previous 'matching' system. Candlepin is looking at the 'dmi.system.uuid' fact which does not change across registrations in the case of this script. Now I think it's fair to ask if this is really a bug. If a user is registering a system multiple times with a script such as yours they would likely want to run 'subscription-manager unregister' in addition to 'subscription-manager clean' to avoid filling their Satellite with orphaned system profiles. Let me know what you think. I think that when I run s-m CLEAN, I expect the system to be registrable again after it. Even in the opposite case, the error should be handled more gracefully. Lukas, There is something we can do to ensure the host can still be registered following a 'subscription-manager clean'. The approach will be to reuse the previous system profile and attach to it based on the dmi.system.uuid fact like Candlepin is doing. The repercussions of this would mean that your register_systems.sh would no longer work as-is to generate a number of system profiles. You'll need to generate and override the dmi.system.uuid fact and it should work as it does now: generating many profiles but without the errors. I'm working on those changes right now. Be careful not to create a security issue by allowing to re-use a profile. Created redmine issue https://projects.theforeman.org/issues/26191 from this bug Upstream bug assigned to jturel Upstream bug assigned to jturel I think you set jcallaha as a QA contact by mistake. Re-taking this BZ. Let me know if it was not a mistake. Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26191 has been resolved. I see this as a possible security issue, please let me know if you think it's possible - I haven't tested it yet: 1) I have a System1. I register it, use it for a while, do some config changes, package changes etc. 2) I s-m clean System1. 3) Attacker has a System2. They spoof System1's dmi.system.uuid. They register the system. => The attacker now can see some of the settings System1 used previously. Additionally, if reusing the system profile actually proves as safe and secure, there is the following issue: 1) I have a system.with.hostname.a registered. I s-m clean it. 2) I have a system.with.hostname.b with the same dmi.system.uuid (the same system with changed hostname or a different system). I register it. => The s-m's output says: "The registered system name is: system.with.hostname.b" while actually, the system is named "system.with.hostname.a" in the Satellite. I don't think I used the word "reuse" properly. What *actually* happens is: - if there is a single matching system found by hostname (old behavior) *or* dmi.system.uuid fact (new behavior) then unregister it (old behavior) and proceed with a new registration (old behavior) - if there are multiple matching systems through above criteria, raise an error about multiple matching profiles being found & instruct the user for action (new behavior) So, I don't think reuse is the proper word - there really shouldn't be any data lingering around. Thanks for raising the point. Lukas, Does Jonathon's comment address your concern? Should we place this back ON_QA for verification? If not, can you clarify the current failure? Thanks! From Jonathon's comment, I could believe there is indeed NO security regression. However, second part of my comment still applies and is itself a reason not to verify this BZ. Additionally, it shows that at least SOME info from the old system IS USED - which raises security related questions again. So, I'd suggest to fix the system name (to "system.with.hostname.b" in the example in my previous comment) and double make sure no old system's info can be reused. (In reply to Lukáš Hellebrandt from comment #30) > From Jonathon's comment, I could believe there is indeed NO security > regression. > > However, second part of my comment still applies and is itself a reason not > to verify this BZ. Additionally, it shows that at least SOME info from the > old system IS USED - which raises security related questions again. > > So, I'd suggest to fix the system name (to "system.with.hostname.b" in the > example in my previous comment) and double make sure no old system's info > can be reused. The behavior you're describing around the "original" hostname being used is pre-existing which is how I extended the functionality to also look at dmi.system.uuid I think it's also similar to what happens when this is done: subscription-manager register --consumerid=<something> Because of that I think the renaming of the system profile is an altogether separate concern If you submit the same system uuid and owner, you will reuse the same record for the consumer. If you want to avoid that, unregister [delete] the old consumer before proceeding. How is it a security risk when you are reusing the same record for the same system with the same credentials? Jonathon, the old behavior kept the old system hostname which made sense because the hostname was the same. Now, the hostname changes and only dmi.system.uuid is the same so this introduces two issues: 1) old system name used which is IMO wrong itself 2) s-m output showing wrong system name Therefore I think your fix introduced this issue. William, if I understand correctly, if a system's info is s-m cleaned, any attacker who knows the same activation key can spoof system's hostname or dmi.system.uuid, register and reuse the original system's record. AFAIK, it is common for multiple systems to use the same activation key. If you still don't understand what problem I see here, ask yourself "Does a user who runs 'subscription-manager clean' on their system want any other registerable system to reuse their account?" Lukáš, I think there is some misunderstanding of the purpose of s-m clean vs unregister. The purpose of clean is to clear data from the local system so that it can be reconnected to an existing consumer record in Satellite. The purpose of unregister is to remove the local data from the system AND delete the record from satellite. From the sub-man man page: --- The clean command removes all of the subscription and identity data from the local system without affecting the system information in the subscription management service. This means that any of the subscriptions applied to the system are not available for other systems to use. The clean command is useful in cases where the local subscription information is corrupted or lost somehow, and the system will be re-registered using the register --consumerid=EXISTING_ID command. --- Using the register command after clean has been done is working as designed to connect to an existing system record based on the information on that system. If the system is connected to a record that is in a different organization then the one associated with the credentials (username/pwd or activation key) specified during registration then we would absolutely have an issue that would be a high priority to fix. However, as long as things are being connected back to the same organization then things are functioning in the way that we would expect. Barnaby, as I said, I believe this is not a security *regression*. The attack vector described in comment 33, paragraph 2, is not my reason not to verify this BZ. Although I still believe it is an security issue because sharing the same activationkey between A and B shouldn't mean that B has access to A's records. However, I believe comment 33 paragraph 1 is still valid in both points. When a user specifies a system profile name X and gets a confirmation that the system profile name is indeed X, he can quite reasonably expect that the system profile name is X. Lukáš, I do see what you mean about the host name reuse. The behavior was there before (in a certain light), but it's just made apparent now that we might be matching with different hostnames when the uuid is matched. The question is *should* we update the host's name. I'm not entirely sure what all the consequences are of renaming a system. Could we be inadvertently affecting things that Satellite manages like DHCP, DNS, or something else? I'm not familiar with that side of things. I know that we enable changing the host's name but to my knowledge it's always been a deliberate choice. Your suggestion means this would be an automatic change. Maybe that wouldn't be desired in some cases? Another option is to raise an error when we match on dmi.system.uuid but have different host name between the client and server profile. Ivan, can you advise if we could run into problems when registering a host and updating the server's record of the host with the new hostname? (The name field of the host) As you mentioned, updating the hostname in Satellite might have the consequences as changing the record in DNS etc and I would be a bit cautious about doing so automatically. Failing in this case and letting the user resolved the unification of the names would be probably safer IMO Thanks for confirming Ivan. Lukas, what do you think about erroring out in this case? Sounds reasonable. I think the security concerns I mentioned are the same as bug 1508957. Fixing this BZ may create another way how to exploit that bug but let's track it in that bug. Lukáš, I have an upstream PR opened addressing your previous concern: https://github.com/Katello/katello/pull/8066 Mind having a look before we get it downstream? I can confirm the PR fixes issue described in comment 27, paragraph 2 by showing error "HTTP error (500 - Internal Server Error): The host centos7.fish.example.com matches this registration. Remove or rename it to centos7-fish2.fish.example.com before registering." when registering a system with different hostname but same dmi.system.uuid. It also renders the script from OP useless but that is expected and correct. Should this bug be ON_QA? Has it been fixed by fixing bug 1508957? This code is part of 6.5 while #1508957 is part of 6.6 since they're separate problems. It seems this bug was not ever marked VERIFIED for 6.5 and I think it should be per comment#42 Brad, was this bug moved to Unspecified milestone because it wasn't ever marked as verified? Verified with Sat 6.6 snap 7. I think this got fixed the same way as bug 1508957: 1) Registered a host 2) s-m cleaned the host 3) changed the host's hostname 4) Tried to register, failed with: "Please unregister or remove hosts which match this host before registering: <old_hostname>" 5) Changed uuid in /etc/rhsm/facts/uuid.facts 6) Tried to register, succeeded. There are now *two different* hosts registered to the Satellite 7) Changed the uuid 8) Tried to register, failed again => When there is a system already registered with the same hostname *or* uuid, the registration fails and the situation needs to be resolved manually by someone who can unregister the old host, that is the Satellite user or the root of the old host (if it hasn't been s-m cleaned before). When registering a system with *both* hostname and uuid *different* to any host already in Satellite, it is registered as a new host. I think this is correct behavior. This has potential to break some processes of both QE and customers, keep that in mind. As this is a security issue, I think this is allowable. => Verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3172 |
Created attachment 1507980 [details] register-systems.sh Description of problem: Sometimes, when I register a system to the Satellite, subscription-manager fails, showing no error. The return code is 70 and new system ID and name is not shown, as opposed to a successful run. rhsm.log on the client shows RestlibException and response status 409, see attached traceback. production.log on the Satellite shows "RestClient::Conflict: 409 Conflict" and a traceback, also see attached. This seems to happen for every second attempt to register in a cycle: 1) # subscription-manager clean 2) change system hostname 3) # subscription-manager register ... See attached the reproducer script. It basically runs the above 100 times. However, I don't think the bug is totally deterministic - at some times, the above doesn't happen and then starts happening again during the same script run. Version-Release number of selected component (if applicable): Reproduced on 6.5 snap 3. Actual results: No error message output, s-m return code 70, tracebacks, system not registered. Expected results: System registered successfully.