Bug 1652652 - Registering a system fails randomly (409 Conflict)
Summary: Registering a system fails randomly (409 Conflict)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Hosts
Version: 6.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium vote
Target Milestone: 6.6.0
Assignee: Jonathon Turel
QA Contact: Lukáš Hellebrandt
URL:
Whiteboard:
Depends On: 1679696
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-22 14:34 UTC by Lukáš Hellebrandt
Modified: 2019-10-22 19:48 UTC (History)
10 users (show)

Fixed In Version: tfm-rubygem-katello-3.10.0.32-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1679696 1728291 (view as bug list)
Environment:
Last Closed: 2019-10-22 19:48:56 UTC
Target Upstream Version:


Attachments (Terms of Use)
register-systems.sh (506 bytes, text/plain)
2018-11-22 14:34 UTC, Lukáš Hellebrandt
no flags Details
rhsm.log (1.94 KB, text/plain)
2018-11-22 14:36 UTC, Lukáš Hellebrandt
no flags Details
production.log (17.42 KB, text/plain)
2018-11-22 14:38 UTC, Lukáš Hellebrandt
no flags Details


Links
System ID Priority Status Summary Last Updated
Foreman Issue Tracker 26191 Normal Closed Registering a system fails randomly (409 Conflict) 2020-02-25 03:56:29 UTC

Description Lukáš Hellebrandt 2018-11-22 14:34:31 UTC
Created attachment 1507980 [details]
register-systems.sh

Description of problem:
Sometimes, when I register a system to the Satellite, subscription-manager fails, showing no error. The return code is 70 and new system ID and name is not shown, as opposed to a successful run.

rhsm.log on the client shows RestlibException and response status 409, see attached traceback.

production.log on the Satellite shows "RestClient::Conflict: 409 Conflict" and a traceback, also see attached.

This seems to happen for every second attempt to register in a cycle:

1) # subscription-manager clean
2) change system hostname
3) # subscription-manager register ...

See attached the reproducer script. It  basically runs the above 100 times.

However, I don't think the bug is totally deterministic - at some times, the above doesn't happen and then starts happening again during the same script run.

Version-Release number of selected component (if applicable):
Reproduced on 6.5 snap 3.

Actual results:
No error message output, s-m return code 70, tracebacks, system not registered.

Expected results:
System registered successfully.

Comment 1 Lukáš Hellebrandt 2018-11-22 14:36:13 UTC
Created attachment 1507981 [details]
rhsm.log

Comment 2 Lukáš Hellebrandt 2018-11-22 14:38:06 UTC
Created attachment 1507982 [details]
production.log

Comment 4 Lukáš Hellebrandt 2018-12-06 15:02:23 UTC
This causes another issue. When the registered system has some subscription and then the same system is re-registered (s-m clean, rename, s-m reg), the old system in Satellite loses these subscriptions.

Furthermore, when listing subscriptions / available subscriptions / subscription events for the old system, we get 404. We also get 404 in e.g. Subscriptions report.

This means that not only did the old system lose subscription even when it should not have, but also the database got somehow to inconsistent state.

Comment 5 Lukáš Hellebrandt 2018-12-10 12:29:16 UTC
After more testing, I became pretty confident that this is not so random. It happens when re-registering the same system after s-m clean and hostname change, every second time.

Comment 6 Lukáš Hellebrandt 2018-12-11 13:24:52 UTC
I think (sorry for my info being so fuzzy in this BZ) that this also caused the following issues:

1) After thousands of registrations, 's-m register' started to say:

"Candlepin is in Suspend mode, please check /status resource to get more details"


2) Accessing https://<FQDN>/katello/api/v2/organizations/1 resulted in:

"""
{
  "error": {"message":"Katello::Resources::Candlepin::Owner: 404 Not Found  (GET /candlepin/owners/Default_Organization)"}
}
"""


'katello-service restart' didn't help in either case.

Comment 7 Jonathon Turel 2019-02-05 20:59:49 UTC
Hi Lukáš,

I've not had any luck in reproducing this with your script on the latest snap and an up-to-date RHEL8 virtual machine. Would you mind checking for this behavior with the latest 6.5 bits? Also, what OS is running on your client? Maybe that has something to do with it w.r.t the version of subscription-manager there - but I am speculating.

Comment 8 Lukáš Hellebrandt 2019-02-07 12:19:33 UTC
Reproduced with Sat 6.5 snap 14 and client RHEL 7, subscription-manager-1.20.10-1.el7.x86_64.

Comment 9 Jonathon Turel 2019-02-07 19:18:10 UTC
Still no luck and I've registered about a thousand systems with your script. Can you give me access to a server that's exhibiting this problem?

Comment 15 Jonathon Turel 2019-02-19 19:27:46 UTC
Lukas, thank you! I was able to see the problem and I've added some debug logging on the server which pointed me to a possible culprit for the problem. Please make sure that these hosts stick around for the rest of the week while I investigate further.

Comment 16 Jonathon Turel 2019-02-21 18:52:33 UTC
It looks like this bug is related to a possible regression or change in behavior of a particular Candlepin API which is being investigated through the dependent bugs linked up to this BZ. Once there's a new Candlepin build and/or related Satellite change then this bug can move forward.

Comment 17 Jonathon Turel 2019-02-26 04:29:04 UTC
I've learned that this behavior is the result of a change in Candlepin. Here's an explanation of the flow in Satellite:

# first iteration of registration script
--> generate UUID for the system
--> create system in Candlepin with the generated UUID (+ system facts)
---> Candlepin gives us back a system with that UUID
--> taking UUID from Candlepin, create system in Pulp with the generated UUID

# second iteration of registration script
--> generate (new) UUID for the system
--> create system in Candlepin w/ generated UUID (+ system facts)
!! ---> Candlepin gives us back the system profile from the previous registration
!! ---> Using the UUID coming back from Candlepin (which matches first registration) we fail to create the consumer in Pulp as it already exists

The Candlepin change was a deliberate one; the intention was to prevent multiple systems being created in the database and instead return the previous 'matching' system. Candlepin is looking at the 'dmi.system.uuid' fact which does not change across registrations in the case of this script.

Now I think it's fair to ask if this is really a bug. If a user is registering a system multiple times with a script such as yours they would likely want to run 'subscription-manager unregister' in addition to 'subscription-manager clean' to avoid filling their Satellite with orphaned system profiles. Let me know what you think.

Comment 18 Lukáš Hellebrandt 2019-02-26 09:33:07 UTC
I think that when I run s-m CLEAN, I expect the system to be registrable again after it. Even in the opposite case, the error should be handled more gracefully.

Comment 19 Jonathon Turel 2019-02-26 21:48:24 UTC
Lukas,

There is something we can do to ensure the host can still be registered following a 'subscription-manager clean'. The approach will be to reuse the previous system profile and attach to it based on the dmi.system.uuid fact like Candlepin is doing.

The repercussions of this would mean that your register_systems.sh would no longer work as-is to generate a number of system profiles. You'll need to generate and override the dmi.system.uuid fact and it should work as it does now: generating many profiles but without the errors. I'm working on those changes right now.

Comment 20 Lukáš Hellebrandt 2019-02-27 11:34:16 UTC
Be careful not to create a security issue by allowing to re-use a profile.

Comment 21 Jonathon Turel 2019-02-28 14:08:01 UTC
Created redmine issue https://projects.theforeman.org/issues/26191 from this bug

Comment 22 Bryan Kearney 2019-03-07 17:06:45 UTC
Upstream bug assigned to jturel@redhat.com

Comment 23 Bryan Kearney 2019-03-07 17:06:47 UTC
Upstream bug assigned to jturel@redhat.com

Comment 24 Lukáš Hellebrandt 2019-03-08 12:57:17 UTC
I think you set jcallaha as a QA contact by mistake. Re-taking this BZ. Let me know if it was not a mistake.

Comment 25 Bryan Kearney 2019-03-18 20:07:03 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26191 has been resolved.

Comment 27 Lukáš Hellebrandt 2019-03-27 13:14:47 UTC
I see this as a possible security issue, please let me know if you think it's possible - I haven't tested it yet:
1) I have a System1. I register it, use it for a while, do some config changes, package changes etc.
2) I s-m clean System1.
3) Attacker has a System2. They spoof System1's dmi.system.uuid. They register the system.
=> The attacker now can see some of the settings System1 used previously.

Additionally, if reusing the system profile actually proves as safe and secure, there is the following issue:
1) I have a system.with.hostname.a registered. I s-m clean it.
2) I have a system.with.hostname.b with the same dmi.system.uuid (the same system with changed hostname or a different system). I register it.
=> The s-m's output says: "The registered system name is: system.with.hostname.b" while actually, the system is named "system.with.hostname.a" in the Satellite.

Comment 28 Jonathon Turel 2019-03-27 15:03:21 UTC
I don't think I used the word "reuse" properly.

What *actually* happens is:

- if there is a single matching system found by hostname (old behavior) *or* dmi.system.uuid fact (new behavior) then unregister it (old behavior) and proceed with a new registration (old behavior)

- if there are multiple matching systems through above criteria, raise an error about multiple matching profiles being found & instruct the user for action (new behavior)

So, I don't think reuse is the proper word - there really shouldn't be any data lingering around. Thanks for raising the point.

Comment 29 Brad Buckingham 2019-04-02 14:17:47 UTC
Lukas,

Does Jonathon's comment address your concern?

Should we place this back ON_QA for verification?  If not, can you clarify the current failure?

Thanks!

Comment 30 Lukáš Hellebrandt 2019-04-02 14:27:52 UTC
From Jonathon's comment, I could believe there is indeed NO security regression.

However, second part of my comment still applies and is itself a reason not to verify this BZ. Additionally, it shows that at least SOME info from the old system IS USED - which raises security related questions again.

So, I'd suggest to fix the system name (to "system.with.hostname.b" in the example in my previous comment) and double make sure no old system's info can be reused.

Comment 31 Jonathon Turel 2019-04-02 14:41:07 UTC
(In reply to Lukáš Hellebrandt from comment #30)
> From Jonathon's comment, I could believe there is indeed NO security
> regression.
> 
> However, second part of my comment still applies and is itself a reason not
> to verify this BZ. Additionally, it shows that at least SOME info from the
> old system IS USED - which raises security related questions again.
> 
> So, I'd suggest to fix the system name (to "system.with.hostname.b" in the
> example in my previous comment) and double make sure no old system's info
> can be reused.

The behavior you're describing around the "original" hostname being used is pre-existing which is how I extended the functionality to also look at dmi.system.uuid I think it's also similar to what happens when this is done: subscription-manager register --consumerid=<something>

Because of that I think the renaming of the system profile is an altogether separate concern

Comment 32 William Poteat 2019-04-02 15:56:07 UTC
If you submit the same system uuid and owner, you will reuse the same record for the consumer.
If you want to avoid that, unregister [delete] the old consumer before proceeding.

How is it a security risk when you are reusing the same record for the same system with the same credentials?

Comment 33 Lukáš Hellebrandt 2019-04-03 09:24:54 UTC
Jonathon, the old behavior kept the old system hostname which made sense because the hostname was the same. Now, the hostname changes and only dmi.system.uuid is the same so this introduces two issues:
1) old system name used which is IMO wrong itself
2) s-m output showing wrong system name
Therefore I think your fix introduced this issue.

William, if I understand correctly, if a system's info is s-m cleaned, any attacker who knows the same activation key can spoof system's hostname or dmi.system.uuid, register and reuse the original system's record. AFAIK, it is common for multiple systems to use the same activation key. If you still don't understand what problem I see here, ask yourself "Does a user who runs 'subscription-manager clean' on their system want any other registerable system to reuse their account?"

Comment 34 Barnaby Court 2019-04-03 13:51:44 UTC
Lukáš, I think there is some misunderstanding of the purpose of s-m clean vs unregister. The purpose of clean is to clear data from the local system so that it can be reconnected to an existing consumer record in Satellite. The purpose of unregister is to remove the local data from the system AND delete the record from satellite.

From the sub-man man page:

---
The clean command removes all of the subscription and identity data from the local system without affecting the system information in the subscription management service.  This means that any of the
       subscriptions  applied  to  the system are not available for other systems to use. The clean command is useful in cases where the local subscription information is corrupted or lost somehow, and the
       system will be re-registered using the register --consumerid=EXISTING_ID command.
--- 

Using the register command after clean has been done is working as designed to connect to an existing system record based on the information on that system. If the system is connected to a record that is in a  different organization then the one associated with the credentials (username/pwd or activation key) specified during registration then we would absolutely have an issue that would be a high priority to fix. However, as long as things are being connected back to the same organization then things are functioning in the way that we would expect.

Comment 35 Lukáš Hellebrandt 2019-04-03 14:03:10 UTC
Barnaby, as I said, I believe this is not a security *regression*. The attack vector described in comment 33, paragraph 2, is not my reason not to verify this BZ. Although I still believe it is an security issue because sharing the same activationkey between A and B shouldn't mean that B has access to A's records.

However, I believe comment 33 paragraph 1 is still valid in both points. When a user specifies a system profile name X and gets a confirmation that the system profile name is indeed X, he can quite reasonably expect that the system profile name is X.

Comment 36 Jonathon Turel 2019-04-03 17:32:01 UTC
Lukáš, I do see what you mean about the host name reuse. The behavior was there before (in a certain light), but it's just made apparent now that we might be matching with different hostnames when the uuid is matched.

The question is *should* we update the host's name. I'm not entirely sure what all the consequences are of renaming a system. Could we be inadvertently affecting things that Satellite manages like DHCP, DNS, or something else? I'm not familiar with that side of things.

I know that we enable changing the host's name but to my knowledge it's always been a deliberate choice. Your suggestion means this would be an automatic change. Maybe that wouldn't be desired in some cases? Another option is to raise an error when we match on dmi.system.uuid but have different host name between the client and server profile.

Ivan, can you advise if we could run into problems when registering a host and updating the server's record of the host with the new hostname? (The name field of the host)

Comment 37 Ivan Necas 2019-04-04 07:20:34 UTC
As you mentioned, updating the hostname in Satellite might have the consequences as changing the record in DNS etc and I would be a bit cautious about doing so automatically. Failing in this case and letting the user resolved the unification of the names would be probably safer IMO

Comment 38 Jonathon Turel 2019-04-04 13:15:01 UTC
Thanks for confirming Ivan.

Lukas, what do you think about erroring out in this case?

Comment 39 Lukáš Hellebrandt 2019-04-04 13:20:32 UTC
Sounds reasonable.

Comment 40 Lukáš Hellebrandt 2019-04-04 13:23:56 UTC
I think the security concerns I mentioned are the same as bug 1508957. Fixing this BZ may create another way how to exploit that bug but let's track it in that bug.

Comment 41 Jonathon Turel 2019-04-09 18:20:11 UTC
Lukáš, I have an upstream PR opened addressing your previous concern: https://github.com/Katello/katello/pull/8066

Mind having a look before we get it downstream?

Comment 42 Lukáš Hellebrandt 2019-04-10 11:37:09 UTC
I can confirm the PR fixes issue described in comment 27, paragraph 2 by showing error "HTTP error (500 - Internal Server Error): The host centos7.fish.example.com matches this registration. Remove or rename it to centos7-fish2.fish.example.com before registering." when registering a system with different hostname but same dmi.system.uuid.

It also renders the script from OP useless but that is expected and correct.

Comment 44 Lukáš Hellebrandt 2019-06-25 11:41:32 UTC
Should this bug be ON_QA? Has it been fixed by fixing bug 1508957?

Comment 45 Jonathon Turel 2019-06-27 14:17:32 UTC
This code is part of 6.5 while #1508957 is part of 6.6 since they're separate problems. It seems this bug was not ever marked VERIFIED for 6.5 and I think it should be per comment#42

Comment 46 Jonathon Turel 2019-06-27 14:18:37 UTC
Brad, was this bug moved to Unspecified milestone because it wasn't ever marked as verified?

Comment 49 Lukáš Hellebrandt 2019-07-17 14:25:01 UTC
Verified with Sat 6.6 snap 7. I think this got fixed the same way as bug 1508957:

1) Registered a host
2) s-m cleaned the host
3) changed the host's hostname
4) Tried to register, failed with: "Please unregister or remove hosts which match this host before registering: <old_hostname>"
5) Changed uuid in /etc/rhsm/facts/uuid.facts
6) Tried to register, succeeded. There are now *two different* hosts registered to the Satellite
7) Changed the uuid
8) Tried to register, failed again

=> When there is a system already registered with the same hostname *or* uuid, the registration fails and the situation needs to be resolved manually by someone who can unregister the old host, that is the Satellite user or the root of the old host (if it hasn't been s-m cleaned before). When registering a system with *both* hostname and uuid *different* to any host already in Satellite, it is registered as a new host. I think this is correct behavior.

This has potential to break some processes of both QE and customers, keep that in mind. As this is a security issue, I think this is allowable. => Verified

Comment 50 Bryan Kearney 2019-10-22 19:48:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3172


Note You need to log in before you can comment on or make changes to this bug.