Bug 2094752
Summary: | rhcd service fails to start when configured with stage environment | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Jameer Pathan <jpathan> |
Component: | rhc | Assignee: | Link Dupont <link> |
Status: | CLOSED NOTABUG | QA Contact: | jaudet |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 8.6 | CC: | alcohan, anferrei, cmarinea, ecerquei, ehelms, fjansen, gchamoul, jaudet, jhollowa, jholloway, link, mabezerr, nmoumoul, omaciel, pakotvan, perobins, rantunes, sshtein, tcarlin |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-06-17 16:55:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2095598, 2095599 |
Description
Jameer Pathan
2022-06-08 09:26:09 UTC
Can you set the 'log-level' to "debug", start rhcd.service again, and attach the output of `journalctl --unit=rhcd.service`? Jun 08 05:23:28 dhcp-3-227.example.com systemd[1]: Started Red Hat connector daemon. Jun 08 05:23:29 dhcp-3-227.example.com rhcd[3147]: cannot connect to broker: network Error : websocket: close 1006 (abnormal closure): unexpected EOF Jun 08 05:23:29 dhcp-3-227.example.com systemd[1]: rhcd.service: Main process exited, code=exited, status=1/FAILURE Jun 08 05:23:29 dhcp-3-227.example.com systemd[1]: rhcd.service: Failed with result 'exit-code'. Jun 08 07:31:39 dhcp-3-227.example.com systemd[1]: Started Red Hat connector daemon. Jun 08 07:31:39 dhcp-3-227.example.com rhcd[3309]: [rhcd] 2022/06/08 07:31:39 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:160: starting rhcd version 0.2.1 Jun 08 07:31:39 dhcp-3-227.example.com rhcd[3309]: [rhcd] 2022/06/08 07:31:39 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:209: listening on socket: @yggd-dispatcher-iEEDtI Jun 08 07:31:41 dhcp-3-227.example.com rhcd[3309]: cannot connect to broker: network Error : websocket: close 1006 (abnormal closure): unexpected EOF Jun 08 07:31:41 dhcp-3-227.example.com systemd[1]: rhcd.service: Main process exited, code=exited, status=1/FAILURE Jun 08 07:31:41 dhcp-3-227.example.com systemd[1]: rhcd.service: Failed with result 'exit-code'. I provisioned a RHEL 8.6 VM and executed the following: subscription-manager register --username insights-qa --serverurl https://subscription.rhsm.stage.redhat.com:443 subscription-manager attach --pool=8a82d2b77fc27a39018029d31d115815 vi /etc/rhc/config.toml # set broker and data-host rhc connect # exit 1 The `rhc connect` command failed, as originally reported. I then provisioned a second RHEL 8.6 VM and executed the following: subscription-manager register --username insights-qa vi /etc/rhc/config.toml # set broker and data-host rhc connect # exit 0 The `rhc connect` command succeeded. Perhaps this indicates there is some difficulty with the certificate provided by stage RHSM? It seems that the certificate generated when you subscribe the host to STAGE RHSM has changed. Now the subject has not only the CN, but also has the O = org_id. New cert (openssl x509 -in /etc/pki/consumer/cert.pem -text --noout): Certificate: Data: Version: 3 (0x2) Serial Number: 1626572696888250470 (0x1692bf3ff75f9066) Signature Algorithm: sha256WithRSAEncryption Issuer: C = US, ST = North Carolina, O = "Red Hat, Inc.", OU = Red Hat Network, CN = Red Hat Candlepin Authority, emailAddress = ca-support Validity Not Before: Jun 8 10:34:34 2022 GMT Not After : Jun 8 11:34:34 2023 GMT Subject: O = 11789772, CN = 6011c284-905d-47f2-a450-2a0bafd71e3c Old cert (openssl x509 -in /root/old-cert.pem -text --noout):: Certificate: Data: Version: 3 (0x2) Serial Number: 9215938700743403232 (0x7fe5976b9195e6e0) Signature Algorithm: sha256WithRSAEncryption Issuer: C = US, ST = North Carolina, O = "Red Hat, Inc.", OU = Red Hat Network, CN = Red Hat Candlepin Authority, emailAddress = ca-support Validity Not Before: Jul 2 13:06:48 2021 GMT Not After : Jul 2 14:06:48 2022 GMT Subject: CN = 0272bb50-57e8-4476-981a-ac25fb969abd When I try to start rhcd using the new cert I got: [root@iqe-vm-rhc-e2e-sbbylrgbnp ~]# rhcd --broker wss://connect.cloud.stage.redhat.com:443 --cert-file /etc/pki/consumer/cert.pem --key-file /etc/pki/consumer/key.pem --log-level debug [rhcd] 2022/06/08 10:00:25 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:160: starting rhcd version 0.2.1 [rhcd] 2022/06/08 10:00:25 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:209: listening on socket: @yggd-dispatcher-oHfBIE cannot connect to broker: network Error : websocket: close 1006 (abnormal closure): unexpected EOF Using the old cert I don't have any errors: [root@iqe-vm-rhc-e2e-sbbylrgbnp ~]# rhcd --broker wss://connect.cloud.stage.redhat.com:443 --cert-file /root/old-cert.pem --key-file /root/old-key.pem --log-level debug [rhcd] 2022/06/08 10:00:52 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:160: starting rhcd version 0.2.1 [rhcd] 2022/06/08 10:00:52 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:209: listening on socket: @yggd-dispatcher-VOtOqf [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:337: starting worker: rhc-package-manager-worker [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:337: starting worker: rhc-worker-playbook.worker [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/main.go:357: cannot start watching '/etc/rhc/tags.toml': lstat /etc/rhc/tags.toml: no such file or directory [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/exec.go:54: started process: 18303 [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/exec.go:92: watching process: 18303 [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/exec.go:54: started process: 18307 [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/exec.go:92: watching process: 18307 [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/grpc.go:69: worker registered: {pid:18303 handler:package-manager addr:@ygg-package-manager-RmNFAr features:map[] detachedContent:false} [rhcd] 2022/06/08 10:00:53 /builddir/build/BUILD/rhc/yggdrasil-0.2.1/cmd/yggd/mqtt.go:131: published message 1d6d806e-30b4-4028-a7a1-f16c73ab2e20 to topic redhat/insights/0272bb50-57e8-4476-981a-ac25fb969abd/control/out I'm able to reproduce this as well. Will start looking into a root cause. Same error when I connect directly to the broker using `mqttcli/sub`: [root@rhel-8-dev ~]# sub -broker wss://connect.cloud.stage.redhat.com:443 -cert-file /etc/pki/consumer/cert.pem -key-file /etc/pki/consumer/key.pem -topic redhat/insights/8924d620-3d96-48fd-b0ee-d730a92fc07f/control/in 2022/06/08 11:41:45 connect failed: network Error : websocket: close 1006 (abnormal closure): unexpected EOF Parsing the certificate subject isn't the problem here. I stepped through the parseCertCN function[1] and it returns the CommonName from the subject correctly. And because it happens to mqttcli[2] as well (which doesn't parse the contents of the certificate at all), I strongly doubt this is a client-side issue. I wonder if there is some problem with TLS handshaking with the broker. 1: https://github.com/RedHatInsights/yggdrasil/blob/main/cmd/yggd/util.go#L31 2: https://git.sr.ht/~spc/mqttcli/tree/main/item/mqtt.go#L39 JWT authentication is unaffected; this only affects mTLS authentication (which rhcd uses). [link@thelio rhc-demo]$ yggctl generate control-message --type command '{"command":"ping"}' | pub -config ldupont-teNiem6C.cfg -topic redhat/insights/8924d620-3d96-48fd-b0ee-d730a92fc07f/control/in 2022/06/08 12:26:46 connected: wss://connect.cloud.stage.redhat.com:443 2022/06/08 12:26:46 published: [redhat/insights/8924d620-3d96-48fd-b0ee-d730a92fc07f/control/in] [123 34 116 121 112 101 34 58 34 99 111 109 109 97 110 100 34 44 34 109 101 115 115 97 103 101 95 105 100 34 58 34 101 54 50 52 102 50 53 98 45 101 52 101 57 45 52 49 102 50 45 98 102 53 48 45 53 102 52 51 99 99 48 52 53 97 100 54 34 44 34 114 101 115 112 111 110 115 101 95 116 111 34 58 34 34 44 34 118 101 114 115 105 111 110 34 58 49 44 34 115 101 110 116 34 58 34 50 48 50 50 45 48 54 45 48 56 84 49 50 58 50 54 58 52 53 46 52 52 50 50 54 52 56 57 55 45 48 52 58 48 48 34 44 34 99 111 110 116 101 110 116 34 58 123 34 99 111 109 109 97 110 100 34 58 34 112 105 110 103 34 44 34 97 114 103 117 109 101 110 116 115 34 58 110 117 108 108 125 125 10] [link@thelio rhc-demo]$ sub -config ldupont-Izah3bu7.cfg -topic redhat/insights/8924d620-3d96-48fd-b0ee-d730a92fc07f/control/in 2022/06/08 12:28:18 connected: wss://connect.cloud.stage.redhat.com:443 2022/06/08 12:28:18 subscribed: redhat/insights/8924d620-3d96-48fd-b0ee-d730a92fc07f/control/in 2022/06/08 12:28:20 [redhat/insights/8924d620-3d96-48fd-b0ee-d730a92fc07f/control/in] {"type":"command","message_id":"94facd7c-ac17-4f8a-9e68-cccec1430117","response_to":"","version":1,"sent":"2022-06-08T12:28:20.234849158-04:00","content":{"command":"ping","arguments":null}} A note from our own testing of these workflows that may not be as obvious in the description, there are no errors thrown if you: 1) Register to production subscription.rhsm.redhat.com 2) Install rhc 3) Configure rhc to point at the stage environment (keeping RHSM pointed at production and using production certificates) 4) Start rhcd So the connection to the broker in stage appears to work if the box is registered to production RHSM rather than stage RHSM. (In reply to Eric Helms from comment #11) > A note from our own testing of these workflows that may not be as obvious in > the description, there are no errors thrown if you: > > 1) Register to production subscription.rhsm.redhat.com > 2) Install rhc > 3) Configure rhc to point at the stage environment (keeping RHSM pointed at > production and using production certificates) > 4) Start rhcd > > > So the connection to the broker in stage appears to work if the box is > registered to production RHSM rather than stage RHSM. The service might start successfully, but did you test whether or not you can publish messages to the host? I think the broker will silently accept authenticated connections from clients, but if they do not have the right ACLs to the topic(s) the client subscribed to, no messages will be received on those topics. We do have some machinery in place in cloud-connector to tell hosts to disconnect if they don't have the correct credentials, so it does seem odd that a production certificate would work correctly when used to authenticate to the stage broker. Resolving as NOTABUG, this was not actually a bug in rhc. |