Auth Server and Nodes in different AWS VPCs

I am trying to add a node to a cluster that is in a different AWS VPC than the auth server.
teleport start --roles=node,proxy --token=secret-value --ca-pin=sha256:bla --auth-server=teleport-nlb.mjh.group
The error I am getting is
ERRO [PROC:1] Node failed to establish connection to cluster: Get https://teleport-nlb.mjh.group:3025/v1/webapi/find: remote error: tls: internal error. service/connect.go:65
ERRO [PROC:1] Proxy failed to establish connection to cluster: x509: certificate signed by unknown authority. service/connect.go:65
Adding a node in the same VPC is not a problem.

Do you have a load balancer in front of your auth server?

If so, what type of load balancer?

Do you have cross-zone load balancing enabled?

Thanks for jumpomg in! Yes, I have a NLB in front of the auth server. I have not enabled Cross-Zone Load Balancing.

Switch on cross-zone load balancing and give this another try. This is a fairly common error which occurs when connections aren’t balanced in the way Teleport expects.

That worked, but I am getting another error now:
Proxy failed to establish connection to cluster: x509: certificate signed by unknown authority. service/connect.go:65

Connecting directly does not work either:
Sep 13 20:31:54 graylog teleport[20411]: DEBU [PROC:1] Service has started. service:auth.broadcast service/supervisor.go:242
Sep 13 20:31:54 graylog teleport[20411]: DEBU [PROC:1] Connected state: never updated. service/connect.go:99
Sep 13 20:31:54 graylog teleport[20411]: INFO [PROC] Connecting to the cluster graylog with TLS client certificate. service/connect.go:127
Sep 13 20:31:54 graylog teleport[20411]: DEBU [PROC] Attempting to connect to Auth Server directly. service/connect.go:795
Sep 13 20:31:54 graylog teleport[20411]: ERRO [PROC:1] Proxy failed to establish connection to cluster: x509: certificate signed by unknown authority. service/connect.go:65
Sep 13 20:31:54 graylog teleport[20411]: DEBU [PROC] Attempting to connect to Auth Server through tunnel. service/connect.go:803
Sep 13 20:31:54 graylog teleport[20411]: DEBU [CLIENT] HTTPS client init(proxyAddr=34.226.251.154:3025, insecure=false) client/weblogin.go:252
Sep 13 20:31:55 graylog teleport[20411]: ERRO [PROC:1] Node failed to establish connection to cluster: Get https://34.226.251.154:3025/v1/webapi/find: x509: cannot validate certificate for 34.226.251.154 because it doesn’t contain any IP SANs. service/connect.go:65
Sep 13 20:31:55 graylog teleport[20411]: DEBU [SQLITE] Initialized event ID iterator to 436 lite/periodic.go:143
Sep 13 20:31:55 graylog teleport[20411]: DEBU [BUFFER] Add Watcher(name=auth:1:cache, prefixes=/authorities, /cluster_configuration/general, /cluster_configuration/name, /cluster_configuration/static_tokens, /namespaces, /nodes/default, /proxies, /reverseTunnels, /roles,
lines 1-18/18 (END)

What’s under auth_service -> public_addr in the Teleport config file on your auth servers? Also, do you have anything set for advertise_ip in the config on the auth servers?

You may need to explicitly add teleport-nlb.mjh.group:3025 to the auth_server -> public_addr configured on each auth server and restart Teleport - this will cause it to regenerate the TLS certificates and add the hostname that you’re using to connect. Once this is done things should work as expected.

This is my configuration file on the auth server:
teleport:
nodename: ip-172-31-59-53-ec2-internal
advertise_ip: 34.226.251.154
log:
output: syslog
severity: INFO

data_dir: /var/lib/teleport
storage:
type: dir
path: /var/lib/teleport/backend

auth_service:
enabled: yes
listen_addr: 0.0.0.0:3025
public_addr: teleport-nlb.mjh.group:3025
authentication:
second_factor: otp

ssh_service:
enabled: yes
listen_addr: 0.0.0.0:3022

proxy_service:
enabled: yes
listen_addr: 0.0.0.0:3023
tunnel_listen_addr: 0.0.0.0:3080
web_listen_addr: 0.0.0.0:3080
public_addr: 34.226.251.154:3080x

The error I am getting after applyoing those changes is:
ERRO [PROC:1] Node failed to establish connection to cluster: Get https://teleport-nlb.mjh.group:3025/v1/webapi/find: remote error: tls: internal error. service/connect.go:65

@dschmidt Sorry for the delay. If you’re still having trouble, could you please run openssl s_client -connect teleport-nlb.mjh.group:3025 on the host and paste the output?

Certainly, here it is:

$ openssl s_client -connect teleport-nlb.mjh.group:3025 
CONNECTED(00000003)
depth=0 O = Admin, CN = ip-172-31-59-53-ec2-internal.ip-172-31-59-53-ec2-internal
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 O = Admin, CN = ip-172-31-59-53-ec2-internal.ip-172-31-59-53-ec2-internal
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
 0 s:/O=Admin/CN=ip-172-31-59-53-ec2-internal.ip-172-31-59-53-ec2-internal
   i:/O=ip-172-31-59-53-ec2-internal/CN=ip-172-31-59-53-ec2-internal/serialNumber=164510503386702552417114567311249527143
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIEHTCCAwWgAwIBAgIQEr7zKqeAnrr2iByj2bhLmTANBgkqhkiG9w0BAQsFADCB
gDElMCMGA1UEChMcaXAtMTcyLTMxLTU5LTUzLWVjMi1pbnRlcm5hbDElMCMGA1UE
AxMcaXAtMTcyLTMxLTU5LTUzLWVjMi1pbnRlcm5hbDEwMC4GA1UEBRMnMTY0NTEw
NTAzMzg2NzAyNTUyNDE3MTE0NTY3MzExMjQ5NTI3MTQzMB4XDTE5MDkyMDE3MzMy
NloXDTI5MDkxNzE3MzQyNlowVDEOMAwGA1UEChMFQWRtaW4xQjBABgNVBAMTOWlw
LTE3Mi0zMS01OS01My1lYzItaW50ZXJuYWwuaXAtMTcyLTMxLTU5LTUzLWVjMi1p
bnRlcm5hbDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANDzKXmrkz2c
LkbvWv9XTFQFv0VWPbQPIBsRmB0g+kS4xs87yTROgyYEcHo6Uyp/923rwJIx/Qp9
Rgs8FbbbFDdy783MIa7iDuQiIgkULSV76jX3/HI9zkjb6k9Yc7cR1r5L8/b1RYc5
El4NhaKk1Oa6DniDsMvoClZV01mV5cNClYxnuBUpDSO/UjrN/iPdAPANGyFfI84b
rU7crpzQnUUmYl0RZ5WJt70vOT/QbEtlRrGDZrmAJDbLYKE3fySgdZxJUa22myjX
x9dAZJrm9naKii3JDwtGEq5uwbOSG7NdywE8MtjWXy6NTCfsMCun/lflWxa4/MjW
UyprO3Dr4CsCAwEAAaOBvTCBujAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYI
KwYBBQUHAwEGCCsGAQUFBwMCMAwGA1UdEwEB/wQCMAAwewYDVR0RBHQwcoIcaXAt
MTcyLTMxLTU5LTUzLWVjMi1pbnRlcm5hbIIgdGVsZXBvcnQtbmxiLnN1aXRlcGxh
dGZvcm0uZ3JvdXCCGCoudGVsZXBvcnQuY2x1c3Rlci5sb2NhbIIWdGVsZXBvcnQu
Y2x1c3Rlci5sb2NhbDANBgkqhkiG9w0BAQsFAAOCAQEAgkXy+EfevdyAwHkuJvoJ
VwcT79/l014+jxr54oHLf2KQKnWM2avsEXIbFqxT42INKjjOpHzLPxDYhuY6GzbE
6kQk8dSxXz/xJTtRowTciV+rwJNtIVrFFDqz8T/wgfmvlD7SgppBpRD0fLapRG11
MEWEmHmQL5TLX2AH70HcTc67G/CUe8kzKlN8vRhlPFXIuzLAR4Fpl3ubeZgydUkT
XJ2VN/qB+vbXKmtYo/WE4pnfpP6Nt8VnqqHbUP6DFeao6VNhmwHu0o/cEV0L0bW9
6JWPbxkgr7tHGHDL2gmgsz1WpWo0p//LSM/5c5YPzfyQqm8EgTn4TuU6FR/uTV7A
oQ==
-----END CERTIFICATE-----
subject=/O=Admin/CN=ip-172-31-59-53-ec2-internal.ip-172-31-59-53-ec2-internal
issuer=/O=ip-172-31-59-53-ec2-internal/CN=ip-172-31-59-53-ec2-internal/serialNumber=164510503386702552417114567311249527143
---
Acceptable client certificate CA names
/O=ip-172-31-59-53-ec2-internal/CN=ip-172-31-59-53-ec2-internal/serialNumber=164510503386702552417114567311249527143
/O=ip-172-31-59-53-ec2-internal/CN=ip-172-31-59-53-ec2-internal/serialNumber=66315096447619284790383964950628989312
Client Certificate Types: RSA sign, ECDSA sign
Requested Signature Algorithms: RSA+SHA256:ECDSA+SHA256:RSA+SHA384:ECDSA+SHA384:RSA+SHA512:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA+SHA256:ECDSA+SHA256:RSA+SHA384:ECDSA+SHA384:RSA+SHA512:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Peer signing digest: SHA512
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 1959 bytes and written 427 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES128-GCM-SHA256
    Session-ID: 4BCEA52883130652705C82585318A41E321EDD55CB8A547035428C1390F3D93F
    Session-ID-ctx: 
    Master-Key: 8CD71425C1BF76A7E4DB3B12C938430CAB4D4E5724BDB8AAD13BB46ED41FF3A23DA57C683733203865D6C70045B6D956
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    TLS session ticket:
    0000 - a5 00 18 45 3a c0 9f e3-e0 d3 6e fc 7c fc c4 5e   ...E:.....n.|..^
    0010 - 32 6b 61 98 37 71 cd b7-29 4f e8 13 50 e9 c2 4b   2ka.7q..)O..P..K
    0020 - a0 b3 31 d8 9e dd 6f 71-21 a7 4b 9a 30 95 db 59   ..1...oq!.K.0..Y
    0030 - 76 d8 66 99 9b 7e 9d 89-7e 21 26 62 bf 87 28 aa   v.f..~..~!&b..(.
    0040 - 42 50 5c ff 9d 18 fe 8b-93 c0 cf 1b 5a ee 78 a4   BP\.........Z.x.
    0050 - 3b d2 4c 57 30 24 06 d9-7d bf 59 a6 10 de 2f 07   ;.LW0$..}.Y.../.
    0060 - e9 b2 1e 17 52 94 21 9a-d7 4b 21 c5 6b 84 3e 8a   ....R.!..K!.k.>.
    0070 - fa 0e 43 3a 4d 8b 6b 5e-                          ..C:M.k^

    Start Time: 1569445288
    Timeout   : 300 (sec)
    Verify return code: 21 (unable to verify the first certificate)
---

@dschmidt That looks fine to me. How many auth servers do you have - just the one?

It may be necessary to delete the contents of /var/lib/teleport to allow the auth server’s certificates to be regenerated completely and for new proxies/nodes to join the cluster.

Yes, just one auth server. I will try regenerating the certs.

1 Like

This helped somewhat, but I am still getting certificate errors when trying to join the cluster:
On the client:
Sep 30 19:11:20 graylog teleport[15859]: ERRO [PROC:1] Node failed to establish connection to cluster: Get https://teleport-nlb.suiteplatform.group:3025/v1/webapi/find: remote

On the server:

Sep 30 19:18:02 ip-172-31-59-53.ec2.internal teleport[5256]: 2019/09/30 19:18:02 http: TLS handshake error from 3.85.35.181:33640: remote error: tls: bad certificate

Your first error from the client is truncated a little early; what does the rest of the error say?

Sep 30 21:06:55 graylog teleport[15859]: ERRO [PROC:1] Proxy failed to establish connection to cluster: x509: certificate signed by unknown authority

Out of interested, what’s the output of teleport version?

Teleport v4.0.6 git:v4.0.6-0-g1b1ee727 go1.12.1 on the client
Teleport v4.0.4 git:v4.0.4-0-g1a2ed507 go1.12.1
on the server

When using a NLB instead, I am getting this error:
Proxy failed to establish connection to cluster: x509: certificate is valid for *.suiteplatform.group, not 677261796c6f67.teleport.cluster.local.

Did you try deleting the contents of /var/lib/teleport on the joining node after you changed the public_addr of the auth server, then trying to join the cluster again?

I’ve been experiencing issues myself related to old client CAs/certs being cached on nodes and causing the x509: certificate signed by unknown authority error.

I did, same result unfortunately.