Teleport Node Tunnelling Issue

Hi. We setup a HA Teleport cluster (v4.0.4) in AWS following your terraform guide. We have 2 proxies and 2 auth servers, are using LetEncrypt certs and have an internet-facing nlb for the proxies and internal nlb for the auth servers. Ports 443, 3023, 3024, 3080 are open on the proxy and Port 3025 on the auth server.

We have been able to add nodes to the cluster as long as they can reach the auth server on its internal IP, but when we try to do this with a server external to the cluster using node tunnelling we have issues.

The node tries to connect but on the proxy we see the error:

TLS handshake error from public_ip:55388: remote error: tls: bad certificate

and on the node itself we see:

Node failed to establish connection to cluster: ssh: handshake failed: no matching keys found. service/connect.go:65

The config for the node and the proxy are below. I assume we have something incorrectly set under the proxy_service config on the proxies, but could someone please advise?

Also to note, we are using Cloudflare for external dns which also provides a cert. Not sure if this could cause an issue?

Thanks
Mark.

Node config:

teleport:
auth_token: 1234567
ca_pin: sha256:1234567879
nodename: node
advertise_ip: public_ip
log:
output: syslog
severity: INFO
data_dir: /var/lib/teleport
storage:
type: dir
path: /var/lib/teleport/backend
auth_servers:
- external_url
auth_service:
enabled: no
ssh_service:
enabled: yes
listen_addr: 0.0.0.0:3022
proxy_service:
enabled: no

Proxy config:

teleport:
auth_token: /var/lib/teleport/token
ca_pin: sha256:1234567879
nodename: proxy01
advertise_ip: 10.0.0.1
log:
output: syslog
severity: INFO
data_dir: /var/lib/teleport
storage:
type: dir
path: /var/lib/teleport/backend
auth_servers:
- nlb01-01234567.elb.ap-southeast-1.amazonaws.com
auth_service:
enabled: no
ssh_service:
enabled: no
proxy_service:
enabled: yes
listen_addr: 0.0.0.0:3023
tunnel_listen_addr: 0.0.0.0:3024
web_listen_addr: 0.0.0.0:3080
public_addr: external_url:443
ssh_public_addr: external_url:3023
tunnel_public_addr: external_url:3024
https_cert_file: /var/lib/teleport/fullchain.pem
https_key_file: /var/lib/teleport/privkey.pem

Do you have cross-zone load balancing enabled?

Hi Gus. I checked again and we enabled if for the internet facing NLB, but it wasn’t enabled for the internal NLB. We enabled it, but still same issue. This is what we see on the proxy:

teleport[26383]: 2019/09/13 10:20:15 http: TLS handshake error from <public_ip>:44184: remote error: tls: bad certificate

Is your NLB listening on port 443 and forwarding traffic to the proxy servers on port 3080? Are the Letsencrypt certs configured on the proxy servers themselves or on the NLB?

When trying to join a node to a cluster using IoT/NAT tunnelling, the address configured for auth_servers on the node needs to be pointed towards the proxy server’s web interface - you should probably explicitly set this to proxy_address:443 if this isn’t being done already (it isn’t quite clear from the redacted config)

You may also want to delete the contents of /var/lib/teleport on the node and restart Teleport so it generates fresh certificates.