Node fails to join "certificate signed by unknown authority"

I have one AWS EC2 node running Ubuntu Server 18.04 LTS (HVM) which runs all three Teleport services. I have another almost identical EC2 instance which I am trying to join to the first to make a cluster, and I keep getting the error:

ERRO [PROC:1] Node failed to establish connection to cluster: Get https://m1.dnets.com:3025/v1/webapi/find: x509: certificate signed by unknown authority. time/sleep.go:149

Both machines are Teleport v4.2.9 git:v4.2.9-0-ga4bd6c36 go1.13.2

Both are started using systemd, here is the teleport.service file from the proxy-auth-node machine, which is called m1.dnets.com:

[Unit]
Description=Teleport SSH Service
After=network.target

[Service]
Type=simple
Restart=on-failure
ExecStart=/usr/local/bin/teleport start --config=/etc/teleport.yaml --pid-fil$
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/run/teleport.pid

[Install]
WantedBy=multi-user.target

The file from the node I am trying to add is the same except for the argument specifying the teleport.yaml file - that is not there.

Here is the teleport.yaml file from the server machine:

m1.dnets.com Teleport configuration file.

teleport:
nodename: m1.dnets.com
data_dir: /var/lib/teleport
pid_file: /var/run/teleport.pid
auth_token: cluster-join-token
auth_servers:

  • 0.0.0.0:3025
    connection_limits:
    max_connections: 15000
    max_users: 250
    log:
    output: stderr
    severity: INFO
    ca_pin: “”
    auth_service:
    enabled: “yes”
    listen_addr: 0.0.0.0:3025
    tokens:
  • proxy,node:cluster-join-token
    session_recording: “”
    client_idle_timeout: 0s
    disconnect_expired_cert: false
    keep_alive_count_max: 0
    public_addr: m1.dnets.com:3025
    ssh_service:
    enabled: “yes”
    labels:
    db_role: master
    db_type: postgres
    commands:
  • name: hostname
    command: [/usr/bin/hostname]
    period: 1m0s
  • name: arch
    command: [/usr/bin/uname, -p]
    period: 1h0m0s
    public_addr: m1.dnets.com:3022
    proxy_service:
    enabled: “yes”
    listen_addr: 0.0.0.0:3023
    web_listen_addr: 0.0.0.0:3080
    tunnel_listen_addr: 0.0.0.0:3024
    https_key_file: /etc/letsencrypt/live/m1.dnets.com/privkey.pem
    https_cert_file: /etc/letsencrypt/live/m1.dnets.com/fullchain.pem

As you can see, there are LetsEncrypt certificates installed on the proxy-auth-node machine. From a web browser, https://m1.dnets.com is served up cleanly.

I ran this command on the server:

sudo tctl nodes add --ttl=305m --roles=node --token=secret-value

And then this on the node I am trying to add:

teleport start --roles=node --token=secret-value --ca-pin=sha256:e24812f94fdbe440df5b5f9117bb77ee7bd619b6c80439cf9f2c23183e91e0ce --auth-server=m1.dnets.com:3025

I had never tried to add this node to this cluster before, but just in case I stopped the Teleport service on the node, deleted /var/lib/teleport, restarted the Teleport service on the node, and tried to add it again to see if it would make a difference, but it did not.

The command

sudo journalctl -fu teleport

gives me this:

Jun 02 17:40:41 m1.dnets.com teleport[3829]: ERRO [NODE] fork/exec /usr/bin/hostname: no such file or directory regular/sshserver.go:695

Here is the contents of /etc/hostname:

m1.dnets.com

And /etc/hosts:

18.224.81.251 m1.dnets.com
172.31.35.11 m1.dnets.com
3.128.7.1 m2.dnets.com
172.31.34.68 m2.dnets.com

127.0.0.1 localhost

The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

(m2.dnets.com is the node I am trying to add)

Adding a user (teleport) works fine on the server.

I have created sets of test machines many times and always am stopped by not being able to add a node. I believe I have followed the instructions in the Admin Guide on this but I’m clearly missing something simple.

Thanks

Is /etc/teleport.yaml present on the node machine? You can’t both start Teleport using systemd and the teleport start command - it has to be one or the other. There are two ways to fix this:

  1. Try stopping teleport on the node with systemctl stop teleport, deleting the contents of /var/lib/teleport (to clear out any cached data) and then running the teleport start command that you pasted - you should find that it works.

If you start the service via systemd again, it will write data to /var/lib/teleport which will prevent teleport start from working.

  1. The other alternative for a more permanent setup is to modify the /etc/teleport.yaml file on your node to use the same credentials as teleport start - this will then enable systemctl start teleport to work as you expected.
teleport:
  nodename: name-you-want-your-node-to-use
  auth_token: secret-value
  auth_servers:
  - m1.dnets.com:3025
  ca_pin: "sha256:e24812f94fdbe440df5b5f9117bb77ee7bd619b6c80439cf9f2c23183e91e0ce"
auth_service:
  enabled: no
ssh_service:
  enabled: yes
proxy_service:
  enabled: no

The error with ERRO [NODE] fork/exec /usr/bin/hostname: no such file or directory regular/sshserver.go:695 is just saying that the command you’re executing to set one of the labels in your auth server config file isn’t running correctly because /usr/bin/hostname doesn’t exist. It’s unrelated, but you can likely fix it by changing the path in the config file to use /bin/hostname instead.

Thank you Gus this has been very helpful, got me over a place I had been stuck for a while.

1 Like

You’re welcome, glad to hear it! Sorry it took so long for the initial reply.

If you have any other issues, please feel free to make a new topic and we’ll try to assist.