Fail to add node to cluster

Hi Team, sorry about this really newbee question. I just fail to understand what the dependencies are for my current problem with Teleport OSS.

My aim is to create a single global Cluster with multiple Nodes from public and private end-points.

If there is any documentation that clearly discusses this - please guide me.

Setup a vanilla non-ssl 2-node connectivity between 2-nodes.

  1. Node-1 in AWS public network
  2. Node-2 in private network

I instantiated the teleport service on Node-1 and I am able to work with it all locally.
I am trying to add Node-2 to Node-1 cluster - and it’s failing miserably.

Steps I did:

  1. Create a node token on Node-1
tctl nodes add

Returns:

teleport start \
       --roles=node \
       --token=xxxxx24afebb36f67a770xxxxxxxxxx \
       --ca-pin=sha256:xxxxxxxxb6c91ce8a8312b186adaf0a9f206efce293e45403xxxxxxxxxxxxx \
       --auth-server=<Node-1>:3025 --insecure
  1. Run above command on Node-2
    Error:
ERRO [AUTH:1]    "Failed to retrieve client pool. Client cluster XXXX-01, target cluster YYYY-02, error:  \nERROR REPORT:\nOriginal Error: *trace.NotFoundError key /authorities/host/XXXX-01 is [..snip]
(*conn).serve\n\t/opt/go/src/runtime/asm_amd64.s:1358 runtime.goexit\nUser Message: key /authorities/host/XXXX-01 is not found\n." auth/middleware.go:170
ERRO [PROC:1]    "Node failed to establish connection to cluster: 404 page not found\n." time/sleep.go:149

Config File for Node-2

teleport:
  nodename: YYY-02
  data_dir: /var/lib/teleport
  auth_token: xxxxxxxx1da1c7b76f6e243957c3e18739c6d1c4a3xxxxxxxxx
  #  auth_servers:
  #- 127.0.0.1:3025
  log:
    output: stderr
    severity: INFO
  ca_pin: sha256:ca-pin-hash-goes-here
  #auth_service:
  #enabled: "yes"
  #listen_addr: 0.0.0.0:3025
  #tokens:
  #- proxy,node:5024e61da1c7b76f6e243957c3e18739c6d1c4a3f2ad88ed
  #license_file: /path/to/license-if-using-teleport-enterprise.pem
ssh_service:
  enabled: "yes"
  labels:
    db_role: master
    db_type: postgres
  commands:
  - name: hostname
    command: [/usr/bin/hostname]
    period: 1m0s
  - name: arch
    command: [/usr/bin/uname, -p]
    period: 1h0m0s
    #proxy_service:
    #enabled: "yes"
    #listen_addr: 0.0.0.0:3023
    #web_listen_addr: 0.0.0.0:3080
    #tunnel_listen_addr: 0.0.0.0:3024

It was as simple as deleting ‘/var/lib/teleport’ directory. If a previous instance was started, then it tries to use that data and fails.

This is resolved now.

Yes, this is a known issue if you regenerate the cert authority for your cluster (by reinstalling the auth server, for example)

We have a Github issue here tracking a fix for this - if it’s something you’d like to see implemented, please give it a +1 and comment.