Adding a node to the cluster

Hi everyone,
I was trying to add a new node to the cluster, while the server is running teleport services, auth server is running on port 3025 and the server is listening to that port, I added a new node using the token and the CA-pin but still gives me an error:

(ERRO [PROC:1] “Node failed to establish connection to cluster: 404 page not found\n.” service/connect.go:65)

I tried changing the configuration of the auth server on another port, and still the same error occures.

Any help suggested!

Can you show a command you are using to add a new node? Thanks!

Thanks for your response. I used the following command as a root:

tctl nodes add

and the output was:

The invite token: 051aa8cfc5ebe8b660108fa14967fd40
This token will expire in 30 minutes

Run this on the new node to join the cluster:

> teleport start \
   --roles=node \
   --token=051aa8cfc5ebe8b660108fa14967fd40 \
   --ca-pin=sha256:45e6e512c42a370993aa6a28372f41027d970b037238d9b00fa3d1ae8d6daf28 \
   --auth-server=192.168.93.4:3025

Please note:

  - This invitation token will expire in 30 minutes
  - 192.168.93.4:3025 must be reachable from the new node

Of coarse the server was running:

[AUTH]    Auth service is starting on 0.0.0.0:3025.
[NODE]    Service is starting on 0.0.0.0:3022.
[PROXY]   Reverse tunnel service is starting on 0.0.0.0:3024.
[PROXY]   Web proxy service is starting on 0.0.0.0:3080.
[PROXY]   SSH proxy service is starting on 0.0.0.0:3023.

In the target node I entered the command:

teleport start --roles=node --token=051aa8cfc5ebe8b660108fa14967fd40 --ca-pin=sha256:45e6e512c42a370993aa6a28372f41027d970b037238d9b00fa3d1ae8d6daf28 --auth-server=192.168.93.4:3025

And got the error mentioned earlier in the thread:

(ERRO [PROC:1] “Node failed to establish connection to cluster: 404 page not found\n.” service/connect.go:65)

Thanks for your concern.

I have the same error
If I check curl -k https://localhost:3025 I get page not found.
teleport start:

INFO [AUTH]      Updating cluster configuration: StaticTokens([ProvisionToken(Token=cluster-join-token,Roles=Proxy,Node,Expires=never) ProvisionToken(Token=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,Roles=trusted_cluster,Expires=never)]). auth/init.go:243
INFO [AUTH]      Updating cluster configuration: AuthPreference(Type="local",SecondFactor="otp"). auth/init.go:252
INFO [AUTH]      Created namespace: "default". auth/init.go:259
INFO [AUTH]      Auth server is running periodic operations. auth/init.go:398
INFO [PROC:1]    Service auth is creating new listener on 0.0.0.0:3025. service/signals.go:218
INFO [AUTH:1]    Starting Auth service with PROXY protocol support. service/service.go:977
INFO [AUTH]      Auth service is starting on 0.0.0.0:3025. utils/cli.go:147
[AUTH]    Auth service is starting on 0.0.0.0:3025.
INFO [PROC]      Connecting to the cluster gauss with TLS client certificate. service/connect.go:59
INFO [PROC]      Connecting to the cluster gauss with TLS client certificate. service/connect.go:59
INFO [PROC:1]    Service proxy:web is creating new listener on 0.0.0.0:3080. service/signals.go:218
INFO [PROC:1]    Service proxy:tunnel is creating new listener on 0.0.0.0:3024. service/signals.go:218
INFO [PROC:1]    Service node is creating new listener on 0.0.0.0:3022. service/signals.go:218
INFO [NODE:1]    Service is starting on 0.0.0.0:3022 cache that will expire after connection to database is lost after 20h0m0s, will cache frequently accessed items for 2s. service/service.go:1261
INFO [NODE]      Service is starting on 0.0.0.0:3022. utils/cli.go:147
[NODE]    Service is starting on 0.0.0.0:3022.
INFO [PROXY]     Reverse tunnel service is starting on 0.0.0.0:3024. utils/cli.go:147
[PROXY]   Reverse tunnel service is starting on 0.0.0.0:3024.
INFO [PROXY:SER] Starting on 0.0.0.0:3024 using cache that will expire after connection to database is lost after 20h0m0s, will cache frequently accessed items for 2s service/service.go:1726
INFO [PROXY:SER] Using TLS cert /etc/letsencrypt/live/gauss.DOMAIN/fullchain.pem, key /etc/letsencrypt/live/gauss.DOMAIN/privkey.pem service/service.go:1777
INFO [PROC:1]    Service proxy:ssh is creating new listener on 0.0.0.0:3023. service/signals.go:218
INFO [AUDIT:1]   Creating directory /var/lib/teleport/log. service/service.go:1342
INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload. service/service.go:1342
INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload/sessions. service/service.go:1342
INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload/sessions/default. service/service.go:1342
INFO [PROXY]     Web proxy service is starting on 0.0.0.0:3080. utils/cli.go:147
[PROXY]   Web proxy service is starting on 0.0.0.0:3080.
INFO [PROXY:SER] Web proxy service is starting on 0.0.0.0:3080. service/service.go:1789
INFO [PROXY]     SSH proxy service is starting on 0.0.0.0:3023. utils/cli.go:147
[PROXY]   SSH proxy service is starting on 0.0.0.0:3023.
INFO [PROXY:SER] SSH proxy service is starting on 0.0.0.0:3023 service/service.go:1830
INFO [PROC:1]    The new service has started successfully. Starting syncing rotation status with period 10s. service/connect.go:341
INFO [PROXY:AGE] Starting reverse tunnel agent pool. service/service.go:1841
2019/07/06 20:10:52 http: TLS handshake error from XXX.XXX.XXX.XXX:54658: remote error: tls: bad certificate

Could you both (@Otosamaa @Wikunia) please post:

  1. The output of teleport version
  2. The contents of your Teleport config file (/etc/teleport.yaml by default), redacting any secrets or tokens

Thank you

I have the same problem.
teleport version: 4.0.2
I sure the config file(/etc/teleport.yaml ) is fine. Can you help me?

I have same problem as well.
Tested on all 4.x versions (4.0.0, 4.0.1, 4.0.2 and 4.0.3)
Unable to add node to the cluster.

Partial logs bellow.

On server side:
WARN [PROXY:1:C] Re-init the cache on error: all SubConns are in TransientFailure, latest connection error: . logrus/entry.go:188
DEBU [PROXY:1:C] Reloading Linear(attempt=38, duration=10s). cache/cache.go:346
WARN [PROXY:1] Re-init the watcher on error: all SubConns are in TransientFailure, latest connection error: . services/proxywatcher.go:180
DEBU [PROXY:1] Reloading Linear(attempt=38, duration=10s). services/proxywatcher.go:184
WARN [REVERSE:D] Re-init the cache on error: all SubConns are in TransientFailure, latest connection error: . logrus/entry.go:188
DEBU [REVERSE:D] Reloading Linear(attempt=38, duration=10s). cache/cache.go:346
WARN [PROXY:1:C] Re-init the cache on error: all SubConns are in TransientFailure, latest connection error: . logrus/entry.go:188
DEBU [PROXY:1:C] Reloading Linear(attempt=39, duration=10s). cache/cache.go:346
WARN [PROXY:1] Re-init the watcher on error: all SubConns are in TransientFailure, latest connection error: . services/proxywatcher.go:180
DEBU [PROXY:1] Reloading Linear(attempt=39, duration=10s). services/proxywatcher.go:184
WARN [REVERSE:D] Re-init the cache on error: all SubConns are in TransientFailure, latest connection error: . logrus/entry.go:188
DEBU [REVERSE:D] Reloading Linear(attempt=39, duration=10s). cache/cache.go:346
WARN [PROXY:1:C] Re-init the cache on error: all SubConns are in TransientFailure, latest connection error: . logrus/entry.go:188
DEBU [PROXY:1:C] Reloading Linear(attempt=40, duration=10s). cache/cache.go:346
WARN [PROXY:1] Re-init the watcher on error: all SubConns are in TransientFailure, latest connection error: . services/proxywatcher.go:180
DEBU [PROXY:1] Reloading Linear(attempt=40, duration=10s). services/proxywatcher.go:184
WARN [REVERSE:D] Re-init the cache on error: all SubConns are in TransientFailure, latest connection error: . logrus/entry.go:188
DEBU [REVERSE:D] Reloading Linear(attempt=40, duration=10s). cache/cache.go:346

On the client side:
WARN [NODE:BEAT] Heartbeat failed transport is closing. logrus/entry.go:188
WARN [PROC:1] Sync rotation state cycle failed: watcher has disconnected, going to retry after 10s. logrus/entry.go:188
WARN [NODE:BEAT] Heartbeat failed all SubConns are in TransientFailure, latest connection error: . logrus/entry.go:188
WARN [NODE:BEAT] Heartbeat failed transport is closing. logrus/entry.go:188
WARN [PROC:1] Sync rotation state cycle failed: watcher has disconnected, going to retry after 10s. logrus/entry.go:188

Looking forward for any suggestion.
Regards,

P.

Anyone who has this problem - please follow the instructions in my previous post for further assistance.

Version:
Teleport v4.0.0 git:v4.0.0-0-gc7f55ac3 go1.12.1

teleport.yaml:
#
# Sample Teleport configuration file.
#
teleport:
nodename: proxy
data_dir: /var/lib/teleport
pid_file: /var/run/teleport.pid
auth_token: cluster-join-token
auth_servers:
- 0.0.0.0:3025
connection_limits:
max_connections: 15000
max_users: 250
log:
# output: stdout
# severity: info
ca_pin: “”
auth_service:
# IMPORTANT: this line enables the proxy recording mode:
session_recording: “proxy” # can also be “off” and “on” (default)

# For better security it’s recommended to enable host checking as well,
# this is when the Teleport proxy will verify the identity of the
# nodes. Teleport documentation covers how to issue host certificates,
# but for simplicity of this tutorial we are disabling strict host
# checking here
proxy_checks_host_keys: no

# turn 2FA off to make the tutorial easier to follow
authentication:
type: local
second_factor: off

listen_addr: 0.0.0.0:3025
enabled: “yes”
#listen_addr: 0.0.0.0:3025

tokens:
- proxy,node:cluster-join-token
session_recording: “”
client_idle_timeout: 0s
disconnect_expired_cert: false
keep_alive_count_max: 0

ssh_service:
enabled: “yes”
labels:
db_role: master
db_type: postgres
commands:
- name: hostname
command: [/usr/bin/hostname]
period: 1m0s
- name: arch
command: [/usr/bin/uname, -p]
period: 1h0m0s
proxy_service:
enabled: “yes”
listen_addr: 0.0.0.0:3023
web_listen_addr: 0.0.0.0:3080
tunnel_listen_addr: 0.0.0.0:3024
https_key_file: /var/lib/teleport/webproxy_key.pem
https_cert_file: /var/lib/teleport/webproxy_cert.pem

Thanks for your concern.

This is incorrect. If you provide this value then it needs to be the actual address of the auth server.

Try changing it to 127.0.0.1:3025 or, given that you’re running all the services on one node just remove the auth_servers section entirely - the proxy will know that it can communicate locally with the auth server.

1 Like

Here you go:

  • The output of teleport version - Teleport v4.0.3 git:v4.0.3-0-g69c4868b go1.12.1
  • The contents of Teleport config file (sensitive data replaced with strings surrounded with <>)
    # By default, this file should be stored in /etc/teleport.yaml
    
    # This section of the configuration file applies to all teleport
    # services.
    teleport:
        # nodename allows to assign an alternative name this node can be reached by.
        # by default it's equal to hostname
        nodename: <nodename>
    
        # Data directory where Teleport daemon keeps its data.
        # See "Filesystem Layout" section above for more details.
        data_dir: /var/lib/teleport
    
        # Invitation token used to join a cluster. it is not used on
        # subsequent starts
        auth_token: xxxx-token-xxxx
    
        # Optional CA pin of the auth server. This enables more secure way of adding new
        # nodes to a cluster. See "Adding Nodes" section above.
        ca_pin: "sha256:7e12c17c20d9cb504bbcb3f0236be3f446861f1396dcbb44425fe28ec1c108f1"
    
        # When running in multi-homed or NATed environments Teleport nodes need
        # to know which IP it will be reachable at by other nodes
        #
        # This value can be specified as FQDN e.g. host.example.com
        advertise_ip: <FQDN of main teleport server>
    
        # list of auth servers in a cluster. you will have more than one auth server
        # if you configure teleport auth to run in HA configuration.
        # If adding a node located behind NAT, use the Proxy URL. e.g. 
        #  auth_servers:
        #     - teleport-proxy.example.com:3080
        auth_servers:
            - 0.0.0.0:3025
    
        # Teleport throttles all connections to avoid abuse. These settings allow
        # you to adjust the default limits
        connection_limits:
            max_connections: 1000
            max_users: 250
    
        # Logging configuration. Possible output values are 'stdout', 'stderr' and
        # 'syslog'. Possible severity values are INFO, WARN and ERROR (default).
        log:
            output: stderr
            severity: ERROR
    
        # Configuration for the storage back-end used for the cluster state and the
        # audit log. Several back-end types are supported. See "High Availability"
        # section of this Admin Manual below to learn how to configure DynamoDB, 
        # S3, etcd and other highly available back-ends.
        storage:
            # By default teleport uses the `data_dir` directory on a local filesystem
            type: dir
    
            # Array of locations where the audit log events will be stored. by
            # default they are stored in `/var/lib/teleport/log`
    #        audit_events_uri: ['file:///var/lib/teleport/log', 'dynamodb://events_table_name']
    
            # Use this setting to configure teleport to store the recorded sessions in
            # an AWS S3 bucket. see "Using Amazon S3" chapter for more information.
    #        audit_sessions_uri: 's3://example.com/path/to/bucket?region=us-east-1'
    
        # Cipher algorithms that the server supports. This section only needs to be
        # set if you want to override the defaults.
        ciphers:
          - aes128-ctr
          - aes192-ctr
          - aes256-ctr
          - aes128-gcm@openssh.com
          - chacha20-poly1305@openssh.com
    
        # Key exchange algorithms that the server supports. This section only needs
        # to be set if you want to override the defaults.
        kex_algos:
          - curve25519-sha256@libssh.org
          - ecdh-sha2-nistp256
          - ecdh-sha2-nistp384
          - ecdh-sha2-nistp521
    
        # Message authentication code (MAC) algorithms that the server supports.
        # This section only needs to be set if you want to override the defaults.
        mac_algos:
          - hmac-sha2-256-etm@openssh.com
          - hmac-sha2-256
    
        # List of the supported ciphersuites. If this section is not specified,
        # only the default ciphersuites are enabled.
        ciphersuites:
           - tls-rsa-with-aes-128-gcm-sha256
           - tls-rsa-with-aes-256-gcm-sha384
           - tls-ecdhe-rsa-with-aes-128-gcm-sha256
           - tls-ecdhe-ecdsa-with-aes-128-gcm-sha256
           - tls-ecdhe-rsa-with-aes-256-gcm-sha384
           - tls-ecdhe-ecdsa-with-aes-256-gcm-sha384
           - tls-ecdhe-rsa-with-chacha20-poly1305
           - tls-ecdhe-ecdsa-with-chacha20-poly1305
    
    
    # This section configures the 'auth service':
    auth_service:
        # Turns 'auth' role on. Default is 'yes'
        enabled: yes
    
        # A cluster name is used as part of a signature in certificates
        # generated by this CA.
        #
        # We strongly recommend to explicitly set it to something meaningful as it
        # becomes important when configuring trust between multiple clusters.
        #
        # By default an automatically generated name is used (not recommended)
        #
        # IMPORTANT: if you change cluster_name, it will invalidate all generated
        # certificates and keys (may need to wipe out /var/lib/teleport directory)
        cluster_name: <cluster name>
    
        authentication:
            # default authentication type. possible values are 'local', 'oidc' and 'saml'
            # only local authentication (Teleport's own user DB) is supported in the open
            # source version
            type: local
            # second_factor can be off, otp, or u2f
            second_factor: otp
            # this section is used if second_factor is set to 'u2f'
            u2f:
                # app_id must point to the URL of the Teleport Web UI (proxy) accessible
                # by the end users
                app_id: https://<FQDN of main teleport server>:3080
                # facets must list all proxy servers if there are more than one deployed
                facets:
                - https://<FQDN of main teleport server>:3080
    
        # IP and the port to bind to. Other Teleport nodes will be connecting to
        # this port (AKA "Auth API" or "Cluster API") to validate client
        # certificates
        listen_addr: 0.0.0.0:3025
    
        # The optional DNS name the auth server if located behind a load balancer.
        # (see public_addr section below)
        public_addr: <FQDN of main teleport server>:3025
    
        # Pre-defined tokens for adding new nodes to a cluster. Each token specifies
        # the role a new node will be allowed to assume. The more secure way to
        # add nodes is to use `ttl node add --ttl` command to generate auto-expiring
        # tokens.
        #
        # We recommend to use tools like `pwgen` to generate sufficiently random
        # tokens of 32+ byte length.
        tokens:
            - "proxy,node:xxxxx"
            - "auth:yyyy"
            - trusted_cluster:<token>
    
        # Optional setting for configuring session recording. Possible values are:
        #    "node"  : sessions will be recorded on the node level  (the default)
        #    "proxy" : recording on the proxy level, see "recording proxy mode" section.
        #    "off"   : session recording is turned off
        session_recording: "node"
    
        # This setting determines if a Teleport proxy performs strict host key checks.
        # Only applicable if session_recording=proxy, see "recording proxy mode" for details.
        proxy_checks_host_keys: yes
    
        # Determines if SSH sessions to cluster nodes are forcefully terminated
        # after no activity from a client (idle client).
        # Examples: "30m", "1h" or "1h30m"
        client_idle_timeout: never
    
        # Determines if the clients will be forcefully disconnected when their
        # certificates expire in the middle of an active SSH session. (default is 'no')
        disconnect_expired_cert: no
    
        # License file to start auth server with. Note that this setting is ignored
        # in open-source Teleport and is required only for Teleport Pro, Business
        # and Enterprise subscription plans.
        #
        # The path can be either absolute or relative to the configured `data_dir`
        # and should point to the license file obtained from Teleport Download Portal.
        #
        # If not set, by default Teleport will look for the `license.pem` file in
        # the configured `data_dir`.
        license_file: /var/lib/teleport/license.pem
    
    #    # DEPRECATED in Teleport 3.2 (moved to proxy_service section)
    #    kubeconfig_file: /path/to/kubeconfig
    
    # This section configures the 'node service':
    ssh_service:
        # Turns 'ssh' role on. Default is 'yes'
        enabled: yes
    
        # IP and the port for SSH service to bind to.
        listen_addr: 0.0.0.0:3022
    
        # The optional public address the SSH service. This is useful if administrators
        # want to allow users to connect to nodes directly, bypassing a Teleport proxy
        # (see public_addr section below)
        public_addr: <FQDN of main teleport server>:3022
    
        # See explanation of labels in "Labeling Nodes" section below
        labels:
            role: master
            type: postgres
    
        # List of the commands to periodically execute. Their output will be used as node labels.
        # See "Labeling Nodes" section below for more information and more examples.
        commands:
        # this command will add a label 'arch=x86_64' to a node
        - name: arch
          command: ['/bin/uname', '-p']
          period: 1h0m0s
    
        # enables reading ~/.tsh/environment before creating a session. by default
        # set to false, can be set true here or as a command line flag.
        permit_user_env: false
    
        # configures PAM integration. see below for more details.
        pam:
            enabled: no
            service_name: teleport
    
    # This section configures the 'proxy service'
    proxy_service:
        # Turns 'proxy' role on. Default is 'yes'
        enabled: yes
    
        # SSH forwarding/proxy address. Command line (CLI) clients always begin their
        # SSH sessions by connecting to this port
        listen_addr: 0.0.0.0:3023
    
        # Reverse tunnel listening address. An auth server (CA) can establish an
        # outbound (from behind the firewall) connection to this address.
        # This will allow users of the outside CA to connect to behind-the-firewall
        # nodes.
        tunnel_listen_addr: 0.0.0.0:3024
    
        # The HTTPS listen address to serve the Web UI and also to authenticate the
        # command line (CLI) users via password+HOTP
        web_listen_addr: 0.0.0.0:3080
    
        # The DNS name the proxy HTTPS endpoint as accessible by cluster users.
        # Defaults to the proxy's hostname if not specified. If running multiple
        # proxies behind a load balancer, this name must point to the load balancer
        # (see public_addr section below)
        public_addr: <FQDN of main teleport server>:3080
    
        # The DNS name of the proxy SSH endpoint as accessible by cluster clients.
        # Defaults to the proxy's hostname if not specified. If running multiple proxies 
        # behind a load balancer, this name must point to the load balancer. 
        # Use a TCP load balancer because this port uses SSH protocol.
        ssh_public_addr: <FQDN of main teleport server>:3023
    
        # TLS certificate for the HTTPS connection. Configuring these properly is
        # critical for Teleport security.
    #    https_key_file: /var/lib/teleport/webproxy_key.pem
    #    https_cert_file: /var/lib/teleport/webproxy_cert.pem
        https_key_file: /etc/teleport/<pivate>.key
        https_cert_file: /etc/teleport/<private>.crt
    
        # This section configures the Kubernetes proxy service
        kubernetes:
            # Turns 'kubernetes' proxy on. Default is 'no'
            enabled: no
    
            # Kubernetes proxy listen address.
            listen_addr: 0.0.0.0:3026
    
            # The DNS name of the Kubernetes proxy server that is accessible by cluster clients.
            # If running multiple proxies behind  a load balancer, this name must point to the 
            # load balancer.
            public_addr: ['<FQDN of main teleport server>:3026']
    
            # This setting is not required if the Teleport proxy service is 
            # deployed inside a Kubernetes cluster. Otherwise, Teleport proxy 
            # will use the credentials from this file:
            kubeconfig_file: /path/to/kube/config
    

Thank you So Much @gus your answer was directly to the point. but I would like to mention that this is the default teleport.yaml generated. I hope this bug could be fixed in the generator.

1 Like

@Otosamaa Interesting - you’re absolutely right. I’ll look at whether we can get that changed in future. Glad you’ve got it fixed for now.

@gus How to mark this answer as the solution?!

@Otosamaa I don’t think we have that plugin installed!

Edit: figured it out and marked that answer as the solution :slight_smile:

Edit 2: I’ve raised https://github.com/gravitational/teleport/issues/2891 to address this issue.

1 Like

@piotrjaksa I think you have the same problem - you need to set your auth_servers to an actual hostname of the server rather than the bind address (0.0.0.0).

You can either delete the auth_servers entry entirely if you’re running the auth server on the same host, or set it to 127.0.0.1:3025.

@sakura See the accepted answer in this thread - it may well solve your problem.

Thanks so much for your effort, and thanks for the ticket raised. The system is running greatly on staging environment and should be moved to production soon. :smiley::smiley:

1 Like

Thanks for your suggestion. Unfortunately no difference with this fix. Still same issues.

@piotrjaksa

  1. Did you delete the auth_servers entry completely or did you change it? If you changed it, what did you change it to?

  2. Have you restarted Teleport since making this change?

  3. Can you please try stopping Teleport, deleting the contents of /var/lib/teleport, then restarting Teleport and adding a node? This will wipe out your cluster database completely but I’m presuming this is a staging/test cluster so it shouldn’t matter.