Installer hanging at pre-flight checks

We’re seeing the gravity install hang during the pre-flight checks phase and are having trouble determining the cause. The gravity-system logs don’t appear to have anything super interesting, but comparing to a successful install it looks like after “Monitoring peers” is logged there is no PeerJoin event.

2019-08-09T20:41:32Z DEBU [OPS]       create operation_install domain:logrocket opid:fd50cdc9-0163-46f7-9b17-72f61ed4024e install/hook.go:56
2019-08-09T20:41:32Z DEBU [OPS]       selected subnets: &{10.244.0.0/16 10.100.0.0/16} domain:logrocket opid:fd50cdc9-0163-46f7-9b17-72f61ed4024e install/hook.go:56
2019-08-09T20:41:32Z INFO [OPS]       Operation has been created domain:logrocket opid:fd50cdc9-0163-46f7-9b17-72f61ed4024e opstate:in_progress optype:operation_install progress:0 install/hook.go:56
2019-08-09T20:41:32Z DEBU             got operation: &ops.SiteOperationKey{AccountID:"00000000-0000-0000-0000-000000000001", SiteDomain:"logrocket", OperationID:"fd50cdc9-0163-46f7-9b17-72f61ed4024e"} install/hook.go:56
2019-08-09T20:41:32Z DEBU [INSTALLER] Got operation key: {00000000-0000-0000-0000-000000000001 logrocket fd50cdc9-0163-46f7-9b17-72f61ed4024e}. install/hook.go:56
2019-08-09T20:41:32Z DEBU [INSTALLER] Got operation state: &storage.InstallExpandOperationState{Profiles:map[string]storage.ServerProfile{"single-node":storage.ServerProfile{Description:"Generic Linux node", Labels:map[string]string{"logrocket.io/worker-node":"true", "role":"single-node", "gravitational.io/k8s-role":"", "logrocket.io/data-node":"true"}, ServiceRole:"", Request:storage.ServerProfileRequest{InstanceType:"", Count:1}}}, Servers:storage.Servers(nil), Agents:map[string]storage.AgentProfile{"single-node":storage.AgentProfile{Instructions:"\ncurl -s --tlsv1.2 -0 -k \"https://10.97.52.10:61009/t/.../single-node\" | sudo bash\n", AgentURL:"agent://10.97.52.10:61009/single-node?access_token=...", Token:"..."}}, Subnets:storage.Subnets{Overlay:"10.244.0.0/16", Service:"10.100.0.0/16"}, Vars:storage.OperationVariables{System:storage.SystemVariables{ClusterName:"logrocket", OpsURL:"https://10.97.52.10:61009/t", Devmode:false, Token:"...", TeleportProxyAddress:"", Docker:storage.DockerConfig{StorageDriver:"", Args:[]string(nil)}}, OnPrem:storage.OnPremVariables{PodCIDR:"10.244.0.0/16", ServiceCIDR:"10.100.0.0/16", VxlanPort:8472}, AWS:storage.AWSVariables{AMI:"", Region:"", AccessKey:"", SecretKey:"", SessionToken:"", VPCID:"", VPCCIDR:"", SubnetID:"", SubnetCIDR:"", InternetGatewayID:"", KeyPair:""}}, Package:loc.Locator{Repository:"gravitational.io", Name:"logrocket", Version:"9.663.1"}}. install/hook.go:56
2019-08-09T20:41:32Z DEBU [INSTALLER] Starting agent: agent://10.97.52.10:61009/single-node?access_token=.... install/hook.go:56
2019-08-09T20:41:32Z INFO [PEERS]     Monitoring. monitored:10.97.52.10:61009 install/hook.go:56
2019-08-09T20:41:32Z INFO [INSTALLER] Listening. addr:10.97.52.10:3012 install/hook.go:56
2019-08-09T20:41:32Z INFO [PEERS]     Reconnecting reconnected:peer(addr=10.97.52.10:3012->server=10.97.52.10:61009) install/hook.go:56
2019-08-09T20:41:32Z INFO [PEERS]     Monitoring peers. health.checker:peer(addr=10.97.52.10:61009) install/hook.go:56
2019-08-09T20:41:32Z DEBU [PROCESS]   Added group. key:{00000000-0000-0000-0000-000000000001 logrocket fd50cdc9-0163-46f7-9b17-72f61ed4024e} mode:installer process:10.97.52.10_61009 install/hook.go:56
2019-08-09T20:41:32Z INFO [PEERS]     Monitoring peers. health.checker: install/hook.go:56
2019-08-09T20:42:26Z DEBU [KEYGEN]    generated user key for [root] with expiry on (1565419346) 2019-08-10 06:42:26.039735689 +0000 UTC m=+36064.348023126 install/hook.go:56
2019-08-09T20:42:26Z INFO [CA]        Generating TLS certificate {0x5b90660 0xc0010aa470 CN=opscenter@gravitational.io,O=@teleadmin+O=default-implicit-role,L=root 2019-08-10 06:42:26.044406707 +0000 UTC []}. common_name:opscenter@gravitational.io dns_names:[] locality:[root] not_after:2019-08-10 06:42:26.044406707 +0000 UTC org:[@teleadmin default-implicit-role] org_unit:[] install/hook.go:56
2019-08-09T20:42:26Z DEBU [TELEPROXY] Renewed certificate for opscenter@gravitational.io. install/hook.go:56
2019-08-09T20:43:26Z DEBU [KEYGEN]    generated user key for [root] with expiry on (1565419406) 2019-08-10 06:43:26.038068574 +0000 UTC m=+36124.346356937 install/hook.go:56
2019-08-09T20:43:26Z INFO [CA]        Generating TLS certificate {0x5b90660 0xc000b25720 CN=opscenter@gravitational.io,O=@teleadmin+O=default-implicit-role,L=root 2019-08-10 06:43:26.042080813 +0000 UTC []}. common_name:opscenter@gravitational.io dns_names:[] locality:[root] not_after:2019-08-10 06:43:26.042080813 +0000 UTC org:[@teleadmin default-implicit-role] org_unit:[] install/hook.go:56
2019-08-09T20:43:26Z DEBU [TELEPROXY] Renewed certificate for opscenter@gravitational.io. install/hook.go:56
2019-08-09T20:43:30Z INFO [INSTALLER] Received "interrupt" signal. install/hook.go:56
2019-08-09T20:43:30Z INFO [PEERS]     Health checker loop closing. health.checker: install/hook.go:56

I canceled this instance of the install after 2 minutes, but previously we let it hang around for about 10 minutes with no progress. In the successful install logs (on another server) the Monitoring/Peer phase looks like this:

2019-06-04T09:29:07-04:00 DEBU [OPS]       create operation_install domain:logrocket opid:360acde5-7d55-4f65-85e4-38bb8125e36c install/hook.go:56
2019-06-04T09:29:07-04:00 DEBU [OPS]       selected subnets: &{10.244.0.0/16 10.100.0.0/16} domain:logrocket opid:360acde5-7d55-4f65-85e4-38bb8125e36c install/hook.go:56
2019-06-04T09:29:07-04:00 INFO [OPS]       Operation has been created domain:logrocket opid:360acde5-7d55-4f65-85e4-38bb8125e36c opstate:in_progress optype:operation_install progress:0 install/hook.go:56
2019-06-04T09:29:07-04:00 DEBU             got operation: &ops.SiteOperationKey{AccountID:"00000000-0000-0000-0000-000000000001", SiteDomain:"logrocket", OperationID:"360acde5-7d55-4f65-85e4-38bb8125e36c"} install/hook.go:56
2019-06-04T09:29:07-04:00 DEBU [INSTALLER] Got operation key: {00000000-0000-0000-0000-000000000001 logrocket 360acde5-7d55-4f65-85e4-38bb8125e36c}. install/hook.go:56
2019-06-04T09:29:07-04:00 DEBU [INSTALLER] Got operation state: &storage.InstallExpandOperationState{Profiles:map[string]storage.ServerProfile{"single-node":storage.ServerProfile{Description:"Generic Linux node", Labels:map[string]string{"gravitational.io/k8s-role":"", "logrocket.io/data-node":"true", "logrocket.io/worker-node":"true", "role":"single-node"}, ServiceRole:"", Request:storage.ServerProfileRequest{InstanceType:"", Count:1}}}, Servers:storage.Servers(nil), Agents:map[string]storage.AgentProfile{"single-node":storage.AgentProfile{Instructions:"\ncurl -s --tlsv1.2 -0 -k \"https://10.237.114.65:61009/t/.../single-node\" | sudo bash\n", AgentURL:"agent://10.237.114.65:61009/single-node?access_token=...", Token:"..."}}, Subnets:storage.Subnets{Overlay:"10.244.0.0/16", Service:"10.100.0.0/16"}, Vars:storage.OperationVariables{System:storage.SystemVariables{ClusterName:"logrocket", OpsURL:"https://10.237.114.65:61009/t", Devmode:false, Token:"...", TeleportProxyAddress:"", Docker:storage.DockerConfig{StorageDriver:"", Args:[]string(nil)}}, OnPrem:storage.OnPremVariables{PodCIDR:"10.244.0.0/16", ServiceCIDR:"10.100.0.0/16", VxlanPort:8472}, AWS:storage.AWSVariables{AMI:"", Region:"", AccessKey:"", SecretKey:"", SessionToken:"", VPCID:"", VPCCIDR:"", SubnetID:"", SubnetCIDR:"", InternetGatewayID:"", KeyPair:""}}, Package:loc.Locator{Repository:"gravitational.io", Name:"logrocket", Version:"9.580.0"}}. install/hook.go:56
2019-06-04T09:29:07-04:00 DEBU [INSTALLER] Starting agent: agent://10.237.114.65:61009/single-node?access_token=.... install/hook.go:56
2019-06-04T09:29:07-04:00 INFO [INSTALLER] Listening. addr:10.237.114.65:3012 install/hook.go:56
2019-06-04T09:29:07-04:00 INFO [PEERS]     Monitoring. monitored:10.237.114.65:61009 install/hook.go:56
2019-06-04T09:29:07-04:00 INFO [PEERS]     Reconnecting reconnected:peer(addr=10.237.114.65:3012->server=10.237.114.65:61009) install/hook.go:56
2019-06-04T09:29:07-04:00 INFO [PEERS]     Monitoring peers. health.checker:peer(addr=10.237.114.65:61009) install/hook.go:56
2019-06-04T09:29:07-04:00 DEBU             "PeerJoin((*proto.PeerJoinRequest)(addr:\"10.237.114.65:3012\" config:<role:\"single-node\" token:\"...\" > system_info:\"TRIMMED\" )\n)." install/hook.go:56
2019-06-04T09:29:07-04:00 INFO [PROCESS]   NewPeer(10.237.114.65:3012). mode:installer process:10.237.114.65_61009 install/hook.go:56
2019-06-04T09:29:07-04:00 DEBU [PROCESS]   Added group. key:{00000000-0000-0000-0000-000000000001 logrocket 360acde5-7d55-4f65-85e4-38bb8125e36c} mode:installer process:10.237.114.65_61009 install/hook.go:56
2019-06-04T09:29:07-04:00 INFO [PEERS]     Monitoring peers. health.checker: install/hook.go:56
2019-06-04T09:29:07-04:00 INFO [TELEPROXY] GetServers(logrocket, map[]) install/hook.go:56

This is on a bog standard CentOS box. Any information or assistance you could provide in debugging this would be great.

Figured this out, a proxy was interfering with the connection request.