Gravity status reports overlay network issue with removed node

Description

Some recent Gravity releases may experience an issue where a cluster may enter “degraded” state after removing a node with the following error mentioning the removed node:

overlay packet loss for node <node-ip> is higher than the allowed threshold of 20.00%: 100.00%

This will happen if the overlay checker detects a networking issue while the node is being removed. The warning will stay permanently even after the node has been removed.

The following Github ticket describes the issue in more detail: https://github.com/gravitational/gravity/issues/1403.

Affected versions

The following versions may experience this issue:

  • 5.5.40-5.5.41
  • 6.1.21-6.1.22
  • 6.3.10-6.3.13
  • 7.0.1-7.0.3

Workaround

Recreating “nethealth” pods in the monitoring namespace will clear the bogus warnings:

kubectl -nmonitoring delete pods -lk8s-app=nethealth