flannel migration: document removing leftover flannel iptables rules#2768
Open
stitrace wants to merge 1 commit into
Open
flannel migration: document removing leftover flannel iptables rules#2768stitrace wants to merge 1 commit into
stitrace wants to merge 1 commit into
Conversation
The live-migration controller removes the flannel daemonset and deletes the flannel network devices, but it does not remove the iptables chains flannel programs (FLANNEL-POSTRTG in nat, FLANNEL-FWD in filter). These survive the migration and the FLANNEL-POSTRTG masquerade rule keeps SNAT-ing cross-node pod-to-pod traffic to the node tunnel IP, which silently breaks NetworkPolicy after migration. Add a cleanup step (with a reboot alternative) so operators can remove the leftover rules.
✅ Deploy Preview for calico-docs-preview-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview succeeded!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
The live-migration guide tells you to delete the migration controller once migration completes, but it does not mention that flannel leaves iptables rules behind on every node.
The migration controller removes the flannel daemonset and deletes the flannel network devices (
flannel.<vni>,cni0), but it does not remove the iptables chains flannel programs:FLANNEL-POSTRTG(nat) andFLANNEL-FWD(filter). These survive the migration.The masquerade rule in
FLANNEL-POSTRTGkeeps SNAT-ing cross-node pod-to-pod traffic to the node's tunnel IP. This is invisible until you useNetworkPolicy: the SNAT'd source no longer matches pod-selector rules, so Calico's default-deny drops the traffic. Symptom after an otherwise successful migration: cross-node connections to policy-selected pods silently time out, while same-node traffic keeps working.This PR adds a cleanup step after "Delete the migration controller" with an idempotent flush command (legacy + nft backends) and a reboot alternative.
Notes
calico/...); maintainers may want to backport to the versioned snapshots.Reproduction
After migrating, on any node:
nft list chain ip nat POSTROUTINGstill showsjump FLANNEL-POSTRTGwith non-zero counters; a cross-node listener sees the client's source as the sender node's tunnel IP (<block>.0) rather than the pod IP, and NetworkPolicy-selected pods become unreachable cross-node.