initial
This commit is contained in:
commit
66db9ee20c
127
README.md
Normal file
127
README.md
Normal file
@ -0,0 +1,127 @@
|
||||
# Carrier-grade NAT demo (work in progress)
|
||||
|
||||
> **Current state**: cross-VRF routing is working, but NAT breaks it.
|
||||
>
|
||||
> conntrack log shows state is immediately destroyed after it gets created,
|
||||
> and the packet is "lost" between `up` and `muplink`.
|
||||
|
||||
The basic idea of `100.64.0.0/10` seems to be that a CGN-Router should be able to handle multiple interfaces using `100.64.0.0/10` (including an uplink), but keeping them separated.
|
||||
|
||||
Now theoretically it should work moving each interface (apart from the uplink) into a different network namespace, connect all network namespaces with `veth` pairs to the main one (using some other IP addresses...), and enable SNAT when forwarding packets to the main namespace, and SNAT again when forwarding to the uplink.
|
||||
|
||||
This demo tries to use VRFs; hopefully this results in having to NAT only once (and doesn't need additional local IP addresses).
|
||||
|
||||
To test yourself run `./cgnat-demo.sh` as root (doesn't need network, so feel free to use some isolated container/VM/...):
|
||||
- spawns `tmux` with multiple windows after setup is done (`ip vrf/netns exec ...` and others)
|
||||
- `tmux` is configured to use `ctrl-a` prefix (like screen)
|
||||
- `tmux` shouldn't be detached; default detach keybind (`ctrl-a d`) is replaced to prompt for session destroy
|
||||
|
||||
Dependencies:
|
||||
|
||||
- `nftables` for NAT / trace
|
||||
- `conntrack` to show conntrack events
|
||||
- `tmux` to open shells in various contexts
|
||||
|
||||
## Example pings
|
||||
|
||||
- Working in `blue_c2`:
|
||||
- `ping -I 192.0.2.2 192.0.2.1` - ping `uplink` "public" IP
|
||||
- `ping 100.64.0.1` - ping `blue_c1`
|
||||
- `ping 2001:db8:b:10::1` - ping `blue_c1`
|
||||
- `ping 100.127.255.254` - ping gateway
|
||||
- `ping 2001:db8:b:10::ffff` - ping gateway
|
||||
- `ping 2001:db8:b:20::1` - ping `red_c1`
|
||||
- `ping 2001:db8:a::ffff` - ping `uplink`
|
||||
- `ping 2001:db8:a::1` - ping `main` (i.e. `up:muplink`)
|
||||
- Broken everywhere but `uplink`:
|
||||
- `ping 192.0.2.1`
|
||||
- Broken in `up`:
|
||||
- `ping 100.127.255.254` (works as soon NAT gets disabled)
|
||||
|
||||
## Basic design
|
||||
|
||||
- Run everything in a separate network+mount+UTS namespace
|
||||
- Explicit VRFs for everything, including the uplink
|
||||
- Uplink VRF (`up`) with `muplink` interface
|
||||
- Two client VRFs (`blue` and `red`), each with a brigde to connect clients to
|
||||
- Simulate an uplink with one client (in namespace `uplink`)
|
||||
- Simulate two clients in VRF `blue` (namespaces `blue_c1` and `blue_c2`)
|
||||
- Simulate one clients in VRF `red` (namespace `red_c1`)
|
||||
- IPv4: NAT from client VRFs (`blue` and `red`) to uplink `up`
|
||||
- IPv6: no NAT, proper routing
|
||||
- Route `192.0.2.2` from uplink all the way through to `blue_c2` (test IPv4 cross-VRF connectivity without NAT)
|
||||
|
||||
Topology:
|
||||
|
||||
```
|
||||
+--------------------+ +-----------------------+ +--------------------+
|
||||
| uplink: | | main: | | blue_c1: |
|
||||
| lo | | lo | | lo |
|
||||
| | | up (vrf) | +--=---> cuplink (veth) |
|
||||
| client1 (veth) <-=-=--> muplink (veth) | | +--------------------+
|
||||
+--------------------+ | blue (vrf) | |
|
||||
| br-blue (bridge) | | +--------------------+
|
||||
| blue_c1 (veth) <-=--+ | blue_c2: |
|
||||
| blue_c2 (veth) <-=--+ | lo |
|
||||
| red (vrf) | +--=---> cuplink (veth) |
|
||||
| br-red (bridge) | +--------------------+
|
||||
| red_c1 (veth) <--=--+
|
||||
+-----------------------+ | +--------------------+
|
||||
| | red_c1: |
|
||||
| | lo |
|
||||
+--=---> cuplink (veth) |
|
||||
+--------------------+
|
||||
```
|
||||
|
||||
## Basic VRF setup
|
||||
|
||||
Proper VRF `ip rule` setup with unreachables if VRF table didn't succeed:
|
||||
|
||||
```
|
||||
1000: from all lookup [l3mdev-table]
|
||||
2000: from all lookup [l3mdev-table] unreachable
|
||||
32765: from all lookup local
|
||||
32766: from all lookup main
|
||||
```
|
||||
|
||||
(+ `lookup default` in IPv4)
|
||||
|
||||
## `uplink` configuration
|
||||
|
||||
- Address `192.0.2.1/32` on `lo`
|
||||
- Addresses `100.127.255.254/10` and `2001:db8:a::ffff/64` on `client1`
|
||||
- Route `2001:db8:b::/48 via 2001:db8:a::1 dev client1`
|
||||
- Route `192.0.2.2 via 100.64.0.1 dev client1`
|
||||
|
||||
## `main:up` configuration
|
||||
|
||||
- Addresses `100.64.0.1/10` and `2001:db8:a::1/64` on `muplink`
|
||||
- Route `default via 100.127.255.254 dev muplink` and `default via 2001:db8:a::ffff dev muplink`
|
||||
- Route `2001:db8:b:10::/64 dev blue` (forward to VRF `blue`)
|
||||
- Route `2001:db8:b:20::/64 dev red` (forward to VRF `red`)
|
||||
- Route `192.0.2.2 dev blue` (forward to VRF `blue`)
|
||||
|
||||
## `main:blue` configuration
|
||||
|
||||
- Addresses `100.127.255.254/10` and `2001:db8:b:10::ffff/64` on `br-blue`
|
||||
- Route `default dev up` (IPv4 + IPv6) - forward to VRF `up`
|
||||
- Route `192.0.2.2 dev br-blue` (connected in `blue_c2`)
|
||||
|
||||
## `main:red` configuration
|
||||
|
||||
- Addresses `100.127.255.254/10` and `2001:db8:b:20::ffff/64` on `br-red`
|
||||
- Route `default dev up` (IPv4 + IPv6) - forward to VRF `up`
|
||||
|
||||
## client configuration
|
||||
|
||||
- Addresses on `cuplink`:
|
||||
- `blue_c1`: `100.64.0.1/10` and `2001:db8:b:10::1/64`
|
||||
- `blue_c2`: `100.64.0.2/10` and `2001:db8:b:10::2/64`, also `192.0.2.2/32`
|
||||
- `red_c1`: `100.64.0.1/10` and `2001:db8:b:20::1/64`
|
||||
- Route `default via 100.127.255.254 dev cuplink`
|
||||
- Route `default via 2001:db8:b:$$$$::ffff dev cuplink` (depending on `blue`/`red`)
|
||||
|
||||
## TODO
|
||||
|
||||
- get NAT working
|
||||
- test whether one can route to `lo` instead of VRF `up` (and drop VRF `up`), or whether there are other ways for for cross-VRF routing
|
152
cgnat-demo.sh
Executable file
152
cgnat-demo.sh
Executable file
@ -0,0 +1,152 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -e
|
||||
|
||||
if [ "$1" != "--inner" ]; then
|
||||
if [ ! -d "/run/netns" ]; then
|
||||
mkdir /run/netns
|
||||
chmod 0755 /run/netns
|
||||
fi
|
||||
|
||||
export tmpdir=$(mktemp -p /run -d netns-cgnat-demo-XXXXXXX)
|
||||
trap 'rm -rf "${tmpdir}"' EXIT
|
||||
export NAMESPACEDIR="${tmpdir}/netns"
|
||||
mkdir "${NAMESPACEDIR}"
|
||||
chmod 0755 "${NAMESPACEDIR}"
|
||||
# Run actuall demo in network+mount+UTS namespaces
|
||||
unshare -m -n -u -- "$0" --inner
|
||||
echo "Cleaning up"
|
||||
# cleanup afterwards
|
||||
exit 0
|
||||
fi
|
||||
|
||||
show_failed_command() {
|
||||
local rc=$?
|
||||
if [ "${rc}" -ne 0 ]; then
|
||||
printf 'Failed command: %s\n' "${BASH_COMMAND}"
|
||||
fi
|
||||
exit $rc
|
||||
}
|
||||
|
||||
trap show_failed_command EXIT
|
||||
|
||||
cd "$(dirname "$(readlink -f "$0")")"
|
||||
|
||||
cp tmux_base.conf "${tmpdir}/tmux.conf"
|
||||
printf >>"${tmpdir}/tmux.conf" 'new-session -n main -s cgnat-demo "%s"\n' "${SHELL} -i"
|
||||
printf >>"${tmpdir}/tmux.conf" 'new-window -d -n trace "nft monitor trace"\n'
|
||||
printf >>"${tmpdir}/tmux.conf" 'new-window -d -n conntrack "conntrack -E -o timestamp"\n'
|
||||
|
||||
# setup local ip-netns "namespace" (so ip-netns names don't conflict with other stuff)
|
||||
mount -o bind "${NAMESPACEDIR}" /run/netns
|
||||
mount --make-private /run/netns
|
||||
|
||||
# gonna do routing
|
||||
sysctl -q net.ipv4.ip_forward=1
|
||||
sysctl -q net.ipv6.conf.default.forwarding=1
|
||||
sysctl -q net.ipv6.conf.all.forwarding=1
|
||||
|
||||
# basic setup of our main network namespace
|
||||
ip link set dev lo up
|
||||
./fix-vrf-rules.sh
|
||||
|
||||
netns() {
|
||||
local name="$1"
|
||||
shift
|
||||
ip netns exec "${name}" "$@"
|
||||
}
|
||||
|
||||
create_netns() {
|
||||
local name="$1"
|
||||
|
||||
ip netns add "${name}"
|
||||
# basic setup
|
||||
ip -n "${name}" link set dev lo up
|
||||
netns "${name}" ./fix-vrf-rules.sh
|
||||
}
|
||||
|
||||
# build explicit VRF to uplink (and route others through)
|
||||
ip link add name "up" type vrf table "1"
|
||||
ip link set dev "up" up
|
||||
printf >>"${tmpdir}/tmux.conf" 'new-window -d -n up -e debian_chroot=up "%s"\n' "ip vrf exec up ${SHELL} -i"
|
||||
|
||||
export UPLINK="100.127.255.254" # last usable ip in 100.64.0.0/10
|
||||
export UPLINK6="2001:db8:a::ffff"
|
||||
export PUBLIC="192.0.2.1"
|
||||
|
||||
# build "uplink": uplink has one client: the "main" netns
|
||||
create_netns "uplink"
|
||||
ip link add name muplink type veth peer client1
|
||||
ip link set dev client1 netns "uplink"
|
||||
ip -n "uplink" address add "${PUBLIC}/32" dev lo
|
||||
ip -n "uplink" link set dev client1 up
|
||||
ip -n "uplink" address add "${UPLINK}/10" dev client1
|
||||
ip -n "uplink" address add "${UPLINK6}/64" dev client1
|
||||
ip -n "uplink" route add "2001:db8:b::/48" via 2001:db8:a::1 dev client1
|
||||
ip link set dev muplink vrf "up" up
|
||||
ip address add 100.64.0.1/10 dev muplink
|
||||
ip address add 2001:db8:a::1/64 dev muplink
|
||||
ip route add default vrf "up" via "${UPLINK}" dev muplink
|
||||
ip -6 route add default vrf "up" via "${UPLINK6}" dev muplink
|
||||
printf >>"${tmpdir}/tmux.conf" 'new-window -d -n uplink -e debian_chroot=uplink "%s"\n' "ip netns exec uplink ${SHELL} -i"
|
||||
|
||||
declare -A VRFIDS
|
||||
|
||||
create_client_vrf() {
|
||||
local vrfname="$1"
|
||||
local vrfid=$2
|
||||
VRFIDS[${vrfname}]=${vrfid}
|
||||
ip link add name "${vrfname}" type vrf table "${vrfid}"
|
||||
ip link add name "br-${vrfname}" type bridge
|
||||
ip link set dev "br-${vrfname}" master "${vrfname}" up
|
||||
ip link set dev "${vrfname}" up
|
||||
ip address add "${UPLINK}/10" dev "br-${vrfname}"
|
||||
ip address add "2001:db8:b:${vrfid}::ffff/64" dev "br-${vrfname}"
|
||||
ip route add "2001:db8:b:${vrfid}::/64" vrf "up" dev "${vrfname}" # route-leak IPv6 clients
|
||||
ip route add default vrf "${vrfname}" dev "up" # route-leak uplink
|
||||
ip -6 route add default vrf "${vrfname}" dev "up" # route-leak uplink
|
||||
|
||||
printf >>"${tmpdir}/tmux.conf" 'new-window -d -n %s -e debian_chroot=%s "%s"\n' "${vrfname}" "${vrfname}" "ip vrf exec ${vrfname} ${SHELL} -i"
|
||||
}
|
||||
|
||||
create_client() {
|
||||
local vrfname="$1"
|
||||
local name="$2"
|
||||
local id="$3"
|
||||
local vrfid=${VRFIDS[$vrfname]}
|
||||
local ip="100.64.0.${id}/10"
|
||||
local ipv6="2001:db8:b:${vrfid}::${id}/64"
|
||||
|
||||
create_netns "${name}"
|
||||
ip link add name "${name}" type veth peer cuplink
|
||||
ip link set dev cuplink netns "${name}"
|
||||
ip -n "${name}" address add "${ip}" dev cuplink
|
||||
ip -n "${name}" address add "${ipv6}" dev cuplink
|
||||
ip -n "${name}" link set dev cuplink up
|
||||
ip -n "${name}" route add default via "${UPLINK}" dev cuplink
|
||||
ip -n "${name}" route add default via "2001:db8:b:${vrfid}::ffff" dev cuplink
|
||||
sysctl -q "net.ipv6.conf.${name}.disable_ipv6=1" # disable ipv6 on bridge slave
|
||||
ip link set dev "${name}" master "br-${vrfname}" up
|
||||
|
||||
printf >>"${tmpdir}/tmux.conf" 'new-window -d -n %s -e debian_chroot=%s "%s"\n' "${name}" "${name}" "ip netns exec ${name} ${SHELL} -i"
|
||||
}
|
||||
|
||||
# setup firewall / NAT
|
||||
/usr/sbin/nft -f nft.conf
|
||||
|
||||
create_client_vrf "blue" 10
|
||||
create_client "blue" "blue_c1" 1
|
||||
create_client "blue" "blue_c2" 2
|
||||
create_client_vrf "red" 20
|
||||
create_client "red" "red_c1" 1
|
||||
|
||||
# without NAT ipv4 seems to be working:
|
||||
ip -n "blue_c2" address add "192.0.2.2/32" dev cuplink
|
||||
ip route add "192.0.2.2/32" vrf "blue" dev br-blue # on bridge in vrf blue
|
||||
ip route add "192.0.2.2/32" vrf "up" dev blue # leak to vrf up
|
||||
ip -n "uplink" route add "192.0.2.2/32" via "100.64.0.1" dev client1 # static route in uplink
|
||||
|
||||
echo
|
||||
echo "--- Have fun checking it out yourself (exit the shell to close the experiment)."
|
||||
export debian_chroot=cgnat-demo
|
||||
exec tmux -L "cgnat-demo-$$" -f "${tmpdir}/tmux.conf" attach
|
61
fix-vrf-rules.sh
Executable file
61
fix-vrf-rules.sh
Executable file
@ -0,0 +1,61 @@
|
||||
#!/bin/sh
|
||||
|
||||
# Creating a VRF on linux (like `ip link add vrf_foobar type vrf table 10`) automatically inserts a
|
||||
# `l3mdev` rule (both IPv4 and IPv6) with preference 1000 by default.
|
||||
#
|
||||
# Sadly this means that the `lookup local` with preference 0 (the table `local` containing your
|
||||
# addresses in the "default VRF") is queried before that, which breaks routing of packets from a
|
||||
# VRF to your non-VRF addresses.
|
||||
#
|
||||
# So you actually want the `l3mdev` rule before the `lookup local` rule, and this script helps with
|
||||
# that.
|
||||
#
|
||||
# Your VRF routing table usually is contained completely in the table you specified when creating
|
||||
# the VRF; this script also creates an "pref 2000 l3mdev unreachable" rule to make sure within VRFs
|
||||
# no routes "outside" the VRF are used. (As an alternative you could add `unreachable default
|
||||
# metric 4278198272` routes in both IPv4 and IPv6 VRF tables).
|
||||
#
|
||||
# This should still leave enough room to add policy-based routing rules if you need them.
|
||||
#
|
||||
# Also see `vrf_prepare()` and `vrf_create()` in linux kernel
|
||||
# source:tools/testing/selftests/net/forwarding/lib.sh
|
||||
|
||||
set -e
|
||||
|
||||
has_rule() {
|
||||
if [ -n "$(ip $family rule list "$@")" ]; then
|
||||
# echo "Have: ip $family rule $*"
|
||||
return 0
|
||||
else
|
||||
# echo "Have not: ip $family rule $*"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
rule() {
|
||||
# echo "Running: ip $family rule $*"
|
||||
ip $family rule "$@"
|
||||
}
|
||||
|
||||
run() {
|
||||
# move lookup local to pref 32765 (from 0)
|
||||
if ! has_rule pref 32765 lookup local; then
|
||||
rule add pref 32765 lookup local
|
||||
fi
|
||||
if has_rule pref 0 lookup local; then
|
||||
rule del pref 0 lookup local
|
||||
fi
|
||||
# make sure that in VRFs after failed lookup in the VRF specific table nothing else is reached
|
||||
if ! has_rule pref 1000 l3mdev; then
|
||||
# this should be added by the kernel when a VRF is created; add it here for completeness
|
||||
rule add pref 1000 l3mdev protocol kernel
|
||||
fi
|
||||
if ! has_rule pref 2000 l3mdev; then # can't search for actions; so can't make sure this is actually using "unreachable"
|
||||
rule add pref 2000 l3mdev unreachable
|
||||
fi
|
||||
}
|
||||
|
||||
family=-4
|
||||
run
|
||||
family=-6
|
||||
run
|
66
nft.conf
Normal file
66
nft.conf
Normal file
@ -0,0 +1,66 @@
|
||||
#!/usr/sbin/nft -f
|
||||
|
||||
flush ruleset
|
||||
|
||||
# Counting IPv4 packets in `inet` tables:
|
||||
# meta nfproto ipv4 counter accept
|
||||
|
||||
# NAT when routing packets from some VRF to "up" VRF
|
||||
table inet nat {
|
||||
chain postrouting {
|
||||
type nat hook postrouting priority srcnat; policy accept;
|
||||
ip saddr 100.64.0.0/10 oif "up" counter masquerade
|
||||
# 192.0.2.2 is statically routed: stops working as soon as NAT is enabled
|
||||
# ip saddr 192.0.2.2/32 oif "up" counter masquerade
|
||||
accept # less noise in trace
|
||||
}
|
||||
|
||||
# pre kernel 4.18 needs this:
|
||||
chain prerouting {
|
||||
type nat hook prerouting priority -100; policy accept;
|
||||
accept # less noise in trace
|
||||
}
|
||||
}
|
||||
|
||||
# Trace all IPv4:
|
||||
# define filter hooks so we see packets tracing through them
|
||||
|
||||
table inet main {
|
||||
chain prerouting {
|
||||
type filter hook prerouting priority filter; policy accept;
|
||||
accept # less noise in trace
|
||||
}
|
||||
|
||||
chain input {
|
||||
type filter hook input priority filter; policy accept;
|
||||
accept # less noise in trace
|
||||
}
|
||||
|
||||
chain forward {
|
||||
type filter hook forward priority filter; policy accept;
|
||||
accept # less noise in trace
|
||||
}
|
||||
|
||||
chain output {
|
||||
type filter hook output priority filter; policy accept;
|
||||
accept # less noise in trace
|
||||
}
|
||||
|
||||
chain postrouting {
|
||||
type filter hook postrouting priority filter; policy accept;
|
||||
accept # less noise in trace
|
||||
}
|
||||
}
|
||||
|
||||
# enable tracing for all IPv4 packets (either start in prerouting or output)
|
||||
table ip traceall {
|
||||
chain prerouting {
|
||||
type filter hook prerouting priority -350; policy accept;
|
||||
meta nftrace set 1 accept
|
||||
}
|
||||
|
||||
chain output {
|
||||
type filter hook output priority -350; policy accept;
|
||||
meta nftrace set 1 accept
|
||||
}
|
||||
}
|
44
tmux_base.conf
Normal file
44
tmux_base.conf
Normal file
@ -0,0 +1,44 @@
|
||||
# screen like prefix
|
||||
set-option -g prefix C-a
|
||||
unbind-key C-b
|
||||
bind-key a send-prefix
|
||||
bind-key C-a last-window
|
||||
|
||||
# Ctrl-N for next window
|
||||
# bind-key -T root ^N next-window
|
||||
bind-key -n ^N next-window
|
||||
# Ctrl-P for previous window
|
||||
# bind-key -T root ^P previous-window
|
||||
bind-key -n ^P previous-window
|
||||
|
||||
# ctrl-arrow keys
|
||||
set-window-option -g xterm-keys on
|
||||
|
||||
# layout/colours
|
||||
set-option -g status-bg black
|
||||
set-option -g status-fg colour45
|
||||
set-option -g status-justify centre
|
||||
set-option -g status-keys vi
|
||||
set-option -g status-left "#[fg=green][ #H ]#[fg=red] [ #W ]"
|
||||
set-option -g status-left-length 40
|
||||
set-option -g status-right "#[fg=colour5][ %H:%M %d-%b-%y ]"
|
||||
#set-option -g status-utf8 on
|
||||
|
||||
set-window-option -g monitor-activity on
|
||||
set-window-option -g window-status-current-style bold
|
||||
set-window-option -g window-status-current-format "#[fg=colour196](#[fg=default]#I#F #W#[fg=colour196])"
|
||||
set-window-option -g window-status-format "[#I#F #W]"
|
||||
#set-window-option -g window-status-alert-fg color226
|
||||
|
||||
set-option -g set-titles on
|
||||
set-window-option -g automatic-rename off
|
||||
set-window-option -g allow-rename on
|
||||
|
||||
# destroy instead of detach
|
||||
bind-key d confirm-before -p "kill session #S? (y/n)" kill-session
|
||||
# vim style :quit / :q
|
||||
set-option -s command-alias[200] quit='confirm-before -p "kill session #S? (y/n)" kill-session'
|
||||
set-option -s command-alias[201] q='confirm-before -p "kill session #S? (y/n)" kill-session'
|
||||
|
||||
# new -n bash "exec /bin/bash"
|
||||
# ...
|
Loading…
x
Reference in New Issue
Block a user