cgnat-demo/README.md

128 lines
5.6 KiB
Markdown
Raw Normal View History

2021-04-25 13:22:38 +00:00
# Carrier-grade NAT demo (work in progress)
> **Current state**: cross-VRF routing is working, but NAT breaks it.
>
> conntrack log shows state is immediately destroyed after it gets created,
> and the packet is "lost" between `up` and `muplink`.
The basic idea of `100.64.0.0/10` seems to be that a CGN-Router should be able to handle multiple interfaces using `100.64.0.0/10` (including an uplink), but keeping them separated.
Now theoretically it should work moving each interface (apart from the uplink) into a different network namespace, connect all network namespaces with `veth` pairs to the main one (using some other IP addresses...), and enable SNAT when forwarding packets to the main namespace, and SNAT again when forwarding to the uplink.
This demo tries to use VRFs; hopefully this results in having to NAT only once (and doesn't need additional local IP addresses).
To test yourself run `./cgnat-demo.sh` as root (doesn't need network, so feel free to use some isolated container/VM/...):
- spawns `tmux` with multiple windows after setup is done (`ip vrf/netns exec ...` and others)
- `tmux` is configured to use `ctrl-a` prefix (like screen)
- `tmux` shouldn't be detached; default detach keybind (`ctrl-a d`) is replaced to prompt for session destroy
Dependencies:
- `nftables` for NAT / trace
- `conntrack` to show conntrack events
- `tmux` to open shells in various contexts
## Example pings
- Working in `blue_c2`:
- `ping -I 192.0.2.2 192.0.2.1` - ping `uplink` "public" IP
- `ping 100.64.0.1` - ping `blue_c1`
- `ping 2001:db8:b:10::1` - ping `blue_c1`
- `ping 100.127.255.254` - ping gateway
- `ping 2001:db8:b:10::ffff` - ping gateway
- `ping 2001:db8:b:20::1` - ping `red_c1`
- `ping 2001:db8:a::ffff` - ping `uplink`
- `ping 2001:db8:a::1` - ping `main` (i.e. `up:muplink`)
- Broken everywhere but `uplink`:
- `ping 192.0.2.1`
- Broken in `up`:
- `ping 100.127.255.254` (works as soon NAT gets disabled)
## Basic design
- Run everything in a separate network+mount+UTS namespace
- Explicit VRFs for everything, including the uplink
- Uplink VRF (`up`) with `muplink` interface
- Two client VRFs (`blue` and `red`), each with a brigde to connect clients to
- Simulate an uplink with one client (in namespace `uplink`)
- Simulate two clients in VRF `blue` (namespaces `blue_c1` and `blue_c2`)
- Simulate one clients in VRF `red` (namespace `red_c1`)
- IPv4: NAT from client VRFs (`blue` and `red`) to uplink `up`
- IPv6: no NAT, proper routing
- Route `192.0.2.2` from uplink all the way through to `blue_c2` (test IPv4 cross-VRF connectivity without NAT)
Topology:
```
+--------------------+ +-----------------------+ +--------------------+
| uplink: | | main: | | blue_c1: |
| lo | | lo | | lo |
2021-04-25 17:14:55 +00:00
| | | up (vrf) | +--=-> cuplink (veth) |
2021-04-25 13:22:38 +00:00
| client1 (veth) <-=-=--> muplink (veth) | | +--------------------+
+--------------------+ | blue (vrf) | |
| br-blue (bridge) | | +--------------------+
| blue_c1 (veth) <-=--+ | blue_c2: |
| blue_c2 (veth) <-=--+ | lo |
2021-04-25 17:14:55 +00:00
| red (vrf) | +--=-> cuplink (veth) |
2021-04-25 13:22:38 +00:00
| br-red (bridge) | +--------------------+
| red_c1 (veth) <--=--+
+-----------------------+ | +--------------------+
| | red_c1: |
| | lo |
2021-04-25 17:14:55 +00:00
+--=-> cuplink (veth) |
2021-04-25 13:22:38 +00:00
+--------------------+
```
## Basic VRF setup
Proper VRF `ip rule` setup with unreachables if VRF table didn't succeed:
```
1000: from all lookup [l3mdev-table]
2000: from all lookup [l3mdev-table] unreachable
32765: from all lookup local
32766: from all lookup main
```
(+ `lookup default` in IPv4)
## `uplink` configuration
- Address `192.0.2.1/32` on `lo`
- Addresses `100.127.255.254/10` and `2001:db8:a::ffff/64` on `client1`
- Route `2001:db8:b::/48 via 2001:db8:a::1 dev client1`
- Route `192.0.2.2 via 100.64.0.1 dev client1`
## `main:up` configuration
- Addresses `100.64.0.1/10` and `2001:db8:a::1/64` on `muplink`
- Route `default via 100.127.255.254 dev muplink` and `default via 2001:db8:a::ffff dev muplink`
- Route `2001:db8:b:10::/64 dev blue` (forward to VRF `blue`)
- Route `2001:db8:b:20::/64 dev red` (forward to VRF `red`)
- Route `192.0.2.2 dev blue` (forward to VRF `blue`)
## `main:blue` configuration
- Addresses `100.127.255.254/10` and `2001:db8:b:10::ffff/64` on `br-blue`
- Route `default dev up` (IPv4 + IPv6) - forward to VRF `up`
- Route `192.0.2.2 dev br-blue` (connected in `blue_c2`)
## `main:red` configuration
- Addresses `100.127.255.254/10` and `2001:db8:b:20::ffff/64` on `br-red`
- Route `default dev up` (IPv4 + IPv6) - forward to VRF `up`
## client configuration
- Addresses on `cuplink`:
- `blue_c1`: `100.64.0.1/10` and `2001:db8:b:10::1/64`
- `blue_c2`: `100.64.0.2/10` and `2001:db8:b:10::2/64`, also `192.0.2.2/32`
- `red_c1`: `100.64.0.1/10` and `2001:db8:b:20::1/64`
- Route `default via 100.127.255.254 dev cuplink`
- Route `default via 2001:db8:b:$$$$::ffff dev cuplink` (depending on `blue`/`red`)
## TODO
- get NAT working
- test whether one can route to `lo` instead of VRF `up` (and drop VRF `up`), or whether there are other ways for for cross-VRF routing