128 lines
5.6 KiB
Markdown
128 lines
5.6 KiB
Markdown
# Carrier-grade NAT demo (work in progress)
|
|
|
|
> **Current state**: cross-VRF routing is working, but NAT breaks it.
|
|
>
|
|
> conntrack log shows state is immediately destroyed after it gets created,
|
|
> and the packet is "lost" between `up` and `muplink`.
|
|
|
|
The basic idea of `100.64.0.0/10` seems to be that a CGN-Router should be able to handle multiple interfaces using `100.64.0.0/10` (including an uplink), but keeping them separated.
|
|
|
|
Now theoretically it should work moving each interface (apart from the uplink) into a different network namespace, connect all network namespaces with `veth` pairs to the main one (using some other IP addresses...), and enable SNAT when forwarding packets to the main namespace, and SNAT again when forwarding to the uplink.
|
|
|
|
This demo tries to use VRFs; hopefully this results in having to NAT only once (and doesn't need additional local IP addresses).
|
|
|
|
To test yourself run `./cgnat-demo.sh` as root (doesn't need network, so feel free to use some isolated container/VM/...):
|
|
- spawns `tmux` with multiple windows after setup is done (`ip vrf/netns exec ...` and others)
|
|
- `tmux` is configured to use `ctrl-a` prefix (like screen)
|
|
- `tmux` shouldn't be detached; default detach keybind (`ctrl-a d`) is replaced to prompt for session destroy
|
|
|
|
Dependencies:
|
|
|
|
- `nftables` for NAT / trace
|
|
- `conntrack` to show conntrack events
|
|
- `tmux` to open shells in various contexts
|
|
|
|
## Example pings
|
|
|
|
- Working in `blue_c2`:
|
|
- `ping -I 192.0.2.2 192.0.2.1` - ping `uplink` "public" IP
|
|
- `ping 100.64.0.1` - ping `blue_c1`
|
|
- `ping 2001:db8:b:10::1` - ping `blue_c1`
|
|
- `ping 100.127.255.254` - ping gateway
|
|
- `ping 2001:db8:b:10::ffff` - ping gateway
|
|
- `ping 2001:db8:b:20::1` - ping `red_c1`
|
|
- `ping 2001:db8:a::ffff` - ping `uplink`
|
|
- `ping 2001:db8:a::1` - ping `main` (i.e. `up:muplink`)
|
|
- Broken everywhere but `uplink`:
|
|
- `ping 192.0.2.1`
|
|
- Broken in `up`:
|
|
- `ping 100.127.255.254` (works as soon NAT gets disabled)
|
|
|
|
## Basic design
|
|
|
|
- Run everything in a separate network+mount+UTS namespace
|
|
- Explicit VRFs for everything, including the uplink
|
|
- Uplink VRF (`up`) with `muplink` interface
|
|
- Two client VRFs (`blue` and `red`), each with a brigde to connect clients to
|
|
- Simulate an uplink with one client (in namespace `uplink`)
|
|
- Simulate two clients in VRF `blue` (namespaces `blue_c1` and `blue_c2`)
|
|
- Simulate one clients in VRF `red` (namespace `red_c1`)
|
|
- IPv4: NAT from client VRFs (`blue` and `red`) to uplink `up`
|
|
- IPv6: no NAT, proper routing
|
|
- Route `192.0.2.2` from uplink all the way through to `blue_c2` (test IPv4 cross-VRF connectivity without NAT)
|
|
|
|
Topology:
|
|
|
|
```
|
|
+--------------------+ +-----------------------+ +--------------------+
|
|
| uplink: | | main: | | blue_c1: |
|
|
| lo | | lo | | lo |
|
|
| | | up (vrf) | +--=-> cuplink (veth) |
|
|
| client1 (veth) <-=-=--> muplink (veth) | | +--------------------+
|
|
+--------------------+ | blue (vrf) | |
|
|
| br-blue (bridge) | | +--------------------+
|
|
| blue_c1 (veth) <-=--+ | blue_c2: |
|
|
| blue_c2 (veth) <-=--+ | lo |
|
|
| red (vrf) | +--=-> cuplink (veth) |
|
|
| br-red (bridge) | +--------------------+
|
|
| red_c1 (veth) <--=--+
|
|
+-----------------------+ | +--------------------+
|
|
| | red_c1: |
|
|
| | lo |
|
|
+--=-> cuplink (veth) |
|
|
+--------------------+
|
|
```
|
|
|
|
## Basic VRF setup
|
|
|
|
Proper VRF `ip rule` setup with unreachables if VRF table didn't succeed:
|
|
|
|
```
|
|
1000: from all lookup [l3mdev-table]
|
|
2000: from all lookup [l3mdev-table] unreachable
|
|
32765: from all lookup local
|
|
32766: from all lookup main
|
|
```
|
|
|
|
(+ `lookup default` in IPv4)
|
|
|
|
## `uplink` configuration
|
|
|
|
- Address `192.0.2.1/32` on `lo`
|
|
- Addresses `100.127.255.254/10` and `2001:db8:a::ffff/64` on `client1`
|
|
- Route `2001:db8:b::/48 via 2001:db8:a::1 dev client1`
|
|
- Route `192.0.2.2 via 100.64.0.1 dev client1`
|
|
|
|
## `main:up` configuration
|
|
|
|
- Addresses `100.64.0.1/10` and `2001:db8:a::1/64` on `muplink`
|
|
- Route `default via 100.127.255.254 dev muplink` and `default via 2001:db8:a::ffff dev muplink`
|
|
- Route `2001:db8:b:10::/64 dev blue` (forward to VRF `blue`)
|
|
- Route `2001:db8:b:20::/64 dev red` (forward to VRF `red`)
|
|
- Route `192.0.2.2 dev blue` (forward to VRF `blue`)
|
|
|
|
## `main:blue` configuration
|
|
|
|
- Addresses `100.127.255.254/10` and `2001:db8:b:10::ffff/64` on `br-blue`
|
|
- Route `default dev up` (IPv4 + IPv6) - forward to VRF `up`
|
|
- Route `192.0.2.2 dev br-blue` (connected in `blue_c2`)
|
|
|
|
## `main:red` configuration
|
|
|
|
- Addresses `100.127.255.254/10` and `2001:db8:b:20::ffff/64` on `br-red`
|
|
- Route `default dev up` (IPv4 + IPv6) - forward to VRF `up`
|
|
|
|
## client configuration
|
|
|
|
- Addresses on `cuplink`:
|
|
- `blue_c1`: `100.64.0.1/10` and `2001:db8:b:10::1/64`
|
|
- `blue_c2`: `100.64.0.2/10` and `2001:db8:b:10::2/64`, also `192.0.2.2/32`
|
|
- `red_c1`: `100.64.0.1/10` and `2001:db8:b:20::1/64`
|
|
- Route `default via 100.127.255.254 dev cuplink`
|
|
- Route `default via 2001:db8:b:$$$$::ffff dev cuplink` (depending on `blue`/`red`)
|
|
|
|
## TODO
|
|
|
|
- get NAT working
|
|
- test whether one can route to `lo` instead of VRF `up` (and drop VRF `up`), or whether there are other ways for for cross-VRF routing
|