Using policy routing to loopback over external interfaces

Wed 01 April 2015

Filed under Howto

Tags Linux Networking

Sometimes there is good reason to talk to yourself. You might be doing a sound check, for example.

Likewise, it can be useful to route IP packets between two interfaces on the same machine using an external path. One reason to do this is to test other network devices like routers or switches.

For many years this has been much trickier to do than it might seem, until Patrick McHardy's introduction of the 'accept_local' setting in December 2009.

A little more than a year after that, Kirill Smelkov replied with a functional example script that shows how to achieve full loopback bi-directional IP traffic over physical interfaces.

Even so, there are numerous red herring posts on StackOverflow and elsewhere that don't really help: 1 2 3 4 5 6 7

So, how does it work? Policy Routing, that's how!

Basic preparation:

If you have not already done so, step 0 is to tell NetworkManager to leave the interfaces alone!

Select the interface names and addresses:

IFACE_A=p3p1
IFACE_B=p3p4

IFACE_A_IP="10.10.3.1"
IFACE_B_IP="10.10.3.4"

Assign addresses:

ip addr replace $IFACE_A_IP dev $IFACE_A
ip addr replace $IFACE_B_IP dev $IFACE_B

Ensure the links are up:

ip link change $IFACE_A up
ip link change $IFACE_B up

Policy Routing

Enable receiving of local source addresses on external ifaces (see Documentation/networking/ip-sysctl.txt in the kernel source tree).

sysctl -w net.ipv4.conf.$IFACE_A.accept_local=1
sysctl -w net.ipv4.conf.$IFACE_B.accept_local=1

Decrease priority of the 'local' routing table (change preference from 0 to 1000):

ip rule del pref    0 lookup local
ip rule add pref 1000 lookup local

Prevent routing loop by using preference rules to route incoming packets using 'local' table.

ip rule add pref 100 iif $IFACE_A lookup local
ip rule add pref 101 iif $IFACE_B lookup local

Use a routing table for each interface, that is, destination policy routing:

ip route add default dev $IFACE_A table 200
ip route add default dev $IFACE_B table 201

Note: There's no need to create the tables first, adding a route entry is sufficient.

Add preference rules to route each destination using a coresponding table:

ip rule add pref 200 to ${IFACE_B_IP} lookup 200 # IFACE_A to IFACE_B
ip rule add pref 201 to ${IFACE_A_IP} lookup 201

Note: Do not be confused by the numbers. 'pref 200' refers to the preference/priority of the rule, whereas 'lookup 200' refers to table number 200. The kernel only cares about the number, but 'ip' will use the optional number-to-name mapping defined in /etc/iproute2/rt_tables.

Finally, flush the route cache to ensure the tables are reread:

ip route flush cache

Check the routing rules:

ip rule show
100:    from all iif p3p1 lookup local 
101:    from all iif p3p4 lookup local 
200:    from all to 10.10.3.4 lookup 200 
201:    from all to 10.10.3.1 lookup 201 
1000:   from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default

Test that the traffic is routed via the external interface using 'tcpdump' and 'ping':

tcpdump:

# tcpdump -nlq -i p3p1 -c 6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on p3p1, link-type EN10MB (Ethernet), capture size 262144 bytes
15:51:00.965010 IP 10.10.3.1 > 10.10.3.4: ICMP echo request, id 27352, seq 1, length 64
15:51:00.965217 IP 10.10.3.4 > 10.10.3.1: ICMP echo reply, id 27352, seq 1, length 64
15:51:01.964694 IP 10.10.3.1 > 10.10.3.4: ICMP echo request, id 27352, seq 2, length 64
15:51:01.964825 IP 10.10.3.4 > 10.10.3.1: ICMP echo reply, id 27352, seq 2, length 64
15:51:02.964785 IP 10.10.3.1 > 10.10.3.4: ICMP echo request, id 27352, seq 3, length 64
15:51:02.964987 IP 10.10.3.4 > 10.10.3.1: ICMP echo reply, id 27352, seq 3, length 64
6 packets captured
6 packets received by filter
0 packets dropped by kernel

ping:

$ ping -c 3 10.10.3.4
PING 10.10.3.4 (10.10.3.4) 56(84) bytes of data.
64 bytes from 10.10.3.4: icmp_seq=1 ttl=64 time=0.247 ms
64 bytes from 10.10.3.4: icmp_seq=2 ttl=64 time=0.183 ms
64 bytes from 10.10.3.4: icmp_seq=3 ttl=64 time=0.243 ms

--- 10.10.3.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.183/0.224/0.247/0.031 ms

Thats it! There's no need to mess around with ARP or reverse path filtering!



rationali.st © Andrew Cooks