Looking into DSCP and IEEE 802.1p (VLAN priorities).

Wed 17 September 2014

Filed under Linux Networking

Tags Linux Networking

I recently discovered a flaw in the VLAN implementation I did at work. It seemed that the normal TCP traffic had the correct VLAN priorities applied, but audio streaming UDP traffic did not.

This was due to DSCP being applied to the streaming audio and the fact that the VLAN device's egress-qos-map was incorrect.

I had assumed, incorrectly, that VLAN priorities are applied to all traffic as long as we're not using fancy queuing disciplines (qdiscs). After all, 802.1Q is strictly a layer 2 thing.

Problem: egress-qos-map

Anonymous Coward uses an application that sends voice data and that sets the DSCP value to 20.

Anonymous Coward wants the voice traffic to use a VLAN interface with priority 5, which is appropriate for voice data.

To achieve this, the VLAN device must have the egress qos map set correctly, but setting the egress qos map requires knowing the sk_priority used in the kernel for the particular traffic.

The translation looks like this:

User's DSCP value -> user's app -> kernel      -> user's skb to VLAN prio map

or alternatively:

DSCP              -> IP_TOS     -> sk_priority -> 802.1p

So, how do we know what the sk_priority is for DSCP value of, say, 20?

DSCP priority levels

DSCP is ugly and complicated. It cannot simply be reduced to a range-based mapping to priority levels. I don't think I can say anything constructive about it.

802.1p priority levels

Wikipedia has a simple table of VLAN priority levels. I've reproduced a similar one below:

PriorityTraffic Types
0 (lowest)Background
1 Best Effort
2 Excellent Effort
3 Critical Applications
4 Video, < 100 ms latency and jitter
5 Voice, < 10 ms latency and jitter
6 Internetwork Control
7 (highest) Network Control

Creating a VLAN device

$ ip link add link eth1 name vlan7 type vlan id 42 egress-qos-map 6:5

This creates a new VLAN interface that... * is called vlan7, * uses eth1 as the parent interface, * uses 42 as the VLAN ID and * that maps sk_priority 6 to VLAN priority 5.

Show the parameters for the new VLAN device

$ ip -d link show vlan7
12: vlan7@eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
    link/ether 00:01:c0:0b:4c:f5 brd ff:ff:ff:ff:ff:ff promiscuity 0 
    vlan id 42 <REORDER_HDR> 
      egress-qos-map { 6:5 }

But, why 6 ? I'm so glad you asked. That's what we're trying to determine.

Setting DSCP in the user application:

sock = socket(AF_INET, SOCK_DGRAM, 0);
int32 val = DSCP << 2;
setsockopt(sock, IPPROTO_IP, IP_TOS, &val, sizeof(val));

The bit-shift is needed, because the two least significant bits are used for Explicit Congestion Notification (ECN).

Setting sk_priority in the user application:

int32 val = 6;
setsockopt(sock, SOL_SOCKET, SO_PRIORITY, &val, sizeof(val));

But, why 6?!

Read on!

Diving into some kernel code

net/ipv4/ip_sockglue.c:

static int do_ip_setsockopt(struct sock *sk, int level,
                            int optname, char __user *optval, unsigned int optlen)
{
        struct inet_sock *inet = inet_sk(sk);
        int val = 0, err;

        switch (optname) {

        ...

        case IP_TOS:    /* This sets both TOS and Precedence */
                if (sk->sk_type == SOCK_STREAM) {
                        val &= ~INET_ECN_MASK;
                        val |= inet->tos & INET_ECN_MASK;
                }
                if (inet->tos != val) {
                        inet->tos = val;
                        sk->sk_priority = rt_tos2priority(val);
                        sk_dst_reset(sk);
                }
                break;

        ...

This tells us that using setsockopt to set the IP_TOS option at the IPPROTO_IP level results in the sk_priority being set, and that the value is translated using the rt_tos2priority(int) function.

net/core/sock.c:

int sock_setsockopt(struct socket *sock, int level, int optname,
                    char __user *optval, unsigned int optlen)
{
        struct sock *sk = sock->sk;
        int val;
        int valbool;
        struct linger ling;
        int ret = 0;
        ...
        switch (optname) {
        ...
        case SO_PRIORITY:
                if ((val >= 0 && val <= 6) ||
                    ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
                        sk->sk_priority = val;
                else
                        ret = -EPERM;
                break;
        ...

This tells us that using setsockopt to set the SO_PRIORITY option at the SOL_SOCKET level results in sk_priority being set and that the value must be in the range [0..6].

The reset of the pieces are listed below. I won't describe how to interpret it, because if you're reading this, I'm sure you can work it out. ;)

usr/include/linux/ip.h:

#define IPTOS_TOS_MASK          0x1E
#define IPTOS_TOS(tos)          ((tos)&IPTOS_TOS_MASK)
#define IPTOS_LOWDELAY          0x10
#define IPTOS_THROUGHPUT        0x08
#define IPTOS_RELIABILITY       0x04
#define IPTOS_MINCOST           0x02

#define IPTOS_PREC_MASK         0xE0
#define IPTOS_PREC(tos)         ((tos)&IPTOS_PREC_MASK)
#define IPTOS_PREC_NETCONTROL           0xe0
#define IPTOS_PREC_INTERNETCONTROL      0xc0
#define IPTOS_PREC_CRITIC_ECP           0xa0
#define IPTOS_PREC_FLASHOVERRIDE        0x80
#define IPTOS_PREC_FLASH                0x60
#define IPTOS_PREC_IMMEDIATE            0x40
#define IPTOS_PREC_PRIORITY             0x20
#define IPTOS_PREC_ROUTINE              0x00

include/net/route.h:

static inline char rt_tos2priority(u8 tos)
{
    return ip_tos2prio[IPTOS_TOS(tos)>>1];
}

net/ipv4/route.c:

#define ECN_OR_COST(class)      TC_PRIO_##class

const __u8 ip_tos2prio[16] = {
        TC_PRIO_BESTEFFORT,
        ECN_OR_COST(BESTEFFORT),
        TC_PRIO_BESTEFFORT,
        ECN_OR_COST(BESTEFFORT),
        TC_PRIO_BULK,
        ECN_OR_COST(BULK),
        TC_PRIO_BULK,
        ECN_OR_COST(BULK),
        TC_PRIO_INTERACTIVE,
        ECN_OR_COST(INTERACTIVE),
        TC_PRIO_INTERACTIVE,
        ECN_OR_COST(INTERACTIVE),
        TC_PRIO_INTERACTIVE_BULK,
        ECN_OR_COST(INTERACTIVE_BULK),
        TC_PRIO_INTERACTIVE_BULK,
        ECN_OR_COST(INTERACTIVE_BULK)
};
EXPORT_SYMBOL(ip_tos2prio);

include/uapi/linux/pkt_sched.h:

#define TC_PRIO_BESTEFFORT              0
#define TC_PRIO_FILLER                  1
#define TC_PRIO_BULK                    2
#define TC_PRIO_INTERACTIVE_BULK        4
#define TC_PRIO_INTERACTIVE             6
#define TC_PRIO_CONTROL                 7

Worked example

We're want to know what values to plug into the egress-qos-map, so we're trying to see how to get from DSCP to sk_priority, like this:

DSCP -> IP_TOS -> sk_priority -> 802.1p

DSCP, base 10:

= 20

DSCP -> IP_TOS:

= (20 << 2)
= 80
= 0x50

IP_TOS -> sk_priority:

= rt_tos2priority(0x50)
= ip_tos2prio[IPTOS_TOS(0x50)>>1]
= ip_tos2prio[(0x50 & 0x1E)>>1]
= ip_tos2prio[8]
= TC_PRIO_INTERACTIVE
= 6

Now we know why we need to map sk_priority 6 to VLAN priority 5 if we use a DSCP value of 20!

Conclusion

If DSCP is specified (using IP_TOS), then sk_priority will be set appropriately by the kernel, but unless SO_PRIORITY is set as well, there is no simple way to know exactly what the sk_priority value is in order to set the VLAN egress-qos-map correctly.

If you want to use DSCP and you don't want to, or can't specify SO_PRIORITY in the application, then you'll probably want to configure the egress-qos-map to map the whole set of possible sk_priority values to the chosen VLAN priority.



rationali.st © Andrew Cooks