When Packets Can't Wait: Comparing Protocols for Delay-Sensitive Data

In Diagnosing Video Stuttering Over TCP, we built a diagnostic framework—identifying zero-window events (receiver overwhelmed) versus retransmits (network problems). In The TCP Jitter Cliff, we discovered that throughput collapses unpredictably when jitter exceeds ~20% of RTT, and the chaos zone makes diagnosis treacherous.

The conclusion from both posts is clear: TCP is inappropriate for delay-sensitive streaming. Its guaranteed-delivery model creates unbounded latency during loss recovery. When a packet is lost, TCP will wait—potentially forever—rather than skip ahead. For a live video frame or audio sample, arriving late is the same as not arriving at all.

But “don’t use TCP” isn’t a complete answer. What should you use? The protocol landscape for delay-sensitive data is vast—spanning media streaming, industrial automation, robotics, financial messaging, and IoT. Each protocol answers the fundamental question differently.

The Fundamental Question

Every protocol for delay-sensitive data answers the same question: “What should happen when a packet is lost?”

TCP’s answer: “Retransmit until delivered.” Latency grows unboundedly, but everything arrives.

UDP’s answer: “Nothing—that’s the application’s problem.” No latency penalty, but no help either.

Between these extremes lie many other protocols, each with a different answer. Some retransmit within a time budget, then drop what’s too late. Some push reliability to the link layer and accept occasional glitches. Some let you configure the behavior per-message.

You often can’t see a protocol’s loss-handling logic in a packet capture—encrypted headers hide the flags, and application-layer framing obscures the recovery mechanism. But you can see the result in packet timing. A “guaranteed delivery” protocol produces distinct stalling patterns during loss: packets queue up, then burst when recovery completes. A “bounded-time” protocol maintains steady cadence at the cost of occasional gaps. Tools like JitterTrap measure Inter-Arrival Time (IAT) distributions, making these patterns visible.

The answer a protocol gives reveals its design philosophy—and determines whether it’s suitable for your application. This post maps the landscape of delay-sensitive transport protocols across media, industrial, and systems domains, identifying what makes each unique and where to focus deeper investigation.

The Landscape

Here are the protocols we’ll examine. They span four decades, from UDP (1980) to emerging standards like MoQ—but this list is illustrative, not exhaustive. See the Protocol Reference for details on each.

Transport Layer Protocols (kernel-provided)

Protocol	Over	Recovery
UDP	IP	Best-effort
TCP	IP	Guaranteed delivery
RDP	IP	Guaranteed delivery (experimental)
SCTP	IP	Guaranteed / Configurable (PR-SCTP)
DCCP	IP	Best-effort + congestion control (deprecated)

Application Layer Protocols (userspace, over UDP or TCP)

Protocol	Over	Recovery	Delivery	Topology	Multiplexing
Media (video, interactive/contribution)
RTP	UDP	Best-effort	Push	P2P	SSRC
RTMP	TCP	Guaranteed	Push	P2P	Chunks
WebRTC	UDP	Bounded-time	Push	P2P	Tracks
SRT	UDP	Bounded-time	Push	P2P	Streams
RIST	UDP	Bounded-time	Push	P2P	Streams
Media (video, buffered)
HLS	TCP (HTTP)	Guaranteed	Pull	CDN	Segments
DASH	TCP (HTTP)	Guaranteed	Pull	CDN	Segments
Media (audio)
AES67	UDP	Network-designed	Push	P2P	SSRC
RAVENNA	UDP	Network-designed	Push	P2P	SSRC
Industrial
GVSP	UDP	Bounded-time	Push	P2P	Frames
EtherNet/IP	TCP+UDP	Hybrid	Push	P2P	Connections
Modbus TCP	TCP	Guaranteed	Poll	Client-Server	Registers
OPC UA	TCP+UDP	Configurable	Both	Configurable	Nodes
IoT / Constrained Devices
MQTT	TCP	Configurable	Push	Broker	Topics
CoAP	UDP	Configurable	Pull	Client-Server	Resources
Robotics / Autonomous Systems
RTPS/DDS	UDP	Configurable	Push	Decentralized	Topics
zenoh	UDP/TCP	Configurable	Push	Decentralized	Keys
Gaming / Real-time Applications
KCP	UDP	Fast ARQ	Push	P2P	Streams
ENet	UDP	Configurable	Push	P2P	Channels
Financial Systems
Aeron	UDP/IPC	Configurable (default: reliable)	Push	P2P	Streams
ZeroMQ	TCP/IPC	Best-effort	Push	Configurable	Messages
Web Infrastructure
QUIC	UDP	Guaranteed	Push	P2P	Streams
MoQ	QUIC	Bounded-time	Hybrid	Relay	Objects

Reading the table: Recovery describes how protocols handle packet loss—the focus of this post. Delivery indicates whether data flows from sender (Push), is requested by receiver (Pull), or both (Hybrid). Topology describes the network structure: direct peer-to-peer (P2P), through a central broker, via relay infrastructure, or with decentralized discovery. Multiplexing shows how protocols organize concurrent data flows—streams, topics, channels, or discrete objects. See the Protocol Reference for detailed per-protocol information.

Transport Layer vs Application Layer

The distinction matters. Transport-layer protocols (TCP, UDP, SCTP, DCCP) are implemented in the kernel, providing socket APIs that applications use directly. Application-layer protocols (RTP, SRT, WebRTC, QUIC) run in userspace, implementing their own reliability and timing mechanisms—typically over UDP.

Application-layer protocols have an advantage for domain-specific optimization: they can be tailored to the exact needs of video, audio, or industrial data without waiting for kernel changes. The downside is that each application must integrate a protocol library.

Integration complexity varies enormously among application-layer protocols. Library protocols like ENet (single C file), KCP (header-only), SRT, and ZeroMQ can be embedded in any application with minimal dependencies. Platform stacks like WebRTC are browser engines—hundreds of megabytes of code that dictate your entire architecture, build system, and update cadence. Choose accordingly: if you control both endpoints and don’t need browser compatibility, a library protocol is often simpler.

The Link Layer

The Link Layer (Layer 2) sits below the Network Layer (Layer 3) in the OSI model. IP is ubiquitous at Layer 3 (all the Application Layer protocols we consider here depend on it), but many Application Layer protocols were designed to depend on characteristics of Ethernet as the Link Layer. Reality turned out messier than the OSI model’s clean layered architecture suggests.

Not that Ethernet (802.3) is the only link layer that matters. WiFi (802.11ax, 802.11ah) carries increasing amounts of delay-sensitive traffic, with its own QoS mechanisms and timing challenges. But Ethernet dominates in professional and industrial installations where timing guarantees matter most, so we’ll focus there.

Standard Ethernet (802.3) provides best-effort frame delivery. Switches forward frames as fast as they can, but offer no timing guarantees. Under congestion, frames queue up or get dropped.

802.1Q VLANs add traffic isolation and priority tagging. The Priority Code Point (PCP) field allows 8 priority levels, enabling switches to give real-time traffic preferential treatment. This is the minimum requirement for protocols like AES67 and RAVENNA.

AVB (Audio Video Bridging) introduced bandwidth reservation and traffic shaping:

802.1Qav (Credit-Based Shaper): Guarantees bandwidth for time-sensitive streams by shaping traffic at each hop
802.1AS (gPTP): Generalized Precision Time Protocol for switch-aware clock synchronization

TSN (Time-Sensitive Networking) extends AVB with stricter guarantees:

802.1Qbv (Time-Aware Shaper): Time-division multiplexing—specific time slots reserved for specific traffic classes
802.1Qbu/802.3br (Frame Preemption): High-priority frames can interrupt low-priority frame transmission
802.1CB (Seamless Redundancy): Duplicate frames on redundant paths for zero-loss failover

PTPv2 (IEEE 1588) provides sub-microsecond clock synchronization across the network. Required by AES67, RAVENNA, and TSN-based industrial protocols for sample-accurate timing.

The evolution follows a pattern: as applications demanded tighter timing guarantees, the link layer grew more sophisticated. Pro audio (AES67, RAVENNA) requires AVB-class infrastructure. Industrial automation increasingly requires TSN. The tradeoff is always the same: tighter guarantees require more specialized (and expensive) infrastructure.

Protocols that depend on link-layer reliability—AES67, RAVENNA, EtherCAT, PROFINET IRT—work beautifully in purpose-built facilities with managed switches. They fail the moment traffic crosses an uncontrolled network boundary. 802.11ax and earlier can’t provide AVB/TSN guarantees at all; WiFi 7 (802.11be) adds TSN support, but the wireless medium’s inherent challenges—unreliable links, asymmetric path delay, interference—make sub-microsecond synchronization difficult. This is why SRT and RIST exist: to provide reliability at the application layer, where it works over any IP network.

The Link Layer Trap: Layer 2 protocols cannot traverse Layer 3 routers. Period. AES67, TSN, and EtherCAT depend on Ethernet frame timing that routers destroy—buffering, reordering, and variable delay are fundamental to IP routing. High bandwidth doesn’t help; a 10 Gbps WAN link with 5ms of jitter is useless for protocols expecting microsecond precision. If you need to cross network boundaries, use an application-layer protocol designed for it (SRT, RIST, WebRTC) or accept that you’re building a private L2 network.

Recovery Philosophies

Every protocol in our list falls into one of several categories based on how it answers the packet-loss question:

Domain	Guaranteedretransmit until delivered	Bounded-timeretry if time remains	Best-effortaccept loss	Configurablepolicy selects	Fast ARQaggressive retry
Media	RTMP, HLS, DASH	WebRTC, SRT, RIST	RTP, AES67, RAVENNA
Industrial	Modbus TCP	GVSP		OPC UA, EtherNet/IP
IoT				MQTT, CoAP
Robotics				RTPS/DDS, zenoh
Gaming				ENet	KCP
Financial			ZeroMQ	Aeron
Web	QUIC	MoQ
General	TCP, SCTP		UDP	PR-SCTP

The difference between these philosophies becomes concrete when we trace what happens during a packet loss event:

Guaranteed Delivery

TCP, SCTP, and QUIC guarantee that every byte reaches the destination, in order. They achieve this through acknowledgments and retransmission—if a packet is lost, they keep trying until it arrives.

The cost is unbounded latency—though what users observe first is throughput collapse. As The TCP Jitter Cliff demonstrated, when loss or jitter occurs, TCP’s congestion control throttles the sending rate and head-of-line blocking stalls delivery until missing packets are recovered. For a continuous stream, the playout buffer drains while waiting. Latency grows without bound because TCP will retry until it succeeds. For file transfers, this is fine—the file arrives intact. For live video, it’s fatal: a frame that arrives 10 seconds late is worthless.

QUIC improves on TCP by providing independent streams—packet loss in one stream doesn’t block others. But each stream still guarantees delivery, so the fundamental problem remains for latency-sensitive data.

Why UDP-based protocols avoid the chaos zone: TCP couples reliability with congestion control—when packets are lost, TCP CUBIC (the default algorithm) assumes congestion and aggressively backs off, sometimes collapsing bandwidth by 50% or more. The “chaos zone” emerges because this backoff is often excessive and recovery is slow. UDP-based protocols like SRT decouple these concerns: they implement their own reliability (bounded retries) and their own congestion control (often less aggressive, or sender-controlled). This lets them maintain throughput through loss events that would collapse TCP, at the cost of occasionally dropping data that’s too late to be useful.

Bounded-Time Recovery

SRT, RIST, WebRTC, and GVSP take a different approach: they try to recover lost packets, but only within a time budget. If a packet can’t arrive in time to be useful, they drop it rather than delaying everything behind it.

SRT makes this explicit with its latency setting. Configure 200ms latency, and SRT will attempt retransmission for packets that might still arrive within that budget. Packets that can’t make the deadline are dropped via TLPKTDROP (Too-Late Packet Drop). The receiver gets bounded latency at the cost of occasional loss.

WebRTC adapts dynamically. When RTT is low (<100ms round-trip), it uses NACK-based retransmission—fast enough to recover packets without noticeable delay. When RTT is high, it switches to FEC (Forward Error Correction), adding redundant data that lets the receiver reconstruct losses without waiting for retransmission.

GVSP operates per-frame. Each image has a block ID; the receiver can request retransmission of missing packets via PACKETRESEND. But there’s a timeout—if the frame can’t be completed in time, it’s dropped and the system moves to the next frame.

The latency bounds differ by an order of magnitude depending on the audience. SRT and WebRTC are designed with human perception in mind—ITU research shows conversational quality degrades rapidly above 150ms one-way delay, with 200ms being the upper bound for natural interaction. GVSP targets machine-to-machine interfaces where vision systems must react in single-digit milliseconds. Same philosophy; very different timescales.

MoQ (Media over QUIC): An emerging IETF standard applying bounded-time thinking to web-scale delivery. Unlike HLS/DASH (client pulls segments over HTTP, 10-60s latency), MoQ uses pub/sub: subscribers receive objects pushed through relay infrastructure as they’re published. Unlike WebRTC (P2P, hard to scale), MoQ’s relay architecture enables CDN-like fan-out. The bounded-time mechanism: objects (frames, samples) carry expiry metadata and can be dropped by relays before delivery—sub-second latency without head-of-line blocking. FETCH operations support late-join and seeking. Still maturing as of 2025.

Best-Effort

UDP provides no reliability at all. RTP adds sequence numbers for reordering and timestamps for synchronization, but base RTP doesn’t retransmit—if a packet is lost, it’s gone.

This isn’t a flaw; it’s a feature. It gives applications maximum control. A video codec might handle loss through error concealment; an audio application might interpolate. The protocol doesn’t impose a recovery strategy.

RTP can be extended with reliability mechanisms (RFC 4585 for NACK feedback, RFC 4588 for retransmission format, SMPTE-2022 for FEC), but these are optional layers.

ZeroMQ is also best-effort in practice—while it provides reliable delivery within a session, there’s no persistence across crashes and no guaranteed delivery if the receiver isn’t connected. Messages can be dropped under backpressure depending on socket type and high-water mark settings.

Best-Effort with Link-Layer Dependency

AES67 and RAVENNA are best-effort at the application layer—they have no retransmission mechanism. What distinguishes them from raw RTP is their explicit dependency on link-layer reliability technologies.

These protocols assume you’ve selected an appropriate link layer: AVB (802.1Qav) for bandwidth reservation, TSN (802.1Qbv) for time-aware scheduling, or at minimum managed switches with QoS (DiffServ EF class), dedicated VLANs, and traffic shaping. PTPv2 (IEEE 1588) provides clock synchronization. The link layer is expected to prevent loss.

But link layers fail too. Equipment reboots, cables get disconnected, switches get misconfigured. When this happens, AES67/RAVENNA have no recovery mechanism—they glitch or go silent, exactly like any best-effort protocol would under total loss. The application must still handle connectivity events, and while the pattern differs (link failure causes sustained loss, congestion causes transient loss), the protocol provides no way to recover from either.

Industrial protocols have the same pattern: EtherCAT, PROFINET IRT, and TSN-based systems all push reliability to Layer 2. The philosophy is consistent: best-effort at L4-L7, with reliability expected from L2.

The tradeoff: simpler protocol complexity, but you’ve just moved the reliability problem rather than solved it.

Configurable

Some protocols let you choose the recovery behavior.

RTPS/DDS offers independent QoS policies: Reliability (RELIABLE or BEST_EFFORT), Durability (VOLATILE, TRANSIENT_LOCAL, TRANSIENT, PERSISTENT), History (KEEP_LAST or KEEP_ALL), plus timing constraints like DEADLINE and LIFESPAN. These policies combine orthogonally—you can have reliable-volatile, best-effort-transient, or any other combination that makes sense for your data.

zenoh offers similar configurability—reliability and congestion control policies are selectable per-resource. Unlike DDS, zenoh can also store data and answer queries, blurring the line between messaging and database.

MQTT provides per-message QoS levels: 0 (at-most-once, fire-and-forget), 1 (at-least-once, may duplicate), or 2 (exactly-once, four-step handshake). Separately, session persistence controls whether the broker queues QoS 1/2 messages for disconnected clients. Retained messages store the last value per topic for late joiners. These mechanisms are broker-mediated—the publisher and subscriber never communicate directly, which adds latency but enables massive fan-out and disconnected operation.

CoAP offers per-message reliability through message types: Confirmable (CON) messages are acknowledged and retransmitted until delivery succeeds, while Non-confirmable (NON) messages are fire-and-forget. The application chooses per-request based on importance.

ENet provides per-packet reliability via flags: Reliable (acknowledged, retransmitted), Unsequenced (unreliable, out-of-order allowed), or default (unreliable but sequenced). Multiple channels provide independent sequencing—so reliable packets on one channel don’t block unreliable packets on another. This lets games send critical events (player death) reliably while streaming position updates unreliably, without head-of-line blocking between them.

PR-SCTP (Partial Reliability SCTP) is particularly interesting: you can specify reliability policy per-message. “Timed reliability” retransmits for a duration then gives up. “Limited retransmissions” tries N times then drops. This lets you mix reliable control messages with unreliable video data in the same connection.

Historic and Deprecated Protocols

We can also set aside two protocols that are no longer actively deployed.

DCCP (Datagram Congestion Control Protocol)

DCCP, specified in RFC 4340 (2006), solved a real problem: congestion control without reliability. It let applications get UDP-like unreliable delivery while still being “good citizens” that back off during congestion.

Linux supported DCCP from kernel 2.6.14, but it was deprecated in 6.4 and removed entirely in 6.16 (2025). Why did it fail?

No ecosystem: Few applications implemented it
Alternatives emerged: Application-layer protocols like SRT handle congestion control themselves
Kernel adoption barrier: Transport-layer protocols require OS support; application-layer protocols just need a library

DCCP is a cautionary tale: good design isn’t enough. Ecosystem momentum matters.

RDP (Reliable Data Protocol)

RFC 908 (1984) defined the Reliable Data Protocol for remote loading and debugging. RFC 1151 (1990) updated it, but the protocol remained “experimental” and never achieved significant deployment. Its ideas influenced later protocols like RUDP and SCTP.

Protocol Properties for Delay-Sensitive Applications

What makes a protocol suitable for delay-sensitive applications? Three properties matter:

Bounded latency: Maximum delay is configurable or guaranteed
Drop semantics: Protocol can discard data that’s “too old”
Real-time feedback: Sender learns about loss quickly

A caveat about feedback: many protocols rely on companion protocols for feedback rather than building it in. RTP uses RTCP for receiver reports and loss notifications. GVSP uses GVCP for control and packet resend requests. SRT and WebRTC integrate feedback into the main protocol. The distinction matters for implementation but not for the capability—what matters is whether the sender can learn about loss in time to react.

A protocol needs at least two of these to be genuinely delay-sensitive.

Protocol	Bounded Latency	Drop Semantics	RT Feedback
Delay-Sensitive (sub-second, all three properties)
Aeron	Yes (<100μs)	Optional (reliable=false)	Yes
AES67	Yes (<10ms)	Yes	Yes (RTCP)
RAVENNA	Yes (<10ms)	Yes	Yes (RTCP)
GVSP	Yes (<10ms)	Yes	Yes (GVCP)
WebRTC	Yes (<500ms)	Yes	Yes
SRT	Yes (<500ms)	Yes	Yes
RIST	Yes (<500ms)	Yes	Yes
MoQ	Yes (<500ms)	Yes	Yes
Buffered (all three properties, but wrong timescale)
HLS	Yes (10-30s)	Yes	Yes
DASH	Yes (10-30s)	Yes	Yes
Partial (missing one or more properties, or configurable)
RTP	No	Application	Yes (RTCP)
RTPS/DDS	Configurable	Configurable	Yes
zenoh	Configurable	Configurable	Yes
OPC UA	Configurable (TSN)	Configurable	Yes
EtherNet/IP	Yes (CIP Motion)	No	Yes
SCTP	No	PR-SCTP only	Yes
KCP	Lower than TCP	No	Yes
ENet	Lower than TCP	Per-packet	Yes
UDP	N/A	N/A	No
Not Delay-Sensitive (missing critical properties)
TCP	No	No	Yes
QUIC	No	No	Yes
RTMP	No	No	Yes
MQTT	No	No	Yes
CoAP	No	CON/NON	Yes
ZeroMQ	No	No	Yes
Modbus TCP	No	No	Yes

Not Delay-Sensitive

TCP: The baseline for comparison, but not delay-sensitive itself.

UDP: Context only—the substrate that application-layer protocols build on.

HLS/DASH (Adaptive Bitrate Streaming): When you click play on Netflix or YouTube, video starts within seconds—but this isn’t delay-sensitive streaming. These protocols work differently:

Video is pre-segmented into 2-10 second chunks on the server
The player downloads chunks over HTTP/TCP, buffering 10-30+ seconds ahead
Quality adapts dynamically based on available bandwidth
Latency is typically 10-60 seconds from live (the “short delay” is just initial buffering)

This is buffered streaming—designed for on-demand or near-live content where reliability and quality adaptation matter more than real-time delivery. Perfect for watching a movie; unusable for a video call. Low-Latency HLS (LL-HLS) and Low-Latency DASH can reduce latency to 2-5 seconds, but still far from the sub-second requirements of truly delay-sensitive applications.

QUIC: Optimized for web performance (0-RTT connection establishment, independent streams), but still guarantees delivery. Used by YouTube for buffered streaming, not live.

RTMP: TCP-based video ingest. Guarantees delivery but with unbounded latency—being replaced by SRT for professional contribution.

MQTT: Event-driven pub/sub for IoT telemetry. Not delay-sensitive for streams, but absolutely delay-sensitive for events—turning on a light switch needs <200ms response, and MQTT QoS 0 typically achieves this over reliable LANs. The distinction matters: stream latency cares about jitter (variation kills playout buffers); event latency cares about tail latency (the 99th percentile command must still be fast). MQTT is designed for discrete commands and sensor readings, not continuous media. The broker-mediated architecture adds a network hop but enables massive fan-out and store-and-forward.

CoAP: Like MQTT, designed for discrete request/response interactions rather than continuous streams. The Observe extension adds notifications, but it’s still event-driven telemetry, not media.

ZeroMQ: High-throughput messaging, but no bounded latency guarantees. Best-effort delivery with potential drops under backpressure. Optimized for throughput and flexibility, not for meeting timing deadlines.

Protocol Landscape

This visualization plots protocols by latency bound (X-axis) and recovery strategy (Y-axis), revealing why delay-sensitive protocols cluster in the top-left.

The scale of latency differences across protocols is striking—five orders of magnitude separate ultra-low-latency messaging from buffered streaming:

Human Perception: The Biological Constraints

The green “Human perception” zone on the diagram isn’t arbitrary—it reflects fundamental biological limits that constrain protocol design.

Temporal resolution (10-20ms): The human auditory system can detect gaps and distinguish separate events down to about 10-20 milliseconds. Below this threshold, sounds blur together; above it, we perceive distinct events. Hirsh’s research identified three temporal ranges: 1-20ms for phase perception, 20-100ms for pattern recognition, and >100ms for perceiving separate events. This 10ms floor explains why professional audio protocols (AES67, RAVENNA) target single-digit millisecond latency—anything higher and musicians hear the delay between their input and the monitored output.

Conversational quality (< 150ms): ITU-T G.114 establishes thresholds for interactive voice based on extensive user studies. Below 150ms one-way delay, conversation quality is unaffected—users don’t notice the latency. Between 150-400ms, quality degrades noticeably: users become aware of delay, and natural conversation rhythm breaks down as people start talking over each other. Above 400ms, the delay becomes unacceptable for interactive use.

Why interactive protocols exceed the zone: WebRTC, SRT, and RIST target 100-500ms—extending well beyond the 150ms conversational threshold. This isn’t a design flaw; it’s a practical tradeoff. ARQ-based recovery needs headroom: detecting loss, requesting retransmission, and receiving the retry takes at least 2-4× RTT. On a 50ms RTT link, that’s 100-200ms just for recovery. These protocols are designed for unreliable public internet where you can’t guarantee low RTT—so they budget extra latency to enable recovery. For contribution workflows (field-to-studio), one-way delay is acceptable because there’s no real-time conversation. The latency budget is the price of reliability over unpredictable networks.

Note that configurable protocols like RTPS/DDS, zenoh, OPC UA, EtherNet/IP, and ENet don’t appear on this diagram—their latency bounds depend entirely on configuration. This flexibility is powerful but creates risk: these protocols are complex and prone to misconfiguration. A system intended to meet a 10ms deadline might silently fail to do so if QoS policies, network infrastructure, or timing parameters aren’t correctly aligned.

This is precisely where diagnostic tools like JitterTrap can add value—not just for troubleshooting failures, but for validating that a configured system actually achieves its timing requirements. We’ll explore this in more detail in a future article on industrial protocol diagnostics.

Application Domains

The delay-sensitive protocols cluster into several domains:

Media Streaming — Video and audio delivery, from interactive calls to broadcast distribution
Industrial / Embedded — Machine-to-machine communication with deterministic timing requirements
IoT / Constrained Devices — Telemetry and control for resource-limited sensors and actuators
Robotics / Autonomous Systems — Sensor fusion and coordination for vehicles and robots
Gaming / Real-time Applications — Game state synchronization and interactive multiplayer
Financial Systems — Trading infrastructure, market data distribution, risk calculation

Let’s examine each domain, starting with Media Streaming—which itself spans four distinct sub-domains with latency requirements differing by 1000×.

Media Streaming

Interactive (<150ms)

Two-way communication where latency directly affects conversation quality. ITU research shows <150ms is acceptable for natural interaction; above 200ms, people start talking over each other.

WebRTC: The dominant protocol for browser-based interactive media. Combines RTP/RTCP with NACK-based retransmission, FEC, and GCC (Google Congestion Control). Adapts strategy to network conditions: NACK when RTT is low enough for retransmission to help, FEC when it isn’t.

RTP (Real-time Transport Protocol): The foundation that WebRTC builds on. RFC 3550 defines sequence numbers for reordering, timestamps for synchronization, SSRC identifiers for multiplexing. By itself, RTP provides no reliability—extensions add it:

RFC 4585 (AVPF): Early RTCP feedback for faster loss detection
RFC 4588: Retransmission payload format

Contribution (120-500ms)

One-way transmission from field to studio over unreliable networks. Latency budget must be at least ~4×RTT to allow for loss detection and retransmission.

SRT (Secure Reliable Transport): Configurable latency budget (default 120ms), NAK-based retransmission, and TLPKTDROP—packets that can’t arrive within budget are dropped rather than causing unbounded delay. Built-in AES encryption. Dominant for contribution over public internet.

RIST (Reliable Internet Stream Transport): Similar goals to SRT, built on RTP/SMPTE-2022 heritage. Supports connection bonding (multiple network paths), multicast distribution, and DTLS certificate authentication. Preferred for broadcast distribution networks.

Professional Facility (<10ms)

Studio and venue environments with controlled network infrastructure. Latency requirements are sub-10ms, often sub-millisecond. These protocols assume managed networks with QoS, VLANs, and often TSN/AVB.

AES67: Interoperability standard for professional audio-over-IP. Built on RTP with PTPv2 synchronization. No retransmission—relies on network engineering rather than recovery mechanisms. Enables Dante, Livewire, RAVENNA, and other systems to exchange audio.

RAVENNA: Superset of AES67. Adds stream discovery, device management, higher sample rates (up to 192kHz). Same philosophy—expects properly engineered infrastructure.

RTP + SMPTE-2022: Professional broadcast video. SMPTE-2022-1 adds FEC for loss recovery without retransmission latency. Used in broadcast facilities where network infrastructure is controlled.

Common Challenges

Across all media streaming use cases, the challenges are similar: loss during I-frames is catastrophic while loss during P-frames may be concealable; jitter must be absorbed without visible stutter; and congestion control must avoid oscillation that causes quality flickering.

JitterTrap relevance: High. Network-induced jitter directly causes visible artifacts. Distinguishing network problems from encoder/decoder problems is valuable. SRT’s explicit latency budget creates clear diagnostic thresholds.

Industrial / Embedded

Machine-to-machine communication with deterministic timing requirements. Typically LAN-only with controlled infrastructure. Cycle times range from microseconds (motion control) to hundreds of milliseconds (process monitoring).

Unlike Media Streaming—where the use case determines latency requirements (human perception drives the targets)—Industrial protocols are differentiated by their architecture. The network approach determines what cycle times are even possible. You can’t configure Modbus TCP to achieve 100µs cycles; the TCP/IP stack fundamentally prevents it. Achieving faster cycle times requires progressively more specialized infrastructure.

IEC 61784-2 defines performance classes for industrial Ethernet. Rather than using the overloaded term “real-time,” we can describe protocols by what they actually deliver:

Polling / On-demand (10-100ms): Request-response communication with no timing guarantees. OS scheduling and TCP congestion control introduce variable latency.
Cyclic (1-10ms): Periodic updates at software-scheduled intervals. Bounded jitter through kernel bypass and managed switches.
Synchronized (100µs-1ms): Time-coordinated communication using PTP/TSN. Deterministic scheduling reserves network capacity.
Isochronous (<100µs): Hardware-timed, equidistant intervals with sub-microsecond jitter. Requires specialized NICs and network topology.

The protocols we’ve encountered map to these tiers based on their underlying architecture:

Polling / On-demand (10-100ms)

Modbus TCP: Simple request/response protocol for PLCs and sensors. TCP congestion control introduces variable latency—this protocol was never designed for deterministic timing. Cycle times are tens of milliseconds (vs hundreds of ms for serial RTU). ADU limited to 260 bytes (253 data + 7 header). Polling-based with no event mechanism, consuming bandwidth on busy networks. No built-in security. Still ubiquitous due to simplicity and universal support.

OPC UA (Client/Server): Unified Architecture for industrial interoperability. Client/server mode uses TCP for browsing, subscriptions, and method calls. Semantic data modeling enables machine-readable information exchange, but TCP transport places it in the polling tier.

Cyclic (1-10ms)

GVSP (GigE Vision Streaming Protocol): Transports images from machine vision cameras over UDP. Frame-oriented: each image has a block ID, and the receiver can request retransmission of specific packets via GVCP PACKETRESEND command. When the driver detects an out-of-sequence packet, it buffers subsequent arrivals and sends a resend request after a configurable timeout. Filter drivers bypass the OS network stack to reduce latency. Typical latency: single-digit milliseconds on well-configured LANs.

EtherNet/IP (Implicit I/O): Common Industrial Protocol (CIP) over standard Ethernet. Base protocol uses UDP for implicit (cyclic) I/O—achieving ~10ms cycle times with managed switches and QoS. TCP is used separately for explicit messaging (configuration, diagnostics). Widely deployed in North American manufacturing.

Synchronized (100µs-1ms)

EtherNet/IP + CIP Sync: CIP Motion extension adds IEEE-1588 synchronization (CIP Sync) for 1ms motion task updates with <100ns jitter across 100+ axes. TSN integration is in development.

OPC UA PubSub + TSN: PubSub mode over UDP enables deterministic data distribution—with TSN time-aware shaping, RTT of ~800µs is achievable for typical payloads. Without TSN, jitter is ±250µs. Full TSN integration (OPC UA FX) is still being standardized.

Isochronous (<100µs)

EtherCAT and PROFINET IRT achieve cycle times below 100µs (31.25µs for PROFINET IRT) through specialized hardware and modified Ethernet framing. EtherCAT processes data on-the-fly as frames pass through each device. PROFINET IRT reserves dedicated time slots in the Ethernet frame schedule. Critical for servo control and high-speed automation, but these operate at timescales where software-based measurement tools have limited visibility.

Challenges

Hard deadlines (microseconds to milliseconds), frame-complete delivery requirements (partial images are useless), integration with PLC scan cycles and servo loops, and TSN/AVB for deterministic scheduling on standard Ethernet.

JitterTrap relevance: Potentially high for Polling and Cyclic tiers, where software-based measurement is feasible. The configurable nature of protocols like OPC UA and EtherNet/IP creates opportunity—systems intended to meet timing requirements may silently fail to do so if QoS policies, network infrastructure, or timing parameters aren’t correctly aligned. JitterTrap could help validate that configured systems actually achieve their timing goals. The challenge is that industrial systems are more complex, more regulated, less publicly documented, and have significant barriers to entry for outside tooling. This is an area worth exploring in future work.

IoT / Constrained Devices

Telemetry and control for resource-limited sensors and actuators. These protocols optimize for devices with limited memory, processing power, and energy—often battery-powered and connected over constrained networks.

MQTT: Event-driven pub/sub for telemetry. Broker-mediated architecture—all messages flow through a central broker, enabling massive fan-out, store-and-forward for disconnected clients, and simple device onboarding. Three QoS levels: 0 (fire-and-forget), 1 (at-least-once), 2 (exactly-once). Designed for discrete sensor readings and commands arriving irregularly—not continuous streams. The hub-and-spoke model adds latency but simplifies connectivity for thousands of devices.

CoAP: RESTful protocol for constrained devices. UDP-based (unlike MQTT’s TCP), with a compact binary format—4-byte base header vs HTTP’s verbose text. Request/response model with GET/PUT/POST/DELETE on resources. Two message types: Confirmable (acknowledged, retransmitted) and Non-confirmable (fire-and-forget). Observe extension enables pub/sub-like notifications without polling. Designed for devices too constrained for TCP—tiny microcontrollers, battery sensors, mesh networks.

Challenges

Power efficiency (radio duty cycling), unreliable connectivity (cellular, LoRa), massive scale (millions of devices), and security on constrained hardware.

JitterTrap relevance: Low for typical IoT—discrete events don’t have continuous timing requirements. Higher relevance for control loops (actuator commands) where latency affects physical outcomes.

Robotics / Autonomous Systems

Sensor fusion and coordination for vehicles and robots. These protocols carry typed data samples at fixed intervals—IMU orientation at 100Hz, lidar scans at 10Hz, joint positions at 1kHz—not unlike video frames, but with rich semantics rather than compressed pixels.

RTPS/DDS: Data-centric pub/sub middleware. Key insight: data has a type and lifecycle, not just bytes. Decentralized architecture—participants discover each other then exchange data peer-to-peer with no broker in the data path. Independent QoS policies control reliability (RELIABLE/BEST_EFFORT), durability (VOLATILE through PERSISTENT), history depth, timing (DEADLINE), and sample lifetime (LIFESPAN). Used for sensor fusion in autonomous vehicles, telemetry in aerospace, and robot coordination in ROS2.

zenoh: Modern alternative designed to fix DDS scalability issues. DDS’s verbose discovery and fully-connected participant mesh don’t scale to large deployments or constrained networks (like multi-robot mesh). zenoh provides decentralized pub/sub with optional routing infrastructure, unifying data in motion (pub/sub), data at rest (storage), and data in use (queries). Selected as official ROS 2 alternative middleware in 2024. Research shows lower delay and overhead than DDS over mesh networks with dynamic topology.

Challenges

Discovery in dynamic environments, handling network partitions gracefully, meeting timing deadlines across heterogeneous compute (GPU, CPU, microcontroller), and bridging between different middleware in multi-vendor systems.

JitterTrap relevance: High. Continuous sampling has real-time constraints similar to video. Missed sensor readings can cause control instability. Validating that a ROS2 system actually meets its DEADLINE QoS requirements is valuable diagnostic capability.

Gaming / Real-time Applications

Game state synchronization and interactive multiplayer. These protocols optimize for low latency over unreliable consumer internet, trading bandwidth for reduced delay.

KCP: Fast ARQ protocol popular in gaming and VPN tunneling. Key optimizations: faster retransmit (no delayed ACK), selective repeat, aggressive RTO calculation, no congestion window during recovery. Trades 10-20% extra bandwidth for 30-40% latency reduction vs TCP. Pure algorithm library—bring your own UDP transport. Popular in Asia for game servers and tools like kcptun for accelerating connections over lossy links.

ENet: Reliable UDP library developed for the Cube FPS. Key features: optional per-channel reliability and ordering (so unreliable position updates don’t block reliable chat messages), connection management with RTT/loss monitoring, packet fragmentation. Multiple independent channels avoid head-of-line blocking. Progressive retry timeouts adapt to network turbulence. Widely used in game engines—has official Unreal Engine plugin. BSD-licensed, cross-platform, no royalties.

Challenges

Variable player latency (10ms LAN vs 200ms intercontinental), client-side prediction and server reconciliation, cheat prevention (authoritative servers vs responsive clients), and graceful degradation during packet loss spikes.

JitterTrap relevance: Medium. Game developers typically build custom latency monitoring into their netcode. External tools are useful for network-level diagnosis—distinguishing ISP problems from server issues—but less useful for game-specific logic.

Financial Systems

Trading infrastructure, market data distribution, and risk calculation. These protocols optimize for microsecond latency and millions of messages per second—where being slower than competitors means losing money.

Aeron: High-throughput messaging for capital markets. Achieves <100μs latency in cloud, <1μs over IPC, with throughput exceeding 350k messages/second (open-source) to 20M+ messages/second (Premium). Supports UDP unicast, multicast, and multi-destination-cast (MDC) for cloud environments where multicast isn’t available. Lock-free, zero-allocation design. Primary use cases: market data distribution, trade execution, risk workflows. Recovery model: Reliable by default (NAK-based, recovers all messages like TCP but faster). Optional reliable=false mode gap-fills losses with padding instead of retransmitting—useful for multicast where some subscribers may accept loss for lower latency. This is the opposite of SRT, which drops late packets by default (TLPKTDROP). Aeron assumes a high-quality network; SRT assumes a hostile one.

ZeroMQ: Brokerless message queue library with socket abstraction. Multiple patterns: REQ/REP (request-reply), PUB/SUB (publish-subscribe), PUSH/PULL (pipeline), DEALER/ROUTER (async). Lock-free queues for inter-thread, IPC for same-machine, TCP/PGM for network. No broker means no double network hop—critical for low latency. Financial industry heritage (iMatix, original AMQP designers, left AMQP for ZeroMQ citing simplicity and speed). Not a message broker—no persistence, no guaranteed delivery across crashes.

Challenges

Tail latency (99.9th percentile matters more than median), exactly-once semantics without adding latency, multicast in cloud environments (often unavailable), and coordination between geographically distributed systems.

JitterTrap relevance: Medium-High. Aeron’s high packet rates (350k+ msg/sec) mean packet-per-second measurements at millisecond resolution can detect throughput anomalies—a drop in market data rate is actionable even if individual packet latency is below measurement threshold. Network jitter is a known cause of trading system problems.

Inspecting Delay-Sensitive Traffic on Linux

Standard Linux tools have limits when inspecting delay-sensitive protocols:

ss -u -a shows UDP socket stats, but only for kernel sockets. Application-layer protocols like SRT and WebRTC maintain their own statistics internally—ss sees UDP packets, not the ARQ state machine.
Kernel-bypass protocols (DPDK, some industrial drivers, Aeron’s embedded media driver) don’t show up at all—traffic flows directly between NIC and userspace, invisible to standard tools.
Encrypted protocols (SRT with AES, WebRTC with DTLS-SRTP) hide their control flags from packet captures. You see bytes, not retransmission requests.

This is why external measurement matters. Tools like JitterTrap measure at the network interface—before encryption, before kernel bypass, before the application’s view of the world. Packet timing patterns reveal protocol behavior that socket stats and packet captures can’t show: the regular cadence of a healthy stream, the burst-and-gap signature of retransmission, the throughput collapse when loss exceeds recovery capacity.

Where to Go From Here

This survey maps the landscape. Future posts will dive deeper into specific areas:

Video Streaming Protocols: A detailed comparison of RTP, WebRTC, SRT, RIST, and GVSP—including an SRT deep dive covering TLPKTDROP, latency budgets, and why SRT is displacing RTMP for contribution. This is where most JitterTrap users will find immediate value.

Industrial Protocols: GVSP packet retransmission, EtherNet/IP cyclic I/O, and diagnosing timing issues in factory automation.

Each domain deserves its own investigation. The goal isn’t to pick a winner, but to understand what problems JitterTrap can help solve in each space—and to build the diagnostic tools that make delay-sensitive networking less mysterious.

This is part of a series on delay-sensitive networking. Previous: Diagnosing Video Stuttering Over TCP and The TCP Jitter Cliff. Next: Video Streaming Protocols (coming soon).

This work is part of Project Pathological Porcupines—an ongoing exploration of delay-sensitive networking problems and how JitterTrap can help diagnose them.