The Jitter Cliff: When TCP Goes Chaotic

In Part 1, we used “video over TCP” as a stress test for TCP’s behavior—examining how zero-window events, retransmits, and the masking effect reveal what’s happening inside a struggling connection.

But during those experiments, I discovered TCP throughput degraded rapidly as jitter worsened. While I knew that packet loss would destroy TCP throughput, I hadn’t quite expected the jitter-induced cliff.

At a certain jitter threshold, throughput collapses so severely that measurements become unreliable. Single tests can vary by over 100%. This “chaos zone” makes diagnosis treacherous: the same network conditions can produce wildly different results depending on when you measure.

This post explores TCP’s behavior under jitter and loss, comparing CUBIC and BBR. It’s common knowledge that TCP is inappropriate for delay-sensitive streaming data, and this post will try to demonstrate how and why.

Experimental Setup

The findings in this post come from controlled experiments using Linux network namespaces with tc/netem to inject precise network impairments (delay, jitter, loss).

Why these constraints? We wanted to study a single TCP stream carrying streaming data—video, audio, or telemetry—where the application produces data at a relatively consistent rate rather than as fast as possible. This differs from bulk file transfers, where the goal is maximum throughput and applications use large buffers with TCP autotuning (up to 32 MB on Linux). By using a modest receive buffer (256 KB) and a fixed send rate (80 Mbps), we model a constrained streaming scenario where the jitter cliff behavior becomes visible. The absolute throughput numbers in this post reflect this setup; the ratios between algorithms and conditions are what generalize.

Key parameters:

Parameter	Value	Notes
Target send rate	10 MB/s (80 Mbps)	Sender paced at this rate
Receive buffer	256 KB	Sized to avoid zero-window events
RTT range	24-100ms	Controlled via `netem delay`
Jitter range	0-24ms	Controlled via `netem delay ... distribution`
Loss range	0-5%	Controlled via `netem loss`
Duration	10 seconds per test	Multiple iterations for statistical validity
Topology	veth pairs	Between Linux network namespaces

The receive buffer was set to 256 KB via SO_RCVBUF. Linux doubles this value internally for bookkeeping overhead (see socket(7))—verified by getsockopt() returning 512 KB. Packet capture analysis showed:

Requested: 256 KB (setsockopt)
Allocated: 512 KB (getsockopt)
Maximum advertised window: 313 KB
RTT: 50.2ms (from TCP timestamps)

This gives a theoretical maximum of 313 KB / 0.050s ≈ 50 Mbps. The measured baseline of ~42 Mbps is 85% of this theoretical maximum—the gap reflects TCP overhead from slow start, congestion window probing, and ACK timing. All figures in this post are relative to this ~42 Mbps baseline.

Why baseline throughput is ~42 Mbps, not 80 Mbps: TCP throughput is limited by the bandwidth-delay product (BDP)—the amount of data that can be “in flight” (sent but not yet acknowledged) at any moment. With 50ms RTT, the sender can only have ~50ms worth of data outstanding before it must wait for acknowledgments. Even in these benign conditions, throughput is significantly below the target rate of 80 Mbps.

The experiments compared CUBIC (Linux default) and BBR congestion control algorithms.

Results were obtained from more than 1500 experimental runs, sweeping the jitter, delay, loss and Congestion Control Algorithm parameters, using Linux kernel 6.12 with BBRv3.

The Jitter Cliff: RTT-Relative Collapse

A moderate amount of jitter (variation in packet delay) is inevitable on real networks. Routers queue packets, wireless links have variable latency (variable transmission rates due to variable modulation coding schemes), and congested paths add unpredictable delays. TCP is designed to handle this, within bounds.

This investigation is about learning where the boundaries are and what happens near them. Part 1 showed there is a rapid decay in throughput. Let’s characterise it: The cliff occurs when jitter reaches roughly 15-30% of the RTT, depending on the congestion control algorithm and conditions.

The plot shows throughput retention (percentage of maximum achievable throughput) versus jitter expressed as a percentage of RTT. Above 50% retention is functional; below is degraded or unusable.

RTT	CUBIC Cliff	BBR Cliff	Notes
24ms	±8ms (33%)	±4ms (17%)	BBR more sensitive at low RTT
50ms	±12ms (24%)	±8ms (16%)	BBR collapses earlier
100ms	±16ms (16%)	±16ms (16%)	Both similar at high RTT

Notably, BBR’s cliff occurs earlier than CUBIC’s at lower RTTs. At 50ms RTT, BBR starts degrading at 16% jitter while CUBIC holds until 24%. BBR’s pacing model is more sensitive to timing disruptions from jitter, even though it handles packet loss better (as we’ll see later).

These thresholds matter because real-world networks operate in this range. Starlink, for example, has baseline jitter of ~7ms with ~27ms RTT—a 26% ratio, right at the CUBIC cliff. We’ll examine this in detail later.

The Counter-Intuitive Implication

This leads to a surprising conclusion: higher RTT means more jitter tolerance.

At ±12ms of jitter:

50ms RTT: CUBIC 2.4 Mbps, BBR 4.7 Mbps (collapsed—jitter is 24% of RTT)
100ms RTT: CUBIC 16 Mbps (functional—jitter is only 12% of RTT)

That’s nearly a 7x difference in CUBIC throughput for the same absolute jitter, despite the higher RTT.

This matters for network paths that include a satellite hop: the satellite segment may contribute more or less to the TCP throughput collapse than you might imagine. If the 20% rule holds, a geostationary satellite link with 600ms RTT could theoretically handle ±120ms of jitter while a 50ms LEO hop would collapse at ±10ms. (Note: GEO latencies were not simulated or tested—this is an extrapolation.)

The Chaos Zone: Why Single Measurements Lie

Near the jitter cliff, results become highly variable. Run the same test five times and you might get five dramatically different answers.

Region	CUBIC CV	BBR CV	Interpretation
Below cliff	1-5%	1-5%	Stable, predictable
At cliff (chaos zone)	21-66%	14-31%	CUBIC unstable, BBR more predictable
Above cliff	5-19%	3-5%	Both collapsed, BBR still more stable

In the chaos zone at 50ms RTT with 10ms jitter, CUBIC showed coefficient of variation up to 66%—meaning throughput varied wildly between runs. BBR’s CV stayed below 20% even in the worst conditions.

Practical implication: If your network operates near the jitter cliff (jitter 10-30% of RTT), don’t trust single measurements. Run at least five tests and look at the distribution.

Why CUBIC Becomes Unpredictable

What causes CUBIC’s high variance in the chaos zone? The data shows the pattern clearly, even if the mechanism is complex.

At 10ms jitter (20% of RTT):

Algorithm	Range	Mean	CV
CUBIC	2-33 Mbps	12 Mbps	66%
BBR	4-9 Mbps	7 Mbps	18%

CUBIC’s distribution has a long right tail—occasional “lucky” runs achieved 3-4x the median throughput. BBR clusters tightly around its mean. This isn’t bimodal behavior; CUBIC seems to be highly sensitive to initial conditions while BBR remains consistent.

Packet capture analysis revealed the mechanism. The key metric is inter-packet gap—how long the sender waits between packets:

Condition	CUBIC Gap	BBR Gap
Low jitter (stable)	~0.5ms	~0.5ms
High jitter (collapsed)	5-8ms	2-3ms

When jitter crosses the threshold, CUBIC’s gap jumps 10-15x. BBR’s grows only 4-6x. Longer gaps mean lower throughput.

The hypothesis: High jitter causes spurious loss detection—packets arriving out of order or delayed beyond the retransmit timeout get mistaken for lost packets. Each “loss” triggers a cwnd reduction. But recovery is slow: during congestion avoidance, cwnd grows by roughly one segment per RTT. Above the threshold, reductions compound faster than recovery, and cwnd cascades toward its minimum. BBR survives better because its pacing-based approach doesn’t reduce sending rate aggressively on loss.

Caveat: We measured inter-packet gaps, not cwnd directly. This mechanism is plausible but not proven.

BBR vs CUBIC

Linux defaults to CUBIC for congestion control. BBR (Bottleneck Bandwidth and Round-trip propagation time) is an alternative developed by Google. The question everyone asks: which is better?

The answer: it depends on the amount of jitter and loss you expect on your path. For low amounts of jitter and loss, CUBIC still has a role to play.

The heatmap shows the BBR/CUBIC throughput ratio across different RTT and jitter combinations. Green means BBR is better, red means CUBIC is better, and yellow means they’re roughly equal.

When BBR Wins

Condition	BBR Advantage
Jitter > 30% of RTT	3-5x better
Post-cliff (both algorithms collapsed)	3-5x better
Any significant packet loss	2-17x better

After the cliff, CUBIC flatlines at 0.7-0.9 Mbps while BBR maintains 3-5 Mbps. BBR degrades gracefully; CUBIC collapses sharply.

When CUBIC Wins (or Ties)

Condition	Result
Jitter < 10% of RTT	Similar or CUBIC slightly better
Clean network, no loss	Similar performance

The Surprising Middle Ground

In the chaos zone (jitter 10-30% of RTT), BBR can actually be worse than CUBIC in terms of mean throughput. At 50ms RTT with ±8ms jitter (16% of RTT):

CUBIC: 29 Mbps mean (but CV = 21%)
BBR: 14 Mbps mean (CV = 31%)
BBR/CUBIC ratio: 0.49x

BBR achieves roughly half the throughput of CUBIC at this operating point. However, CUBIC’s higher mean masks significant run-to-run variance. In the worst part of the chaos zone (±10ms jitter), CUBIC’s CV reaches 66%—meaning some runs achieve good throughput while others collapse entirely.

This caught me off guard. BBR’s model-based approach sometimes makes suboptimal decisions in conditions where CUBIC’s loss-based reactions happen to work better. However, BBR’s lower throughput is predictably lower—and for video streaming, predictable throughput often matters more than maximum throughput.

Loss Tolerance: Where BBR Dominates

Jitter creates complexity for TCP, but packet loss is where CUBIC truly struggles.

Loss Rate	CUBIC	BBR	BBR Advantage
0%	43 Mbps	43 Mbps	1.0x
0.1%	14 Mbps	39 Mbps	2.8x
0.25%	8 Mbps	34 Mbps	4.2x
0.5%	5 Mbps	29 Mbps	6.1x
1%	3 Mbps	25 Mbps	7.8x
2%	2 Mbps	21 Mbps	10.6x
5%	1 Mbps	17 Mbps	17.6x

(Tested at 50ms RTT, 0% jitter, 5 iterations per condition)

Even 0.1% packet loss—one packet in a thousand—causes CUBIC throughput to drop by 67%. BBR maintains 91% efficiency at the same loss rate.

At 5% loss, the difference is staggering: BBR provides 17.6x the throughput of CUBIC.

Why the Asymmetry?

CUBIC interprets every lost packet as a congestion signal and aggressively backs off. This is appropriate when loss is caused by buffer overflow—it means the network is genuinely oversaturated.

But loss can have other causes: wireless interference, cable faults, router bugs, or even ECN-incapable middleboxes dropping marked packets. In these cases, backing off doesn’t help—the path capacity hasn’t changed, just some packets got corrupted.

BBR uses a model-based pacing approach. It estimates the bottleneck bandwidth and RTT independently of loss, and paces packets accordingly. Random loss doesn’t cause BBR to dramatically reduce its sending rate.

Recommendation

On any network with adverse jitter or packet loss, use BBR. The advantage is significant.

Real-World Application: Starlink

The experiments so far used synthetic conditions. How do these findings apply to real networks? Starlink provides a good test case—it has measurable jitter, occasional packet loss, and millions of users trying to stream video over it.

Realistic Starlink Characteristics

Based on published measurements:

Metric	Typical Range	Source
RTT	25.7ms median (US), 30-80ms range	Starlink official, APNIC
Jitter	6.7ms average, 30-50ms at handover	APNIC
Packet Loss	0.13% baseline, ~1% overall	WirelessMoves, APNIC
Handover	Every 15 seconds	APNIC

Critically, Starlink’s loss is not congestion-related—it’s caused by satellite handovers and radio impairments. CUBIC’s loss-based congestion control misinterprets these as congestion signals, causing unnecessary throughput reduction.

Now we can connect these characteristics to the jitter cliff thresholds from earlier:

Profile	Jitter	RTT	Jitter/RTT	Cliff Zone
Baseline	±7ms	27ms	26%	At CUBIC cliff
Handover	±40ms	60ms	67%	Past both cliffs
Degraded	±15ms	80ms	19%	Chaos zone

Starlink’s baseline operation—not degraded, not during handover, just normal—sits right at the CUBIC cliff threshold. This explains why TCP performance over Starlink is so sensitive to congestion control algorithm choice.

Experimental Results

Profile	RTT	Jitter	Loss	CUBIC	BBR	BBR Advantage
Baseline	27ms	±7ms	0.125%	2.3 Mbps	7.3 Mbps	3.1x
Handover	60ms	±40ms	1.0%	0.5 Mbps	2.2 Mbps	4.2x
Degraded	80ms	±15ms	1.5%	1.3 Mbps	3.7 Mbps	2.9x

(200 iterations per profile, parameters matched to cited Starlink characteristics)

Even at baseline conditions with only 0.125% loss, BBR provides 3.1x the throughput of CUBIC.

A note on absolute vs. relative throughput: The absolute Mbps values above reflect our constrained test setup (256 KB buffer, single stream). Real-world Starlink achieves higher absolute throughput because applications use larger buffers and speed tests use multiple parallel connections. However, the ratios we measured match real-world observations remarkably well.

Real-World Validation

Independent testing on actual Starlink connections confirms the pattern we observed in simulation:

Source	CUBIC	BBR	BBR Advantage
Our simulation	2.3 Mbps	7.3 Mbps	3.1x
WirelessMoves (2023)	~20 Mbps	>100 Mbps	~5x

The WirelessMoves testing used single-connection iperf3 tests over real Starlink hardware. Despite the 10x difference in absolute throughput (due to larger buffers and different conditions), the BBR advantage ratio is consistent: 3-5x better than CUBIC.

This also explains why Starlink users don’t universally complain about poor TCP performance:

Speed tests use multiple parallel connections. Speedtest.net and similar tools aggregate many TCP streams, masking single-connection limitations. A user might see “150 Mbps” on Speedtest while a single video stream struggles at 20 Mbps with CUBIC.
Many major services use BBR. Google, Netflix, and Cloudflare have deployed BBR on their servers. Users streaming from these services get BBR’s benefits without changing anything on their end.
Adaptive bitrate masks the problem. Video services like YouTube and Netflix adjust quality based on available throughput. Users see “480p” instead of “buffering,” which feels like a content choice rather than a network failure.

Video Quality Mapping

What does this mean for actual video quality?

Condition	With CUBIC	With BBR	Recommendation
Baseline (2.3 vs 7.3 Mbps)	360p choppy	720p smooth	BBR strongly recommended
Handover (0.5 vs 2.2 Mbps)	Unusable	360p barely	BBR + buffer for handovers
Degraded (1.3 vs 3.7 Mbps)	360p choppy	480p usable	BBR essential

Video quality estimates assume ~2.5 Mbps for 480p, ~5 Mbps for 720p (H.264). These represent relative performance differences between algorithms under simulated Starlink conditions.

The Handover Problem

Starlink satellites hand off every 15 seconds. During handover (per APNIC measurements):

RTT spikes by 30-50ms (e.g., 30ms → 80ms)
Jitter increases significantly
Packet loss spikes occur

These are brief disruptions, not 15-second outages. Video applications need enough buffer to absorb the throughput dip during handover—likely a few seconds, not 15.

Practical Recommendations

Decision Guide

Stable networks (jitter < 10% of RTT, loss < 0.1%): Use CUBIC (Linux default)
Moderate jitter (10-30% of RTT): Test both algorithms—results vary
High jitter (> 30% of RTT): Use BBR
Any significant packet loss (> 0.1%): Use BBR
Satellite links (Starlink, LEO, GEO): Use BBR

How to Switch Congestion Control

# Enable BBR (requires root, kernel 4.9+)
sudo modprobe tcp_bbr
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

To make permanent, add tcp_bbr to /etc/modules-load.d/bbr.conf and net.ipv4.tcp_congestion_control=bbr to /etc/sysctl.conf.

Implications for JitterTrap

This investigative work was done to support JitterTrap. When users report video stuttering or unexplained throughput problems, the tool needs to help them understand why—not just show that something is wrong. The jitter cliff and chaos zone findings directly inform what JitterTrap should measure and how it should present that information.

Planned improvements based on this work:

Jitter/RTT ratio indicator: Show whether the network is below, at, or above the cliff threshold
Chaos zone warning: Alert when measurements may be unreliable (jitter 10-30% of RTT)
Congestion control guidance: Recommend BBR vs CUBIC based on observed conditions
Stability indicator: Display coefficient of variation to distinguish consistent problems from chaotic ones

The Bigger Picture

The jitter cliff is a real problem for video over TCP. Throughput can collapse by 90% or more, and near the cliff, behavior becomes unpredictable. But understanding why this happens points to a deeper issue.

TCP’s flow control relies on back-pressure: when the network is impaired, TCP slows down and signals the sender to wait. This works for file transfers, database queries, or web requests—applications that can pause. But video can’t pause:

Applications that cannot slow down: A live video encoder produces frames at a fixed rate regardless of network conditions. When TCP’s send buffer fills, frames queue up, latency grows unboundedly, and eventually data drops catastrophically. TCP signals back-pressure, but the encoder has no mechanism to respond—it can’t “skip this frame” or “reduce bitrate” based on socket buffer state.
Disconnected back-pressure: Consider UDP video inside a TCP VPN tunnel. When TCP throughput drops—due to the jitter cliff, loss, or BDP limits—the video source keeps sending at its configured rate while the tunnel delivers at a fraction of that. Latency climbs as data queues. TCP retransmits packets the receiver may no longer care about. The back-pressure never reaches the encoder.

The jitter cliff isn’t a bug in TCP—it’s TCP doing what it was designed to do. The failure is architectural: TCP guarantees “deliver everything, in order, eventually,” but video needs “deliver what you can now, drop what’s stale, and tell me to adapt.”

Summary

The jitter cliff: Throughput collapses when jitter exceeds roughly 15-30% of RTT. Higher RTT = more tolerance.
The chaos zone: Near the cliff, CUBIC varies 21-66%; BBR stays at 14-31%. Don’t trust single tests.
BBR vs CUBIC: BBR wins under loss (up to 17.6x better) and post-cliff. CUBIC can win in the chaos zone but is less predictable.
Practical: Use BBR on lossy or satellite networks; CUBIC on stable networks; test both near the cliff.
Limitations: Lab simulation (1,500+ runs, Linux 6.12/BBRv3) validated against real-world Starlink measurements showing consistent ratios.

What’s Next

Part 3 will explore SRT (Secure Reliable Transport)—a protocol designed specifically for live video that borrows from TCP but fixes what makes TCP unsuitable:

Bounded latency: SRT enforces a maximum latency; packets that arrive too late are dropped, not delivered
Sender feedback: The receiver reports packet loss and timing back to the sender, enabling adaptive bitrate
Selective retransmission: Only retransmit packets that can still arrive in time
Application-layer control: The video encoder can respond to network conditions

SRT asks the right question: “What can I deliver within this latency budget?” rather than TCP’s “How do I eventually deliver everything?”

This research is part of Project Pathological Porcupines—an ongoing systematic exploration into the kinds of issues that delay-sensitive networking applications encounter, and how JitterTrap can help us understand these problems and improve our applications. Both the research and JitterTrap itself are works in progress.