Open Issues in Buffer Sizing Amogh Dhamdhere Constantine Dovrolis College of Computing

advertisement

Open Issues in Buffer Sizing

Amogh Dhamdhere

Constantine Dovrolis

College of Computing

Georgia Tech

Outline

Motivation and previous work

The Stanford model for buffer sizing

Important issues in buffer sizing

Simulation results for the Stanford model

Buffer sizing for bounded loss rate (Infocom’05)

Motivation

Router buffers are crucial elements of packet networks

Absorb rate variations of incoming traffic

Prevent packet losses during traffic bursts

Increasing the router buffer size:

Can increase link utilization (especially with TCP traffic)

Can decrease packet loss rate

Can also increase queuing delays

Common operational practices

Major router vendor recommends 500ms of buffering

Implication: buffer size increases proportionally to link capacity

Why 500ms?

Bandwidth Delay Product (BDP) rule:

Buffer size B = link capacity C x typical RTT T (B = CxT)

What does “typical RTT” mean?

Measurement studies showed that RTTs vary from 1ms to

10sec!

How do different types of flows (TCP elephants vs mice) affect buffer requirement?

Poor performance is often due to buffer size:

Under-buffered switches: high loss rate and poor utilization

Over-buffered DSL modems: excessive queuing delay for interactive apps

Previous work

Approaches based on queuing theory (e.g. M|M|1|B)

Assume a certain input traffic model, service model and buffer size

Loss probability for M|M|1|B system is given by

 p 

ρ B

1

( 1

ρ B

ρ )

 1

TCP is not open-loop; TCP flows react to congestion

There is no universally accepted Internet traffic model

Morris’ Flow Proportional Queuing (Infocom ’00)

Proposed a buffer size proportional to the number of active

TCP flows (B = 6*N)

Did not specify which flows to count in N

Objective: limit loss rate

High loss rate causes unfairness and poor application performance

TCP window dynamics for long flows

TCP-aware buffer sizing must take into account TCP dynamics

Saw-tooth behavior

Window increases until packet loss

Single loss results in cwnd reduction by factor of two

Square-root TCP model

TCP throughput can be approximated by

R 

0.87

T p

Valid when loss rate p is small (less than 2-5%)

Average window size is independent of RTT

Origin of BDP rule

Consider a single flow with RTT T

Window follows TCP’s saw-tooth behavior

Maximum window size = CT + B

At this point packet loss occurs

Window size after packet loss = (CT + B)/2

Key step: Even when window size is minimum, link should be fully utilized

(CT + B)/2 ≥ CT which means B ≥ CT

Known as the bandwidth delay product rule

Same result for N homogeneous TCP connections

Outline

Motivation and previous work

The Stanford model for buffer sizing

Important issues in buffer sizing

Simulation results for the Stanford model

Buffer sizing for bounded loss rate (BSCL)

Stanford Model - Appenzeller et al.

Objective: Find the minimum buffer size to achieve full utilization of target link

Assumption: Most traffic is from TCP flows

If N is large, flows are independent and unsynchronized

Aggregate window size distribution tends to normal

Queue size distribution also tends to normal

Flows in congestion avoidance (linear increase of window between successive packet drops)

Buffer for full utilization is given by

B  CT / N

N is the number of “long” flows at the link

CT: Bandwidth delay product

Stanford Model (cont’)

If link has only short flows, buffer size depends only on offered load and average flow size

Flow size determines the size of bursts during slow start

For a mix of short and long flows, buffer size is determined by number of long flows

Small flows do not have a significant impact on buffer sizing

Resulting buffer can achieve full utilization of target link

Loss rate at target link is not taken into account

Outline

Motivation and previous work

The Stanford model for buffer sizing

Important issues in buffer sizing

Simulation results for the Stanford model

Buffer sizing for bounded loss rate (BSCL)

What are the objectives ?

Network layer vs. application layer objectives

Network’s perspective: Utilization, loss rate, queuing delay

User’s perspective: Per-flow throughput, fairness etc.

Stanford Model: Focus on utilization & queueing delay

Can lead to high loss rate (> 10% in some cases)

BSCL: Both utilization and loss rate

Can lead to large queuing delay

Buffer sizing scheme that bounds queuing delay

Can lead to high loss rate and low utilization

A certain buffer size cannot meet all objectives

Which problem should we try to solve?

Saturable/congestible links

A link is saturable when offered load is sufficient to fully utilize it, given large enough buffer

A link may not be saturable at all times

Some links may never be saturable

Advertised-window limitation, other bottlenecks, size-limited

Small buffers are sufficient for non-saturable links

Only needed to absorb short term traffic bursts

Stanford model applicable: when N is large

Backbone links are usually not saturable due to overprovisioning

Edge links are more likely to be saturable

But N may not be large for such links

Which flows to count ?

N: Number of “long” flows at the link

“Long” flows show TCP’s saw-tooth behavior

“Short” flows do not exit slow start

Does size matter?

Size does not indicate slow start or congestion avoidance behavior

If no congestion, even large flows do not exit slow start

If highly congested, small flows can enter congestion avoidance

Should the following flows be included in N ?

Flows limited by congestion at other links

Flows limited by sender/receiver socket buffer size

N varies with time. Which value should we use ?

Min ? Max ? Time average ?

Which traffic model to use ?

Traffic model has major implications on buffer sizing

Early work considered traffic as exogenous process

Not realistic. The offered load due to TCP flows depends on network conditions

Stanford model considers mostly persistent connections

No ambiguity about number of “long” flows (N)

N is time-invariant

In practice, TCP connections have finite size and duration, and N varies with time

Open-loop vs closed-loop flow arrivals

Traffic model (cont’)

Open-loop TCP traffic:

Flows arrive randomly with average size S , average rate l

Offered load l S, link capacity C

Offered load is independent of system state (delay, loss)

The system is unstable if l S > C

Closed-loop TCP traffic:

Each user starts a new transfer only after the completion of previous transfer

Random think time between consecutive transfers

Offered load depends on system state

The system can never be unstable

Outline

Motivation and previous work

The Stanford model for buffer sizing

Important issues in buffer sizing

Simulation results for the Stanford model

Buffer sizing for bounded loss rate (BSCL)

Why worry about loss rate?

The Stanford model gives very small buffer if N is large

E.g., CT=200 packets, N=400 flows: B=10 packets

What is the loss rate with such a small buffer size?

Per-flow throughput and transfer latency?

Compare with BDP-based buffer sizing

Distinguish between large and small flows

Small flows that do not see losses: limited only by RTT

Flow size: k segments R

 log k

( k ) T

2

Large flows depend on both losses & RTT:

R 

0.87

T p

Simulation setup

Use ns-2 simulations to study the effect of buffer size on loss rate for different traffic models

Heterogeneous RTTs (20ms to

530ms)

TCP NewReno with SACK option

BDP = 250 packets (1500 B)

Model-1: persistent flows + mice

200 “infinite” connections – active for whole simulation duration

 mice flows - 5% of capacity, size between 3 and 25 packets, exponential inter-arrivals

Simulation setup (cont’)

Flow size distribution for finite size flows:

Sum of 3 exponential distributions: Small files (avg. 15 packets), medium files (avg. 50 packets) and large files (avg.

200 packets)

70% of total bytes come from the largest 30% of flows

Model-2: Closed-loop traffic

675 source agents

Think time exponentially distributed with average 5 s

Time average of 200 flows in congestion avoidance

Model-3: Open-loop traffic

Exponentially distributed flow inter-arrival times

Offered load is 95% of link capacity

Time average of 200 flows in congestion avoidance

Simulation results – Loss rate

CT=250 packets, N=200 for all traffic types

Stanford model gives a buffer of 18 packets

High loss rate with Stanford buffer

Greater than 10% for open loop traffic

7-8% for persistent and closed loop traffic

Increasing buffer to BDP or small multiple of BDP can significantly decrease loss rate

Stanford buffer

Per-flow throughput

Transfer latency = flow-size / flow-throughput

Flow throughput depends on both loss rate and queuing delay

Loss rate decreases with buffer size ( good )

Queuing delay increases with buffer size ( bad )

Major tradeoff: Should we have low loss rate or low queuing delay ?

Answer depends on various factors

Which flows are considered: Long or short ?

Which traffic model is considered?

Persistent connections and mice

Application layer throughput for

B=18 (Stanford buffer) and larger buffer B=500

Two flow categories: Large

(>100KB) and small (<100KB)

Majority of large flows get better throughput with large buffer

Large difference in loss rates

Smaller variability of per-flow throughput with larger buffer

Majority of short flows get better throughput with small buffer

Lower RTT and smaller difference in loss rates

Closed-loop traffic

Per-flow throughput for large flows is slightly better with larger buffer

Majority of small flows see better throughput with smaller buffer

Similar to persistent case

Not a significant difference in per-flow loss rate

Reason: Loss rate decreases slowly with buffer size

Open-loop traffic

Both large and small flows get much better throughput with large buffer

Significantly smaller per-flow loss rate with larger buffer

Reason: Loss rate decreases very quickly with buffer size

Outline

Motivation and previous work

The Stanford model for buffer sizing

Important issues in buffer sizing

Simulation results for the Stanford model

Buffer sizing for bounded loss rate (BSCL)

Our buffer sizing objectives

Full utilization:

The average utilization of the target link should be at least

  % when the offered load is sufficiently high

Bounded loss rate:

The loss rate p should not exceed , typically 1-2% for a saturated link

ˆ

Minimum queuing delays and buffer requirement, given previous two objectives:

Large queuing delay causes higher transfer latencies and jitter

Large buffer size increases router cost and power consumption

So, we aim to determine the minimum buffer size that meets the given utilization and loss rate constraints

Why limit the loss rate?

End-user perceived performance is very poor when loss rate is more than 5-10%

Particularly true for short and interactive flows

High loss rate is also detrimental for large TCP flows

High variability in per-flow throughput

Some “unlucky” flows suffer repeated losses and timeouts

We aim to bound the packet loss rate to = 1-2%

Traffic classes

Locally Bottlenecked Persistent (LBP) TCP flows

Large TCP flows limited by losses at target link

Loss rate p is equal to loss rate at target link

Remotely Bottlenecked Persistent (RBP) TCP flows

Large TCP flows limited by losses at other links

Loss rate is greater than loss rate at target link

Window Limited Persistent TCP flows

Large TCP flows limited by advertised window , instead of congestion window

Short TCP flows and non-TCP traffic

Scope of our model

Key assumption:

LBP flows account for most of the traffic at the target link

(80-90 %)

Reason: we ignore buffer requirement of non-LBP traffic

Scope of our model:

Congested links that mostly carry large TCP flows, bottlenecked at target link

Minimum buffer requirement for full utilization: homogenous flows

Consider a single LBP flow with RTT T

Window follows TCP’s saw-tooth behavior

Maximum window size = CT + B

At this point packet loss occurs

Window size after packet loss = (CT + B)/2

Key step: Even when window size is minimum, link should be fully utilized

(CT + B)/2 >= CT which means B >= CT

Known as the bandwidth delay product rule

Same result for N homogeneous TCP connections

Minimum buffer requirement for full utilization: heterogeneous flows

N b heterogeneous LBP flows with RTTs {T i

}

Initially, assume Global Loss Synchronization

All flows decrease windows simultaneously in response to single congestion event

We derive that: B   i

Nb

 1  i

C

N b

 1

1

T i

As a bandwidth-delay product:

T

Practical Implication:

T e

 1 

1

 e e

: “effective RTT” is the harmonic mean of RTTs

 i

N b i

N b

 1

1

T i

Few connections with very large RTTs cannot significantly

Minimum buffer requirement for full utilization (cont’)

More realistic model: partial loss synchronization

Loss burst length L(N

) = α N b

): number of packets lost by N b flows during single congestion event

Assumption: loss burst length increases almost linearly with N i.e.,

L(N b b

α : synchronization factor (around 0.5-0.6 in our simulations)

Minimum buffer size requirement: b

,

B  q N CT  2 MN [1  q N b

2  q N b

( )

 q N b

) : Fraction of flows that see losses in a congestion event

M: Average segment size

Partial loss synchronization reduces buffer requirement

Validation (ns2 simulations)

Heterogeneous flows (RTTs vary between 20ms & 530ms)

Partial synchronization model: accurate

Global synchronization (deterministic) model overestimates buffer requirement by factor 3-5

Relation between loss rate and N

N

 b homogeneous LBP flows at target link

Link capacity: C, flows’ RTT: T

If flows saturate target link, then flow throughput is given by

R 

0.87

T p

Loss rate is proportional to square of N b b

2 (

0.87

CT

) 2

Hence, to keep loss rate less than we must limit number of flows

N b

 ˆ /0.87

But this would require admission control (not deployed)

Flow Proportional Queuing (FPQ)

First proposed by Morris (Infocom’00)

Bound loss rate by:

Increasing RTT proportionally to number of flows p  N b

2 (

0.87

CT

) 2

Solving for T gives:

T 

N b

C

0.87

N

C b K p

 T p

Where

Set T q

K p

0.87

p ˆ

B and T p

: RTT’s propagation delay

 K p

N b

 CT p

 T q

Window of each flow should be K

Packets in target link buffer (B term)

Packets “on the wire” (CT

 K p term) p packets, consisting of p

=6 packets for 2% loss rate, and K p

=9 packets for 1% loss rate

Buffer size requirement for both full utilization and bounded loss rate

We previously showed separate results for full utilization and bounded loss rate

To meet both goals, provide enough buffers to satisfy most stringent of two requirements

Buffer requirement:

Decreases with N b

Increases with N b

Crossover point:

(full utilization objective)

(loss rate objective) q N CT e

2 MN q N b b

[1  q N

2 ( )

if N b

 N b

ˆ

 p

 K N b

 CT e

if N b

 N b

Previous result is referred to as the BSCL formula

Model validation

Heterogeneous flows

Utilization constraint

Loss rate constraint

1.

2.

3.

Parameter estimation

Number of LBP flows:

With LBP flows, all rate reductions occur due to packet losses at target link

RBP flows: some rate reductions due to losses elsewhere

Effective RTT:

Jiang et al. (2002): simple algorithms to measure TCP RTT from packet traces

Loss burst lengths or loss synchronization factor:

Measure loss burst lengths from packet loss trace or use approximation L(N b

) = α N b

Results: Bound loss rate to 1%

Results: Bound loss rate to 1%

Per-flow throughput with BSCL

BSCL can achieve network layer objectives of full utilization and bounded loss rate

Can lead to large queuing delay due to larger buffer

How does this affect application throughput ?

BSCL loss rate target set to 1%

BSCL buffer size is 1550 packets

Compare with the buffer of 500 packets

BSCL is able to bound the loss rate to 1% target for all traffic models

Persistent connections and mice

BSCL buffer gives better throughput for large flows

Also reduces variability of per-flow throughputs

Loss rate decrease favors large flows in spite of larger queuing delay

All smaller flows get worse throughput with the BSCL buffer

Increase in queuing delay harms small flows

Closed-loop traffic

Similar to persistent traffic case

BSCL buffer improves throughput for large flows

Also reduces variability of perflow throughputs

Loss rate decrease favors large flows in spite of larger queuing delay

All smaller flows get worse throughput with the BSCL buffer

Increase in queuing delay harms small flows

Open-loop traffic

No significant difference between B=500 and B=1550

Reason: Loss rate for open loop traffic decrease quickly

Loss rate for B=500 is already less than 1%

Further increase in buffer reduces loss rate to ≈ 0

Large buffer does not increase queuing delays significantly

Summary

We derived a buffer sizing formula (BSCL) for congested links that mostly carry TCP traffic

Objectives:

Full utilization

Bounded loss rate

Minimum queuing delay, given previous two objectives

BSCL formula is applicable for links with more than

80-90% of traffic coming from large and locally bottlenecked TCP flows

BSCL accounts for the effects of heterogeneous

RTTs and partial loss synchronization

Validated BSCL through simulations

Download