vp-ntt-Mar04.slides

advertisement
Experiences With
Internet Traffic Measurement
and Analysis
Vern Paxson
ICSI Center for Internet Research
International Computer Science Institute
and
Lawrence Berkeley National Laboratory
vern@icir.org
March 5th, 2004
Outline
• The 1990s: How is the Internet used?
–
–
–
–
Growth and diversity
Fractal traffic, “heavy tails”
End-to-end dynamics
Difficulties with measurement & analysis
• The 2000s: How is the Internet abused?
– Prevalence of misuse
– Detecting attacks
– Worms
The 1990s
How is the Internet Used?
= 80% growth/year
Data courtesy of
Rick Adams
Internet Growth: Exponential
• Growth of 80%/year
• Sustained for at least ten
years …
• … before the Web even
existed.
 Internet is always changing.
You do not have a lot of time
to understand it.
Characterizing Site Traffic
• Methodology: passively record traffic in/out of a site
• Danzig et al (1992)
– 3 sites, 24 hrs, all packet headers
• Paxson (1994)
– TCP SYN/FIN/RST control packets
• Gives hosts, sizes, start time, duration, application
• Large filtering win (≈ 10-100:1 packets, 1000s:1 bytes)
– 7 month-long traces at Lawrence Berkeley Natl. Laboratory
– 8 day-long traces from 6 other sites
Findings from Site Studies
• Traffic mix (which protocols are used; how many
connections/bytes they contribute) varies widely from
site to site.
• Mix also varies at the same site over time.
• Most connections have much heavier traffic in one
direction than the other:
– Even interactive login sessions (20:1)
Findings from Site Studies, con’t
• Many random variables associated with connection
characteristics (sizes, durations) are best described
with log-normal distributions
– But often these are not particularly good fits
– And often their parameters vary significantly between
datasets
• The largest connections in bulk transfers are very
large
– Tail behavior is unpredictable
 Many of these findings differ from assumptions used
in 1990s traffic modeling
Theory vs. Measured Reality
Scaling behavior in Internet Traffic
Burstiness
• Long-established framework: Poisson modeling
• Central idea: network events (packet arrivals,
connection arrivals) are well-modeled as independent
• In simplest form, there’s just a rate parameter, 
• It then follows that the time between “calls” (events)
is exponentially distributed, # of calls ~ Poisson
• Implications (if assumptions correct):
– Aggregated traffic will smooth out quickly
– Correlations are fleeting, bursts are limited
Burstiness:
Theory vs. Measurement
• For Internet traffic, Poisson models have fundamental
problem: they greatly underestimate burstiness
• Consider an arrival process: Xk gives # packets
arriving during kth interval of length T.
– Take 1-hour trace of Internet traffic (1995)
– Generate (batch) Poisson arrivals with same mean and
variance
Previous Region
10
100
600
Burstiness Over
Many Time Scales
• Real traffic has strong, long-range correlations
• Power spectrum:
– Flat for Poisson processes
– For measured traffic, diverges to  as   0
• To build Poisson-based models that capture this
characteristic takes many parameters
• But due to great variation in Internet traffic, we are
desperate for parsimonious models (few parameters)
Describing Traffic with Fractals
• Landmark 1993 paper by Leland et al proposed
capturing such characteristics (in Ethernet traffic)
using self-similarity, a form of fractal-based modeling:
– Parameterized by mean, variance, and Hurst parameter
• Models predict burstiness on all time scales
 Queueing delays / drop probabilities much higher than
predicted by Poisson-based models
Heavy Tails
• Key prediction from fractal modeling:
One way fractal traffic can arise in aggregate is if
individual connections have activity periods (durations,
sizes) whose distribution has infinite variance.
• Infinite variance manifests in distribution’s upper tail
• Consider Pareto distribution, F(x) = (x/a)-
– If  < 2, then F(x) has infinite variance
– Can test for Pareto fit by plotting log F(x) vs. log x
 Straight line = Pareto distribution, slope estimates -
Web connection sizes
(226,386 observations)
28,000 observations
 = 1.3
 Infinite Variance
Self-Similarity & Heavy Tails, con’t
• We find heavy-tailed sizes in many types of network
traffic. Just a few extreme connections dominate the
entire volume.
Self-Similarity & Heavy Tails, con’t
• We find heavy-tailed sizes in many types of network
traffic. Just a few extreme connections dominate the
entire volume.
• Theorems then give us that this traffic aggregates to
self-similar behavior.
• While self-similar models are parsimonious, they are
not (alas) “simple”.
• You can have self-similar correlations for which
magnitude of variations is small  still possible to
have a statistical multiplexing gain, especially at very
high aggregation
• Smaller time scales behave quite differently.
– When very highly aggregated, they can appear Poisson!
End-to-End Internet Dynamics
Routing & Packets
End-to-End Dynamics
• Ultimately what the user cares about is not what’s
happening on a given link, but the concatenation of
behaviors along all of the hops in an end-to-end path.
• Measurement methodology: deploy measurement
servers at numerous Internet sites, measure the
paths between them
• Exhibits N2 scaling: as # sites grows, # paths
between them grows rapidly.
“Measurement Infrastructure” sites
1994-1995 End-to-End Dynamics Study
Path in the Study: N2 Scaling Effect
End-to-End Routing Dynamics
• Analysis of 40,000 “traceroute” measurements
between 37 sites, 900+ end-to-end paths.
• Route prevalence:
– most end-to-end paths through the Internet dominated by a
single route.
• Route persistence:
– 2/3’s of routes remain unchanged for days/weeks
– 1/3 of routes change on time scales of seconds to hours
• Route symmetry:
– More than half of all routes visited at least one different city
in each direction
 Very important for tracking connection state inside network!
End-to-End Packet Dynamics
• Analysis of 20,000 TCP bulk transfers of 100 KB
between 36 sites
• Each traced at both ends using tcpdump
• Benefits of using TCP:
– Real-world traffic
– Can probe fine-grained time scales but using congestion
control
• Drawbacks to using TCP:
– Endpoint TCP behavior a major analysis headache
– TCP’s loading of the transfer path also complicates analysis
End-to-End Packet Dynamics:
Unusual Behavior
• Out-of-order delivery:
– Not uncommon. 0.6%-2% of all packets.
– Strongly site-specific.
– Generally little impact on performance.
• Replicated packets:
– Very rare, but does occur (e.g., 1 packet in, 22 out)
• Corrupted packets (bad checksum):
– Overall, 1 in 5,000 (!)
– Stone/Partridge (2000): between 1 in 1,100 and 1 in 32,000
• Undetected: between 1 in 16 million and 1 in 10 billion
End-to-End Packet Dynamics: Loss
• Half of all 100 KB transfers experienced no loss
– 2/3s of paths within U.S.
• The other half experienced significant loss:
– Average 4-9%, but with wide variation
•
•
•
•
TCP loss is not well described as independent
Losses dominated by a few long-lived outages
(Keep in mind: this is 1994-1995!)
Subsequent studies:
–
–
–
–
Loss rates have gotten much better
Loss episodes well described as independent
Same holds for regions of stable delay, throughput
Time scales of constancy  minutes or more
Issues / Difficulties for
Analyzing Internet Traffic
Measurement, Simulation & Analysis
There is No Such Thing as “Typical”
• Heterogeneity in:
– Traffic mix
– Range of network capabilities
• Bottleneck bandwidth (orders of magnitude)
• Round-trip time (orders of magnitude)
– Dynamic range of network conditions
• Congestion / degree of multiplexing / available bandwidth
• Proportion of traffic that is adaptive/rigid/attack
• Immense size & growth
– Rare events will occur
• New applications explode on the scene
Doubling every 7-8
weeks for 2 years
There is No Such Thing as “Typical”, con’t
• New applications explode on the scene
– Not just the Web, but: Mbone, Napster, KaZaA etc., IM
• Event robust statistics fail.
– E.g., median size of FTP data transfer at LBL
•
•
•
•
•
•
•
Oct. 1992: 4.5 KB
Mar. 1993: 2.1 KB
Mar. 1998: 10.9 KB
Dec. 1998: 5.6 KB
Dec. 1999: 10.9 KB
Jun. 2000: 62 KB
Nov. 2000: 10 KB
(60,000 samples)
• Danger: if you misassume that something is “typical”,
nothing tells you that you are wrong!
The Search for Invariants
• In the face of such diversity, identifying things
that don’t change has immense utility
• Some Internet traffic invariants:
– Daily and weekly patterns
– Self-similarity on time scales of 100s of msec and above
– Heavy tails
• both in activity periods and elsewhere, e.g., topology
– Poisson user session arrivals
– Log-normal sizes (excluding tails)
– Keystrokes have a Pareto distribution
The Danger of Mental Models
“Exponential plus
a constant offset”
Not exponential - Pareto!
Heavy tail:  ≈ 1.0
Versus the Power of Modeling to
Open Our Eyes
• Fowler & Leland, 1991:
Traffic ‘spikes’ (which cause actual losses) ride on
longer-term ‘ripples’, that in turn ride on still longerterm ‘swells’
Versus the Power of Modeling to
Open Our Eyes
• Fowler & Leland, 1991:
Traffic ‘spikes’ (which cause actual losses) ride on
longer-term ‘ripples’, that in turn ride on still longerterm ‘swells’
• Lacked vocabulary that came from selfsimilar modeling (1993)
• Similarly, 1993 self-similarity paper:
We did so without first studying and modeling the
behavior of individual Ethernet users (sources)
• Modeling led to suggestion to investigate
heavy tails
Measurement Soundness
• How well-founded is a given Internet
measurement?
• We can often use additional information to help
calibrate.
• One source: protocol structure
– E.g., was a packet dropped by the network …
… or by the measurement device?
• For TCP, can check: did receiver acknowledge it?
– If Yes, then dropped by measurement device
– If No, then dropped by network
• Can also calibrate using additional information
Calibration Using Additional
Information: Packet Timings
Routing change?
Clock adjustment
Reproducibilty of Results
(or lack thereof)
• It is rare, though sometimes occurs, that raw
measurements are made available to other
researchers for further analysis or for confirmation.
• It is more rare that analysis tools and scripts are
made available, particularly in a coherent form that
others can actually get to work.
• It is even rarer that measurement glitches, “outliers,”
analysis fudge factors, etc., are detailed.
• In fact, often researchers cannot reproduce their own
results.
Towards Reproducible Results
• Need to ensure a systematic approach to data
reduction and analysis
– I.e., a “paper trail” for how analysis was conducted,
particularly when bugs are fixed
• A methodology to do this:
– Enforce discipline of using a single (master) script that builds
all analysis results from the raw data
– Maintain all intermediary/reduced forms of the data as
explicitly ephemeral
– Maintain a notebook of what was done and to what effect.
– Use version control for scripts & notebook.
– But also really need: ways to visualize what's changed in
analysis results after a re-run.
The 2000s
How is the Internet Abused?
Magnitude of Internet Attacks
• As seen at Lawrence Berkeley National
Laboratory, on a typical day in 2004:
– > 70% of Internet connections (20 million out of
28 million) reflect clear attacks.
– 60 different remote hosts scan one of LBL’s two
blocks of 65,536 address in its entirety
– More than 10,000 remote hosts engage in
scanning activity
• Much of this activity reflects “worms”
• Much of the rest reflects automated scanand-exploit tools
How is the Internet Abused?
Detecting Network Attacks
Design Goals for the “Bro”
Intrusion Detection System
• Monitor traffic in a very high
performance environment
• Real-time detection and response
• Separation of mechanism from policy
• Ready extensibility of both mechanism
and policy
• Resistant to evasion
How Bro Works
Network
• Taps GigEther fiber link passively,
sends up a copy of all network traffic.
How Bro Works
Tcpdump
Filter
Filtered Packet
Stream
libpcap
Packet Stream
Network
• Kernel filters down high-volume stream
via standard libpcap packet capture
library.
How Bro Works
Event
Control
Event
Stream
Event Engine
Tcpdump
Filter
Filtered Packet
Stream
libpcap
Packet Stream
Network
• “Event engine” distills filtered stream
into high-level, policy-neutral events
reflecting underlying network activity
–
E.g., connection_attempt, http_reply,
user_logged_in
How Bro Works
Policy
Script
Real-time Notification
Record To Disk
Policy Script Interpreter
Event
Control
Event
Stream
Event Engine
Tcpdump
Filter
Filtered Packet
Stream
libpcap
Packet Stream
Network
• “Policy script” processes event stream,
incorporates:
–
–
Context from past events
Site’s particular policies
How Bro Works
Policy
Script
Real-time Notification
Record To Disk
Policy Script Interpreter
Event
Control
Event
Stream
Event Engine
Tcpdump
Filter
Filtered Packet
Stream
libpcap
Packet Stream
Network
• “Policy script” processes event stream,
incorporates:
–
–
Context from past events
Site’s particular policies
• … and takes action:
• Records to disk
• Generates alerts via syslog, paging
• Executes programs as a form of
response
Experiences with Bro
• Exciting research because used operationally
(24x7) at several open sites (LBL, UCB, TUM)
• Key enabler: sites’ threat model
– Occasional break-ins are tolerable
– “Jewels” are additionally protected (e.g., firewalls)
• Significant real-world concern: policy
management
• Dynamic blocking critical to success
– Currently, 100-200 blocks/day
The Problem of Evasion
• Fundamental problem passively measuring
traffic on a link: Network traffic is inherently
ambiguous
• Generally not a significant issue for traffic
characterization
• But is in the presence of an adversary:
Attackers can craft traffic to confuse/fool
monitor
Evading Detection Via
Ambiguous TCP Retransmission
The Problem of “Crud”
• There are many such ambiguities attackers can
leverage.
• Unfortunately, they occur in benign traffic, too:
– Legitimate tiny fragments, overlapping fragments
– Receivers that acknowledge data they did not receive
– Senders that retransmit different data than originally
• In a diverse traffic stream, you will see these
• Approaches for defending against evasion:
– Traffic “normalizers” that actively remove ambiguities
– “Mapping” of local hosts to determine their behaviors
– Active participation by local hosts in intrusion detection
How is the Internet Abused?
The Threat of Internet Worms
What is a Worm?
• Self-replicating/self-propagating code.
• Spreads across a network by exploiting flaws
in open services.
– As opposed to viruses, which require user action
to quicken/spread.
• Not new --- Morris Worm, Nov. 1988
– 6-10% of all Internet hosts infected
• Many more since, but none on that scale ….
until ….
Code Red
• Initial version released July 13, 2001.
• Exploited known bug in Microsoft IIS Web
servers.
• 1st through 20th of each month: spread.
20th through end of each month: attack.
• Payload: web site defacement.
• Spread: via random scanning of 32-bit
IP address space.
• But: failure to seed random number generator
 linear growth.
Code Red, con’t
• Revision released July 19, 2001.
• Payload: flooding attack on
www.whitehouse.gov.
• Bug lead to it dying for date ≥ 20th of the
month.
• But: this time random number generator
correctly seeded. Bingo!
Network Telescopes
• Idea: monitor a cross-section of the IP
address space to measure network
traffic involving random addresses
(flooding “backscatter”; worm scanning)
• LBL’s cross-section: 1/32,768 of Internet.
• UCSD’s cross-section: 1/256.
Spread of Code Red
• Network telescopes give lower bound on #
infected hosts: 360K.
• Course of infection fits classic logistic.
• Note: larger the vulnerable population, faster
the worm spreads.
• That night ( 20th), worm dies …
… except for hosts with inaccurate clocks!
• It just takes one of these to restart the worm
on August 1st …
Striving for Greater Virulence:
Code Red 2
•
•
•
•
•
•
Released August 4, 2001.
Comment in code: “Code Red 2.”
But in fact completely different code base.
Payload: a root backdoor, resilient to reboots.
Bug: crashes NT, only works on Windows 2000.
Localized scanning: prefers nearby
addresses.
• Kills Code Red I.
• Safety valve: programmed to die Oct 1, 2001.
Striving for Greater Virulence:
Nimda
• Released September 18, 2001.
• Multi-mode spreading:
–
–
–
–
attack IIS servers via infected clients
email itself to address book as a virus
copy itself across open network shares
modifying Web pages on infected servers w/ client
exploit
– scanning for Code Red II backdoors (!)
 worms form an ecosystem!
• Leaped across firewalls.
Life Just Before Slammer
Life Just After Slammer
A Lesson in Economy
• Slammer exploits a connectionless UDP
service, rather than connection-oriented TCP.
• Entire worm fits in a single packet!
 When scanning, worm can “fire and forget”.
• Worm infects 75,000+ hosts in 10 minutes
(despite broken random number generator).
• Progress limited by the Internet’s carrying
capacity!
The Usual Logistic Growth
Slammer’s Bandwidth-Limited Growth
Blaster
• Released August 11, 2003.
• Exploits flaw in RPC service ubiquitous
across Windows.
• Payload: attack Microsoft Windows Update.
• Despite flawed scanning and secondary
infection strategy, rapidly propagates to
100K’s of hosts.
• Actually, bulk of infections are really Nachia, a
Blaster counter-worm.
• Key paradigm shift: firewalls don’t help.
What if Spreading Were
Well-Designed?
• Observation (Weaver): Much of a worm’s
scanning is redundant.
• Idea: coordinated scanning
– Construct permutation of address space
– Each new worm starts at a random point
– Worm instance that “encounters” another instance
re-randomizes.
 Greatly accelerates worm in later stages.
What if Spreading Were
Well-Designed?, con’t
• Observation (Weaver): Accelerate initial
phase using a precomputed hit-list of say 1%
vulnerable hosts.
 At 100 scans/worm/sec, can infect huge
population in a few minutes.
• Observation (Staniford): Compute hit-list of
entire vulnerable population, propagate via
divide & conquer.
 At 10 scans/worm/sec, infect in 10s of sec!
Defenses
• Detect via honeyfarms: collections of
“honeypots” fed by a network telescope.
– Any outbound connection from honeyfarm = worm.
– Distill signature from inbound/outbound traffic.
– If telescope covers N addresses, expect detection
when worm has infected 1/N of population.
• Thwart via scan suppressors: network
elements that block traffic from hosts that
make failed connection attempts to too many
other hosts.
Defenses?
• Observation:
worms don’t need to randomly scan
• Meta-server worm: ask server for hosts to
infect. E.g., query Google for “index.html”.
• Topological worm: fuel spread with local
information from infected hosts (web server
logs, email address books, config files, SSH
“known hosts”)
 No scanning signature; with rich interconnection topology, potentially very fast.
Defenses??
• Contagion worm: propagate parasitically
along with normally initiated communication.
• E.g., using 2 exploits - Web browser & Web
server - infect any vulnerable servers visited
by browser, then any vulnerable browsers
that come to those servers.
• E.g., using 1 KaZaA exploit, glide along
immense peer-to-peer network in days/hours.
 No unusual connection activity at all! :-(
Some Observations
• Today’s worms have significant real-world
impact:
– Code Red disrupted routing
– Slammer disrupted elections, ATMs, airline
schedules, operations at an off-line nuclear power
plant …
– Blaster possibly contributed to North American
Blackout of Aug. 2003
• But today’s worms are amateurish
– Frequent bugs, algorithm/attack botches
– Unimaginative payloads
Next-Generation Worm Authors
• Potential for major damage with more nasty
payloads :-(.
• Military (“cyberwarfare”)
• Criminals:
– Denial-of-service, spamming for hire
– Access for Sale: A New Class of Worm
(Schecter/Smith, ACM CCS WORM 2003)
• Money on the table  Arms race
Summary
• Internet measurement is deeply challenging:
– Immense diversity
– Internet never ceases to be a moving target
– Our mental models can betray us: the Internet is full of
surprises!
 Seek invariants
• Many of the last decade’s measurement questions
-- “What are the basic characteristics and
properties of Internet traffic?” -- have returned …
• … but now regarding Internet attacks
• What on Earth will the next decade hold??
Download