Computer Genomics: Towards Self- Change and Configuration Management Yi-Min Wang

advertisement
Computer Genomics: Towards
Self- Change and Configuration
Management
(http://research.microsoft.com/sn/strider)
Yi-Min Wang
Senior Researcher & Group Manager
Systems Management Research Group
(http://research.microsoft.com/sm/)
OUTLINE
•
•
•
•
Change & Configuration Management
Genomics & Computer Genomics
What We’ve Learned From The Analogy
Systems Management
– Configuration Troubleshooting
– Patch Impact Analysis
– Spyware Management
• Towards Self-Management
Change & Configuration Management
• Problem Scope
– Setting changes through Control Panel,
program executions, etc.
– Software installations, updates, and patching
– Drive-by downloads of spyware
…
…
…
Setting
Change
O(101) to O(102)
processes
Spyware
Download
Patching
O(105) Registry
entries and files
Configuration Errors
• Persistent: cannot be solved by restart / reboot
• A major contributor to Internet service
unavailability and computer user frustration
Executable
Files
Persistent
Configuration
Settings
Aging
Volatile
State
Process
Patching
App Reinstallation
System Restore
OS Re-imaging
App Restart
Rejuvenation
Machine Reboot
Genomics & Computer Genomics
• “A”, “C”, “G”, and “T” are the four DNA letters
of the genetic alphabet
– “1” and “0” are the binary letters of the computer
genetic alphabet
• 3 billion base pairs arranged into 24 distinct
chromosomes
– Windows Registry is typically 50MB (or 400 mega
bits) arranged into several hives
• Gene: a stretch of sequence in a specific
position on a DNA strand
– Computer gene: a Registry entry (a stretch of bit
sequence) in a specific position of a hive identified
by a hierarchical path name
• Gene carries the instructions for making a
particular protein through gene expression
– Registry entry carries the instructions for
configuring a particular process instantiation
• Less than 2 percent of the human genome is
made up of protein-coding sequences
• The rest labeled as ‘junk’ DNA
– A lot of Registry entries are not configuration
settings, but rather “operational states” such as
usage counts, most recently used files, etc.
– They can be labeled as ‘junk’ entries as far as
configuration management is concerned
• Any two persons’ genome is >99.9% identical
– Registry snapshots from two different days on the
same machine typically have about 99% of the
entries identical between them
• Even between mouse and human genes, the
similarities range from 70% to 90%
– Even across different machines, there is a high
degree of similarity
• Majority of variations in the genome sequence
simply create diversity
– Majority of variations in Registry simply reflect
diversity in hardware/software installation and user
preferences
• But some genetic differences are responsible
for causing diseases: the gene for Huntington’s
disease was found at the tip of the short arm of
Chromosome 4
– Some differences in Registry data are responsible
for configuration problems.
– For example, the gene for the “Short-cuts-do-notwork” problem was found at the following Registry
location: HKEY_CLASSES_ROOT\CLSID\{00021401-0000-0000C000-000000000046}\shellex\MayChangeDefaultMenu
Huntington’s Gene & Human Chromosomes
http://www.hdsa-wi.org/chromosomes.gif
Short-cuts-do-not-work’s Gene
• Most diseases involve the interaction of
several genes
• Studies have shown irrefutable evidence of
the role environment plays in gene
expression
– Studies of Registry problems reveal that the
“healthy” or “sick” values of many entries are not
absolute on their own and very often depend on
the environment of individual machines
• Gene therapy can potentially treat diseases
by using normal genes to replace a defective
gene
• But some failed experiments have shown the
risk of unexpected side effects of creating
new diseases
– The equivalent of gene therapy can be easily
performed with a Registry or file editor
– But direct modifications to these low-level state
information can potentially cause inconsistency
and lead to more serious problems
What We’ve Learned From The Analogy
• Configuration problems are solvable
– One order of magnitude easier than the genomics problem
• Techniques for complexity reduction
– Noise filtering through “junk” labeling
– Diff can be very powerful: two orders of magnitude reduction
– Attack the Mess with the Mass: statistical analysis across
multiple machines
• Computer Genomics Database for problem detection
and repair
– Problems with known root causes: which gene causes which
problem and how to fix it
– Problems with unknown root causes: which action should be
tried to provide safe gene therapy
No.1: Configuration Troubleshooting
• “It worked yesterday, but not today.”
• “It worked for that user, but not this
user.”
• “It worked on that machine, but not this
machine.”
• “I restarted the application, rebooted the
machine, but still can’t fix the problem!”
Strider Process for Configuration
Troubleshooting
Complexity Reduction
Phase
The program
keeps failing
User
It was
working
Now it
doesn’t
work
Context Information Gathering
phase
Support
Articles
Config
Action
UI
App
Info
Doc
Tool
Tracing
State Diff
Intersection
PC
Genomics
Database
Noise Filtering
State Ranking
Support Database
Lookup
Ownership
Mapping
Filtered & Ranked
Candidate Set
Cross-Restore-Point Results
Average Registry size
Two
Orders
After diff & trace
intersection
1000000
100000
PowerPoint
10000
Instant Messenger
Word Install
1000
JPG Send To
System Restore UI
100
Another
Two
Orders
Of
Magnitude
IE Passwords
10
1
After state diff
After noise filtering
Root cause
Order-ranking
No.2: Patch Impact Analysis
• “If I apply this security patch, which one
of the 3,000 applications in my company
is going to be affected?”
Strider Process for Patch Impact Analysis
Complexity Reduction
Phase
All Program
Executions
Before
After
Patching Patching
Tracing
State Diff
User
Context Information Gathering
phase
Applications Requiring
High-Priority Testing
Tool
Intersection
PC
Genomics
Database
Noise Filtering
(System Processes)
Process-to-Application Mapping
State Ranking
(Process Criticality)
Filtered & Ranked
Candidate Set
No.3: Spyware Management
• “I’m getting lots of pop-ups and my
browser is crashing a lot. What software
got installed on my machine?”
Strider Process for Spyware Management
Complexity Reduction
Phase
Before
After
Reboot Machine
Spyware Spyware
& Launch IE
Infection Infection
User
Context Information Gathering
phase
Objective Criteria Evaluation, Bundle
Information, & Support Articles
Tool
Tracing
State Diff
Intersection
PC
Genomics
Database
Noise Filtering
(Known Goods)
Known-* Database
Lookup
State Ranking
(Behavior Criticality)
Filtered & Ranked
Candidate Set
Towards Self-Management
• Flight Data Recorder (FDR)
– Always-on tracing, diff’ing, intersection, noise
filtering, and state ranking
– Automatic genomic lookup for known problems
• “Self-healing”, “known-bad”, and “wait for user complaint”
– Automatic PeerPressure analysis for anomaly
detection
– Automatic generation of black-box application
dependency database
– Automatic trace analysis for new ASEP hooks
• ASEP = Auto-Start Extensibility Point
Summary
•
The Strider Process for Handling
Persistent-State Complexity
1.
2.
3.
4.
5.
6.
Diff
Trace
Intersection
Noise Filtering
State Ranking
Look-up
For More Information
Google “MSR Strider” or http://research.microsoft.com/sn/strider/
• Configuration Management
– Strider Troubleshooting: DSN’03, LISA’04, DSN’04,
LISA’05
– Glean: ICAC’04
– Flight Data Recorder (FDR): LISA’05
– Friends Troubleshooting Network (FTN): IPTPS’04
– PeerPressure: SigMetrics’04 (poster)
• Patch Management
– ICAC’04
• Spyware Management
– LISA’05
Thank You!
• International Conference on
Autonomic Computing (ICAC’05)
– Tentative: May 2005 in Seattle
Download