PYTHON DICTIONARIES CHAPTER 11 FROM

advertisement
PYTHON
DICTIONARIES
CHAPTER 11
FROM
THINK PYTHON
HOW TO THINK LIKE A COMPUTER SCIENTIST
WHAT IS A REAL
DICTIONARY?
It’s a book or file that contains the definitions of words.
In particular there is a pairing between a word and its definition.
You look up the word and get the definition. Can you look up
the definition and get the word??
de·moc·ra·cy : a system of government by the whole population
or all the eligible members of a state, typically through elected
representatives.
word : definition is the pairing
In Python we can define a variable to contain a dictionary.
>>>d1 = {1:’one’,2:’two’,3:’three’,4:’four’}
>>> print d1[2]
two
 what is the value that goes with key 2
KEY:VALUE PARING
Lets look at a English to German translation table
engGer ={‘one’:’eins’,’two’:’zwei’,’three’:’drei’,’four’:’vier’}
print engGer[‘three’]
drei
The keys can be anything you want. How about
>>> decToBinary = {0:0,1:1,2:10,3:11,4:100,5:101,6:110}
>>> print decToBinary[3]
11
NOTE: Dictionaries are mutable !
THE VALUES() METHOD
To see whether something appears as a value in a dictionary,
you can use the method values(), which returns the values as a
list, and then use the in operator:
>>> vals = eng2Ger.values()
>>> print vals
[’eins’,’zwei’,’drei’,’vier]
so you can do things like
if ‘seiben’ in vals:
#this function returns the number
#of key:value pairs
print len(eng2Ger)
4
do_something
dictionary
NOTE: if you put in a key that not there you get an error!
i.e. eng2Ger[‘seiben’] throws and exception!
DICTIONARY ACCESS
IS VERY FAST!
You recall we can use the in operator in both lists, sets and
dictionaries.
If we have a dictionary that contains 1000000 key: value
pairs and a list that has 1000000 elements in it
the speed of
value in dictionary
is much faster than
element in list
This is because dictionaries are implemented in a special
way under the hood , so to speak. See Exercise 10.11
DICTIONARY AS A SET
OF COUNTERS
Suppose you are given a string and you want to count how many
times each letter appears.
There are several ways you could do it:
1. You could create 26 variables, one for each letter of the alphabet.
Then you could traverse the string and, for each character, increment
the corresponding counter, probably using a chained conditional.
2. You could create a list with 26 elements. Then you could convert
each character to a number (using the built-in function ord), use the
number as an index into the list, and increment the appropriate
counter.
3. You could create a dictionary with characters as keys and counters
as the corresponding values. The first time you see a character, you
would add an item
LET USE
DICTIONARIES
def histogram(s):
d = dict()
>>> h = histogram('brontosaurus')
>>> print h
{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}
for c in s:
if c not in d:
d[c] = 1
The histogram indicates that the letters 'a' and 'b'
appear once; 'o' appears twice, and
so on.
else:
d[c] += 1
return d
Add the key c to the dictionary and
set its value to 1 if not found
if it is already there then just increment the value
How about doing this for an entire book! or a DNA string
LOOPING OVER A
DICTIONARY
def print_hist(h):
for c in h:
print c, h[c]
Here’s what the output looks like:
>>> h = histogram('parrot')
>>> print_hist(h)
a1
p1
i.e. You can format this anyway
you choose.
Dictionaries have a method
called keys that returns the keys of the
dictionary, in no particular order, as a list.
r2
t1
o1
How would you do this so they were in
alphabetical order? Remember you can
sort a list. Lets do it in class!
CLICK TO SEE ANSWER
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
h = histogram(‘bothriolepus')
print_hist(h)
def print_hist(h):
keylist = h.keys()
keylist.sort()
for c in keylist:
print c, h[c]
REVERSE LOOKUP
Given a dictionary d and a key k, it is easy to find the
corresponding value v = d[k]. This operation is called a
lookup.
But what if you have v and you want to find k? You have two
problems: first, there might be more than one key that maps
to the value v. Depending on the application, you might be
able to pick one, or you might have to make a list that
contains all of them.
SEARCH THE DICT
def reverse_lookup(d, v):
for k in d:
if d[k] == v:
return k
raise ValueError
 no k found such that k:v exists
This function is yet another example of the search
pattern, but it uses a feature we haven’t seen before,
raise. The raise statement causes an exception; in this
case it causes a ValueError, which generally indicates
that there is something wrong with the value of a
parameter. Note: this is slower than the other way.
RETURN A LIST OF
MATCHING CASES
#Returns a list of the keys that give v. If no key gives v then
#return the empty list ()
def reverse_lookup(d, v):
r=()
for k in d:
if d[k] == v:
r.append(k)
return r
RNA AMINO ACID
TRANSLATION TABLE
DNA_codon {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L', 'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E', 'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_', 'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W‘
}
# A tricky translation for those of you who love this stuff.
def translate( sequence ):
"""Return the translated protein from 'sequence' assuming +1 reading frame"""
return ''.join([DNA_codon.get(sequence[3*i:3*i+3],'X') for i in range(len(sequence)//3)])
ANOTHER WAY (MORE
UNDERSTANDABLE)
def translate( sequence ):
s = '‘  initialize to empty string
numcodons = len(sequence)//3
pos=0
for i in range(numcodons):
s=s+DNA_codon[sequence[pos:pos+3]]
pos+=3  goes to every third char
return s
pos
sequence = ACTGTAAGCCGTACA’
Download