23340 >> Dan Fay: My name is Dan Fay, and... Nielsen, who is joining us today as part of the...

advertisement
23340
>> Dan Fay: My name is Dan Fay, and I'm here to introduce and welcome Michael
Nielsen, who is joining us today as part of the Microsoft Research Visiting Speakers
Series. So Michael's here to actually talk about his book "Reinventing Discovery: The
New Era of Network Science," and talk about how the Internet is transforming the nature
of our collective intelligence and how we understand the world.
So scientists, as we know in our group that we deal with a lot, are using the Internet to
dramatically expand problem solving ability and online world is revolutionizing scientific
discovery and the revolution is just beginning.
So also Michael is one of the pioneers of quantum computing and an author of more than
50 scientific papers including contributions to nature, scientific America. He's an
essayist, speaker and advocate of open science.
So please join me in welcoming him to Microsoft. Thank you. [applause]
>> Michael Nielsen: Thank you, Dan. Thank you very much for coming along today,
everybody. It's I think 13 years since I was last at Microsoft. It's really great to be there
back then. I visited the Theory Group in '98 or '99. And it's nice to be back.
So I'm going to be talking about network science or what you might call e-science or
e-research, and that's the subject of my book. I want to start out with just a little story that
I think is very striking, very interesting, recent example of networked science.
So it starts with a mathematician named Tim Gowers. He's a mathematician at
Cambridge University. He's one of the world's leading mathematician. He's a recipient of
the Fields Medal, among many other honors , the Nobel Prize of mathematics. And
Gowers is also a blogger. Not that uncommon actually amongst leading mathematicians.
Of the 42 living Fields medalists, four have started blogs. Two have since abandoned
them. So that's perhaps not that uncommon, typical of the blogger demographic.
In 2009, Gowers wrote this very strikingly titled post: Is massively collaborative
mathematics possible? What he was proposing doing in this post was using his blog as a
medium to attack a difficult, unsolved mathematical problem, a problem which he said he
would love to solve.
Entirely in the open. Using his blog to post his ideas and his partial progress. And what's
more he issued an open invitation inviting anybody in the world who thought they had an
idea to contribute to post the idea in the comment section of the blog.
So his hope was that back mining the ideas of many minds in this way be possible to
make easy work of this hard mathematical problem. He called the experiment the
polymath project.
Things actually got off to a slow start. For the first seven hours after he opened his blog
up to comments, not a single person made any suggestions.
But then a mathematician at the University of British Columbia named Joszef Salamosi
[phonetic] made a suggestion. Basically it's a simpler variation on the original problem.
And 15 minutes after that, a high school teacher from Arizona made a comment. And
just three minutes after that, another mathematician, Terence Tau from UCLA, also a
field's medallist, actually, made a suggestion. And things were kind of off and running at
this point.
I mean, basically things exploded, in fact. Over the next 37 days, 27 different people
would make 800 substantive mathematical comments containing roughly 170,000 words.
So I wasn't a substantive contributor but I was following along very closely right from the
start. First of all, it was very hard to keep up just at the rate at which people were posting
ideas but it was also very interesting to see how quickly people would post a fairly half
baked idea and it would be rapidly developed and proved and sometimes discarded,
sometimes incorporated into the cannon of real knowledge about the problem.
Gowers commented that the process was to normal research as driving is to pushing a
car. And at the end of the 37 days, he posted again to his blog to announce that the
problem had most probably been solved. In fact, a generalization of the original problem.
They had to go back and check a whole bunch of details that they hadn't made sort of a
bad mistake, actually everything did check out, and they wrote a couple of papers as a
result of this first iteration on the polymath project. There have actually been several
subsequent iterations since that early experiment.
The reason I'm talking about this today is not so much -- it's not so important necessarily
because it solved a particular mathematical problem, no matter how interesting that
problem might have been. It's rather because the suggestion that some of these tools
can in some sense be used as cognitive tools. By that I just mean that they can help
speed up the solution of very hard problems.
So sort of this vogue maybe dates to the book the wisdom of crowd some years ago,
talking about the wisdom of the crowds. People are often applying these kind of ideas to
relatively trivial problems, counting jelly beans in a jar, that kind of thing. What's
interesting about this problem, it's a problem really near the limit of human ingenuity to
solve. It's a problem that challenges certainly some of the brightest mathematicians in
the world. Not only that, but potentially some of these techniques can then be applied
broadly across many fields, not just necessarily in mathematics. But obviously there's
some considerable similarities to ideas from open source software, which I'll return to a
little bit.
There's also some considerable differences. And in particular what I'm going to focus on
today is some of the real challenges in getting scientists, particularly people doing basic
science in adopting some of these tools and using them to their full potential. So as I say
my focus is going to be particularly on basic science today.
I'm going to talk about a very different example from a very different area of science, one
that has nothing whatsoever to do with mathematics, and just focus in a little bit more on
this question of how exactly these tools are being useful.
So this is a story that begins with a young woman named Nita Umashankar. 2003 she
finished up her undergraduate studies at University of Arizona and she went to India for a
year where she worked with a not for profit organization helping young Indian women
escape from prostitution. Depending on whose numbers you believe, anywhere between
several hundred thousand and several million women involved in prostitution in India.
But what she found was I guess disturbing and really very discouraging. What she found
was that many of these young women had too few skills to hold down a job outside
prostitution.
So she went back to the United States at the end of the year, ask she decided after some
reflection she would start a new foundation. It's now called the Asset India Foundation,
which would address what she thought would address the core problem.
By opening technology training centers in India, and training these young women in
technology, and then helping them find placement with some of India's big technology
companies.
So they've actually opened training centers since then in five large Indian cities. They've
trained hundreds of young women have completed their training courses. And they claim
that they've helped many of those young women find placement.
So that's a nice story. There's, of course, a caveat. The caveat is this: What they'd like
to do is expand their program into some smaller Indian cities. And one of the problems
that they've run into in doing the planning for this is a lot of those cities don't have very
good electrical infrastructure.
They don't have reliable electricity. If you want to run a technology training center
obviously that's something of a challenge. This causes all kinds of problems for them.
One of the things they were concerned about, particularly concerned about, was wireless
routers, how are they going to run their wireless routers.
They did a bunch of looking around. They looked for commercial, off the shelf,
solar-powered wireless routers, nothing they found could be suitable for their local needs.
The way they addressed their problem was to go to a company on the other side of the
world in Waltham just outside Boston. Named InnoCentive, how many people familiar
with InnoCentive? Couple people. It's a marketplace for scientific problems. It's a bit like
eBay and Craig's List instead for posting a description of your old furniture, you post a
scientific problem that you'd like to see solved, together with a prize for its solution.
So typical -- actually a spin-off of Eli Lilly. And the typical kind of organization that posts
there is Eli Lilly or another big pharma company, that kind of sector mostly. They have
quite a few. I've forgotten how many companies are signed up. I think like 100.
This is probably their best known prize. It's a little hard to read. But it's a $1 million prize
to find a biomarker for ALS. It's actually a few years old now. A little bit unusual. Most of
their prizes are in the 10, 30, 40, $50,000 kind of a range.
So what this has got to do with Asset, is that Asset got together with the Rockefeller
Foundation, who put up a $20,000 prize for an InnoCentive challenge to design a low
cost reliable solar-powered wireless router that could be made with components that
were easily accessible in India.
That was the challenge. So InnoCentive broadcast it out to their network of solvers all
over the world. They claimed there's several hundred thousand people in this network. I
have no idea how many are truly active. Maybe the relevant metric here is that 400
people downloaded the detailed description of the challenge. It's not 400 people saw the
abstract or saw that kind of thing. Many of them saw the general description. This is the
really long detailed one with all the stuff about IP and there's a whole bunch of kind of
details.
This indicates some relatively serious level of interest. And 27 of those people submitted
solutions. The winner was a software engineer from Texas named Zacary Brown. He
had a few interesting abilities to Mr. Brown.
In his day job, he worked actually as a software engineer working with open source
software in particular, very heavily with Linux, which was kind of helpful.
But he had two more hobbies that were particularly useful. Hobby number one was he
worked at home to build homemade wireless radio networks. And he was working
towards making contact with every country in the world.
Hobby number two. He told me actually in e-mail that when he was growing up he'd
been watching television one day and he saw solar panels being installed at the Carter
White House. He had no idea what they were, he asked his parents what they were. He
was enthralled, that was his word, when they described how you could convert sun light
into electricity.
So as an adult he was working on converting his entire home office, including his wireless
radio networks, so that they could operate off of solar power.
So if you wanted to build a reliable low cost solar-powered wireless, well, certainly Zacary
Brown would be one of the guys you'd want to call about this particular problem, I
suspect.
InnoCentive just provided a way of making this match. Now, you might say, well, maybe
I've just cherry picked this example. Actually Lacreme Laconie [phonetic] at Harvard has
done systematic studies to see what separates out successful InnoCentive solvers and
actually it's not that uncommon.
Very often what's going on is that the person who wins the prize is a person who is
unusually well adapted to the problem at hand. One of the most frequent comments that
he gets from successful solvers is that they don't bother looking at the challenge if they
realize basically -- if they look at it for more than 20 minutes to half an hour and they don't
know for sure that they can solve it easily, they give up.
In other words, looking to see if they already have the expertise, which is necessary to
solve the problem. So what [inaudible] project and InnoCentive project have in common
is that humanity as a whole already had the expertise necessary to solve these problems.
But the expertise was latent. And so what the tool's doing is really activating the latent
expertise by connecting the right expert Zacary Brown, for example, to the right problem
at the right time. It was very obviously true for InnoCentive as the way I describe it and it
was also true in the case of the polymath project as well.
If you look in the archives, which are, of course, all still there, you see the same pattern
emerge over and over. Somebody proposes a half-baked idea. They can't go any
further, necessarily. Somebody else comes in and says that makes me think of.
Somebody else comes in and says that makes me think of boom boom, boom, boom,
you keep going. Very similar to what happens in any creative conversation that you can
have in any a room. Few relevant facts. One is it was carried out at much larger scale
and two is -- where am I going? The much larger scale. Two is it would have been hard
to quickly assemble these people in a room, the number of the experts involved had
never met one another in the past.
So they got very easy, very lightweight access to this expertise. In some sense you
could say what the tools were doing was restructuring expert attention. What I mean by
that is instead of Zacary Brown or the polymath sitting at home that we're using their
expertise in much higher leverage ways. They weren't just sitting there working on their
wireless radio networks, they were doing something that would benefit a large group of
people on the other side of the world.
So one of the questions that I talk about at a lot of length in the book, not really going to
touch on much more here, I think it's very interesting question, how can we design tools
which allocate all the different types of expertise optimally? So this problem can be split
up in a number of ways. It's kind of a technical problem, which is allocating attention
where people, individuals, have maximal comparative advantage.
Certainly Zacary Brown, he was making much better use of his time by working on the
InnoCentive problem. But there's not just this technical problem which is a matching
problem, matching people to the right problem at the right time for their particular set of
expertise, there's also an interesting incentive problem even if you can actually do that,
solve that sort of technical matching problem it doesn't necessarily mean that people are
going to want to work on that problem.
So there's an interesting incentive problem, make it so people are rewarded for allocating
attention in these ways. And if you think of a lot of examples, a lot of different tools. Not
just polymath and InnoCentive. But I've given a couple of examples which are widely
used in the open source community GitHub, IRC channels, that sorts of thing, in some
sense they're all working on this problem.
They're all different attempts to match the right person's attention to the right problem at
the right time. I'm going to switch track now for the next little bit. I want to talk -- I want to
get back to basic science and talk about the ways in which scientists are or are not
adopting some of these tools. I think it's true to say or fair to say, and I'll try to defend this
statement, that scientists have in fact been tremendously inhibited by adopting many of
these tools. If you talk broadly across the sciences I think that's true. It's not necessarily
true in certain parts of science. There are parts which are better than others.
But broadly I think this is true. I want to illustrate that statement with a few examples.
This is a website that was started in 2005 by a grad student at Cal Tech named John
Stockton called the quickie, stands for quantum wiki. And it was a very simple idea, to
develop a research level Wikipedia for quantum computing.
And the idea was almost there would be a super textbook for the field. You would be
very rapidly evolving, constantly updated with the news about the latest breakthroughs in
the field, descriptions of the big open problems, people's speculations about how to solve
the problems, descriptions of what was going on in labs, all these kinds of things.
I happened to be present at the workshop where this was announced at Cal Tech. Very
interesting to chat with people about what they thought.
A large, certainly some sizable contingent of people were quite hostile to the idea. They
said what a waste of time. Why would anybody ever do that.
A larger group of people, it's certainly not a random sample, but amongst the people I
chatted to about the idea were very excited. These interesting conversations you'd talk
for five minutes they'd say you could use it to do this, you could use it to do that, da da
da. And get to the end of the five minutes and say well what are you planning to
contribute?
Oh, no, no, no I don't have time. Geez, I hope somebody else does. Right? It would be
great if such a resource existed but I don't personally have time to contribute.
And, of course, if enough people repeat that little story, well, it's inevitable that it's not
going to do so well. So it's essentially failed. I should say -- well, there's a number of
caveats to that. But it's failed in a particular way. It hasn't recruited a large number of
people to collaboratively construct this kind of knowledge base.
Actually, as measured by the number of downloads it's done pretty well. A lot of people
want to get information, but they're not necessarily willing to take any time at all to
contribute. And you can repeat that story across many, many similar attempts to
construct science wikis. A could more examples, The Not Atlas String Wiki Theory and
many more.
There's also been a large number of attempts to construct the so-called scientific social
networks, kind of the Facebook for scientists idea to connect scientists to other scientists
with complementary interests so they can share data, share code, share ideas.
And a lot of money has been poured into these. There are dozens of them in fact.
Here's just a few. And in principle it seems like a very good idea. And some of the sites
are quite nicely implemented.
In practice, if you create an account on such a site, at least on all the ones I've created
an account on, you log in, you look around. It's a virtual ghost town.
Some of the sites actually claim that they have hundreds of thousands or even millions of
members. I don't know what these members are doing. In a couple of cases I know of I
know what's happened is people developing the site have gone to one of the big scientific
societies, they've paid for the membership database. Run a pill script over it and have
accounts. They're in fact ghosts. What's going on in both these cases many, more I can
describe, is obvious; it's worth spelling out. I'm going to move away from the slides a bit.
Particularly if you're a young scientist and let's say you want to get a research job at a
major research university.
Even if you think -- even if you think that the quickie is the best idea since sliced bread or
one of these scientific social networks is, the career calculus is not good. Should you
spend two or 300 hours writing a couple of mediocre scientific papers nobody's ever
going to read or spend a long time making a slew of brilliant contributions to the quickie.
No matter how enthused you might be, of course you understand that from the point of
view of your tenure community you'd be insane to spend the time on the quickie, it's not
going to matter.
That would be wasting your time, a phrase I've heard many people use in reference to
these things. Despite the fact that on its scientific merits you might believe that's the
better way to go.
So if you want to get adoption of these kinds of things, it's a very tall order. You need to
actually change the culture of science, change the incentives and change the reward
system in some significant way.
That seems like a very hard problem. I'm going to talk a little bit about how to address
that problem in a second. Before I do, I want to just come back to InnoCentive to the
polymath project and talk about how they fit into this picture. Don't they contradict it?
No. They fit in perfectly. What was the polymath project doing?
They were working in an unconventional way towards a conventional end. The end of
the day they have written a series of papers, right? And this was discussed right at the
outset. The discussions about authorship before they had even gotten started. Right,
very much people saw this as an unconventional means to a conventional end. Of
course InnoCentive even more conventional end, it's cash.
And that's great. Those projects are terrific. But it does mean that things like the quickie,
which are ends in themselves, people are not exploring. So I'll talk about an example
where in fact the culture and incentives have changed in a really dramatic way. And that
actually involves the human genome project. Are there any biologists in the room?
Anybody familiar with the Bermuda project, Bermuda principles? Excuse me.
Let me go back to the early 1990s. It's becoming clearer that the human genome is
going to be sequenced sooner rather than later. And there's a problem, however, which
is, again, if you're a young molecular biologist, why are you going to take your data and
share it with others? It's kind of the same as the quickie. You're not going to take the
time necessary to take your data and upload it to Gen Bank or one of the big online
databases because it's not something that is going to be recognized by your peers.
You're not going to list this on your CV, uploads to Gen Bank at that particular point in
time. Of course, everybody in the community could see that it would be best if the data in
the genome project was shared. That was obvious. It didn't mean that people wanted to
unilaterally go ahead and do it themselves first. So there was a lot of discussion. I'm
going to skip over some big part of the story, but a really crucial moment occurred in
1996 when the welcome trust organized a meeting in Bermuda, which had leading
representatives from the Human Genome Project were there. Craig Venter, who would
lead the private effort to sequence the genome was present. Representatives from the
Welcome Trust were there. Representatives from the U.S. NIH were there.
And they talked the problem over for several days. They drafted what are now called the
Bermuda principles. What these principles state essentially is that if you were working on
the genome and you took some genetic data you would upload it within Gen Bank within
24 hours and the data would go into the public domain. It wasn't a toothless agreement.
The reason why is because the representatives from the grant agencies went back to
those grant agencies and they baked those principles into policy within 12 months.
What that meant was that if you wanted to work, if you wanted to get funding to work on
the genome, you needed to agree to abide by the Bermuda principles.
And so big shift, immediately. Everybody now needed to be playing that game if they
wanted to get cash. And that's a big part of the reason why the human genome is
available today. As I've said, I've oversimplified the story. There were couple of big
moments there. But that was certainly a big part of it. That's a nice story but, of course,
the human genetic data as important as it is. It's a tiny fraction of human knowledge.
Even if you just look in other parts of biology, it's really spotty, depending on what
species you're talking about, the situation may be different. One biologist I was chatting
with said to me, his comment was that he had been sitting on a genome for an entire
species for more than a year.
And that's a whole species of life that's sitting there, the genome, kind of rotting away on
his hard disk. And of course this situation is not uncommon. This is a person who he's
certainly well known in the open source community for his contributions there.
So I think that's -- well, it's unfortunate. When I give talks certainly at universities
frequently I ask people to raise their hand to say whether or not they engage in
systematic data sharing. So not if somebody e-mails you you'll respond with an Excel
spreadsheet or whatever but who actually engages in systematic data sharing.
Excepting a few particular sub fields, the standard response is maybe 5 percent of people
make some systematic attempt to share their data. Where I'm going with this, of course,
is it's not just data which is significant, of course, inside of a lot of laboratories there's all
sorts of stuff which is locked up which could be very potentially useful if shared, all kinds
of ideas and questions and scientific code that could be potentially shared which at the
moment is not.
I said that I'd give you two examples where the culture of science has really changed in
this regard. The second example is actually much bigger than the human genome
project. But I need to go all the way back to the dawn of modern science to explain this.
And so the example, in fact -- well, I'll start with Galileo. So Galileo, 1609 builds his first
astronomical telescope December. For whatever reason he doesn't point it at Saturn for
a whole seven months. But in July 25th, 1610, early in the morning, he points it to Saturn
for the first time as far as we know. What he's expecting to see is a little disk, this is what
he's seeing as he's pointing to the other planets. But actually straightaway, immediately
he sees it's not as he expected. It's a small disk with little bumps on either side of it.
And what he's seeing is the first-ever hint of the rings of Saturn. His telescope was
actually not quite good enough to resolve the rings. That would have to wait for Wiggins
some years later. But straightaway, he knows this is a huge discovery. It's hard to
appreciate now but now at the time our image of the heavens had almost been
unchanged since prehistoric times.
So it was a big discovery at the time. And Galileo doesn't announce this to the world.
No. What he does is he writes it down a description of the discovery in his private notes.
And then he scrambles the letters in that description into an anagram. And he writes
letters to four astronomer colleagues, within 24 hours of making the discovery, including
Kepler. Including the anagram. He mails off the anagram to Kepler and four other
people. What this means is say if Kepler later announces the same discovery, Galileo
can reveal the anagram and get the credit.
But in the meantime, he hasn't revealed anything. All right. I mean, imagine the human
genome had been released as an anagram.
Now, this is not to say, it sounds like Galileo is a bad guy but actually it was a sensible
response to the incentives at the same time. Leonardo did the same thing kind of thing.
Newton did the same kind of thing. And Hogens' and Hook, Robert Hooks of law fame,
high school for most people. He revealed Hooks Law as an anagram. It was very
common at the time. It was because there was no incentive at all to reveal discovery.
Modern scientists point of view is to say we solved that problem we solved it with the
scientific journal system. That's kind of a one sentence summary of a process that
actually took close to 100 years.
Because scientists were not terribly interested in publishing in journals initially. There
was no link between publication and career success back then. That was something that
had to be constructed in the culture over a period of many decades. The thing that we
take for granted today was actually not at all obvious that that was going to happen.
So let me give you kind of a couple of sample quotes to illustrate this. This is a passage
I've adapted from Mary Boas Hall, who was the biographer. Henry Aldenberg who was
the editor of the scientific journal philosophical scientifc of the society. He would beg for
information, sometimes writing simultaneously to two competing scientists on the grounds
it would be best to tell A what B was doing vice versa in the hope of stimulating men at
the time to more work and more openness.
She had these descriptions of Aldenburg bouncing information backwards and forwards
insinuating that each person was ahead of where they really were trying to get the other
one to reveal kind of the gap in information.
And Aldenberg would publish distillations of these letters in philosophical transactions.
This is a guy working really hard to get disclosure.
Another quote -- this is from one of the great scholars of the printing press, Elizabeth
Eisenstein, she's got a great chapter in her book where she talks about the use of
printing press by the early modern scientists. She's utterly amazed and befuddled by the
fact that they didn't want to use it. Here's what she writes, exploitation of mass media,
meaning books, is more common pseudo and scientists who often withheld their work
from the press. This is 220 years after Gutenberg. It's not next week or next year. It's
not even the next century. It's 2200 years later and scientists don't want to use it
So kind of the short answer is to say, well, what caused the transition to the modern
system was somehow establishing this link between publication and career success. But
how did that actually happen? What was the motive force there?
Well, a few things, first of all, the transition actually took decades. Some historians have
labeled it the open science revolution. Historian of economics actually at Stanford named
Paul David written a lovely 120-page paper where he discusses the reason for it and to
boil his 120-page paper down to two words, patron pressure. This is his answer. That's
what caused this link to be established.
So let me explain, let me break that down for you and give you an example. This is my
example, actually, not his. But it's in the spirit of his paper. It's actually the example of
the moons of Jupiter, galilean moons of Jupiter this was Galileo's big discovery made
before the rings of Saturn, and he acted very differently in this case. He didn't send off
anagrams or do anything silly like that. No, he published. Very quickly, actually. And he
did so for a very interesting reason. This is the pamphlet that he published or the cover
of the pamphlet Siderevs Nvncivs, The Starry Messenger. And if you look, what are the
largest letters here? It's not Galileo. Medicia Sidera. Why is that? Well, Galileo was not
happy with his living situation at the time and he immediately wrote to several potential
patrons, including the Medicia and said I will name these moons after you publicly if you
agree to become my patrons. And the Medicia wrote back and said, sure, why not.
That's how he got the patronage of the Medicia. And of course the point here is funders
often have very different incentives than scientists do. In fact they have more incentive
for openness than scientists do.
So actually I should say, by the way, this was published, this was engaged in I think it
was about a six week long negotiation before doing this. And he printed it at
considerable personal expense to himself. This was very early days. Anyway, funders
often have more openness than scientists this was essentially what David beliefs was the
reason for this open science revolution back in the 1600s.
Step forward to today, and there's of course a real parallel between the story, that story
and the story of the human genome project. At least in part the human genome is open
data because of funder pressure.
At least it's not completely the story. There was a lot of will on the part of the scientific
community there. But the actual enforcement mechanism at the end of the day was to a
great extent those policies implementing the Bermuda principles. So I would say that
today we should certainly look to the green agencies to work towards much more
stronger open data policies there's op data policies to the green agencies open trust has
been very good but we can expand and strengthen those open data policies so that they
apply to a broader range of data much earlier in the discovery process not just data but
actually many other kinds of scientific knowledge that's presently locked up that can work
towards open code policies and actually not just that but also help legit ma ties new tools
by encouraging to submit, scientists to submit nonstandard impact.
So if somebody uploads, you know, contribution to a site like the wiki, why can't it be
used as evidence of impact if it's of high scientific quality. If they upload, for example,
maybe a video to YouTube showing in detail how some scientific protocol is implemented
in the laboratory. Often it's tremendous difficult to replicate science just because of the
lack of that kind of information.
Certainly talking to many chemists that's true. I guess I would go so far as to say that
really it's a broad principle publicly funded science should in fact be open science.
There's obviously some, a number of exceptions to this. There should be exceptions for
confidential and proprietary knowledge. But as a brought general principle, I'd say that
publicly funded science should be open science and that there's a huge amount that can
be done to work towards that world.
The reason why I wrote this book in large part was to help make open science a public
issue. There are two reasons for that. Reason number one is that internal to the
scientific community, I like to see a serious discussion take place about what types of
contribution are valued. It's not just putting papers on your CV but other things that are
valued as well.
Alongside that discussion I think there needs to be a public discussion about what type of
scientific culture we want to support by public money. There's about $100 billion being
spent each year to support publicly funded research around the world, about 39 billion in
the U.S.
And, well, I think the public deserves the best possible system for its money. I'm going to
skip over -- I'm going to skip over just a couple of bits and finish with a little description of
some organizations which are doing some very interesting work in this space. I've
certainly worked with some of these organizations.
The first one is an organization called The Alliance for Taxpayer Access. This is an
organization that's done a great deal to lobby for open data and open access policies.
Mostly in the United States. Probably the biggest success so far in terms of a policy shift
is perhaps the NIH open access policy. So this was a policy that came in 2008.
Basically if you receive money from the NIH, it means that any papers which you produce
with that grant money must go into Pub Med within 12 months of being published. So
they become openly accessible within 12 months of being published. The NIH is a
34-billion-dollar-a-year agency. So pretty soon we'll start to see a lot of scientific
literature showing up in when people do searches and things like this it will be openly
accessible.
One of the things that they're working on at the moment is the federal public research
access act, this is an act that's been kind of hanging around for about five years now. It
came on to the floor of Congress in 2006 when it was sent back to committee. Came
back in 2010. Sent back to committee again, which is where it's currently languishing.
What the act would do it would extend that policy, the NIH open access policy, to all U.S.
federal agencies with budgets of over $100 million a year.
So basically it would mean any federally funded U.S. research would become open
access. They've also done good work with open data and various other things.
If there's one piece of legislation I could wave a magic wand and get passed at the
moment that would be the federal public research act. So they're doing really good
things. Creativity commons which many of you are familiar with. They're best known
with their work with general culture, their licenses but actually they've done some
interesting work with science as well. One of the things they've done a lot of work doing
is basically going to often large companies and convincing them that actually some of the
data they hold could usefully be made open data. It's pretty competitive it's not actually in
their pipeline of products they're going to develop in any way. So they've done some
good work convincing them to make that data publicly accessible. And there are many
other organizations which are doing all kinds of interesting work. I've just mentioned a
few here. I won't go into details.
Anyway, with all that said, thank you all very much for your attention. And I'm happy to
take questions. Thank you. [applause]
>>: Earlier on you asked two questions how to allocate attention when people are at
maximal comparative advantage and how to reward them for it. Those questions hit all
the keywords to make you think, ah, free market. But then you came to the conclusion
where you said the answer is taxpayer money and public funding. Felt like the two are
intentional.
>> Michael Nielsen: That's an interesting point. So, of course -- it's a complicated
question to give a comprehensive answer to. I will make one observation rather than
give a really comprehensive answer. The observation is just this, that in order for a free
market to function effectively of course there's a need for stable governance structures.
For example, the ability to enforce contract law.
That's one common observation that economists make, if you don't have that ability, the
free market will not function typically. And so one way you can view some of -- I talked a
little bit about policy that can be used is actually starting to, as a way of starting to create
or expand kind of the reputation economy in science. So it's not just that people are able
to build their reputations by publishing papers, but actually they're able to build it in other
ways as well. Perhaps by publishing code, for example, as a first-class research object
in its own right.
So that's kind of, if you like, a basic infrastructural thing about science. It's a little bit like
the contract law, establishment of contract law to make an analogy, somewhat loose
analogy that hopefully you can see where I'm going with that kind of answer.
So it's a really interesting question, though.
>>: So key you talked about discovery. How about some technology, for example,
people doing the development, the results better than the other and government
agencies actually often sponsor certain kind of challenge, common database that
everybody evaluate their technique common to them so this kind of innovation can be
done in the market, in the whole thing organization?
>> Michael Nielsen: Sure. Of course the whole second half of my talk is about -- let me
repeat the question. It's saying where else can you do these kinds of things, I think, the
short version of your question.
And the second half of my talk is concerned explicitly with a very specific kind of set of
incentives and institutions. It's the institutions that have grown up around basic science.
And so none of that applies in other contexts. Right? Any institutional context you care
to talk about -- so I've talked at quite a few different large companies. Of course, each
institution has its own internal reward system and its own culture.
And so the problem is sort of separate inside each of those companies. I wouldn't want
to engage in sort of broad general -- I was going to say I wouldn't want to engage in
broad generalizations and immediately I want to make one. There has been, of course,
some interesting work done, Creative Commons is an example of kind of legal licenses. I
guess the GPL is the most famous example of this, which can be used to promote open
culture of a particular kind and which seems like -- the copy left provision in the GPL is a
surprisingly powerful -- I mean, I'm sure you know it becomes harder and harder over
time to avoid the pull as the GPLed kind of software gets more and more powerful,
ultimately you want to start using it more and more and then you fall under the spell of the
GPL.
It's an interesting kind of a general tool. People certainly often ask me: Is there some
analog of that in open science. And except in really obvious ways like, for example,
when you're talking about code, no, not as far as I can see there's no similarly powerful
idea.
>>: Can I go back to the first half of the talk and concentrate more on the ground piece?
I'm not sure how relevant it is although I think it may be interesting. Do you happen to
know if Zacary Brown, the person with exactly the right skill set, actually hangs out on
InnoCentive, just looking to see if something falls out of the sky or is there a degree of
separation?
>> Michael Nielsen: Interesting question.
>>: They go, oh, pass it along and check it out?
>> Michael Nielsen: I wouldn't absolutely swear to this, but my memory of my e-mail
conversation with him was, no, he's the kind of person who kind of glanced, basically
InnoCentive will send out an e-mail each week in areas relevant to your to he glanced at
them. He didn't spend a huge amount of time, two or three years since I had that e-mail
conversation so I could be misremembering but I don't think it was a referral thing to
answer your question.
Interesting question. Kind of a secondary market in the case of InnoCentive. Actually
turns out there's other companies who are getting together small groups of solvers and
kind of they act as organizational glue to attack some of these InnoCentive solvers and I
bet they're doing what you're describing. They're actually almost certainly systematically
looking for the right experts. It's an interesting question.
>>: So are you seeing a generalization in sharing what are the incentives for the younger
generation for sharing?
>> Michael Nielsen: That's an interesting question. So suddenly -- I guess the Open
Society Institute, actually the Tharus Foundation [phonetic] got me to go earlier this year
and I gave 35 talks at different academic institutions.
So I got a bit of a feel at least for how different people will respond. And a really
interesting pattern was very senior scientists were often extremely enthusiastic. Very
junior scientists were often extremely enthusiastic. To the extent that I had people get
very upset with me, which I had a couple, they're often post-docs or tenured faculty.
These are people who are completely subject to the system but have no power
whatsoever to control it.
So I talked to somebody senior in a grant agency, for example, and have a great
conversation because they feel like, yeah, for them policy is a malleable object whereas
for a post-doc, that's the reality they live in and they feel like they have no control over it.
Talking to undergraduates and high school students is very interesting, though, and I
presume this is kind of where your question was going.
They don't know what's going to fail. And it's a fantastic thing for them. There's a very
nice project, Drew Endy now at Stanford started when he was at MIT. Basically they're
trying to build a biological commons. So they have this MIT registry of standard
biological parts. The way they're building this is by running an under graduate
competition, actually undergraduate and high school competition. They kind of mail off
these kits to kids all over the world and get them to genetically engineer organisms. They
come together at MIT. The last competition, I think, had 1200 people come to MIT from
all over the world. And they show off these little organisms that they've generated.
All kinds of cool stuff. They have little motors and they build little things that fluoresce in
weird ways and whatnot. It's kind of cool. But the interesting thing they're doing is
they're encouraging those students to contribute the genetic sequences that they're
modifying back into this registry of biological parts so other students in future years can
download them and use them as the basis for their designs.
So there's actually quite a lot of part reuse from year to year. They're gradually getting
more complex. So there the students, they're not responding to the standard incentives
at all. They're just doing what seems cool. And they do some pretty interesting stuff.
>>: So in response to some of the things that you talked about earlier, I presume
[inaudible] because they have some pressure or [inaudible] is that any way you can think
about some ways of tying that kind of innovation through this kind of exercise through
some incentive system maybe get some, they care very much about [inaudible] tie them
all together pick out the best things? Does that help that?
>> Michael Nielsen: There's lots of things people can do. The simple suggestion of all is
for people who are senior in, say, one department, to pin one sentence to job description
which says we encourage applicants to submit nontraditional evidence of impact. And
that one sentence would make a big difference, would make a big difference as well if the
grant agencies did the same thing. So that's kind of a one sentence answer. It's not a
complete answer at all. But it makes a big difference.
The other thing which, of course, can be done is anything which is used in the evaluation
process at all can be modified. So an interesting thing that Google Scholar has recently
started doing, just not very systematically. I don't quite understand the mechanism but
you'll find blog posts occasionally show up in Google Scholar results. People certainly
use Google Scholar when they're evaluating academic candidates. It would be an
interesting thing if those kinds of things started to actually contribute to people tendencies
and things like this.
So I don't think -- there's a conversation to be had if somebody's blog posts are
contributing to their H index some committees are going to be very unhappy about this
and they're going to say we should ignore those results. But it's still an interesting
conversation to have.
Part of what's driven the adoption of pre-print culture in physics is the fact that very early
on in -- so there's this wonderful site. The preprint server in physics. If you're a physicist,
you start your day by going there to see what new pre-prints have been uploaded
overnight. Part of what drove that is the fact very early on Stanford runs a service called
Spires, basically a citation tracking service for the high energy physics committee. Spires
made a interesting decision they weren't going to track papers in high energy they were
going to include the pre-prints as well and aggregate the two. If you had a preprint
version and the published version they'd combine the two sets of numbers. I've literally
sat in hiring meetings where everybody has their laptops out and they're looking through
the Spires citation numbers and you can see what impact the latest pre-print from
somebody has had.
Certainly hasn't given pre-print the same status of papers in some but it gives them some
status that's another example. Any tool which can be used to measure if you tweak it in
those kinds of ways, it has an impact. Small impact but an impact.
>>: One of the other challenges we find with the open data portion data on the science
side is to get that to be where does it live and who pays for it. There's still this challenge
about they're spending this electricity and in the case of NIH they're paying for those.
Start to develop these other disciplines there's not the health-related or some of these
other ones that will have the contribution of funding into those. So environmental areas
and things like that. So always comes back to, yeah, they might be available on
somebody's Web server that might be here today and gone tomorrow.
>>: Well, actually, that alone is a significant issue. You may be able to pay for it this year
but does that mean you can pay for it in 10 years or 20 years. Actually, the pre-print
server is a good example of this that I just mentioned they struggle with funding. All
those preprints going to go away.
Of course, I guess the right you want to step back and say irrespective of the political
difficulties you might have in raising the money to contribute to all right let's say the
continued upkeep of the slow and digital space server which is something that Microsoft
has done a huge amount for, step back and say, well, is it in the best interests of the
scientific community as a whole that that data be made available. Well, clearly there's
4,000 papers that have been written citing the Sloan data. It's gotta be one of the great
kind of contributions in the whole history of astronomy. Kind of I talked about Gallileo
before and Sloan is kind of up in the same kind of ballpark. It's a bargain at the current
price that they're spending at Johns Hopkins to do that kind of maintenance. And that's
just a question of having the conversation. I don't think the scientific merits are any
doubt. They're getting a real good value for money.
>>: The other one that also comes up sometimes is, and you mentioned like the U.S.
essentially putting in policy as a way that formalizing. And I wonder if you ran across the
idea of the nationalism where oh, yeah, it's great if we do it in maybe the UK then you
have other countries that may not be vulnerable.
>> Michael Nielsen: That's a funny thing. Some of the scientific social networks have
been scientific social networks, people from country X, for this reason, which, really, you
know fortunately, of course -- I don't think it's really that serious a problem over the long
run. People are smart. They're not -- there are a few stupid people there but they're not
that stupid, fortunately. Except for briefly.
Okay. Thank you very much again.
[applause]
Download