EIGHTH INTERNET GOVERNANCE FORUM
BUILDING BRIDGES-ENHANCING MULTI-STAKEHOLDER COOPERATION FOR GROWTH AND SUSTAINABLE DEVELOPMENT
25 OCTOBER 2013
MEASURING INTERNET FREEDOM
GOOGLE OPEN FORUM
This text is being provided in a rough draft format. Communication Access Realtime Translation (CART) is provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings.
>> MEREDITH WHITTAKER: Hey everyone. We're going to start in just a second. But first point of clarity, this is -- this session is titled Google Open forum, and that's kind of an odd Internet Governance Forum naming Convention that came post hoc after we had submitted this presentation. This is a joint panel hosted by Citizen Lab and Google Research, looking at measuring Internet rights and openness. So I -- that's fascinating and I hope you are pleasantly surprised if you were here for the Google Open Forum.
We're ready to go?
Cool. So thank you everyone for being here. I am delighted to see this turn out and I'm really honoured to be able to present this panel. These are some of the experts who are thinking about this topic, and you know especially Citizen Lab has really led the way in the type of research that is looking at combining these sort of sociological and political aspects of human rights reporting with, you know, measurements and research and, you know, understanding the technological components of how networks work and how the Internet functions in our lives. So this is really exciting for me.
I am Meredith Whittaker. I am a programme manager at Google Research and my focus is measurements and open data. So that can be translated into trying to understand what is happening and trying to communicate it in a way that can be verified by others. So you're not taking my word for it. You're looking at what the facts are and you're able to contest my claims and make your own conclusions.
When we're talking about measuring Internet rights and openness, you can think of that in terms of the traditional human rights model. So reporting has always been a part of that gathering evidence, gathering accounts of human rights abuses, you know, gathering photographic evidence. And as we moved into a time in the world in which a lot of things began to happen online, in which a lot of abuses may have been facilitated by things that happened with networks, things that happened online, you know, the question really became how do we know? How do we gather the evidence we need to figure out what really happened to make the case for these types of abuses?
And in that context you can think of measuring, you can think of measurements as really evidence gathering, as a way of looking at these networks, looking at the bits on the wire and deducing from that what was happening and what was the impact on real humans in real times in the case of, you know, rights and openness.
So this is connecting Internet research with human rights reporting, and I have on the panel a number of people who are thinking about this from a number of different angles.
So I want to start with Tim Mauer, who is a policy analyst at the Open Technology Institute, and can make a clear connection between this type of research, which may seem very arcane to people and the real policy implications. So Tim?
>> TIM: Thank you, Meredith, and good morning everybody. My name is Tim, and I'm here to represent the Open Echnology Institute, which is part of this collaborative effort.
And I work -- I'm part of -- just in terms of background information, the OTI is a nonpartisan Public Policy think tank in Washington, D.C, in the United States, and I'm part of the team for the Open Technology Institute.
So the reason why I am on this panel is because I can speak to the importance of open data and measuring the network for the policy work. That is my day-to-day work.
I've noticed since I've been involved tin this space that the need for empirical data is important to inform the policy development. As some of you who might be -- have been in Washington might know, there is a lot of policy debate that is not necessarily grounded in empirical research, but this is an area where the more data you have, the better your policy recommendations.
To make that more specific, to give you two specific examples from the work that me and my colleagues have been engaged in in the last year or two is first broadband policy. So this kind of data can be used to verify whether consumers are actually getting what they pay for and to find out if that by measuring the network, whether the service that they pay companies is actually what they receive themselves. And that has direct implications for our policy work specifically when it comes to domestic policy in the United States, and it's been a critical tool.
I work specifically on export controls and looking into whether existing export control Regulations might need to be updated to the digital age. As you know, a lot of what we have seen coming out of the certain countries after the Arab Spring, a lot of the research that the Citizen Lab has been done has shed light on new technologies that are used for surveillance and censorship. And one of the things that efforts that people represented on this panel can help with is to help identify where those technologies are being used. To then inform the policy recommendations and analysis of what kind of changes might be necessary. But once those changes have taken place, how do you actually have a continuous research effort ongoing that helps policymakers to decide what kind of technology they should be looking at. Because as you know, you could -- the spectrum of technology is so wide. But if you have Regulations that are overly broad, you actually end up having a potentially negative effect in terms of what you're trying to achieve. So having that empirical data is very important to eliminate the negative, unanticipated results.
So that is why the Open Technology Institute has supported the measurement lab and other people on the panel. And the MLAT is a platform that allows totally open source measurement. It's open data. It's a global platform, and it's open source measurement methodology, which is why it also has -- why we are trying to put this out there as a tool and resource for other researchers that might not be directly linked to the collaborative platform but they can have access to data to come up with their own research. And that's where Collin's work is a prime example for the kind of effects that you can have by pursuing data that is then openly available.
>> MEREDITH WHITTAKER: Great.
>> TIM: There's currently 800 terabytes of totally open data available. Some regulators are using it. Researchers are using it. And it's increasingly a valuable resource for panelists like myself.
What is really magic is because the data is openly available, you have other people looking at it and connecting it to their own research. And Collin will talk about that in a bit, and I think that's really a critical example for anyone in the room who might be interested in using similar data to reach out to us later on to find out how we can connect our work.
>> MEREDITH WHITTAKER: Thank you so much. I think that's a great introduction. That really frames why is this important beyond simply publishing research papers, beyond doing research. And I'm now going to turn it over to Dominic Hamon, who is a research scientist I work with closely and he is familiar with the measurement lab data and network data generally, and I think you can sort of discuss best practices for measurements and frame this in a more technical scope before we move on.
>> DOMINICK HAMON: I'm one of the few technical resources here. What Tim did was a good job. He started out with it as a way for researchers in academia to do really good global research, collect network performance data, and then access other people's performance data. And what we found in doing that was a number of side benefits that we didn't expect, some of which you'll learn about later on the panel.
But I wanted to talk a bit more about some of the best practices for how we make sure people collect data, and why it is important that it is public and open. So when policy changes are made in a vacuum, it can be at best misguided, at worst dangerous. When you have policy changes that are informed by data, that can be really powerful. Now when you have policy changes that are informed by open data, public data, now you have something that is very powerful and measurable by others. And then when you have policy changes that are informed by public and open data, and the analysis is done using open source and the data collection is done using open source software, now you have something that is powerful, measurable and verifiable by others. And that's the best kind of policy change you want to make.
So there are three steps to responsible policy changes. The first is collect a lot of data before making a policy change. This establishes a baseline for where, you know, where the world is. Then make a policy change. And then collect data using the same methods and compare them to your baseline. And this will tell you whether or not your policy change did what you expected to it do. Or if there were any unexpected side effects.
Now, you can take that it methodology and imagine that you're collecting data in a sovereignty that is external to yours. You collect data for a long time, you establish a baseline. And then over time you notice a change in the data compared to the baseline. And from that, you can infer that there was a policy change in that sovereignty, even if it wasn't made public. That policy change might be one of censorship, one of throttling, it could be some other human rights violation, and that is the power of having the open data and open measurement and open analysis.
So a bit more about what I do specifically. I do a number of different projects in network research, but I've spent most of my time with the measurement lab. And a lot of my work is making sure that the data collected is internally consistent, which allows for these kind of longitudinal studies, right? It allows for the long-time baseline studies to happen. Making sure that the platform is broad and stable so that we have global coverage. Establishing mechanisms for easy access to this data. This is 800 terabytes of data. That is an awful lot of data. And it's not always obvious how to start analyzing it, how to start getting at it, how to find the needle in the haystack.
But one of the benefits is that we don't often know what some of the signals in there are, and we rely on other people having access to that data to find those signals and you'll learn about that from Colin specifically.
And the other is to establish good practices for responsible collection and publishing of this data. And what do I mean by that? Well, this is where some tension comes in. Because we want to make sure that this data is comprehensive. We want to make sure that this data is public, and we want to make it open. But when you start collecting data about people's network practices, or their connections from their smartphones, it's very tempting to start collecting data about things like their location. And you might want to assign every user a unique identifier, so that you can track individual devices through the network performance data. This can be very powerful. But it immediately starts infringing on those people's privacy. So we have a constant tension between wanting to get as much data and as good data as possible, but also making sure we're collecting it responsibly, such that we don't expose people to danger.
I think that's what I wanted to say for now.
>> MEREDITH WHITTAKER: Before I pass it to Collin, can you tell me how many iPods is 800 terabytes?
>> A lot. Five thousand.
>> MEREDITH WHITTAKER: That is a log of iPods.
We are going to move on, and I want to pass it now to Collin Anderson.
And we're sort of narrowing the scope now. We introduced the principles of open measurement. We talked about best practices and why this is so powerful in decision-making. And now Collin is going to talk about some research that he did, using this data that as Dominick emphasized was initially collected to show network performance. To show researchers who were interested in how to tune, how to understand the global network, to show them how to do that. This was technical data for technical people. And by collecting it, by creating that baseline it gave Collin the opportunity to do some really stunning research. So I'll pass it to you.
>> COLLIN ANDERSON: So I wanted to start off, there is another side that might be more framing than just the Article.
>> MEREDITH WHITTAKER: I have only this one.
>> COLLIN ANDERSON: So what is interesting is that States are increasingly sophisticated in the ways that they control networks. Whereas in 2009 you'd see articles and mechanisms for States. I generally focus on Iran, so I'll use Iran as a specific example. Because Iran has I think a strong tradition of being more sophisticated, more aggressive, and in some ways more creative in the ways that they dealt with the Internet.
So in 2009 the Iranian government struggled to deal with the popular uprising stemming out of allegations of electoral fraud. The immediate response of the Government was to shut down part of the communications networks. The mobile text messaging, for example, was down for 45 days within the first couple months. And what you saw across time was that when there are specific protest moments, networks were shut down. They would be more heavily censored. They would be subject to greater degrees of interference.
In the case of Iran, this became in some ways increasingly sophisticated, more narrow but more aggressive. So whereas you had that total network shut down in 2009, in 2010 you would see interference with SSL. You would see interference with the Torrent network. You would see DNS hijacking things. You would see DNS hijacked and leading to a phishing site. For gmail, using a compromised cert. This was increasingly sophisticated, in a sort of morbid way kind of impressive.
And so along this narrative, what you also saw was tied with political moment, this idea that the Internet was slow. And I think that this is interesting, because it's pernicious because it's a poor story, right? It's not tangible. It's just sort of, you know, we all have experiences where the Internet at our house is slow. But when you look at, you know, this mechanism of speed throttling, it's actually an incredibly aggressive, targeted move, right? Because this isn't simply Comcast is slowing down my bit torrent. This is my morning routine is I wake up and turn on my antifiltering tool. Maybe by the time the tea is done you get connected. Maybe as I make toast I may be able to log into Facebook, and by the time my breakfast is fully prepared maybe I can get my news feed up on Facebook. And so this became the story associated with preplanned protest moments.
So just to back up, we have seen in some degrees what seems like an accellerating notion of disconnecting the Internet with political events. But I would argue that what happened in Sudan last month, what has happened in Syria a couple months is going to be a decreasing phenomenon. What you get out of throttling is something that traditionally has been difficult to measure externally. It's a very boring story. And then on top of that, sort of achieves the same purpose, right?
It acknowledges, in this pragmatic way, the modern media landscape, which is that a state can't censor news of violence against protesters, against indigenous population, but today that violence doesn't exist unless there are media, pictures, videos, and similar things. And all of those things take much more bandwidth to convey than exists when there is a throttling regime.
And so what we're able to take, for example, and the beautiful thing about NDT is it has a measurement lab is largely because of the network diagnostic test, which is bundled in with a number of applications. A very large portion of this 5,000 iPods is NDT/data. So because everyone loves bit torrent, of which NDT is bundled with a popular bit torrent, you have the nodes of measurement across the world. In Iran there are something like 60 to 100 tests a day. And in places that you would -- that, for example, a researcher like myself could never go in, you find thousands of tests, even on a daily basis.
And so what we have now is we have this mechanism of accountability that was built in order for incredibly like relatively nuanced issues of Public Policy.
And so we can take this data, this measurement data, which had, you know, initially was -- seemed to be conceived of to -- for FCC and for European Union type purposes, and we can apply it. And so now we have a daily mechanism just to test and detect for throttling.
And so what we're able to do in this case is go back across -- the beautiful thing, this is more than 3 and a half years of data. So we have a very large portion of the sort of post 2009, post green movement period. We have a sort of demonstrable evidence that we can go back and say on this date, from this date, throttling occurred. It was you know, X percentage decrease. It lasted for this long. And then on top of that we can start taking a look at who the privileged people were.
And so to take for an example, and I'll conclude with this, so like I said, Iran has actually a very predictable censorship mechanism. Important dates, popular moments of contention, and included in this is elections. Especially an election, you know, this election that happened on June 14 was the first presidential election since the green movement.
So you know that something is going to happen. And so based off of that, we can sit with this data and we can measure as the Iranian Government institutes throttling the day of the official candidates being released, and not relenting on it until the day after the election results were announced and stability hit.
We can also take this for an advocacy point of view and then turn it into real public tangible results. So this graph, for example, is the electoral period. And what you see is an over 70 percent decrease in aggregate bandwidth. This doesn't begin to show some of the other forms of censorship that occurred, but it's very demonstrable over the control that happened.
And based off of this, based off of using peer reviewed methodological cited sources, we can then take this graph -- and this graph appears in an alternative form. In the UN Special Rapporteur for human rights in Iran's report that was released a couple days ago.
And there are something like three or four issues of Internet censorship that are included in that. And two are built off of open data sources that have an open methodologies that did exactly what we're talking about.
And on top of this -- last point, sorry. The Iranian Government reacts to every one of these reports very vehemently. And this is because they can create doubt in the methodology, in the data collection. In this case, what you can say is you can run the tests, here is the code, here is the calculations. You can run the test. How do you -- how can you disprove this data? It's difficult. And so this is what everything that these -- my previous two colleagues are speaking of, this is the power of that sort of sentiment.
>> MEREDITH WHITTAKER: Great Collin, thank you so much. I think that really brings it home and that graph is impressive.
So now I want to move it on to sort of -- discuss a different level of data. Another type of data collection. And I'm really happy to introduce Marco Hogenmorning from Ripe Atlas (inaudible). He can talk about the way in which the data that Ripe collected led to accidental revelations about connectivity that was happening in Pakistan and then talk about Wright and the Ripe Atlas measurement project generally.
I have a video here that may or may not play, and I'm just going to try to put that on as kind of a background tableau while Marco talks. But that may not work. So Marco please just take it away and I'll work with this.
>> MARCO: Thank you. As Meredith said, I work for the Ripe CC, the registry for Central Europe and parts of Central Asia. In our business we distribute and register IP addresses. Besides the core activity of doing that, we have quite a substantial research Department. And that came into existence out of a bit of curiosity. Not only distribute IP addresses, not only register them, but tried to see how they are used on the Internet. And that sort of turned into a quest of mapping the Internet at an infrastructure level that started 20 years ago with a project to try and count the number of computers connected through the Internet, the host count.
That idea now is impossible. Now we've got other things that look now at the Internet. And if you are phishing the Internet as a patch blanket of independent networks that all interconnect, what we're primarily looking into and measuring is how these connections are made. How the topology of the Internet is formed and how it changes over time.
So we're not really looking at performance data. And we're not really looking into censorship. That doesn't mean that we do not occasionally see things in the form of serendipity show up on our measurements that indicate censorship.
Now obvious examples that we see, for instance, are the recent disconnects in Egypt and Syria where we simply see the networks disappear from the Internet topology. They no longer show up in our data.
To give you a bit of background, we have about three months of data online just to help people troubleshoot network faults, because that's our primary goal is to help people trace topology faults and find out what is happening.
Right now we have just over ten years of this data online, so we can build really long trends. And sometimes after incidents occur, we can go back and try to recreate what happened.
So just to quickly introduce this video, a few years back the decision was made in Pakistan to block YouTube. That was done. And so sometimes people make mistakes. So the first thing was that this was never bound to show up in our data if everything would have worked out as planned, YouTube would have been cut off in Pakistan and nobody else would have noticed it.
Some misconfigurations both in Pakistan and upstream of Pakistan caused YouTube to disappear for the whole world. And that delivered interesting data if you later on go back and visualize what happened. And with that, we can then point out like where the mistakes or point out we can assume where mistakes were happening in the configuration.
So if Meredith will attempt at loading this video.
>> MEREDITH WHITTAKER: I don't think I can. But maybe you can -- there is a video here. And we can provide a link at the end so you can watch it. I think, you know, I trust Marco's narrative stylings to communicate sort of what happened. It was really interesting here.
>> MARCO: Yes, so the picture showed a snapshot of one of our visualization tools. It's kind of tiny, where every number represents a network on the Internet.
To the left -- to the right, I think, is YouTube here and to the left is Pakistan telecom. And if you would play the video, unfortunately, we can't, you sort of see that that announcement. You see, where Pakistan tried to pretend they were YouTube and that's quite a technical way of filtering something quick and easy. You pretend that you are those IP addresses and that means that all traffic redirects to you.
That was meant to stay in Pakistan, but unfortunately it got out to the world. And as it got out to the world, that message spread across the Internet. More and more networks decided that YouTube is in Pakistan. Let's go there.
So all traffic that was directed at Google that was directed at YouTube ended up in Pakistan Telecom, which, A, couldn't handle the load and obviously didn't have YouTube online, causing the rest of the world to panic and probably go easily bored because YouTube was done.
It's a shame that we can't play it, because it's a really nice visualization and you slowly see the world pick up on it. We see Google respond pretty fast in trying to mitigate the error and making several other messages to the Internet saying no, no, we're really YouTube. And we see some networks pick on that and some networks doesn't.
And the end of the story is that somewhere upstream of Pakistan, somebody gets a call and hey, you made a typo, corrects the error and you see pretty much Pakistan Telecom cut off and the rest of the world restored to the original plan, which was YouTube and Google.
Do you have my other slide?
>> MEREDITH WHITTAKER: I do.
>> MARCO: Okay. This is one thing, topology. Looking at that and the health of the Internet, we came up with a new plan. This is called Ripe Atlas. There are many atlases in the world. This is ours.
And we have got, and I've brought a few, these little tiny devices, which are tiny Linux computers that connect to a network, it can be in your home or in your data centre, and come in really small measurements. The equipment is not smart or no big enough to do content analysis. We can't do performance measurements. But we can do things like Bing, we can make a basic network connect to a Web server or send out a DNS request.
Our goal is to have one sitting in each of these networks that make up the Internet. There are 50,000 of them. Right now we have got up to close to 5,000 of these little devices online.
The map, every dot represents one of these devices. And you already see few red ones. Those are the ones that at the moment we took the snapshot they were not working. It could be a network failure at home or a power outage or whatever.
Now, an interesting bit in this network is that while we run some basic measurements to check on the K route service, and try to map topology changes on the Internet, we built this as an open platform. People participating in the project can run their own measurements, and it can be done on an individual basis. If you host one of these machines or one of these little thingies at home, you get X as you collect points and with those points you can run your own measurements. Partners, people who step in and say I sponsor also get access to measurement data. And they can run their own measurements and they can configure their own things.
So you can look into is this site blocked or not or what typed of DNS response do I get for a particular probe?
All the ma data is made public under the assumption that if you use our data, we require the result to be published and publicly available. That is pretty much our baseline. So people who are interested in hosting a probe, I've got two here and I think I've got a few more back at our booth. Please contact me. Other than that, for researchers in the room, have a look at this. Because this is -- we're not really running the measurements ourselves, we are just building the infrastructure. And Dominick and others can show what can be done with this information.
>> MEREDITH WHITTAKER: Wonderful. What I love about the Pakistan story is that it sort of makes such a clear connection between these technical decisions that .0001 percent of the world would understand or really care about. And their impact on people, right? To make a decision like blocking YouTube, you have to make one of these technical decisions and that can be detected. That shows up when you look at the network the way Ripe looks at the network.
Now I'll stay on the topic of Pakistan, but bring it to a sort of more local activist point of view. This is Shazad Amed, and he is with Bytes of Freedom, as the slide will say. And he has been doing measurements in Pakistan and using these measurements to facilitate human rights work on the ground. So I will let him drive the slides here, and explain this work.
>> SHAZAD AMED: I was beginning to think that the world has forgotten that YouTube fiasco. But it is still alive.
Okay. Quickly, Bytes For All, we are a organisation based in Pakistan. We have been working on Internet censorship issues since 2007. These are the few issues that we work on.
We have been part of Citizen Labs open network initiative since then, and we have been working on all these different issues. This is how we work. We conduct research through the document evidence or policy, advocacy and policy change, and capacity building of citizens.
So the next screen is very interesting, how it started. It really started with a panic alarm among Internet users in Pakistan when suddenly a lot of websites started disappearing in the country.
So -- and then we saw that there was a very visible increase in the filtering. Particularly when they blocked tumbler, that was the time when there was sort of panic. What is happening?
So this particular image that this website is not accessible and surf safely was the point that actually triggered this next group of research. One of the researchers at the Citizen Lab, he could sense that it's net sweeper. And then all the efforts were directed towards this, because he knew about net sweeper deployment in other countries, so that is how it was started.
So they blocked a few Wikipedia entries in Pakistan. And then proxies also started disappearing. So these are the -- that is in any case we were already running a campaign on a YouTube ban in Pakistan. So this was -- this is the access made Right campaign. And then we were also running a campaign around, the same campaign, access is my right, we were talking about censorship issues as well. And then how net freedom and Democracy in the country.
So with this background, when -- I mean, we started looking at it, at what is happening in Pakistan. Only then we started country level of activity. So, we developed a list of URLs which were of national interest, national interest being related to Pakistani process. These were news websites and websites on religion. And then there was a matter, a large list of International significance URLs as well.
So these, a list of -- two lists were then developed and we then field test on an ONI tool that actually runs on a different online connection, on a different ISP, and then helps network measurement.
So using this system, we can -- I mean, it would actually go to each URL and assess what the status is. And then eventually results are uploaded to Citizen Labs, the servers, where researchers can have further analysis of this.
So we came to know through this field testing that this IP, which you can see its based on the Pakistan Telecommunication limited network in Karachi. It was in Karachi. And then we could actually reach up to the Net sweeper admin panel over here.
So that is how -- looking at -- looking at the network analysis and measurement helped us to actually pinpoint what was happening and that was how we could assess.
So what we did with this report? Actually, when we were very sure, I mean that this has happened and that this is happening, the Citizen Lab developed a research brief that was launched in the media as well as in Pakistan, with huge coverage as well.
We are doing public interest litigation against the Federation of Pakistan on two issues. One is Internet Freedom, which actually includes YouTube banning in Pakistan. So YouTube is banned for about more than one year now in the country. So this is one case that we are fighting. And we have had 14 hearings on it already. Just to give you a good update on that, on the 19th of September the court has referred this case to a larger branch, because of several issues. I believe that it's not very pleasant to talk about it. But then let's hope that it somehow reaches a pleasant end at some point and the platforms are unblocked.
So this particular research, when we submitted it to the high court, where the case is being heard, the Government actually rejected this report, saying that this is fake and this is not acceptable. And then they said that this organisation does not have this capacity and how can they develop this, and they have made it up and they are maligning Pakistan. And they just ended up just throwing it out.
So then we said why not initiate a civilian case against the petitioner, because it's bringing a bad name to the country. And then it would unsettle anyone. But we had very authentic and proper research backing us, with the proper data. And luckily the Judge himself had seen this report on the website of Citizen Lab, so he dealt with him accordingly as he had to. So that was a little interesting thing.
This is the poster for the campaign.
So that is the quick story. I just wanted to say that these kinds of data research, analysis, are extremely important for effective policy adequacy. Because the situation is changing. A lot more, many countries are now heading towards controlling the Internet and filtering. And you know it's not only -- Pakistan is a democratic country and we are proud of it. But still we are doing it, there are repressive regimes as well, and they will do it more.
So it is extremely important. We know of these cases and we know how to go about it. And when you go with proper research and talk to and face these people, the policymakers or the judges in the court, it makes your case very strong and it helps in various ways.
Thank you very much.
>> MEREDITH WHITTAKER: Thank you so much, that was great.
That was a -- yes, that really brought it ho. And thank you for your work there.
Now we're going to come into the present and really talk about some of the work that Citizen Lab is doing. You know, has been doing for years and has been doing this week.
This is not theoretical. We're going to discuss some of the results from their measurements at IGF and in Bali, looking at network practices here, right now. What are the differences between the networks that we as IGF attendees can access and the networks that are available to people who live here on the ground.
And I'll let Masashi and Daru, who work with Citizens Lab and have been doing these measurements, talk about them. And I think just give an overview of some of Citizen Labs' work over time, because I think that really helps frame where we are today.
>> MASASHI: Thank you, Meredith. Just to get started, I wanted to zoom out for a second and talk about our general research area, which is trying to understand information controls. And we do this with our partner, such as Bytes For All, Daru, and other members that joined us at the IGF this year.
And just to give a definition of information controls, we consider them broadly as actions conducted in and through communications technologies that either deny, disrupt, manipulate or monitor information for a political end.
So what does that mean? Here is an example of information controls. And we will go through what the categories are. It's not comprehension, but just shows the diversity of the issues that we're looking at around the world.
So today we have been talking a lot about information denial. Internet filtering, throttling, other ways it can be done. There are service attacks, also nontechnical means, broad regulation, broad use of libel and slander laws in some countries. And the objective of all these things is to deny information from reaching to the user. And that can be done for a variety of different means and a variety of different rationales.
That's not the only kind of information control. There are also controls that seek to manipulate or project information. This can be done by compromising a website, changing the physical appearance to have content that might go against the message that a site has. If it's a site of an activist group, for example, all of the sudden the message that is there is against one of their campaigns. Maybe it can be altered and it can be changed. It can be done on a Government site. It can also be done through online propaganda, whether that's trying to project a message that (inaudible) or social media that is (inaudible.) So again it's not to deny the information, but to manipulate the discourse that is happening on our online sphere.
Another area that we are concerned with and do a lot of research on is information monitoring and surveillance. You can consider two kinds of monitoring. Some are passive. They are trying to collect as much information as possible, and the recent NSA revelations shined a light on how that might operate in one country. And others are more directed such as targeted malware attacks against individuals and they compromise their home their networks.
While there are multiple controls that are being exercised, there are also multiple actors that have to be considered. There are States, Civil Society, terrorist groups, cybercriminals and other groups out there, and also the private sector. And each one of the actors have a different place in political, legal, and technical systems. And each one of them are trying to assert different agendas and having different influences over these systems. So what is important is that to understand these different controls and these different actors, you really have to take a holistic approach and use mixed message techniques.
So what we are talking about today, using network measurements and other means of forensics. And what we do in the lab is we try to combine that with social science theories and methodologies and policy analyses.
So Indonesia is a really good example of why you require this extensive research approach in an effort to understand the situation here in terms of information controls.
So just to first start with the sense of the technical infrastructure, Indonesia is a little bit complicated. So there are over 300 Internet Service Providers here. So what the diagram shows, in the red, those are Indonesian autonomous systems and the different colors are foreign systems working under the entity of a particular entity or authority.
So the middle layer of nodes are Indonesian networks that have upstream connectivity to foreign networks, which are on the top. So the point is that this is very decentralized. In 2012, Renesis put out a blog noting that due to this decentralization, Indonesia is likely to be extremely resilient to Internet disconnection, which is interesting.
So using the general methodology that Shazad explained -- sorry I clicked the wrong computer. We have been doing network measurements in Indonesia for a long time. This shows the summary of our measurements from 2008 to 2010.
And the takeaway here, and there is a longer paper if you're interested in this methodology and also the data is available, is that just as the technical infrastructure is decentralized, so too are the techniques and the means used to filter content. In Indonesia there is a general focus on the filtration of pornography and gambling content. But there are other content filters as well and we will get into that in a minute.
So without going into the details of this graph, you can just see that across the different ISPs, there are different means of filtering. So just as the infrastructure is decentralized, so too are the controls.
So just to bring it down to where we are here at the IGF, we have been, as Meredith mentioned, running a project this week on trying to monitor information controls in and around the venue itself, looking at policy practices, and the debates that have been shaping various events, and taking this as an opportunity to explore and analyze wider issues around Internet censorship and surveillance in Indonesia.
And I'm very happy today to be joined by my colleauge Daru, who will explain how the networks that you all have been using the last week work.
>> DARU: So in this venue, there is a sign in between the host and IGF. And there is an open Internet connection available and provided.
And the primary -- sorry.
The primary wireless network identified as IGF 2013. And also IGF 2013-A. And two other networks is IGF 2013.ID and IGF 2013@Indonesia.
And these ISPs are the two largest providers in Indonesia.
>> MASASHI: So we ran some measurements during this week to try to understand how these networks work, and to verify whether the IGF network that Daru explained was free from filtering. And it was. However, the other two that depend on the two largest Internet service providers have the same controls as elsewhere in the country. So If you were on the one at the bottom here, and you went to a particular website, your content would resolve to one IP address and you'd be directed to the swap page, which was the Web Page for Trust Positive. They have a booth outside. And we will go into details by talking to Daru about what that means.
So we tested a sample of 1,387 URLs and found that 197 of those URLs were blocked through DNS tampering. A variety of content was blocked, pornography, LGBT and religious content, and other things.
We will get into the context in a minute. So this is the other network offered by Indosat. We tested the same sample URLs and found that 197 were filtered. Again a variety of content, independent media sites, religious content and certain Convention sites.
So this is just a breakdown of the different content that we found blocked on these networks. Also as a means of comparison, we ran measurements on another network, which we did through a 3G connection tethered through the phone. As you can see the focus, the top bar is pornography. And it goes through other categories related to social issues. And you can see a lot of overlap there in terms of focus on pornography between the sites.
Here you see -- sorry. Here you see political site, things dealing with political reform, women's rights, free speech, some overlap between these. These are content related to Internet tools, eContent, e-mail providers, and so on and so forth.
So just to give a sense of of the overlap between these, the highly decentralized environment of Indonesia means that there can be a variation of how content is filtered between ISPs. However, our results just with these three ISPs out of the 300 available in this country, do show some variation in filtering. There is a general overlap in terms of the content filter. There is definitely a focus on pornography. But we also see the blocking of non-pornographic LGBT content. And one other notable area of difference is we saw an automizer of the messaging tools, which we saw more heavily filtered on the (inaudible) network than the IGF.
So that gives you a sense of the network measurements that we have done, again in a limited sample of what we have been able to do this week. But just with technical measurements alone, that doesn't tell the whole story. You really have to understand the quality of the dimensions, the legal dimensions, and what the society has been doing in the country.
I will turn it over to Daru to talk about the government.
>> DARU: When we talk about this Internet filtering issue, we have to move back to 2008, with electronic information and limited freedom of expression and privacy information.
This shows there was hostility based on race and ethnicity and also religion. And if you see that on this content, we will see LGBT website and social networks that may be included on this regulation have been blocked.
And another regulation is antipornography law, which was first proposed in October 2008. And it's a position from the group and it shows cultural differences.
The law criminal discrimination and the use of pornography said that anyone distributing pornography could face up to 12 years in prison and like fined like 6 million Rupia. 100,000 US dollars. Antipornography laws are so aggressive as implemented by many ISPs, and that's the first state of the DNS filtering. And after that, become DNS Nawala, and now we have trust. So -- but because the centralized ISP and we have like two others and maybe more now, it's becoming fairly difficult. If you get blocked by one ISP and if you want a remedy, it's quite difficult.
>> MASASHI: So we'll be publishing reports on three issues looking at the infrastructure and governance environment of Indonesia, analyzing content controls, and also exploring surveillance by the end of today. So you can have more details on it. And I'd love to open the floor to discussion.
>> MEREDITH WHITTAKER: Thank you so much. I think that's a lot to take in. There are many technical terms mixed with some political narrative. Please ask questions to clarify anything. We will put up links to reach the data and watch the video. But I want to again echo Masashi and open the floor to questions after I thank the panel. This was really, really stellar. I'm really grateful for you guys to be here after all the hard work that you did that brought you here. And especially the Citizen Lab, who has really been leading the charge on this type of research, this database analysis of ground truth. So thank you.
Does anyone have questions?
I see Ali.
>> ALI: Hi. I'm Ali Banja. We did similar research on Iran and we realized that the information controlling Iran is very dynamic. Prior to the election they increased information control.
My question is to the group that did the -- looked into the information control during the IGF, did you notice a significant change, like from a month ago to this week, or did you see a change that implementation of the control by these ISPs?
>> MASASHI: So we were just running measurements during the week that we're here. We do have the longitudinal data going back from 2010 to compare. And the importance is just looking at the decentralized network, architecture and infrastructure, and the decentralized policy and practice of filtering generally in this country. And we just wanted this to be an example for people to take a deeper dive and have a greater awareness of the situation in Indonesia.
>> MEREDITH WHITTAKER: And since this room is a little odd and there is no walking mic, this mic right up here is open to people who aren't in front of the microphone, if you want to stand up over here in front of the microphone, it seems as good a solution as any.
>> Hi. I'm Ashafi from Malaysia. You mentioned that you know when you took the data to the court, they do not think that this is accessible as supporting evidence to claim the case. So how do you think that the law can narrow the gap? Because maybe the Government or the legislators, they don't really know about these technologies. How can you use this data to make -- to state your case? How do narrow the gap between the law and also the development in this area?
>> MEREDITH WHITTAKER: I want to actually, I think Shazad you can answer that, and then I want to direct that question to Dominick who thinks a lot about verifiable data and open data and how can you provide something that can be verified?
>> SHAZAD: It was not that there was any difficulty in court. It was actually the Government liar -- I don't know why I continue to say "liar." It's "lawyer." "Lawyer." So it was him who thought that this is not admissible and this is fake and this is made up.
So -- but the court didn't have any problems. So it's actually it's very important and useful that when you make a case and then when you -- so you have this kind of evidence with you. It only strengthens your point and then it happens in several cases.
>> DOMINICK HAMON: Yes, I'll just add to that. This is one of the reasons why having the data and the analysis be open and public is vital. Because if you have someone challenging the veracity of the data, if it's open, if other independent parties can do similar analysis, it can only strengthen your case and brings it tightly. It doesn't bridge the gap in the way that I think you're looking to, but at least it makes it a little easier to back yourself up with independent parties who can verify what you're saying.
>> MEREDITH WHITTAKER: So before there are any other questions, I just want to say. There are a lot of buzz words. What does "open" mean? Does that mean you can go to a website and read about it? What is "open" versus "closed" data?
>> DOMINICK HAMON: So open data means it is freely and -- I'm trying to -- it's like a game show. I'm trying to say it without saying the word "open." It's really hard.
>> MEREDITH WHITTAKER: Baaa.
>> DOMINICK HAMON: Right. No hesitation. Yes, "public" and "open" are kind of synonymous in this sense where the data is available to anyone to access, to get hold of. The -- would I go further and describe it as having the methodologies of collection are also publicly described and available to anyone to access. It means no barriers to entry. It means uncontrolled. It also, in our sense for MLAT, the measurement lab, we specifically try and -- well, we don't try. We specifically do not aggregate the data. Unaggregated, unfiltered, unprocessed, because in that way there is no risk of being accused of manipulation. Again, it strengthens cases when you can just show the path of data from collection to exposure to public.
>> MEREDITH WHITTAKER: So replicable science.
>> TIM: Just the question with regard to the lawyers. And I think what this panel would say, especially the approach that the Citizen Lab has taken, is that it's important to have the data and to have verifiable data. But you need to also have that intermediary step of educating people about the utility of that new data and how to interpret it. And the Citizen Lab has been doing it successfully and working with groups in those countries who know the context and to make that transition exercise.
But similar to when we saw fingerprinting becoming a tool in criminal law, that will take time and we can accelerate that by working and reaching out to lawyers and law schools and making them aware of these new tools and evidence that is available. But it will take time and, obviously, resources.
>> MEREDITH WHITTAKER: Shazad.
>> SAHZAD: On this note I would also like to mention that there is a lot of talk about surveillance in different countries, particularly if it's a special team on surveillance here or at IGF as well.
So if the censorship report that was published by Citizen Lab, we didn't have that research, we would not be able to get into any of the public interest litigation.
The Government of Pakistan, we have specifically 89 cases that we lost on surveillance and how it was used. So it was totally based on that one report that we could take it and then the Government admit -- sorry, the court admitted it. And the process is still not finished on that, and we have to have a second hearing. But that is another example of how research can be used by activists in the country effectively.
>> MEREDITH WHITTAKER: Great.
So we're hearing you mention net sweeper, you mentioned FinFisher. And I think this is a great way to sort of bring this back around to some of the work Tim is doing.
What is a FinFisher? What is a Netsweeper? And how do we connect this to maybe a more complicated but more realistic ecosystem?
>> TIM: So this is now in the work that also overlaps with the Citizen Labs. So please chime in here.
We started looking in a more systematic manner into tools that were used for censorship and surveillance and are now coming because of the new technology that is available. FinFisher is a good example for software that has been used to spy in countries like Bahrain, but also countries that are having sanctions at least in the U.S. context.
And in terms of the work that we are doing is using the work that the Citizen Lab and other groups have produced in terms of analyzing their technology and then tying that to our policy analysis by looking at what existing -- what is part of already the export control regime. And in the U.S. you have certain provisions with regard to path controls and they already allow for review of such technology, but where are the gaps? And that's where the research and analyzing and taking apart what we are talking about is critically important because it allows us to look at what kind of language in the existing policy already could be -- already matches the description of that technology. And to what degree are there limits. And Collin might want to add to that because he has been doing a lot of work on that.
And we see FinFisher was one example. We have now with this measurement, data that we talk about here, the ability to actually go deeper and find out where those technologies occur and in what countries. And then find out also what kind of lists of countries or what kind of indicators do we need to use to apply for, say, these particular countries. We want to review whether this technology should be used there or not. And I'm happy to talk more offline.
>> MASASHI: Just to add a bit of reference to some of the research that we have done on these tools, so particularly around FinFisher, which is marketed as a Governmental IT intrusion, that is the way that it was described by the company that develops and markets it. They claim that it's only sold to law enforcement agencies and other Governmental agencies for programmes such as lawful access. Quote unquote.
However, as time Tim mentioned, we found it directly targeting activists in Bahrain and Bahrainian activists who live in the United States. And through some measurement studies that our colleagues have done, have detected the presence of command and control servers. So servers that send commands to clients that are infected with FinSpy, which is part of the FinFisher suite, in over 36 countries, including here in Indonesia. And we will speak about that in the blog post that will be published later today.
Importantly, the presence of a command and control center in a country does not necessarily imply that the Government of that country has -- owns or operates FinFisher. However, it opens some very interesting questions. And Shazad mentioned we also found a presence in Pakistan also. And this is an important aspect of this research that has to be community driven.
So we are university-based researchers. We can create an evidence layer and help inform people about what is happening out there, but we really depend on our partners and others to take that evidence and run with it and try to ask these questions of their Government and otherwise of why are these devices being detected on these networks? What does it mean for broader implications for privacy of citizens in those countries?
I just quickly add that Netsweeper is a software used for Internet filtering, developed in Ontario, based in Toronto, Ontario. We found it of course in Pakistan but also being used to filter human rights content across countries in the Middle East. And we actually found an installation of Netsweeper on an ISP in Indonesia. That was interesting.
That just further goes to the point of how decentralized filtering practices and techniques are in Indonesia and why you have programs like Trust (inaudible) and there is a lot of variation between their ISPs.
>> MEREDITH WHITTAKER: I think we might have remote questions and we will take them. But I also wanted to leave this conservation with the question of where is FinFisher coming from? We talked about Pakistan, we talked about Iran, we talked about Indonesia. But these are using tools that, you know, I think Tim and others are well aware of may not come from these countries.
I -- okay.
And then remote questions?
>> REMOTE MODERATOR: So we have Karen Wu here from Malaysia. She asked two questions. The first is for Collin.
I understand that there is 4 terabytes of data that is collected. My question is: How do you ensure that the data is not compromised so that the users cannot be identified?
And the second is for Marco. I understand that you have computers that collect data from the Internet to understand more of the Internet. My question is whether the study is limited to public networks or does it include private networks?
>> COLLIN: So, this is what I was saying about the responsible collection of data. We spend a lot of time with the researchers that write their measurement tools that run the experiments, that provide the data that we manage at the measurement lab. We spend a lot of time with them to make sure that the data that they are collecting is anonymous. And this is one of the things I was mentioning earlier with the responsible collection ware. We have had researchers ask us if we would collect and process -- sorry. -- and store data which has included problematic data points, like geo location. Like unique IDs. Like the -- I can't even think of the list. But specifically with mobile data. It's pretty tricky.
And we have had people come to us with very, very long lists, and we have had to turn them down because we say we are not going to expose users to that level of risk. Once we have that data out there, we can't take it back and it's prime for mining, and we don't want to expose people.
We just -- the only way we can do it is to work very, very hard to not expose people. But that occasionally limits our ability and limits the things that we can collect, which is what I was talking about earlier, the tension.
>> So I think "tension" is an operative word because there is always a debate as to how much data you are going to collect and to what level of granularity.
Take, for example, NDT. NDT is what I use for IP addresses. In some cases -- I mean, in most cases, IP addresses can lead to the denominization of an individual. However -- so that graph was from a paper that I had written on this trend. I took great pains to emphasize including the politization of the NDT/data that the original mechanism was not a political -- was not a politically inspired data point. Which was to say that this was somebody who was not participating in censorship research. This was a normal functionality of a legal tool.
Now, the more that you get into data collection that is politicized, the more that this becomes a larger question than sometimes even the development of the tool. And so if you take, for example, if you look at the development history of learning probe, which is an upcoming and slowly developing censorship assessment framework, this is -- this has been at the core of a lot of the development.
And what you start to do is you start to have to sort of fuzz your data to be nonspecific to say that we're going to report the name of the network, but not necessarily the IP address. Or that we might round off the time.
And so there is a very large now I think set of conversations, of papers, even to a certain extent a research field on the ethics of data collection that I think are really interesting to go through. And I think on top of that, what I would strongly emphasize is that this is a debate in which there is a strong need for challenge at every point that anyone makes a decision on what level of data to collect.
And I would invite everyone in the room to participate in that debate. Because a small amount of voices leads to group think, which leads to bad decisions and missing things that might potentially be costly.
>> Thank you.
The question was brought up earlier this week already, I think, what is the definition of public Internet? So it's really hard to say where we look at. Let me -- the -- the things we see in our measurements are the networks that interact with other networks. And I think in general that's the public Internet. So we don't see isolated networks by themselves.
>> MEREDITH WHITTAKER: And to follow up on that, I guess something that might be interesting for the audience is to understand why Ripe started the Atlas project. Why was this data initially interesting and then, you know, it may have ancillary benefit like accidentally showing what happened in Pakistan. But the original purpose of the data.
>> The original purpose of the data was really to see how topology changes over the Internet, to see breaks, to find and isolate faults. I think that was one of my goals that we started it for. Generally, it evolved in the system. But we know now that it develops a lot more interesting data. But the primary purpose for us is about the technical operations of the Internet, trying to make the Internet more efficient, trying to make it more resilient. That is the core goal of our research.
>> MEREDITH WHITTAKER: Thank you.
Do we have anymore questions from the audience? One and then two.
>> I'm (inaudible) from Bangladesh. I have a question. Can you please tell us or present the future project for data censorship? Do you have any plan? Do you have any research on that issue?
>> I'm sorry. The question was do we have plans for a project on data protection and surveillance?
>> The collection of data or censorship.
It's on the Citizen lab. Is this kind of process going on in the Citizen Lab for production of data or censorship?
>> Yes. We focus on protection and surveillance and other issues around data protection and policies around that. One group I would direct you for that as well is one of our partners Privacy International, who has a network of colleagues around the world who are doing comparative analysis of data protection and privacy policies around them. And Collin might be in the room that you can connect with him later.
This is a broad research area that requires other research and areas of expertise.
>> My name is Mahamoo D. (Inaudible)
The question is, when it comes to network measurement, what sort of data do you want to have? What would be the main data set and what are the challenges to get that data that you would like to have? (inaudible)
>> MEREDITH WHITTAKER: I think you may have just hit on the question that emphasizes Collin's tension points. Dominick, maybe you can talk about what is the delta between dream data, data that is possible, and then what do we have now in relation to each of those?
>> DOMINICK: So in terms of studying pure network performance, which is where, you know, my focus really is, dream data is every possible bit of information about a device on the Internet and its upstream connectivity.
There are many reasons why we would never push for that. But that would be ideal. Because we generally don't know what the factors are that cause network performance issues. You know, when you -- specifically mobile, when you are talking about mobile, does the battery level matter? Right?
Which radio is active probably matterS. We would like to know what, you know. Our people -- what other applications are running at the time when the network performance measurements are taken. Are they running specific applications?
And this is I'm sure any of you can imagine wrought with difficulties. Because you start to be able to fingerprint, which is always a problem. But you also start to, you know, see much more information about individual than really is relevant for network performance.
So we are a very long way away from that, which is actually a good thing. But it does limit our ability to do good or rather excellent network performance testing.
Was there another part of the question I missed?
>> MEREDITH WHITTAKER: I think that answers it.
We are a very long way from the ideal for that use case.
>> MEREDITH WHITTAKER: Yes. I think Marco and Collinn have comment. It seems evidenced here that we can say a lot about networks with the data that we have. What is missing? What capital we say that we want to be able to say, irrespective of what data that would necessitate? So I think for the original question, Marco and then Collin made the comments.
>> That is an interesting point and for us that is a trade-off. A lot of people come to us: Can we do this for mobile? Can we run around software? And we specifically choose to run dedicated devices to take our measurements, because that eliminates a lot of the problems that Dominick just mentioned. But it also limits us in where we can measure and what we can see.
As I said, our ultimate duty, goal is to have one of these probes in every network that makes up the Internet.
>> So there are two things that I'd like to see. One is the proliferation of just more data based off of the assessments that we already have. If you look at that graph you see jittering, and that's based off of the fact that you're stressing the most of what you have, based off available data. And so the more that you have devices and software applications that are recording this data to the network, the more that you get assurances on what you are saying is right. And the more -- even that graph becomes a little more graphically pleasing.
The second thing I would talk about is I would make reference to something that was quite unprecedented, which is a project called the Internet Census, which was a product that was registered during 2012 using highly unethical mechanisms to basically take a snap shot of the entire Internet over the course of a couple days.
What you got out of that was a sense of every -- yes, every device constituting the publicly facing Internet in a certain bit of time. Based off of that, a lot of solid research came to bear.
For example, collaboration with Citizen Lab, finding a very -- what I assume is a very large portion of the blue coat devices on the Internet. And further demonstrating how lax export controls have led to the flow of these devices where US sanctions should have meant they shouldn't be.
So that is one of those mechanisms where you have a large set of data leading to a very solid, very tangible Public Policy outcome, if not discussion.
The problem with the citizen lab is that these sorts of data measurements first off were collected unethically and second off it shows the fragile nature of the Internet.
I'm guessing that more than legitimate research shows that the Internet has been abused before.
Now we are, for example, there was a very beautiful tool that was put out. I encourage you to look at the Internet census and also to look at something called ZMAP. ZMAP allows you to port scan the Internet in approximately 45 minutes. If you go buy a computer on a dedicated server, you can get it in seven hours. Figuring that back in the day we used to talk about this sort of stuff taking six weeks, that is a pretty good time saver.
This narrows the Internet. The Internet is a much smaller place. The problem is that more often than not, the Internet is a much smaller place for people who want to break the Internet. And so I think that is one of the "tensions," even if I use that word in a little less nuance than we deserved.
When you show you the Internet, sometimes people want to use that for illegitimate gain.
>> I thought of another answer which is slightly different, which is I would like to see more data from different areas of the world. One of the things that we struggle with, especially in some of the countries that we're interested in, is a lack of samples, a lack of data. Because we rely on client side probes in various countries running these tests. And so for places like North America, we have excellent data. Great coverage. More data than we could ever want to see. We have pattern data down to minute of the day.
When you are looking at samples in Iran, you are looking at maybe a couple samples an hour, which is really not enough to be able to make -- as you see the throttling drop by over 70 percent, where it's fairly conclusive, it's very hard to see subtle changes. So a lot more data in other areas of the world.
You can see Atlas spreading even further until we get better coverage or the global Internet would be a huge thing.
>> MEREDITH WHITTAKER: I have a quick question. What is a client side programming or test?
>> Measurement lab is a global platform, a bunch of servers running all over the world in various data centres and cupboards. And those servers are just listening for connections. What we have on the other end of that connection are clients. So that could be someone with a laptop running a programme from online. It could be someone running bit torrent, which is, you know, the other end of a test. And what bit torrent will do is it will find one of our servers, it will run a test, find out what the throughput is, and then report that to us.
Or there is something called Mogen torrent, which is an android application which is run on your phone that does speed testing of your connection. So those are all client side tests that run against the server side which is the suite of servers that we have worldwide.
>> MEREDITH WHITTAKER: So these are people whose devices send information to the server in different ways?
>> MASASHI: Yes. Let me just add I think everything that my colleagues have said about rigorous technical measurements and neutral data that is out there is very exciting. But as a social scientist, my dream data set is a little different and perhaps a bit more abstract. So I'd like to see a comparative data set of the bidirectional relationship between political events and information controls that brings in these rigorous technical measurements that were spoken about, but also looks at the legal policy environments in which these happen and what the impacts of those controls were on political events such as elections, sensitive anniversaries. And looking at a longitudinal analysis of that, how we can understand how Governments reacted to things such as terrorism or dissent in their country and otherwise, and what is the impact on human rights? And how can we then work with our partners to try to raise awareness around that and try to understand what the counter narratives to those responses can be. And document that with evidence and drawing in both the measurements and the legal policy analysis and trying to understand the greater implication of this on human rights and International relations.
>> MEREDITH WHITTAKER: We have about five minutes left. I know Collin has a comment he wants to make. Are there any pressing questions from the audience, and then Collin's comment, and then go through closing statements optional. I see a hand over there.
>> AUDIENCE: My name us (inaudible.) I'm affiliated with the Citizen Lab and through the cybernetwork. I work on circumvention with ideas and tools for the Middle East in particular.
I have come to read the news about your proxy, one of the Google ideas that have come about in which you allow US users in assisting Iranian users. So this appears to be a paradigm shift in having Google more involved in helping human rights activists.
Is this a trend that will continue and whether this will move on to other countries around the world?
And another question is do you find that mapping filtered websites through perhaps a plug in on a browser or some other solution that would give comprehensive data to many of us around the world, trying to understand the phenomenon of filtering, is it useful?
>> MEREDITH WHITTAKER: I can attempt to answer the Google-related question. Unfortunately, I work in a fairly narrow field of research. I don't work with ideas, but I am aware of the project.
I think one of the things to emphasize here is that freedom of expression is really important to a number of people at Google. And the U proxy project wasn't actually developed at Google. It was a support given to a group of people at the University of Washington, a group that was developing the lantern project.
And that this is an ecosystem of development that has existed for a long time and that is being supported by people who saw this as an opportunity to offer that project. I don't know many more details. But that, you know, this was developed in collaboration with a number of people who have been thinking of it for a while.
And then is there a browser plug in to measure blocked sites? It could be possible. I'll let anyone who knows a more specific answer to that to answer.
>> I think you do that to a certain extent, don't you? I said, I think you do that to a certain extent, don't you, with Alaka here?
>> MEREDITH WHITTAKER: Well, it's something to hope for, right?
Great. Is there -- are there any other questions before we go to the panel and let them wrap up?
No? Let's take it away. Maybe Shazad do you want to start and then Mashashi you can wrap it up.
>> SHAZAD: A lot to say, but just that, I mean, that -- evidence-based research helps activists in the country for the work they are doing, either it's policy advocacy or it's campaigning. So it needs to be strengthened much more.
There are many situations who are focusing on this good work.
>> MEREDITH WHITTAKER: It's optional. But I would like to hear closing statements and any last thoughts before we close out?
>> DARU: For me, at this point with the Citizen Lab, give perspective, and give -- and also collaboration and input to our network here. And we hope this other enjoy working with us.
And also, it's help us as an independent researcher or activist or any NGO that is working on human rights issues to push the Government on transparency and issues on regulations on the Internet.
>> TIM: And I think my closing remarks focus on you in that what we are trying to do here is build a database with data that we can then use for a variety of purposes. And use of the data really depends on what your research questions are and what you work on specifically. So whether you're a researcher in the room and you have interesting questions, I encourage you all to look at the data that exists, not just the MLAT data but other research efforts. If you're an Internet user, not just an individual Internet user, but you can contribute to the effort by being a client that helps us collect data and feeds into those databases. And that once we have a comprehensive data set that is also in terms of other things, we have more of an idea of how we address those, and then the NGOs and advocacy groups come in as well.
So I encourage you to look at some of the reports that have made the translation of the technical data and policies that you can use for your work. So our work depends on how many of us come together to support this. This is definitely a long-term effort. So if any of you have further questions, come unto us.
>> MASASHI: I'd just want to echo what Tim is saying. I see this research effort as requiring a community, and it's a new kind of community. The research has to be diverse. We need rigorous technical measurements. We need to bring people with technical expertise, like what you are trying to do on the panel today. We also need to look at the legal regulatory framework. We need to look at prior law. We need a broad section of social science. We need area studies.
We had a workshop that Meredith and Collin and others attended in last July that we hosted where we tried to bring together a multi-disciplinary group of researchers looking at information controls. And I think we had like 22 different disciplines and departments there. And that was really exciting.
But it's not just a research exercise. It's not just something happening in academia or within the private sector. It really requires linkages of how can we turn research output into effective research communications that can have policy input that Shazad, Daru and others that are here today have done. And how can we do this together. How can we try to take the importance of evidence-based research and try to make changes with it. So really try to bring everyone together in a new community I think is vital.
>> And to both echo and expand that a bit. The other day in a panel, there was someone from the UK Parliament was saying that they weren't technical at all. So how could they begin to understand some of the technical issues around some of the policies that they were working on. And I think what is clear from this is that there is a lot of work going on by a lot of people who are very interested in this forum, in Internet Governance, who do have technical knowhow, who can provide that expertise, and are willing to, you know, work with nontechnical people to try and increase the amount of evidence-based work that goes into policy changes and policy work.
So I just hope, if any of you out there can take that away as a message, there are technical people around and we want to see technical influence on policy.
>> Yes, echoing what the previous speaker said, the only reason this research is possible is because of the thousands of volunteers that help us collect the data. And we need more, we need more MLAT points, we need more probes out that there that put data into the system for these people to analyze.
>> I think that if I'm going to define success, define an ask. Because we have given broad sentiments. If you are a developer and you have the opportunity to develop a platform or maintain a platform or a technical platform, this is an opportunity to contribute. And there are a lot of people on the stage who would be happy to help you.
And, secondly, if we define success, it's based off of the legalization of efforts. And while this is an International crowd, the lesson that I have, thenext iteration of this has an American on it, (inaudible) the more successful it will be.
Having local outcomes, local advocacy, and increasingly evenlocal use of this data defines what open data, free data is, and why it's important.
And so I would stress that anyone in here who is interested in developing a policy outcome in your home country, own this data. Because I think it's a lot more effective if you do it than any International organisation does it.
So I just really want to reiterate that localization is the most important thing for these sorts of efforts.
>> MEREDITH WHITTAKER: Yes.
Well, thank you so much everyone for being here. I think -- I want to echo all of those statements. Mashashi's emphasis on community and the multi-disciplinary approaches to taking this data and taking all of the information we can gather and painting a picture of what is really going on. Because that's what this is about. This is about having evidence of what the truth and the reality of the Internet and its impact in our lives is. And that is -- I think may require some multi-stakeholderism. I threw that in there for you guys with the bingo cards.
And I also want to echo what Dominick and Collin and Marco were saying and turn it around a little bit. There are technical people to help the people who are working on policy. There are technical people who can help explain what the data means. But without your questions, without your requests, without the framing that you bring to the table, none of that will find its meaning.
So we really need the participation of people who are not technical, their voice, their input, their contextual information to create this whole picture.
And with that, thank you so much to everyone for being here and thank you especially to the panel.
(End of session, 10:40)
This text is being provided in a rough draft format. Communication Access Realtime Translation (CART) is provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings.