Wednesday, March 15. 2017
Zero Days and Cargo Cult Science
I've complained in the past about the lack of rigorous science in large parts of IT security. However there's no lack of reports and publications that claim to provide data about this space.
Recently RAND Corporation, a US-based think tank, published a report about zero day vulnerabilities. Many people praised it, an article on Motherboard quotes people saying that we finally have “cold hard data” and quoting people from the zero day business who came to the conclusion that this report clearly confirms what they already believed.
I read the report. I wasn't very impressed. The data is so weak that I think the conclusions are almost entirely meaningless.
The story that is spun around this report needs some context: There's a market for secret security vulnerabilities, often called zero days or 0days. These are vulnerabilities in IT products that some actors (government entities, criminals or just hackers who privately collect them) don't share with the vendor of that product or the public, so the vendor doesn't know about them and can't provide a fix.
One potential problem of this are bug collisions. Actor A may find or buy a security bug and choose to not disclose it and use it for its own purposes. If actor B finds the same bug then he might use it to attack actor A or attack someone else. If A had disclosed that bug to the vendor of the software it could've been fixed and B couldn't have used it, at least not against people who regularly update their software. Depending on who A and B are (more or less democratic nation states, nation states in conflict with each other or simply criminals) one can argue how problematic that is.
One question that arises here is how common that is. If you found a bug – how likely is it that someone else will find the same bug? The argument goes that if this rate is low then stockpiling vulnerabilities is less problematic. This is how the RAND report is framed. It tries to answer that question and comes to the conclusion that bug collisions are relatively rare. Thus many people now use it to justify that zero day stockpiling isn't so bad.
The data is hardly trustworthy
The basis of the whole report is an analysis of 207 bugs by an entity that shared this data with the authors of the report. It is incredibly vague about that source. They name their source with the hypothetical name BUSBY.
We can learn that it's a company in the zero day business and indirectly we can learn how many people work there on exploit development. Furthermore we learn: “Some BUSBY researchers have worked for nation-states (so
their skill level and methodology rival that of nation-state teams), and many of BUSBY’s products are used by nation-states.” That's about it. To summarize: We don't know where the data came from.
The authors of the study believe that this is a representative data set. But it is not really explained why they believe so. There are numerous problems with this data:
Naturally BUSBY has an interest in a certain outcome and interpretation of that data. This creates a huge conflict of interest. It is entirely possible that they only chose to share that data because they expected a certain outcome. And obviously the reverse is also true: Other companies may have decided not to share such data to avoid a certain outcome. It creates an ideal setup for publication bias, where only the data supporting a certain outcome is shared.
It is inexcusable that the problem of conflict of interest isn't even mentioned or discussed anywhere in the whole report.
A main outcome is based on a very dubious assumption
The report emphasizes two main findings. One is that the lifetime of a vulnerability is roughly seven years. With the caveat that the data is likely biased, this claim can be derived from the data available. It can reasonably be claimed that this lifetime estimate is true for the 207 analyzed bugs.
The second claim is about the bug collision rate and is much more problematic:
“For a given stockpile of zero-day vulnerabilities, after a year, approximately 5.7 percent have been discovered by an outside entity.”
Now think about this for a moment. It is absolutely impossible to know that based on the data available. This would only be possible if they had access to all the zero days discovered by all actors in that space in a certain time frame. It might be possible to extrapolate this if you'd know how many bugs there are in total on the market - but you don't.
So how does this report solve this? Well, let it speak for itself:
Ideally, we would want similar data on Red (i.e., adversaries of Blue, or other private-use groups), to examine the overlap between Blue and Red, but we could not obtain that data. Instead, we focus on the overlap between Blue and the public (i.e., the teal section in the figures above) to infer what might be a baseline for what Red has. We do this based on the assumption that what happens in the public groups is somewhat similar to what happens in other groups. We acknowledge that this is a weak assumption, given that the composition, focus, motivation, and sophistication of the public and private groups can be fairly different, but these are the only data available at this time. (page 12)
Okay, weak assumption may be the understatement of the year. Let's summarize this: They acknowledge that they can't answer the question they want to answer. So they just answer an entirely different question (bug collision rate between the 207 bugs they have data about and what is known in public) and then claim that's about the same. To their credit they recognize that this is a weak assumption, but you have to read the report to learn that. Neither the summary nor the press release nor any of the favorable blog posts and media reports mention that.
If you wonder what the Red and Blue here means, that's also quite interesting, because it gives some insights about the mode of thinking of the authors. Blue stands for the “own team”, a company or government or anyone else who has knowledge of zero day bugs. Red is “the adversary” and then there is the public. This is of course a gross oversimplification. It's like a world where there are two nation states fighting each other and no other actors that have any interest in hacking IT systems. In reality there are multiple Red, Blue and in-between actors, with various adversarial and cooperative relations between them.
Sometimes the best answer is: We don't know
The line of reasoning here is roughly: If we don't have good data to answer a question, we'll just replace it with bad data.
I can fully understand the call for making decisions based on data. That is usually a good thing. However, it may simply be that this is a scenario where getting reliable data is incredibly hard or simply impossible. In such a situation the best thing one can do is admit that and live with it. I don't think it's helpful to rely on data that's so weak that it's basically meaningless.
The core of the problem is that we're talking about an industry that wants to be secret. This secrecy is in a certain sense in direct conflict with good scientific practice. Transparency and data sharing are cornerstones of good science.
I should mention here that shortly afterwards another study was published by Trey Herr and Bruce Schneier which also tries to answer the question of bug collisions. I haven't read it yet, from a brief look it seems less bad than the RAND report. However I have my doubts about it as well. It is only based on public bug findings, which is at least something that has a chance of being verifiable by others. It has the same problem that one can hardly draw conclusions about the non-public space based on that. (My personal tie in to that is that I had a call with Trey Herr a while ago where he asked me about some of my bug findings. I told him my doubts about this.)
The bigger picture: We need better science
IT security isn't a field that's rich of rigorous scientific data.
There's a lively debate right now going on in many fields of science about the integrity of their methods. Psychologists had to learn that many theories they believed for decades were based on bad statistics and poor methodology and are likely false. Whenever someone tries to replicate other studies the replication rates are abysmal. Smart people claim that the majority scientific outcomes are not true.
I don't see this debate happening in computer science. It's certainly not happening in IT security. Almost nobody is doing replications. Meta analyses, trials registrations or registered reports are mostly unheard of.
Instead we have cargo cult science like this RAND report thrown around as “cold hard data” we should rely upon. This is ridiculous.
I obviously have my own thoughts on the zero days debate. But my opinion on the matter here isn't what this is about. What I do think is this: We need good, rigorous science to improve the state of things. We largely don't have that right now. And bad science is a poor replacement for good science.
Recently RAND Corporation, a US-based think tank, published a report about zero day vulnerabilities. Many people praised it, an article on Motherboard quotes people saying that we finally have “cold hard data” and quoting people from the zero day business who came to the conclusion that this report clearly confirms what they already believed.
I read the report. I wasn't very impressed. The data is so weak that I think the conclusions are almost entirely meaningless.
The story that is spun around this report needs some context: There's a market for secret security vulnerabilities, often called zero days or 0days. These are vulnerabilities in IT products that some actors (government entities, criminals or just hackers who privately collect them) don't share with the vendor of that product or the public, so the vendor doesn't know about them and can't provide a fix.
One potential problem of this are bug collisions. Actor A may find or buy a security bug and choose to not disclose it and use it for its own purposes. If actor B finds the same bug then he might use it to attack actor A or attack someone else. If A had disclosed that bug to the vendor of the software it could've been fixed and B couldn't have used it, at least not against people who regularly update their software. Depending on who A and B are (more or less democratic nation states, nation states in conflict with each other or simply criminals) one can argue how problematic that is.
One question that arises here is how common that is. If you found a bug – how likely is it that someone else will find the same bug? The argument goes that if this rate is low then stockpiling vulnerabilities is less problematic. This is how the RAND report is framed. It tries to answer that question and comes to the conclusion that bug collisions are relatively rare. Thus many people now use it to justify that zero day stockpiling isn't so bad.
The data is hardly trustworthy
The basis of the whole report is an analysis of 207 bugs by an entity that shared this data with the authors of the report. It is incredibly vague about that source. They name their source with the hypothetical name BUSBY.
We can learn that it's a company in the zero day business and indirectly we can learn how many people work there on exploit development. Furthermore we learn: “Some BUSBY researchers have worked for nation-states (so
their skill level and methodology rival that of nation-state teams), and many of BUSBY’s products are used by nation-states.” That's about it. To summarize: We don't know where the data came from.
The authors of the study believe that this is a representative data set. But it is not really explained why they believe so. There are numerous problems with this data:
- We don't know in which way this data has been filtered. The report states that 20-30 bugs “were removed due to operational sensitivity”. How was that done? Based on what criteria? They won't tell you. Were the 207 bugs plus the 20-30 bugs all the bugs the company had found or was this already pre-filtered? They won't tell you.
- It is plausible to assume that a certain company focuses on specific bugs, has certain skills, tools or methods that all can affect the selection of bugs and create biases.
- Oh by the way, did you expect to see the data? Like a table of all the bugs analyzed with the at least the little pieces of information BUSBY was willing to share? Because you were promised to see cold hard data? Of course not. That would mean others could reanalyze the data, and that would be unfortunate. The only thing you get are charts and tables summarizing the data.
- We don't know the conditions under which this data was shared. Did BUSBY have any influence on the report? Were they allowed to read it and comment on it before publication? Did they have veto rights to the publication? The report doesn't tell us.
Naturally BUSBY has an interest in a certain outcome and interpretation of that data. This creates a huge conflict of interest. It is entirely possible that they only chose to share that data because they expected a certain outcome. And obviously the reverse is also true: Other companies may have decided not to share such data to avoid a certain outcome. It creates an ideal setup for publication bias, where only the data supporting a certain outcome is shared.
It is inexcusable that the problem of conflict of interest isn't even mentioned or discussed anywhere in the whole report.
A main outcome is based on a very dubious assumption
The report emphasizes two main findings. One is that the lifetime of a vulnerability is roughly seven years. With the caveat that the data is likely biased, this claim can be derived from the data available. It can reasonably be claimed that this lifetime estimate is true for the 207 analyzed bugs.
The second claim is about the bug collision rate and is much more problematic:
“For a given stockpile of zero-day vulnerabilities, after a year, approximately 5.7 percent have been discovered by an outside entity.”
Now think about this for a moment. It is absolutely impossible to know that based on the data available. This would only be possible if they had access to all the zero days discovered by all actors in that space in a certain time frame. It might be possible to extrapolate this if you'd know how many bugs there are in total on the market - but you don't.
So how does this report solve this? Well, let it speak for itself:
Ideally, we would want similar data on Red (i.e., adversaries of Blue, or other private-use groups), to examine the overlap between Blue and Red, but we could not obtain that data. Instead, we focus on the overlap between Blue and the public (i.e., the teal section in the figures above) to infer what might be a baseline for what Red has. We do this based on the assumption that what happens in the public groups is somewhat similar to what happens in other groups. We acknowledge that this is a weak assumption, given that the composition, focus, motivation, and sophistication of the public and private groups can be fairly different, but these are the only data available at this time. (page 12)
Okay, weak assumption may be the understatement of the year. Let's summarize this: They acknowledge that they can't answer the question they want to answer. So they just answer an entirely different question (bug collision rate between the 207 bugs they have data about and what is known in public) and then claim that's about the same. To their credit they recognize that this is a weak assumption, but you have to read the report to learn that. Neither the summary nor the press release nor any of the favorable blog posts and media reports mention that.
If you wonder what the Red and Blue here means, that's also quite interesting, because it gives some insights about the mode of thinking of the authors. Blue stands for the “own team”, a company or government or anyone else who has knowledge of zero day bugs. Red is “the adversary” and then there is the public. This is of course a gross oversimplification. It's like a world where there are two nation states fighting each other and no other actors that have any interest in hacking IT systems. In reality there are multiple Red, Blue and in-between actors, with various adversarial and cooperative relations between them.
Sometimes the best answer is: We don't know
The line of reasoning here is roughly: If we don't have good data to answer a question, we'll just replace it with bad data.
I can fully understand the call for making decisions based on data. That is usually a good thing. However, it may simply be that this is a scenario where getting reliable data is incredibly hard or simply impossible. In such a situation the best thing one can do is admit that and live with it. I don't think it's helpful to rely on data that's so weak that it's basically meaningless.
The core of the problem is that we're talking about an industry that wants to be secret. This secrecy is in a certain sense in direct conflict with good scientific practice. Transparency and data sharing are cornerstones of good science.
I should mention here that shortly afterwards another study was published by Trey Herr and Bruce Schneier which also tries to answer the question of bug collisions. I haven't read it yet, from a brief look it seems less bad than the RAND report. However I have my doubts about it as well. It is only based on public bug findings, which is at least something that has a chance of being verifiable by others. It has the same problem that one can hardly draw conclusions about the non-public space based on that. (My personal tie in to that is that I had a call with Trey Herr a while ago where he asked me about some of my bug findings. I told him my doubts about this.)
The bigger picture: We need better science
IT security isn't a field that's rich of rigorous scientific data.
There's a lively debate right now going on in many fields of science about the integrity of their methods. Psychologists had to learn that many theories they believed for decades were based on bad statistics and poor methodology and are likely false. Whenever someone tries to replicate other studies the replication rates are abysmal. Smart people claim that the majority scientific outcomes are not true.
I don't see this debate happening in computer science. It's certainly not happening in IT security. Almost nobody is doing replications. Meta analyses, trials registrations or registered reports are mostly unheard of.
Instead we have cargo cult science like this RAND report thrown around as “cold hard data” we should rely upon. This is ridiculous.
I obviously have my own thoughts on the zero days debate. But my opinion on the matter here isn't what this is about. What I do think is this: We need good, rigorous science to improve the state of things. We largely don't have that right now. And bad science is a poor replacement for good science.
(Page 1 of 1, totaling 1 entries)