Grey bar Blue bar
Share this:

Thu, 3 Nov 2011

Squinting at Security Drivers and Perspective-based Biases

While doing some thinking on threat modelling I started examining what the usual drivers of security spend and controls are in an organisation. I've spent some time on multiple fronts, security management (been audited, had CIOs push for priorities), security auditing (followed workpapers and audit plans), pentesting (broke in however we could) and security consulting (tried to help people fix stuff) and even dabbled with trying to sell some security hardware. This has given me some insight (or at least an opinion) into how people have tried to justify security budgets, changes, and findings or how I tried to. This is a write up of what I believe these to be (caveat: this is my opinion). This is certainly not universalisable, i.e. it's possible to find unbiased highly experienced people, but they will still have to fight the tendencies their position puts on them. What I'd want you to take away from this is that we need to move away from using these drivers in isolation, and towards more holistic risk management techniques, of which I feel threat modelling is one (although this entry isn't about threat modelling).

Auditors

The tick box monkeys themselves, they provide a useful function, and are so universally legislated and embedded in best practise, that everyone has a few decades of experience being on the giving or receiving end of a financial audit. The priorities audit reports seem to drive are:

  • Vulnerabilities in financial systems. The whole audit hierarchy was created around financial controls, and so sticks close to financial systems when venturing into IT's space. Detailed and complex collusion possibilities will be discussed when approving payments, but the fact that you can reset anyone's password at the helpdesk is sometimes missed, and more advanced attacks like token hijacking are often ignored.
  • Audit house priorities. Audit houses get driven just like anyone else. While I wasn't around for Enron, the reverberations could still be felt years later when I worked at one. What's more, audit houses are increasingly finding revenue coming from consulting gigs and need to keep their smart people happy. This leads to external audit selling "add-ons" like identity management audits (sometimes, they're even incentivised to).
  • Auditor skills. The auditor you get could be an amazing business process auditor but useless when it comes to infosec, but next year it could be the other way around. It's equally possibly with internal audit. Thus, the strengths of the auditor will determine where you get nailed the hardest.
  • The Rotation plan. This year system X, next year system Y. It doesn't mean system X has gotten better, just that they moved on. If you spend your year responding to the audit on system Y and ignore X, you'll miss vital stuff.
  • Known systems. External and internal auditors don't know IT's business in detail. There could be all sorts of critical systems (or pivot points) that are ignored because they weren't in the "flow of financial information" spread sheet.
Vendors Security vendors are the love to hate people in the infosec world. Thinking of them invokes pictures of greasy salesmen phoning your CIO to ask if your security chumps have even thought about network admission control (true story). On the other hand if you've ever been a small team trying to secure a large org, you'll know you can't do it without automation and at some point you'll need to purchase some products. Their marketing and sales people get all over the place and end up driving controls; whether it's “management by in-flight magazine”, an idea punted at a sponsored conference, or the result of a sales meeting.

But security vendors prioritisation of controls are driven by:

  • New Problems. Security products that work eventually get deployed everywhere they're going to be deployed. They continue to bring in income, but the vendor needs a new bright shiny thing they can take to their existing market and sell. Thus, new problems become new scary things that they can use to push product. Think of the Gartner hype curve. Whatever they're selling, be it DLP, NAC, DAM, APT prevention or IPS if your firewall works more like a switch and your passwords are all "P@55w0rd" then you've got other problems to focus on first.
  • Overinflated problems. Some problems really aren't as big as they're made out to be by vendors, but making them look big is a key part of the sell. Even vendors who don't mean to overinflate end up doing it just because they spend all day thinking of ways to justify (even legitimate) purchases.
  • Products as solutions. Installing a product designed to help with a problem isn't the same as fixing the problem, and vendors aren't great at seeing that (some are). Take patch management solutions, there are some really awesome, mature products out there, but if you can't work out where your machines are, how many there are or get creds to them, then you've got a long way to go before that product starts solving the problem it's supposed to.
Pentesters

Every year around Black Hat Vegas/Pwn2Own/AddYourConfHere time a flurry of media reports hit the public and some people go into panic mode. I remember The DNS bug, where all that was needed was for people to apply a patch, but which, due to the publicity around it, garnered a significant amount of interest from people who it usually wouldn't, and probably shouldn't have cared so much. But many pentesters trade on this publicity; and some pentesting companies use this instead of a marketing budget. That's not their only, or primary, motivation, and in the end things get fixed, new techniques shared and the world a better place. The cynical view then is that some of the motivations for vulnerability researchers, and what they end up prioritising are:

  • New Attacks. This is somewhat similar to the vendors optimising for "new problems" but not quite the same. When Errata introduced Hamster at ToorCon ‘07, I heard tales of people swearing at them from the back. I wasn't there, but I imagine some of the calls were because Layer 2 attacks have been around and well known for over a decade now. Many of us ignored FireSheep for the same reason, even if it motivated the biggest moves to SSL yet. But vuln researchers and the scene aren't interested, it needs to be shiny, new and leet . This focus on the new, and the press it drives, has defenders running around trying to fix new problems, when they haven't fixed the old ones.
  • Complex Attacks. Related to the above, a new attack can't be really basic to do well, it needs to involve considerable skill. When Mark Dowd released his highly complex flash attack, he was rightly given much kudos. An XSS attack on the other hand, was initially ignored by many. However, one lead to a wide class of prevalent vulns, while the other requires you to be, well, Mark Dowd. This mean some of the issues that should be obvious, that underpin core infrastructure, but that aren't sexy, don't get looked at.
  • Shiny Attacks. Some attacks are just really well presented and sexy. Barnaby Jack had an ATM spitting out cash and flashing "Jackpot", that's cool, and it gets a room packed full of people to hear his talk. Hopefully it lead to an improvement in security of some of the ATMs he targeted, but the vulns he exploited were the kinds of things big banks had mostly resolved already, and how many people in the audience actually worked in ATM security? I'd be interested to see if the con budget from banks increased the year of his talk, even if they didn't, I suspect many a banker went to his talk instead of one that was maybe talking about a more prevalent or relevant class of vulnerabilities their organisation may experience. Something Thinkst says much better here.
Individual Experience

Unfortunately, as human beings, our decisions are coloured by a bunch of things, which cause us to make decisions either influenced or defined by factors other than the reality we are faced with. A couple of those lead us to prioritising different security motives if decision making rests solely with one person:

  • Past Experience. Human beings develop through learning and consequences. When you were a child and put your hand on a stove hot plate, you got burned and didn't do it again. It's much the same every time you get burned by a security incident, or worse, internal political incident. There's nothing wrong with this, and it's why we value experience; people who've been burned enough times not to let mistakes happen again. However, it does mean time may be spent preventing a past wrong, rather than focusing on the most likely current wrong. For example, one company I worked with insisted on an overly burdensome set of controls to be placed between servers belonging to their security team and the rest of the company network. The reason for this was due to a previous incident years earlier, where one of these servers had been the source of a Slammer outbreak. While that network was never again a source of a virus outbreak, their network still got hit by future outbreaks from normal users, via the VPN, from business partners etc. In this instance, past experience was favoured over a comprehensive approach to the actual problem, not just the symptom.
  • New Systems. Usually, the time when the most budget is available to work on a system is during its initial deployment. This is equally true of security, and the mantra is for security to be built in at the beginning. Justifying a chunk of security work on the mainframe that's been working fine for the last 10 years on the other hand is much harder, and usually needs to hook into an existing project. The result is that it's easier to get security built into new projects than to force an organisation to make significant “security only” changes to existing systems. The result in those that present the vulnerabilities pentesters know and love get less frequently fixed.
  • Individual Motives. We're complex beings with all sorts of drivers and motivations, maybe you want to get home early to spend some time with your kids, maybe you want to impress Bob from Payroll. All sorts of things can lead to a decision that isn't necessarily the right security one. More relevantly however, security tends to operate in a fairly segmented matter, while some aspects are “common wisdom”, others seem rarely discussed. For example, the way the CISO of Car Manufacturer A and the CISO of Car Manufacturer B set up their controls and choose their focus could be completely different, but beyond general industry chit-chat, there will be little detailed discussion of how they're securing integration to their dealership network. They rely on consultants, who've seen both sides for that. Even then, one consultant may think that monitoring is the most important control at the moment, while another could think mobile security is it.
So What?

The result of all of this is that different companies and people push vastly different agendas. To figure out a strategic approach to security in your organisation, you need some objective risk based measurement that will help you secure stuff in an order that mirrors the actual risk to your environment. While it's still a black art, I believe that Threat Modelling helps a lot here, a sufficiently comprehensive methodology that takes into account all of your infrastructure (or at least admits the existence of risk contributed by systems outside of a “most critical” list) and includes valid perspectives from above tries to provide an objective version of reality that isn't as vulnerable to the single biases described above.

Fri, 28 Oct 2011

Metricon 2011 Summary

[I originally wrote this blog entry on the plane returning from BlackHat, Defcon & Metricon, but forgot to publish it. I think the content is still interesting, so, sorry for the late entry :)]

I've just returned after a 31hr transit from our annual US trip. Vegas, training, Blackhat & Defcon were great, it was good to see friends we only get to see a few times a year, and make new ones. But on the same trip, the event I most enjoyed was Metricon. It's a workshop held at the Usenix security conference in San Francisco, run by a group of volunteers from the security metrics mailing list and originally sparked by Andrew Jacquith's seminal book Security Metrics.

There were some great talks, and interactions, the kind you only get at small groupings around a specific set of topics. It was a nice break from the offensive sec of BH & DC to listen to a group of defenders. The talks I most enjoyed (they were all recorded bar a few private talks) were the following:

Wendy Nather — Quantifying the Unquantifiable, When Risk Gets Messy

Wendy looked at the bad metrics we often see, and provided some solid tactical advice on how to phrase (for input) and represent (for output) metrics. As part of that arc, she threw out more pithy phrases that even the people in the room tweeting could keep up with. From introducing a new phrase for measuring attacker skill, "Mitnicks", to practical experience such as how a performance metric phrase as 0-100 had sysadmins aiming for 80-90, but inverting it had them aiming for 0 (her hypothesis, is that school taught us that 100% was rarely achievable). Frankly, I could write a blog entry on her talk alone.

Josh Corman - "Shall we play a game?" and other questions from Joshua

Josh tried to answer the hard question of "why isn't security winning". He avoided the usual complaints and had some solid analysis that got me thinking. In particular the idea of how PCI is the "No Child Left Behind" act for security, which not only targeted those that had been negligent, but also encouraged those who hadn't to drop their standards. "We've huddled around the digital dozen, and haven't moved on." He went on to talk about how controls decay as attacks improve, but our best practice advice doesn't. "There's a half-life to our advice". He then provided a great setup for my talk "What we are doing, is very different from how people were exploited."

Jake Kouns - Cyber Liability Insurance

Jake has taken security to what we already knew it was, an insurance sale ;) Jokes aside, Jake is now a product manager for cyber-liability insurance at Merkel. He provided some solid justifications for such insurance, and opened my eyes to the fact that it is now here. The current pricing is pretty reasonable (e.g. $1500 for $1million in cover). Most of the thinking appeared to target small to medium organisations, that until now have only really had "use AV & pray" as their infosec strategy, and I'd love to hear some case-studies from large orgs that are using it & have claimed. He also spoke about how it could become a "moral hazard" where people choose to insure rather than implement controls, and the difficulties the industry could face, but that right now work as incentives for us (the cost of auditing a business will be more than the insurance is worth). His conclusion, which seemed solid, is why spend $x million on the "next big sec product" when you could spend less & get more on insurance. Lots of questions left, but it looks like it may be time to start investigating.

Allison Miller - Applied Risk Analytics

I really enjoyed Allison and Itai's talk. They looked at practical methodologies for developing risk metrics and coloured them with great examples. The process they presented was the following:

  1. Target - You need to figure out what you want to measure. Allison recommended aiming for "yes/no" questions rather than more open ended questions such as "Are we at risk"
  2. Find Data, Create Variables - Once you know what you're tying to look at, you need to find appropriate data, and work out what the variables are from it.
  3. Data Prep - "Massaging the data" tasks such as normalising, getting it into a computable format etc.
  4. Model Training - Pick an algorithm, send the data through it and see what comes out. She suggested using a couple, and pitting them against each other.
  5. Assessment - Check the output, what is the "Catch vs False Positive vs False Negative" rate. Even if you have FP & FNs, sometimes, weighting the output to give you one failure of a particular type could still be useful.
  6. Deployment - Building intelligence to take automated responses once the metric is stable
The example they gave was to look for account takeovers stemming from the number of released e-mail/password combos recently. Itai took us through each step and showed us how they were eventually able to automate the decision making of the back of a solid metric.

Conclusion

I found the conference refreshing, with a lot of great advice (more than the little listed above). Too often we get stuck in the hamster wheels of pain, and it's nice to think we may be able to slowly take a step off. Hopefully we'll be back next year.

Wed, 8 Jun 2011

Threat Modeling vs Information Classification

Over the last few years there has been a popular meme talking about information centric security as a new paradigm over vulnerability centric security. I've long struggled with the idea of information-centricity being successful, and in replying to a post by Rob Bainbridge, quickly jotted some of those problems down.

In pre-summary, I'm still sceptical of information-classification approaches (or information-led control implementations) as I feel they target a theoretically sensible idea, but not a practically sensible one.

Information gets stored in information containers (to borrow a phrase from Octave) such as the databases or file servers. This will need to inherit a classification based on the information it stores. That's easy if it's a single purpose DB, but what about a SQL cluster (used to reduce processor licenses) or even end-user machines? These should be moved up the classification chain because they may store some sensitive info, even if they spend the majority of the time pushing not-very-sensitive info around. In the end, the hoped-for cost-saving-and-focus-inducing prioritisation doesn't occur and you end up having to deploy a significantly higher level of security to most systems. Potentially, you could radically re-engineer your business to segregate data into separate networks such as some PCI de-scoping approaches suggest, but, apart from being a difficult job, this tends to counter many of the business benefits of data and system integrations that lead to the cross-pollination in the first place.

Next up, I feel this fails to take cognisance of what we call "pivoting"; the escalation of privileges by moving from one system or part of a system to another. I've seen situations when the low criticality network monitoring box is what ends up handing out the domain administrator password. It had never been part of internal/external audits scope, none of the vulns showed up on your average scanner, it had no sensitive info etc. Rather, I think we need to look at physical, network and trust segregation between systems, and then data. It would be nice to go data-first, but DRM isn't mature (read simple & widespread) enough to provide us with those controls.

Lastly, I feel information-led approaches often end up missing the value of raw functionality. For example, a critical trade execution system at an investment bank could have very little sensitive data stored on it, but the functionality it provides (i.e. being able to execute trades using that bank's secret sauce) is hugely sensitive and needs to be considered in any prioritisation.

I'm not saying I have the answers, but we've spent a lot of time thinking about how to model how our analysts attack systems and whether we could "guess" the results of multiple pentests across the organisation systematically, based on the inherent design of your network, systems and authentication. The idea is to use that model to drive prioritisation, or at least a testing plan. This is probably closer aligned to the idea of a threat-centric approach to security, and suffers from a lack of data in this area (I've started some preliminary work on incorporating VERIS metrics).

In summary, I think information-centric security fails in three ways; by providing limited prioritiation due to the high number of shared information containers in IT environments, by not incorporating how attackers move through a networks and by ignoring business critical functionality.

Sun, 29 May 2011

Incorporating cost into appsec metrics for organisations

A longish post, but this wasn't going to fit into 140 characters. This is an argument pertaining to security metrics, with a statement that using pure vulnerability count-based metrics to talk about an organisation's application (in)security is insufficient, and suggests an alternative approach. Comments welcome.

Current metrics

Metrics and statistics are certainly interesting (none of those are infosec links). Within our industry, Verizon's Data Breach Investigations Report (DBIR) makes a splash each year, and Veracode are also receiving growing recognition for their State of Software Security (SOSS). Both are interesting to read and contain much insight. The DBIR specifically examines and records metrics for breaches, a post-hoc activity that only occurs once a series of vulnerabilities have been found and exploited by ruffians, while the SOSS provides insight into the opposing end of a system's life-cycle by automatically analysing applications before they are put into production (in a perfect world... no doubt they also examine apps that are already in production). Somewhat tangentially, Dr Geer wrote recently about a different metric for measuring the overall state of Cyber Security, we're currently at a 1021.6. Oh noes!

Apart from the two bookends (SOSS and DBIR), other metrics are also published.

From a testing perspective, WhiteHat releases perhaps the most well-known set of metrics for appsec bugs, and in years gone by, Corsaire released statistics covering their customers. Also in 2008, WASC undertook a project to provide metrics with data sourced from a number of companies, however this too has not seen recent activity (last edit on the site was over a year ago). WhiteHat's metrics measure the number of serious vulnerabilities in each site (High, Critical, Urgent) and then slice and dice this based on the vulnerability's classification, the organisation's size, and the vertical within which they lie. WhiteHat is also in the fairly unique position of being able to record remediation times with a higher granularity than appsec firms that engage with customers through projects rather than service contracts. Corsaire's approach was slightly different; they recorded metrics in terms of the classification of the vulnerability, its impact and the year within which the issue was found. Their report contained similar metrics to the WhiteHat report (e.g. % of apps with XSS), but the inclusion of data from multiple years permitted them to extract trends from their data. (No doubt WhiteHat have trending data, however in the last report it was absent). Lastly, WASC's approach is very similar to WhiteHat's, in that a point in time is selected and vulnerability counts according to impact and classification are provided for that point.

Essentially, each of these approaches uses a base metric of vulnerability tallies, which are then viewed from different angles (classification, time-series, impact). While the metrics are collected per-application, they are easily aggregated into organisations.

Drawback to current approaches

Problems with just counting bugs are well known. If I ask you to rate two organisations, the Ostrogoths and the Visigoths, on their effectiveness in developing secure applications, and I tell you that the Ostrogoths have 20 critical vulnerabilities across their applications, while the Visigoths only have 5, without further data it seems that the Visigoths have the lead. However, if we introduce the fact that the Visigoths have a single application in which all 5 issues appear, while the Ostrogoths spread their 20 bugs across 10 applications, then it's not so easy to crow for the Visigoths, who average 5 bugs per application as oppossed to the Ostrogoth's 2. Most reports take this into account, and report on a percentage of applications that exhibit a particular vulnerability (also seen as the probability that a randomly selected application will exhibit that issue). Unfortunately, even taking into account the number of applications is not sufficient; an organisation with 2 brochure-ware sites does not face the same risk as an organisation with 2 transaction-supporting financial applications, and this is where appsec metrics start to fray.

In the extreme edges of ideal metrics, the ability to factor in chains of vulnerabilities that individually present little risk, but combined is greater than the sum of the parts, would be fantastic. This aspect is ignored by most (including us), as a fruitful path isn't clear.

Why count in the first place?

Let's take a step back, and consider why we produce metrics; with the amount of data floating around, it's quite easy to extract information and publish, thereby earning a few PR points. However, are the metrics meaningful? The quick test is to ask whether they support decision making. For example, does it matter that external attackers were present in an overwhelming number incidents recorded in the DBIR? I suspect that this is an easy "yes", since this metric justifies shifting priorities to extend perimeter controls rather than rolling out NAC.

One could just as easily claim that absolute bug counts are irrelevant and that they need to be relative to some other scale; commonly the number of applications an organisation has. However in this case, if the metrics don't provide enough granularity to accurately position your organisation with respect to others that you actually care about, then they're worthless to you in decision making. What drives many of our customers is not where they stand in relation to every other organisation, but specifically their peers and competitors. It's slightly ironic that oftentimes the more metrics released, the less applicable they are to individual companies. As a bank, knowing you're in the top 10% of a sample of banking organisations means something; when you're in the highest 10% of a survey that includes WebGoat clones, the results are much less clear.

In Seven Myths About Information Security Metrics, Dr Hinson raises a number of interesting points about security metrics. They're mostly applicable to security awareness, however they also carry across into other security activities. At least two serve my selfish needs, so I'll quote them here:

Myth 1: Metrics must be “objective” and “tangible”

There is a subtle but important distinction between measuring subjective factors and measuring subjectively. It is relatively easy to measure “tangible” or objective things (the number of virus incidents, or the number of people trained). This normally gives a huge bias towards such metrics in most measurement systems, and a bias against measuring intangible things (such as level of security awareness). In fact, “intangible” or subjective things can be measured objectively, but we need to be reasonably smart about it (e.g., by using interviews,surveys and audits). Given the intangible nature of security awareness, it is definitely worth putting effort into the measurement of subjective factors, rather than relying entirely on easy-to-measure but largely irrelevant objective factors. [G Hinson]

and

Myth 3: We need absolute measurements

For some unfathomable reason, people often assume we need “absolute measures”—height in meters, weight in pounds, etc. This is nonsense!
If I line up the people in your department against a wall, I can easily tell who is tallest, with no rulers in sight. This yet again leads to an unnecessary bias in many measurement systems. In fact, relative values are often more useful than absolute scales, especially to drive improvement. Consider this for instance: “Tell me, on an (arbitrary) scale from one to ten, how security aware are the people in your department are? OK, I'll be back next month to ask you the same question!” We need not define the scale formally, as long as the person being asked (a) has his own mental model of the processes and (b) appreciates the need to improve them. We needn't even worry about minor variations in the scoring scale from month to month, as long as our objective of promoting improvement is met. Benchmarking and best practice transfer are good examples of this kind of thinking. “I don't expect us to be perfect, but I'd like us to be at least as good as standard X or company Y. [G Hinson]

While he writes from the view of an organisation trying to decide whether their security awareness program is yielding dividends, the core statements are applicable for organisations seeking to determine the efficacy of their software security program. I'm particularly drawn by two points: the first is that intangibles are as useful as concrete metrics, and the second is that absolute measurements aren't necessary, comparative ordering is sometimes enough.

Considering cost

It seems that one of the intangibles that currently published appsec metrics don't take into account, is cost to the attacker. No doubt behind each vulnerability's single impact rating are a multitude of factors that contribute, one of which may be something like "Complexity" or "Ease of Exploitation". However, measuring effort in this way is qualitative and only used as a component in the final rating. I'm suggesting that cost (interchangeable with effort) be incorporated into the base metric used when slicing datasets into views. This will allow you to understand the determination an attacker would require when facing one of your applications. Penetration testing companies are in a unique position to provide this estimate; a tester unleashed on an application project is time-bounded and throws their experience and knowledge at the app. At the end, one can start to estimate how much effort was required to produce the findings and, over time, gauge whether your testers are increasing their effort to find issues (stated differently, do they find fewer bugs in the same amount of time?). If these metrics don't move in the right direction, then one might conclude that your security practices are also not improving (providing material for decision making).

Measuring effort, or attacker cost, is not new to security but it's mostly done indirectly through the sale of exploits (e.g. iDefence, ZDI). Even here, effort is not directly related to the purchase price, which is also influenced by other factors such as the number of deployed targets etc. In any case, for custom applications that testers are mostly presented with, such public sources should be of little help (if your testers are submitting findings to ZDI, you have bigger problems). Every now and then, an exploit dev team will mention how long it took them to write an exploit for some weird Windows bug; these are always interesting data points, but are not specific enough for customers and the sample size is low.

Ideally, any measure of an attacker's cost can take into account both time and their exclusivity (or experience), however in practice this will be tough to gather from your testers. One could base it on their hourly rate, if your testing company differentiates between resources. In cases where they don't, or you're seeking to keep the metric simple, then another estimate for effort is the number of days spent on testing.

Returning to our sample companies, if the 5 vulnerabilities exposed in the Visigoth's each required, on average, a single day to find, while the Ostrogoth's 20 bugs average 5 days each, then the effort required by an attacker is minimised by choosing to target the Visigoths. In other words, one might argue that the Visigoths are more at risk than the Ostrogoths.

Metricload, take 1

In our first stab at incorporating effort, we selected an estimator of findings-per-day (or finding rate) to be the base metric against which the impact, classification, time-series and vertical attributes would be measured. From this, it's apparent that, subject to some minimum, the number of assessments performed is less important than the number of days worked. I don't yet have a way to answer what the minimum number of assessments should be, but it's clear that comparing two organisations where one has engaged with us 17 times and the other once, won't yield reliable results.

With this base metric, it's then possible to capture historical assessment data and provide both internal-looking metrics for an organisation as well as comparative metrics, if the testing company is also employed by your competitors. Internal metrics are the usual kinds (impact, classification, time-series), but the comparison option is very interesting. We're in the fortunate position of working with many top companies locally, and are able to compare competitors using this metric as a base. The actual ranking formulae is largely unimportant here. Naturally, data must be anonymised so as to protect names; one could provide the customer with their rank only. In this way, the customer has an independent notion of how their security activities rate against their peers without embarrassing the peers.

Inverting the findings-per-day metric provide the average number of days to find a particular class of vulnerability, or impact level. That is, if a client averages 0.7 High or Critical findings per testing day, then on average it takes us 1.4 days of testing to find an issue of great concern, which is an easy way of expressing the base metric.

What, me worry?

Without doubt, the findings-per-day estimator has drawbacks. For one, it doesn't take into consideration the tester's skill level (but this is also true of all appsec metrics published). This could be extended to include things like hourly rates, which indirectly measure skill. Also, the metric does not take into account functionality exposed by the site; if an organisation has only brochure-ware sites then it's unfair to compare them against transactional sites; this is mitigated at the time of analysis by comparing against peers rather than the entire sample group and also, to a degree, in the scoping of the project as a brochure-ware site would receive minimum testing time if scoped correctly.

As mentioned above, a minimum number of assessments would be needed before the metric is reliable; this is a hint at the deeper problems that randomly selected project days are not independent. An analyst stuck on a 4 week project is focused on a very small part of the broader organisation's application landscape. We counter this bias by including as many projects of the same type as possible.

Thought.rand()

If you can tease it out of them, finding rates could be an interesting method of comparing competing testing companies; ask "when testing companies of our size and vertical, what is your finding rate?", though there'd be little way to verify any claims. Can you foresee a day when testing companies advertise using their finding rate as the primary message? Perhaps...

This metric would also be very useful to include in each subsequent report for the customer, with every report containing an evaluation against their longterm vulnerability averages.

Field testing

Using the above findings-per-day metric as a base, we performed an historical analysis for a client on work performed over a number of years, with a focus on answering the following questions for them:
  1. On average, how long does it take to find issues at each Impact level (Critical down to Informational)?
  2. What are the trends for the various vulnerability classes? Does it take more or less time to find them year-on-year?
  3. What are the Top 10 issues they're currently facing?
  4. Where do they stand in relation to anonymised competitor data?
In preparation for the exercise, we had to capture a decent number of past reports, which was most time-consuming. What this highlighted for us was how paper-based reports and reporting is a serious hinderance to extracting useful data, and has provided impetus internally for us to look into alternatives. The derived statistics were presented to the client in a workshop, with representatives from a number of the customer's teams present. We had little insight into the background to many of the projects, and it was very interesting to hear the analysis and opinions that emerged as they digested the information. For example, one set of applications exhibited particularly poor metrics from a security standpoint. Someone highlighted the fact that these were outsourced applications, which raised a discussion within the client about the pros and cons on using third party developers. It also suggests that many further attributes can be attached to the data that is captured: internal or third party, development lifecycle model (is agile producing better code for you than other models?), team size, platforms, languages, frameworks etc.

As mentioned above, a key test for metrics is where they support decision making, and the feedback from the client was positive in this regard.

And now?

In summary, current security metrics as they relate to talking about an organisation's application security suffers from a resolution problem; they're not clear enough. Attacker effort is not modeled when discussing vulnerabilities, even though it's a significant factor when trying to get a handle on the ever slippery notion of risk. One approximation for attacker effort is to create a base-metric of the number of findings-per-day for a broad set of applications belonging to an organisation, and use those to evaluate which kinds of vulnerabilities are typically present while at the same time clarifying how much effort an attacker requires in order to exploit it.

This idea is still being fleshed out. If you're aware of previous work in this regard or have suggestions on how to improve it (even abandon it) please get in contact.

Oh, and if you've read this far and are looking for training, we're at BH in August.

Sat, 7 Aug 2010

BlackHat Write-up: go-derper and mining memcaches

[Update: Disclosure and other points discussed in a little more detail here.]

Why memcached?

At BlackHat USA last year we spoke about attacking cloud systems, while the thinking was broadly applicable, we focused on specific providers (overview). This year, we continued in the same vein except we focused on a particular piece of software used in numerous large-scale application including many cloud services. In the realm of "software that enables cloud services", there appears to be a handful of "go to" applications that are consistently re-used, and it's curious that a security practitioner's perspective has not as yet been applied to them (disclaimer: I'm not aware of parallel work).

We choose to look at memcached, a "Free & open source, high-performance, distributed memory object caching system" 1. It's not outwardly sexy from a security standpoint and it doesn't have a large and exposed codebase (total LOC is a smidge over 11k). However, what's of interest is the type of applications in which memcached is deployed. Memcached is most often used in web application to speed up page loads. Sites are almost2 always dynamic and either have many clients (i.e. require horizontal scaling) or process piles of data (look to reduce processing time), or oftentimes both. This implies that the sites that use memcached contain more interesting info than simple static sites, and are an indicator of a potentially interesting site. Prominent users of memcached include LiveJournal (memcached was originally written by Brad Fitzpatrick for LJ), Wikipedia, Flickr, YouTube and Twitter.

I won't go into how memcached works, suffice it to say that since data tends to be read more often than written in common use cases the idea is to pre-render and store the finalised content inside the in-memory cache. When future requests ask for the page or data, it doesn't need to be regenerated but can be simply regurgitated from the cache. Their Wiki contains more background.

go-derper

We released go-derper, a tool for playing with memcached instances. It supports three basic modes of operations:
  1. Fingerprinting memcacheds to determine interesting servers
  2. Extracting a (user-limited) copy of the cache
  3. Writing data into the cache
The tool has minor requirements: a recent Ruby and the memcache-client gem. What follows are basic use cases.

Fingerprinting

Let's assume you've scanned a hosting provider and found 239 potential targets using a basic .nse that hunts down open memcached instances3. You need to separate the wheat from the chaff and figure out which servers are potentially interesting; one way to do that is by extracting a bunch of metrics from each cache. Start small against one cache: insurrection:demo marco$ ruby go-derper.rb -f x.x.x.x [i] Scanning x.x.x.x x.x.x.x:11211 ============================== memcached 1.4.5 (1064) up 54:10:01:27, sys time Wed Aug 04 10:34:36 +0200 2010, utime=369388.17, stime=520925.98 Mem: Max 1024.00 MB, max item size = 1024.00 KB Network: curr conn 18, bytes read 44.69 TB, bytes written 695.93 GB Cache: get 514, set 93.41b, bytes stored 825.73 MB, curr item count 1.54m, total items 1.54m, total slabs 3 Stats capabilities: (stat) slabs settings items (set) (get)

44 terabytes read from the cache in 54 days with 1.5 million items stored? This cache is used quite frequently. There's an anomaly here in that the cache reports only 514 reads with 93 billion writes; however it's still worth exploring if only for the size.

We can run the same fingerprint scan against multiple hosts using

ruby go-derper.rb -f host1,host2,host3,...,hostn

or, if the hosts are in a file (one per line):

ruby go-derper.rb -F file_with_target_hosts

Output is either human-readable multiline (the default), or CSV. The latter helps for quickly rearranging and sorting the output to determine potential targets, and is enabled with the "-c" switch:

ruby go-derper.rb -c csv -f host1,host2,host3,...,hostn

Lastly, the monitor mode (-m) will loop forever while retrieving certain statistics and keep track of differences between iterations, in order to determine whether the cache appears to be in active use.

Mining

Once you've identified a potentially interesting target, it's time to mine that cache. The basic leach switch is "-l":

insurrection:demo marco$ ruby go-derper.rb -l -s x.x.x.x [w] No output directory specified, defaulting to ./output [w] No prefix supplied, using "run1"

This will extract data from the cache in the form of a key and its value, and save the value in a file under the "./output" directory by default (if this directory doesn't exist then the tool will exit so make sure it's present.) This means a separate file is created for every retrieved value. Output directories and file prefixes are adjustable with "-o" and "-r" respectively, however it's usually safe to leave these alone.

By default, go-derper fetches 10 keys per slab (see the memcached docs for a discussion on slabs; basically similar-sized entries are grouped together.) This default is intentionally low; on an actual assessment this could run into six figures. Use the "-K" switch to adjust:

ruby go-derper.rb -l -K 100 -s x.x.x.x

As mentioned, retrieved data is stored in the "./ouput" directory (or elsewhere if "-o" is used). Within this directory, each new run of the tool produces a set of files prefixed with "runN" in order to keep multiple runs separate. The files produced are:

  • runN-index, an index file containing metadata about each entry retrieved
  • runN-<md5>, a file containing the bytestream from a retrieved value
The mapping between key and file in which the value is stored occurs in the index file, which is useful in that potentially malicious data (keynames) aren't used when interacting with your local filesystem APIs.

At this point, there will (hopefully) be a large number of files in your output directory, which may contain useful info. Start grepping.

What we found with a bit of field experience was that mining large caches can take some time, and repeating grep gets quite boring. The tool permits you to supply your own set of regular expressions which will be applied to each retrieved value; matches are printed to the screen and this provides a scroll-by view of bits of data that may pique your interest (things like URLs, email addresses, session IDs, strings starting with "user", "pass" or "auth", cookies, IP addresses etc). The "-R" switch enables this feature and takes a file containing regexes as its sole argument:

ruby go-derper.rb -l -K 100 -R regexs.txt -s x.x.x.x

Over-writing

In this blog entry I don't cover the kinds of data we discovered (it'll be subject to a separate entry), however it may come to pass that you discover an interesting cache entry that you'd like to overwrite. Recall entries were stored in "./output" by default, with a prefix of "runN". If the interesting entry was stored in "output/run1-e94aae85bd3469d929727bee5009dddd", edit the file in whatever manner you see fit and save it to your local disk. Then, tell go-derper to write the entry back into the cache with:

ruby go-derper.rb -w output/run1-e94aae85bd3469d929727bee5009dddd

This syntax is simple since go-derper will figure out the target server and key from the run's index file.

And so?

Go-derper permits basic manipulations of a memcached instance. We haven't covered finding open instances or the kinds of data one may come across; these will be the subject of followup posts. Below are the slides from the talk, click through to SlideShare for the downloadable PDF.
1 http://www.memcached.org

2 We're hedging here, but we've not come across a static memcached site.

3 If so, you may be as surprised as we were in finding this many open instances.