In our recent memcached investigations (a blog post is still in the wings) we came across numerous caches storing serialized data. The caches were not homogenous and so the data was quite varied: Java objects, ActiveRecord objects from RoR, JSON, pre-rendered HTML, .Net serialized objects and serialized Python objects. Serialized objects can be useful to an attacker from a number of standpoints: such objects could expose data where naive developers make use of the objects to hold secrets and rely on the user to proxy the objects to various parts of an application. In addition, altering serialized objects could impact on the deserialization process, leading to compromise of the system on which the deserialization takes place.
In all the caches we examined, the most common data format found (apart from HTML snippets) was serialized Python and this prompted a brief investigation into the possible attacks against serialized Python objects. We've put together a couple of posts explaining how one might go about exploiting Pickle strings; the obvious vector is memcached however anytime Pickle strings are passed to an untrusted party the attacks described here become useful.
Python implements a default serialization technique called Pickle. Now I don't pretend to be a Pickle expert; Python is not my script language of choice for starters and serialization is not particularly interesting subject to me, however seeing the following in any docs is cause for further digging:
A little further down the same page, we find a trivial example of how to execute code from a Pickle stream and a quick Google leads to a blog post in which Pickle insecurities are fleshed out in more detail. Both are worthwhile reads.
From these sources emerge the following factoids:
This last point was particularly intriguing; developers purposely removed any semblance of security from the depickling mechanism and exhort users to never deserialize untrusted data. However, the memcached work showed that if one could find memcached instances, it was possible to overwrite data within the cache trivially. If data inside a cache was comprised of Pickle strings, then by overwriting them an attacker is able to inject untrusted Pickle objects into a deserialization operation.
We've had a bit of fun with this seeing how far it can be pushed and over the coming days, I'll post three more entries on this topic. In the mean time, here's some background and a few simple examples to get things going.
In order to understand the Pickle objects below, you'll need to follow a few basic opcodes and their arguments:
Testing out Pickle objects is pretty simple:
An important instruction is the MARK opcode "(", which is used to signify frames on the stack. It is normally used in conjunction with opcodes that have to pop multiple objects off the stack, for example opcodes that build lists, tuples or dicts. The two examples below show how a list and a tuple are produced:
The canonical example given in a number of places including the official Python docs as to why unpickling untrusted data is bad is:
(S'echo hello world'
The intent is clear however the interesting bit is twofold: decoding the instructions used and realizing that for an attacker, "hello world" isn't all that useful. In the next post I'll introduce the basics behind calling functions and see whether we can extend the canonical example into something a little more evil.