A cloud storage service such as Microsoft SkyDrive requires building data centers as well as operational and maintenance costs. An alternative approach is based on distributed computing model which utilizes portion of the storage and processing resources of consumer level computers and SME NAS devices to form a peer to peer storage system. The members contribute some of their local storage space to the system and in return receive "online backup and data sharing" service. Providing data confidentiality, integrity and availability in such de-centerlized storage system is a big challenge to be addressed. As the cost of data storage devices declines, there is a debate that whether the P2P storage could really be cost saving or not. I leave this debate to the critics and instead I will look into a peer to peer storage system and study its security measures and possible issues. An overview of this system's architecture is shown in the following picture:
Each node in the storage cloud receives an amount of free online storage space which can be increased by the control server if the node agrees to "contribute" some of its local hard drive space to the system. File synchronisation and contribution agents that are running on every node interact with the cloud control server and other nodes as shown in the above picture. Folder/File synchronisation is performed in the following steps:

1) The node authenticates itself to the control server and sends file upload request with file meta data including SHA1 hash value, size, number of fragments and file name over HTTPS connection.
2) The control server replies with the AES encryption key for the relevant file/folder, a [IP Address]:[Port number] list of contributing nodes called "endpoints list" and a file ID.
3) The file is split into blocks each of which is encrypted with the above AES encryption key. The blocks are further split into 64 fragments and redundancy information also gets added to them.
4) The node then connects to the contribution agent on each endpoint address that was received in step 2 and uploads one fragment to each of them
Since the system nodes are not under full control of the control server, they fall offline any time or the stored file fragments may become damaged/modified intentionally. As such, the control server needs to monitor node and fragment health regularly so that it may move lost/damaged fragments to alternate nodes if need be. For this purpose, the contribution agent on each node maintains an HTTPS connection to the control server on which it receives the following "tasks":
a) Adjust settings : instructs the node to modify its upload/download limits , contribution size and etc
b) Block check : asks the node to connect to another contribution node and verify a fragment existence and hash value
c) Block Recovery : Assist the control server to recover a number of fragments
By delegating the above task, the control system has placed some degree of "trust" or at least "assumptions" about the availability and integrity of the agent software running on the storage cloud nodes. However, those agents can be manipulated by malicious nodes in order to disrupt cloud operations, attack other nodes or even gain unauthorised access to the distributed data. I limited the scope of my research to the synchronisation and contribution agent software of two storage nodes under my control - one of which was acting as a contribution node. I didn't include the analysis of the encryption or redundancy of the system in my preliminary research because it could affect the live system and should only be performed on a test environment which was not possible to set up, as the target system's control server was not publicly available. Within the contribution agent alone, I identified that not only did I have unauthorised access file storage (and download) on the cloud's nodes, but I had unauthorised access to the folder encryption keys as well.
a) Unauthorised file storage and download
The contribution agent created a TCP network listener that processed commands from the control server as well as requests from other nodes. The agent communicated over HTTP(s) with the control server and other nodes in the cloud. An example file fragment upload request from a remote node is shown below:
Uploading fragments with similar format to the above path name resulted in the "bad request" error from the agent. This indicated that the fragment name should be related to its content and this condition is checked by the contribution agent before accepting the PUT request. By decompiling the agent software code, it was found that the fragment name must have the following format to pass this validation:
<SHA1(uploaded content)>.<Fragment number>.<Global Folder Id>
I used the above file fragment format to upload notepad.exe to the remote node successfully as you can see in the following figure:
The download request (GET request) was also successful regardless of the validity of "Global Folder Id" and "Fragment Number". The uploaded file was accessible for about 24 hours, until it was purged automatically by the contribution agent, probably because it won't receive any "Block Check" requests for the control server for this fragment. Twenty four hours still is enough time for malicious users to abuse storage cloud nodes bandwidth and storage to serve their contents over the internet without victim's knowledge.
b) Unauthorised access to folder encryption keys
The network listener responded to GET requests from any remote node as mentioned above. This was intended to serve "Block Check" commands from the control server which instructs a node to fetch a number of fragments from other nodes (referred to as "endpoints") and verify their integrity but re-calculating the SHA1 hash and reporting back to the control server. This could be part of the cloud "health check" process to ensure that the distributed file fragments are accessible and not tampered with. The agent could also process "File Recovery" tasks from the control server but I didn't observe any such command from the control server during the dynamic analysis of the contribution agent, so I searched the decompiled code for clues on the file recovery process and found the following code snippet which could suggest that the agent is cable of retrieving encryption keys from the control server. This was something odd, considering that each node should only have access to its own folders encryption keys and it stores encrypted file fragments of other nodes.

Conclusion:
While peer to peer storage systems have lower setup/maintenance costs, they face security threats from the storage nodes that are not under direct physical/remote control of the cloud controller system. Examples of such threats relate to the cloud's client agent software and the cloud server's authorisation control, as demonstrated in this post. While analysis of the data encryption and redundancy in the peer to peer storage system would be an interesting future research topic, we hope that the findings from this research can be used to improve the security of various distributed storage systems.
As we grow and operate on a number of continents, so does our dependence on a rock-solid IT infrastructure. We are expanding our repertoire to include a greater collection of Linux/Open Source/Windows and OS X products. With this, we are on the look-out for a rock star to wrangle control of our internal networks, external cloud infrastructure and help us us utilise technology in a way to make us even better.
Job Title: IT Network Packet Wrangling Penguin Master
Salary Range: Industry standard, commensurate with experience
Location: Johannesburg/Pretoria, South Africa
Real Responsibilities:
When performing spear phishing attacks, the more information you have at your disposal, the better. One tactic we thought useful was this Skype security flaw disclosed in the early days of 2012 (discovered by one of the Skype engineers much earlier).
For those who haven't heard of it - this vulnerability allows an attacker to passively disclose victims external, as well as internal, IP addresses in a matter of seconds, by viewing the victims VCard through an 'Add Contact' form.
Why is this useful?
1. Verifying the identity and the location of the target contact. Great when performing geo-targeted phishing attacks.
2. Checking whether your Skype account has not been used elsewhere :)
3. Spear phishing enumeration while Pen Testing.
4. Just out of plain curiosity.
To get this working, following these basic steps:
1. Download and install the patched version of Skype 5.5 from here (the patch enables the Skype client to save the logs in non obfuscated form)
2. Save the lines below as a Skype_log_patch.reg reg file:
Once saved, run it to enable the Skype Debug Log File.Windows Registry Editor Version 5.00[HKEY_CURRENT_USER\Software\Skype\Phone\UI\General]"LastLanguage"="en""Logging"="SkypeDebug2003""Logging2"="on"
4. Start Skype.
5. Search for any Skype contact and click on the 'Add a Skype Contact' button, but do not send the request, rather click on the user to view their VCard.
4. Open the log file (it should appear in the same folder as Skype executable e.g. debug-20121003-0150)
5. Look for the PresenceManager line - you should see something similar to this - >
In the above image you can spot my Skype name, external as well as internal IP addresses.The log will include similar credentilas for everyone listed as a "contact" under your Skype account, as well as many other fresh, genuine and useful information received directly from your local Skype tracker.
[Update: Disclosure and other points discussed in a little more detail here.]
We choose to look at memcached, a "Free & open source, high-performance, distributed memory object caching system" 1. It's not outwardly sexy from a security standpoint and it doesn't have a large and exposed codebase (total LOC is a smidge over 11k). However, what's of interest is the type of applications in which memcached is deployed. Memcached is most often used in web application to speed up page loads. Sites are almost2 always dynamic and either have many clients (i.e. require horizontal scaling) or process piles of data (look to reduce processing time), or oftentimes both. This implies that the sites that use memcached contain more interesting info than simple static sites, and are an indicator of a potentially interesting site. Prominent users of memcached include LiveJournal (memcached was originally written by Brad Fitzpatrick for LJ), Wikipedia, Flickr, YouTube and Twitter.
I won't go into how memcached works, suffice it to say that since data tends to be read more often than written in common use cases the idea is to pre-render and store the finalised content inside the in-memory cache. When future requests ask for the page or data, it doesn't need to be regenerated but can be simply regurgitated from the cache. Their Wiki contains more background.
insurrection:demo marco$ ruby go-derper.rb -f x.x.x.x
[i] Scanning x.x.x.x
x.x.x.x:11211
==============================
memcached 1.4.5 (1064) up 54:10:01:27, sys time Wed Aug 04 10:34:36 +0200 2010, utime=369388.17, stime=520925.98
Mem: Max 1024.00 MB, max item size = 1024.00 KB
Network: curr conn 18, bytes read 44.69 TB, bytes written 695.93 GB
Cache: get 514, set 93.41b, bytes stored 825.73 MB, curr item count 1.54m, total items 1.54m, total slabs 3
Stats capabilities: (stat) slabs settings items (set) (get)
44 terabytes read from the cache in 54 days with 1.5 million items stored? This cache is used quite frequently. There's an anomaly here in that the cache reports only 514 reads with 93 billion writes; however it's still worth exploring if only for the size.
We can run the same fingerprint scan against multiple hosts using
ruby go-derper.rb -f host1,host2,host3,...,hostn
or, if the hosts are in a file (one per line):
ruby go-derper.rb -F file_with_target_hosts
Output is either human-readable multiline (the default), or CSV. The latter helps for quickly rearranging and sorting the output to determine potential targets, and is enabled with the "-c" switch:
ruby go-derper.rb -c csv -f host1,host2,host3,...,hostn
Lastly, the monitor mode (-m) will loop forever while retrieving certain statistics and keep track of differences between iterations, in order to determine whether the cache appears to be in active use.
insurrection:demo marco$ ruby go-derper.rb -l -s x.x.x.x
[w] No output directory specified, defaulting to ./output
[w] No prefix supplied, using "run1"
This will extract data from the cache in the form of a key and its value, and save the value in a file under the "./output" directory by default (if this directory doesn't exist then the tool will exit so make sure it's present.) This means a separate file is created for every retrieved value. Output directories and file prefixes are adjustable with "-o" and "-r" respectively, however it's usually safe to leave these alone.
By default, go-derper fetches 10 keys per slab (see the memcached docs for a discussion on slabs; basically similar-sized entries are grouped together.) This default is intentionally low; on an actual assessment this could run into six figures. Use the "-K" switch to adjust:
ruby go-derper.rb -l -K 100 -s x.x.x.x
As mentioned, retrieved data is stored in the "./ouput" directory (or elsewhere if "-o" is used). Within this directory, each new run of the tool produces a set of files prefixed with "runN" in order to keep multiple runs separate. The files produced are:
At this point, there will (hopefully) be a large number of files in your output directory, which may contain useful info. Start grepping.
What we found with a bit of field experience was that mining large caches can take some time, and repeating grep gets quite boring. The tool permits you to supply your own set of regular expressions which will be applied to each retrieved value; matches are printed to the screen and this provides a scroll-by view of bits of data that may pique your interest (things like URLs, email addresses, session IDs, strings starting with "user", "pass" or "auth", cookies, IP addresses etc). The "-R" switch enables this feature and takes a file containing regexes as its sole argument:
ruby go-derper.rb -l -K 100 -R regexs.txt -s x.x.x.x
ruby go-derper.rb -w output/run1-e94aae85bd3469d929727bee5009dddd
This syntax is simple since go-derper will figure out the target server and key from the run's index file.
2 We're hedging here, but we've not come across a static memcached site.
3 If so, you may be as surprised as we were in finding this many open instances.
Wow. At some point our talk hit HackerNews and then SlashDot after swirling around the Twitters for a few days. The attention is quite astounding given the relative lack of technical sexiness to this; explanations for the interest are welcome!
We wanted to highlight a few points that didn't make the slides but were mentioned in the talk:
The potential risk assigned to exposed memcacheds hasn't as yet been publicly demonstrated so it's unsurprising that you'll find memcacheds around. I imagine this issue will flare and be hunted into extinction, at least on the public interwebs.
Lastly, the major interest seems to be on mining data from exposed caches. An equally disturbing issue is overwriting entries in the cache and this shouldn't be underestimated.