Header

Wed, 21 Nov 2007

Google as an MD5 Cracker..
@

Slashdot picked up on the blog post from Light Blue TouchPaper commenting on the fact that a researcher was suprised to discover that simply putting an md5 hash into google returned a hit with a mapping to the original word..

This is an interesting concept.. A while back, we decided to fiddle with the concept of using googles indexing and spidering as a new take on the time/space trade-off for password cracking..

We did:

A simple cgi script that accepts a single parameter.. We then use url re-writing to make the script look less scripty and more crawler friendly.

A quick check on the internet shows that google indexes 100k into a document, so our CGI sits around doing nothing, till its first visited:

Once it is, it generates all chars from a..ZZZZZ and prints them along with their md5 hash:

So if you hit: https://secure.sensepost.com/sp-hash/a, you would get:

hash1.PNG
"

Now since google only indexes upto a certain point in the doc, its useless filling this page with all of the hashes, so at 100k we stop, and if the char at that point is abc, the cgi then creates a link to itself with abc as the param.. (in our picture it stops at pnt)

hash2.PNG
"

The crawler hits that link, effectively hitting and seeding the same cgi, which then keeps going ad-infinitum..

This can be tested, so a quick google for site:secure.sensepost.com + adog will return:

goog-hash.PNG
"

(you can also use google webmaster tools to pre-seed the spider)

Unfortunately i never got back to it, but noticed that while google did index the full charset a..zzzzz at a point some hits dissapeared.. im not sure if this is due to filtering on some of the words that emerged or simply not enough link credibility..

I suspect that if the problem is the latter, it could be fixed by more ppl picking up seeds.. in this plan.. multiple ppl would run the cgi, and a type of delegation can be set up.. so while google is indexing me from a..zzz its indexing someone else from zzz..ZZZ etc.. at just the cost of bandwidth, this would give useful results..

Ah well..

Blog
Video
Research
QotW
Categories
about:us (31)
blackhat (5)
blog (10)
broadview (2)
build-it (1)
cloud (12)
community (15)
conferences (60)
crypto (3)
fail (3)
foos (1)
fun (51)
goodbye (1)
hackrack (2)
Hope? (2)
howto (8)
imsojaded (2)
infosec-soapies (25)
infrastructure (3)
local (5)
mac (15)
management (7)
materials (3)
memcached (2)
mindless-politics (4)
mindmaps (1)
PCI (2)
post-it (1)
privacy (6)
product (2)
programming (5)
public (275)
qo[w|m|?] (5)
README (1)
real-world (14)
research (37)
reversing (4)
security-fyi (8)
security-news (6)
silly-yammerings (19)
tech-toys (3)
time-waster (6)
tin-foil-hat (6)
tools (46)
training (18)
travel (1)
tricks (1)
Uncategorized (3)
vendors (6)
videos (6)
vulnerability (7)
wasc (1)
webapps (6)
web_x.0 (2)
writing-advice (1)
zen-hacking (6)
Archives
August 2010 (4)
July 2010 (1)
June 2010 (4)
May 2010 (3)
April 2010 (3)
March 2010 (7)
Feburary 2010 (2)
January 2010 (3)
December 2009 (4)
November 2009 (4)
October 2009 (3)
September 2009 (5)
August 2009 (9)
July 2009 (1)
June 2009 (5)
May 2009 (4)
April 2009 (10)
March 2009 (13)
Feburary 2009 (12)
January 2009 (11)
December 2008 (9)
November 2008 (8)
October 2008 (5)
September 2008 (5)
August 2008 (6)
July 2008 (6)
June 2008 (6)
May 2008 (2)
April 2008 (3)
March 2008 (7)
Feburary 2008 (12)
January 2008 (9)
December 2007 (8)
November 2007 (4)
October 2007 (9)
September 2007 (14)
August 2007 (18)
July 2007 (13)
June 2007 (17)
May 2007 (2)
July 2006 (1)
April 2006 (1)
August 2005 (1)
June 2005 (1)
May 2005 (2)
Archives
Conditions of use Privacy statement
Top of Page Legal stuff