[google-chrome] How can I read Chrome Cache files?

A forum I frequent was down today, and upon restoration, I discovered that the last two days of forum posting had been rolled back completely.

Needless to say, I'd like to get back what data I can from the forum loss, and I am hoping I have at least some of it stored in the cache files that Chrome created.

I face two problems -- the cache files have no filetype, and I'm unsure how to read them in an intelligent manner (trying to open them in Chrome itself seems to "redownload" them in a .gz format), and there are a ton of cache files.

Any suggestions on how to read and sort these files? (A simple string search should fit my needs)

This question is related to google-chrome browser-cache

The answer is


EDIT: The below answer no longer works see here


In Chrome or Opera, open a new tab and navigate to chrome://view-http-cache/

Click on whichever file you want to view. You should then see a page with a bunch of text and numbers. Copy all the text on that page. Paste it in the text box below.

Press "Go". The cached data will appear in the Results section below.


The JPEXS Free Flash Decompiler has Java code to do this at in the source tree for both Chrome and Firefox (no support for Firefox's more recent cache2 though).


I had some luck with this open-source Python project, seemingly inactive: https://github.com/JRBANCEL/Chromagnon

I ran:

python2 Chromagnon/chromagnonCache.py path/to/Chrome/Cache -o browsable_cache/

And I got a locally-browsable extract of all my open tabs cache.


EDIT: The below answer no longer works see here


Chrome stores the cache as a hex dump. OSX comes with xxd installed, which is a command line tool for converting hex dumps. I managed to recover a jpg from my Chrome's HTTP cache on OSX using these steps:

  1. Goto: chrome://cache
  2. Find the file you want to recover and click on it's link.
  3. Copy the 4th section to your clipboard. This is the content of the file.
  4. Follow the steps on this gist to pipe your clipboard into the python script which in turn pipes to xxd to rebuild the file from the hex dump: https://gist.github.com/andychase/6513075

Your final command should look like:

pbpaste | python chrome_xxd.py | xxd -r - image.jpg

If you're unsure what section of Chrome's cache output is the content hex dump take a look at this page for a good guide: http://www.sparxeng.com/blog/wp-content/uploads/2013/03/chrome_cache_html_report.png

Image source: http://www.sparxeng.com/blog/software/recovering-images-from-google-chrome-browser-cache

More info on XXD: http://linuxcommand.org/man_pages/xxd1.html

Thanks to Mathias Bynens above for sending me in the right direction.


The Google Chrome cache directory $HOME/.cache/google-chrome/Default/Cache on Linux contains one file per cache entry named <16 char hex>_0 in "simple entry format":

  • 20 Byte SimpleFileHeader
  • key (i.e. the URI)
  • payload (the raw file content i.e. the PDF in our case)
  • SimpleFileEOF record
  • HTTP headers
  • SHA256 of the key (optional)
  • SimpleFileEOF record

If you know the URI of the file you're looking for it should be easy to find. If not, a substring like the domain name, should help narrow it down. Search for URI in your cache like this:

fgrep -Rl '<URI>' $HOME/.cache/google-chrome/Default/Cache

Note: If you're not using the default Chrome profile, replace Default with the profile name, e.g. Profile 1.


Note: The below answer is out of date since the Chrome disk cache format has changed.


Joachim Metz provides some documentation of the Chrome cache file format with references to further information.

For my use case, I only needed a list of cached URLs and their respective timestamps. I wrote a Python script to get these by parsing the data_* files under C:\Users\me\AppData\Local\Google\Chrome\User Data\Default\Cache\:

import datetime
with open('data_1', 'rb') as datafile:
    data = datafile.read()

for ptr in range(len(data)):
    fourBytes = data[ptr : ptr + 4]
    if fourBytes == b'http':

        # Found the string 'http'. Hopefully this is a Cache Entry
        endUrl = data.index(b'\x00', ptr)
        urlBytes = data[ptr : endUrl]
        try:
            url = urlBytes.decode('utf-8')
        except:
            continue

        # Extract the corresponding timestamp
        try:
            timeBytes = data[ptr - 72 : ptr - 64]
            timeInt = int.from_bytes(timeBytes, byteorder='little')
            secondsSince1601 = timeInt / 1000000
            jan1601 = datetime.datetime(1601, 1, 1, 0, 0, 0)
            timeStamp = jan1601 + datetime.timedelta(seconds=secondsSince1601)
        except:
            continue

        print('{} {}'.format(str(timeStamp)[:19], url))

I've made short stupid script which extracts JPG and PNG files:

#!/usr/bin/php
<?php
 $dir="/home/user/.cache/chromium/Default/Cache/";//Chrome or chromium cache folder. 
 $ppl="/home/user/Desktop/temporary/"; // Place for extracted files 

 $list=scandir($dir);
 foreach ($list as $filename)
 {

 if (is_file($dir.$filename))
    {
        $cont=file_get_contents($dir.$filename);
        if  (strstr($cont,'JFIF'))
        {
            echo ($filename."  JPEG \n");
            $start=(strpos($cont,"JFIF",0)-6);
            $end=strpos($cont,"HTTP/1.1 200 OK",0);
            $cont=substr($cont,$start,$end-6);
            $wholename=$ppl.$filename.".jpg";
            file_put_contents($wholename,$cont);
            echo("Saving :".$wholename." \n" );


                }
        elseif  (strstr($cont,"\211PNG"))
        {
            echo ($filename."  PNG \n");
            $start=(strpos($cont,"PNG",0)-1);
            $end=strpos($cont,"HTTP/1.1 200 OK",0);
            $cont=substr($cont,$start,$end-1);
            $wholename=$ppl.$filename.".png";
            file_put_contents($wholename,$cont);
            echo("Saving :".$wholename." \n" );


                }
        else
        {
            echo ($filename."  UNKNOWN \n");
        }
    }
 }
?>

Note: The flag show-saved-copy has been removed and the below answer will not work


You can read cached files using Chrome alone.

Chrome has a feature called Show Saved Copy Button:

Show Saved Copy Button Mac, Windows, Linux, Chrome OS, Android

When a page fails to load, if a stale copy of the page exists in the browser cache, a button will be presented to allow the user to load that stale copy. The primary enabling choice puts the button in the most salient position on the error page; the secondary enabling choice puts it secondary to the reload button. #show-saved-copy

First disconnect from the Internet to make sure that browser doesn't overwrite cache entry. Then navigate to chrome://flags/#show-saved-copy and set flag value to Enable: Primary. After you restart browser Show Saved Copy Button will be enabled. Now insert cached file URI into browser's address bar and hit enter. Chrome will display There is no Internet connection page alongside with Show saved copy button: enter image description here

After you hit the button browser will display cached file.


EDIT: The below answer no longer works see here


Google Chrome cache file format description.

Cache files list, see URLs (copy and paste to your browser address bar):

  • chrome://cache/
  • chrome://view-http-cache/

Cache folder in Linux: $~/.cache/google-chrome/Default/Cache

Let's determine in file GZIP encoding:

$ head f84358af102b1064_0 | hexdump -C | grep --before-context=100 --after-context=5 "1f 8b 08"

Extract Chrome cache file by one line on PHP (without header, CRC32 and ISIZE block):

$ php -r "echo gzinflate(substr(strchr(file_get_contents('f84358af102b1064_0'), \"\x1f\x8b\x08\"), 10,
-8));"

It was removed on purpose and it won't be coming back.

Both chrome://cache and chrome://view-http-cache have been removed starting chrome 66. They work in version 65.

Workaround

You can check the chrome://chrome-urls/ for complete list of internal Chrome URLs.

The only workaround that comes into my mind is to use menu/more tools/developer tools and having a Network tab selected.

The reason why it was removed is this bug:

The discussion:


EDIT: The below answer no longer works see here


If the file you try to recover has Content-Encoding: gzip in the header section, and you are using linux (or as in my case, you have Cygwin installed) you can do the following:

  1. visit chrome://view-http-cache/ and click the page you want to recover
  2. copy the last (fourth) section of the page verbatim to a text file (say: a.txt)
  3. xxd -r a.txt| gzip -d

Note that other answers suggest passing -p option to xxd - I had troubles with that presumably because the fourth section of the cache is not in the "postscript plain hexdump style" but in a "default style".

It also does not seem necessary to replace double spaces with a single space, as chrome_xxd.py is doing (in case it is necessary you can use sed 's/ / /g' for that).


Examples related to google-chrome

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 81 SameSite warning Chrome 77 What's the net::ERR_HTTP2_PROTOCOL_ERROR about? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium Jupyter Notebook not saving: '_xsrf' argument missing from post How to fix 'Unchecked runtime.lastError: The message port closed before a response was received' chrome issue? Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed WebDriverException: unknown error: DevToolsActivePort file doesn't exist while trying to initiate Chrome Browser How to make audio autoplay on chrome How to handle "Uncaught (in promise) DOMException: play() failed because the user didn't interact with the document first." on Desktop with Chrome 66?

Examples related to browser-cache

How to force reloading a page when using browser back button? How to prevent Browser cache on Angular 2 site? Form/JavaScript not working on IE 11 with error DOM7011 AngularJS disable partial caching on dev machine How to prevent Browser cache for php site What is Cache-Control: private? Stylesheet not updating clear cache of browser by command line How do you cache an image in Javascript Leverage browser caching, how on apache or .htaccess?