[html] How to find when a web page was last updated

Is there a way to find out how much time has passed since a web page was changed?

For example, I have a page hosted at: www.mywebsitenotupdated.com

Is there a way to find out when this HTML page was uploaded to the server?

I have no access to server; just a link to the webpage.

This question is related to html upload

The answer is


For checking the Last Modified header, you can use httpie (docs).

Installation

pip install httpie --user

Usage

$ http -h https://martin-thoma.com/author/martin-thoma/ | grep 'Last-Modified\|Date'
Date: Fri, 06 Jan 2017 10:06:43 GMT
Last-Modified: Fri, 06 Jan 2017 07:42:34 GMT

The Date is important as this reports the server time, not your local time. Also, not every server sends Last-Modified (e.g. superuser seems not to do it).


There is another way to find the page update which could be useful for some occasions (if works:).

If the page has been indexed by Google, or by Wayback Machine you can try to find out what date(s) was(were) saved by them (these methods do not work for any page, and have some limitations, which are extensively investigated in this webmasters.stackexchange question's answers. But in many cases they can help you to find out the page update date(s):

  1. Google way: Go by link https://www.google.com.ua/search?q=site%3Awww.example.com&biw=1855&bih=916&source=lnt&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2000%2Ccd_max%3A&tbm=
    • You can change text in search field by any page URL you want.
    • For example, the current stackoverflow question page search gives us as a result May 14, 2014 - which is the question creation date: enter image description here
  2. Wayback machine way: Go by link https://web.archive.org/web/*/www.example.com
    • for this stackoverflow page wayback machine gives us more results: Saved 6 times between June 7, 2014 and November 23, 2016., and you can view all saved copies for each date

Open your browsers console(?) and enter the following:

javascript:alert(document.lastModified)

For me it was the

article:modified_time

in the page source.

View Page Source


This is a Pythonic way to do it:

import httplib
import yaml
c = httplib.HTTPConnection(address)
c.request('GET', url_path)
r = c.getresponse()
# get the date into a datetime object
lmd = r.getheader('last-modified')
if lmd != None:
   cur_data = { url: datetime.strptime(lmd, '%a, %d %b %Y %H:%M:%S %Z') }
else:
   print "Hmmm, no last-modified data was returned from the URL."
   print "Returned header:"
   print yaml.dump(dict(r.getheaders()), default_flow_style=False)

The rest of the script includes an example of archiving a page and checking for changes against the new version, and alerting someone by email.