Can Javascript read the source of any web page

Question

I am working on screen scraping  and want to retrieve the source code a particular page    How can achieve this with javascript  Please help me

User · Answer

Simple way to start  try jQuery      links   load   Main Page  jq-p-Getting-Started li      More at jQuery Docs  Another way to do screen scraping in a much more structured way is to use YQL or Yahoo Query Language  It will return the scraped data structured as JSON or xml  e g  Let s scrape stackoverflow com  select   from html where url  http   stackoverflow com    will give you a JSON array  I chose that option  like this       results         body          noscript                   div             id    noscript-padding                                div             id    noscript-warning           p    Stack Overflow works best with JavaScript enabled                             div                   id    notify-container                        div                       id    header            div                           id    hlogo              a                 href                    img                  alt    logo homepage                height    70                src    http   i stackoverflow com Content Img stackoverflow-logo-250 png                width    250                          The beauty of this is that you can do projections and where clauses which ultimately gets you the scraped data structured and only the data what you need  much less bandwidth over the wire ultimately  e g   select   from html where url  http   stackoverflow com  and       xpath    div h3 a    will get you       results         a                 href     questions 414690 iphone-simulator-port-for-windows-closed         title    Duplicate  Is any Windows simulator available to test iPhone application  as a hobbyist who cannot afford a mac  i set up a toolchain kit locally on cygwin to compile objecti              content    iphone n                simulator port for windows  closed                      href     questions 680867 how-to-redirect-the-web-page-in-flex-application         title    I have a button control     i need another web page to be redirected while clicking that button      how to do that   Thanks          content    How n                to redirect the web page in flex application                   Now to get only the questions we do a   select title from html where url  http   stackoverflow com  and       xpath    div h3 a    Note the title in projections     results         a                 title    I don t want the function to be entered simultaneously by multiple threads  neither do I want it to be entered again when it has not returned yet  Is there any approach to achieve                          title    I m certain I m doing something really obviously stupid  but I ve been trying to figure it out for a few hours now and nothing is jumping out at me  I m using a ModelForm so I can                          title    when i am going through my project in IE only its showing errors A runtime error has occurred Do you wish to debug  Line 768 Error Expected    Is this is regarding any script er                          title    I have a java batch file consisting of 4 execution steps written for analyzing any Java application  In one of the steps  I m adding few libs in classpath that are needed for my co                             Once you write your query it generates a url for you     http   query yahooapis com v1 public yql q select 20title 20from 20html 20where 20url 3D 22http 3A 2F 2Fstackoverflow com 22 20and 0A 20 20 20 20 20 20xpath 3D  2F 2Fdiv 2Fh3 2Fa  0A 20 20 20 20 amp format json amp callback cbfunc  in our case     So ultimately you end up doing something like this    var titleList     getJSON theAboveUrl     and play with it     Beautiful  isn   t it

User · Answer

On linux  download slimerjs  slimerjs org   download firefox version 59  add this environment variable  export SLIMERJSLAUNCHER  home en Let  lt  sek firefox59 firefox firefox  on slimerjs download page use this  js program    slomerjs program js    var page   require  webpage   create     page open     http   www google com search q g  r  ny      function              page render  goo2 pdf         phantom exit                Use pdftotext to get text on the page

User · Answer

Using jquery   lt html gt   lt head gt   lt script src  http   jqueryjs googlecode com files jquery-1 3 2 js   gt  lt  script gt   lt  head gt   lt body gt   lt script gt    get  www google com   function response    alert response       lt  script gt   lt  body gt

User · Answer

Despite many comments to the contrary I believe that it is possible to overcome the same origin requirement with simple JavaScript   I am not claiming that the following is original because I believe I saw something similar elsewhere a while ago   I have only tested this with Safari on a Mac   The following demonstration fetches the page in the base tag and and moves its innerHTML to a new window  My script adds html tags but with most modern browsers this could be avoided by using outerHTML    lt html gt   lt head gt   lt base href  http   apod nasa gov apod   gt   lt title gt test lt  title gt   lt style gt  body   margin  0   textarea   outline  none  padding  2em  width  100   height  100     lt  style gt   lt  head gt   lt body onload  w window open       x document getElementById  t    a   lt html gt  n   b   n lt  html gt    setTimeout  x innerHTML a w document documentElement innerHTML b  w close    2000   gt   lt textarea id t gt  lt  textarea gt   lt  body gt   lt  html gt

User · Answer

You can generate a XmlHttpRequest and request the page and then use getResponseText   to get the content

User · Answer

jquery is not the way of doing things  Do in purre javascript  var r   new XMLHttpRequest        r open  GET    yahoo comm   false       r send null    if  r status    200    alert r responseText

User · Answer

lt script gt        getJSON  http   www whateverorigin org get url     encodeURIComponent  hhttps   example com        amp callback     function  data            alert data contents             lt  script gt    Include jQuery and use this code to get HTML of other website  Replace example com with your website   This method involves an external server fetching the sites HTML  amp  sending it to you

User · Answer

If you absolutely need to use javascript  you could load the page source with an ajax request    Note that with javascript  you can only retrieve pages that are located under the same domain with the requesting page

User · Answer

You can use fetch   x000D   x000D  const URL    https   www sap com belgique index html   fetch URL   then res   gt  res text     then text   gt        console log text       catch err   gt  console log err    x000D   x000D   x000D

User · Answer

You can bypass the same-origin-policy by either creating a browser extension or even saving the file as  hta in Windows  HTML Application

User · Answer

You could simply use XmlHttp  AJAX  to hit the required URL and the HTML response from the URL will be available in the responseText property  If it s not the same domain  your users will receive a browser alert saying something like  This page is trying to access a different domain  Do you want to allow this

User · Answer

javascript alert  Inspect Element On    javascript document body contentEditable    true   document designMode  on    void 0  javascript alert document documentElement innerHTML      Highlight this and drag it to your bookmarks bar and click it when you wanna edit and view the current sites source code

User · Answer

You can use the FileReader API to get a file  and when selecting a file  put the url of your web page into the selection box  Use this code   function readFile         var f   document getElementById  yourfileinput   files 0        if  f          var r   new FileReader          r onload   function e             alert r result                 r readAsText f         else          alert  file could not be found

User · Answer

Javascript can be used  as long as you grab whatever page you re after via a proxy on your domain    lt html gt   lt head gt   lt script src   js jquery-1 3 2 js  gt  lt  script gt   lt  head gt   lt body gt   lt script gt    get  www mydomain com  url www google com   function response         alert response        lt  script gt   lt  body gt

User · Answer

As a security measure  Javascript can t read files from different domains  Though there might be some strange workaround for it  I d consider a different language for this task

User · Answer

I used ImportIO  They let you request the HTML from any website if you set up an account with them  which is free   They let you make up to 50k requests per year  I didn t take them time to find an alternative  but I m sure there are some   In your Javascript  you ll basically just make a GET request like this    x000D   x000D  var request   new XMLHttpRequest    x000D   x000D  request onreadystatechange   function     x000D    jsontext   request responseText  x000D   x000D    alert jsontext   x000D    x000D   x000D  request open  GET    https   extraction import io query extractor THE PUBLIC LINK THEY GIVE YOU  apikey YOUR KEY amp url YOUR URL   true   x000D   x000D  request send    x000D   x000D   x000D    Sidenote  I found this question while researching what I felt like was the same question  so others might find my solution helpful   UPDATE  I created a new one which they just allowed me to use for less than 48 hours before they said I had to pay for the service  It seems that they shut down your project pretty quick now if you aren t paying  I made my own similar service with NodeJS and a library called NightmareJS  You can see their tutorial here and create your own web scraping tool  It s relatively easy  I haven t tried to set it up as an API that I could make requests to or anything

[javascript] Can Javascript read the source of any web page?

Examples related to javascript

Examples related to html