How to get Wikipedia content using Wikipedia s API

Question

I want to get the first paragraph of a Wikipedia article   What is the API query to do so

User · Answer

I do it this way   https   en wikipedia org w api php action opensearch amp search bee amp limit 1 amp format json  The response you get is an array with the data  easy to parse        bee            Bee                Bees are flying insects closely related to wasps and ants  known for their role in pollination and  in the case of the best-known bee species  the European honey bee  for producing honey and beeswax                 https   en wikipedia org wiki Bee          To get just the first paragraph limit 1 is what you need

User · Answer

keyword    Batman     Term you want to search   url    http   en wikipedia org w api php action parse amp page    keyword   amp format json amp prop text amp section 0    ch   curl init  url   curl setopt   ch  CURLOPT RETURNTRANSFER  1   curl setopt   ch  CURLOPT USERAGENT   Infeeds Sniper     c   curl exec  ch    json   json decode  c   if  json       amp  amp  isset  json- gt   parse          title    json- gt   parse  - gt   title        content    json- gt   parse  - gt   text  - gt            pattern      lt p gt      lt  p gt  Us      if preg match  pattern   content   matches          if  matches 1                  con   preg replace callback                function  m  return        matches 1             echo   lt h2 gt    title   lt  h2 gt   strip tags  con    lt  p gt  lt src gt Source   lt a href  https   en wikipedia org wiki    keyword    target   blank  gt Wikipedia lt  a gt  lt  src gt                     Wiki Summary Scrapper w  PHP  Wiki scrapper gist to get summary from either Wikipedia or DB Pedia API with PHP  Hope it helps

User · Answer

You can use the extract html field of the summary REST endpoint for this  e g  https   en wikipedia org api rest v1 page summary Cat   Note  This aims to simply the content a bit by removing most of the pronunciations  mainly in parentheses in some cases

User · Answer

You can get the introduction of the article in Wikipedia by querying pages such as https   en wikipedia org w api php format json amp action query amp prop extracts amp exintro  amp explaintext  amp titles java  You just need to parse the json file and the result is plain text which has been cleaned including removing links and references

User · Answer

You can download the Wikipedia database directly and parse all pages to XML with Wiki Parser  which is a standalone application  The first paragraph is a separate node in the resulting XML   Alternatively  you can extract the first paragraph from its plain-text output

User · Answer

See this section on the MediaWiki docs  These are the key parameters   prop revisions amp rvprop content amp rvsection 0   rvsection   0 specifies to only return the lead section   See this example   http   en wikipedia org w api php action query amp prop revisions amp rvprop content amp rvsection 0 amp titles pizza  To get the HTML  you can use similarly use action parse http   en wikipedia org w api php action parse amp section 0 amp prop text amp page pizza  Note  that you ll have to strip out any templates or infoboxes

User · Answer

You can use JQuery to do that  First create the url with appropriate parameters  Check this link to understand what the parameters mean  Then use   ajax   method to retrieve the articles  Note that wikipedia does not allow cross origin request  That s why we are using dataType   jsonp in the request   var wikiURL    https   en wikipedia org w api php   wikiURL            param        action     opensearch        search     your search term        prop      revisions        rvprop     content        format     json        limit    10         ajax        url  wikiURL      dataType   jsonp       success  function data           console log data

User · Answer

lt script gt          function dowiki place            var URL    https   en wikipedia org w api php format json amp action query amp prop extracts amp exintro  amp explaintext             URL      amp titles     place          URL      amp rvprop content           URL      amp callback               getJSON URL  function  data                var obj   data query pages              var ob   Object keys obj  0               console log obj ob   extract                 try                  document getElementById  Label11   textContent   obj ob   extract                              catch  err                    document getElementById  Label11   textContent   err message                                    lt  script gt

User · Answer

If you need to do this for a large number of articles  then instead of querying the website directly  consider downloading a Wikipedia database dump and then accessing it through an API such as JWPL

User · Answer

To GET first paragraph of an article   https   en wikipedia org w api php action query amp titles Belgrade amp prop extracts amp format json amp exintro 1  I have created short Wikipedia API docs for my own needs  There are working examples on how to get article s   image s  and similar

User · Answer

See Is there a clean wikipedia API just for retrieve content summary  for other proposed solutions  Here is one that I suggested   There is actually a very nice prop called extracts that can be used with queries designed specifically for this purpose  Extracts allow you to get article extracts  truncated article text   There is a parameter called exintro that can be used to retrieve the text in the zeroth section  no additional assets like images or infoboxes   You can also retrieve extracts with finer granularity such as by a certain number of  characters  exchars  or by a certain  number of sentences exsentences    Here is a sample query http   en wikipedia org w api php action query amp prop extracts amp format json amp exintro  amp titles Stack 20Overflow and the API sandbox http   en wikipedia org wiki Special ApiSandbox action query amp prop extracts amp format json amp exintro  amp titles Stack 20Overflow to experiment more with this query   Please note that if you want the first paragraph specifically you still need to get the first tag  However in this API call there are no additional assets like images to parse  If you are satisfied with this intro summary you can retrieve the text by running a function like php s strip tag that remove the html tags

[wikipedia-api] How to get Wikipedia content using Wikipedia's API?

Examples related to wikipedia-api