Is there a wikipedia API just for retrieve content summary

Question

I need just to retrieve first paragraph of a Wikipedia page  Content must be html formated  ready to be displayed on my website  so NO BBCODE  or WIKIPEDIA special CODE

User · Accepted Answer

There s a way to get the entire  quot intro section quot  without any html parsing   Similar to AnthonyS s answer with an additional explaintext param  you can get the intro section text in plain text  Query Getting Stack Overflow s intro in plain text  Using page title  https   en wikipedia org w api php format json amp action query amp prop extracts amp exintro amp explaintext amp redirects 1 amp titles Stack 20Overflow or use pageids https   en wikipedia org w api php format json amp action query amp prop extracts amp exintro amp explaintext amp redirects 1 amp pageids 21721040 JSON Response  warnings stripped         quot query quot              quot pages quot                  quot 21721040 quot                      quot pageid quot   21721040                   quot ns quot   0                   quot title quot    quot Stack Overflow quot                    quot extract quot    quot Stack Overflow is a privately held website  the flagship site of the Stack Exchange Network  created in 2008 by Jeff Atwood and Joel Spolsky  as a more open alternative to earlier Q amp A sites such as Experts Exchange  The name for the website was chosen by voting in April 2008 by readers of Coding Horror  Atwood s popular programming blog  nIt features questions and answers on a wide range of topics in computer programming  The website serves as a platform for users to ask and answer questions  and  through membership and active participation  to vote questions and answers up or down and edit questions and answers in a fashion similar to a wiki or Digg  Users of Stack Overflow can earn reputation points and   quot badges  quot   for example  a person is awarded 10 reputation points for receiving an   quot up  quot  vote on an answer given to a question  and can receive badges for their valued contributions  which represents a kind of gamification of the traditional Q amp A site or forum  All user-generated content is licensed under a Creative Commons Attribute-ShareAlike license  Questions are closed in order to allow low quality questions to improve  Jeff Atwood stated in 2010 that duplicate questions are not seen as a problem but rather they constitute an advantage if such additional questions drive extra traffic to the site by multiplying relevant keyword hits in search engines  nAs of April 2014  Stack Overflow has over 2 700 000 registered users and more than 7 100 000 questions  Based on the type of tags assigned to questions  the top eight most discussed topics on the site are  Java  JavaScript  C   PHP  Android  jQuery  Python and HTML  quot                                   Documentation  API  query prop extracts  Edit  Added  amp redirects 1 as recommended in comments  Edit  Added pageids example

User · Answer

The abstract xml gz dump sounds like the one you want

User · Answer

This url will return summary in xml format   http   lookup dbpedia org api search asmx KeywordSearch QueryString Agra amp MaxHits 1   I have created a function to fetch description of a keyword from wikipedia   function getDescription  keyword        url  http   lookup dbpedia org api search asmx KeywordSearch QueryString   urlencode  keyword    amp MaxHits 1        xml simplexml load file  url       return  xml- gt Result- gt Description    echo getDescription  agra

User · Answer

My approach was as follows  in PHP     url    whatever you need    html   file get contents  https   en wikipedia org w api php action opensearch amp search    url    utf8html   html entity decode preg replace   U    0-9A-F  4        amp  x  1     html   ENT NOQUOTES   UTF-8       utf8html might need further cleaning  but that s basically it

User · Answer

There is actually a very nice prop called extracts that can be used with queries designed specifically for this purpose  Extracts allow you to get article extracts  truncated article text   There is a parameter called exintro that can be used to retrieve the text in the zeroth section  no additional assets like images or infoboxes   You can also retrieve extracts with finer granularity such as by a certain number of  characters  exchars  or by a certain  number of sentences exsentences    Here is a sample query http   en wikipedia org w api php action query amp prop extracts amp format json amp exintro  amp titles Stack 20Overflow and the API sandbox http   en wikipedia org wiki Special ApiSandbox action query amp prop extracts amp format json amp exintro  amp titles Stack 20Overflow to experiment more with this query   Please note that if you want the first paragraph specifically you still need to do some additionally parsing as suggested in the chosen answer  The difference here is that the response returned by this query is shorter than some of the other api queries suggested because you don t have additional assets such as images in the api response to parse

User · Answer

If you are just looking for the text which you can then split up but don t want to use the API take a look at en wikipedia org w index php title Elephant amp action raw

User · Answer

Yes  there is  For example  if you wanted to get the content of the first section of the article Stack Overflow  use a query like this   http   en wikipedia org w api php format xml amp action query amp prop revisions amp titles Stack 20Overflow amp rvprop content amp rvsection 0 amp rvparse  The parts mean this    format xml  Return the result formatter as XML  Other options  like JSON  are available  This does not affect the format of the page content itself  only the enclosing data format  action query amp prop revisions  Get information about the revisions of the page  Since we don t specify which revision  the latest one is used  titles Stack 20Overflow  Get information about the page Stack Overflow  It s possible to get the text of more pages in one go  if you separate their names by    rvprop content  Return the content  or text  of the revision  rvsection 0  Return only content from section 0  rvparse  Return the content parsed as HTML    Keep in mind that this returns the whole first section including things like hatnotes     For other uses          infoboxes or images   There are several libraries available for various languages that make working with API easier  it may be better for you if you used one of them

User · Answer

This code allows you to retrieve the content of the first paragraph of the page in plain text    Parts of this answer come from here and thus here  See MediaWiki API documentation for more information      action parse  get parsed text    page Baseball  from the page Baseball    format json  in json format    prop text  send the text content of the article    section 0  top content of the page   url    http   en wikipedia org w api php format json amp action parse amp page Baseball amp prop text amp section 0    ch   curl init  url   curl setopt   ch  CURLOPT RETURNTRANSFER  1   curl setopt   ch  CURLOPT USERAGENT   TestScript       required by wikipedia org server  use YOUR user agent with YOUR contact information   otherwise your IP might get blocked   c   curl exec  ch     json   json decode  c     content    json- gt   parse  - gt   text  - gt           get the main text content of the query  it s parsed HTML      pattern for first match of a paragraph  pattern      lt p gt      lt  p gt  Us      http   www phpbuilder com board showthread php t 10352690 if preg match  pattern   content   matches            print  matches 0      content of the first paragraph  including wrapping  lt p gt  tag      print strip tags  matches 1       Content of the first paragraph without the HTML tags

User · Answer

You can also get content such as the first pagagraph via DBPedia which takes Wikipedia content and creates structured information from it  RDF  and makes this available via an API  The DBPedia API is a SPARQL one  RDF-based  but it outputs JSON and it is pretty easy to wrap   As an example here s a super simple JS library named WikipediaJS that can extract structured content including a summary first paragraph  http   okfnlabs org wikipediajs   You can read more about it in this blog post  http   okfnlabs org blog 2012 09 10 wikipediajs-a-javascript-library-for-accessing-wikipedia-article-information html  The JS library code can be found here  https   github com okfn wikipediajs blob master wikipedia js

User · Answer

This is the code I m using right now for a website I m making that needs to get the leading paragraphs   summary   section 0 of off Wikipedia articles  and it s all done within the browser  client side javascript  thanks to the magick of JSONP  --  http   jsfiddle net gautamadude HMJJg 1   It uses the Wikipedia API to get the leading paragraphs  called section 0  in HTML like so  http   en wikipedia org w api php format json amp action parse amp page Stack Overflow amp prop text amp section 0 amp callback    It then strips the HTML and other undesired data  giving you a clean string of an article summary  if you want you can  with a little tweaking  get a  p  html tag around the leading paragraphs but right now there is just a newline character between them   Code   var url    http   en wikipedia org wiki Stack Overflow   var title   url split      slice 4  join          Get Leading paragraphs  section 0    getJSON  http   en wikipedia org w api php format json amp action parse amp page     title     amp prop text amp section 0 amp callback     function  data        for  text in data parse text            var text   data parse text text  split   lt p gt             var pText                for  p in text                  Remove html comment             text p    text p  split   lt  --                if  text p  length  gt  1                    text p  0    text p  0  split   r n  r  n                    text p  0    text p  0  0                   text p  0       lt  p gt                               text p    text p  0                  Construct a string from paragraphs             if  text p  indexOf   lt  p gt       text p  length - 5                    var htmlStrip   text p  replace   lt       n    gt  gm        Remove HTML                 var splitNewline   htmlStrip split   r n  r  n      Split on newlines                 for  newline in splitNewline                        if  splitNewline newline  substring 0  11      Cite error                              pText    splitNewline newline                           pText      n                                                                           pText   pText substring 0  pText length - 2     Remove extra newline         pText   pText replace     d    g         Remove reference tags  e x   1    4   etc          document getElementById  textarea   value   pText         document getElementById  div text   textContent   pText

User · Answer

Since 2017 Wikipedia provides a REST API with better caching  In the documentation you can find the following API which perfectly fits your use case   as it is used by the new Page Previews feature   https   en wikipedia org api rest v1 page summary Stack Overflow returns the following data which can be used to display a summery with a small thumbnail        type    standard      title    Stack Overflow      displaytitle    Stack Overflow      extract    Stack Overflow is a question and answer site for professional and enthusiast programmers  It is a privately held website  the flagship site of the Stack Exchange Network  created in 2008 by Jeff Atwood and Joel Spolsky  It features questions and answers on a wide range of topics in computer programming  It was created to be a more open alternative to earlier question and answer sites such as Experts-Exchange  The name for the website was chosen by voting in April 2008 by readers of Coding Horror  Atwood s popular programming blog       extract html     lt p gt  lt b gt Stack Overflow lt  b gt  is a question and answer site for professional and enthusiast programmers  It is a privately held website  the flagship site of the Stack Exchange Network  created in 2008 by Jeff Atwood and Joel Spolsky  It features questions and answers on a wide range of topics in computer programming  It was created to be a more open alternative to earlier question and answer sites such as Experts-Exchange  The name for the website was chosen by voting in April 2008 by readers of  lt i gt Coding Horror lt  i gt   Atwood s popular programming blog  lt  p gt       namespace          id   0       text              wikibase item    Q549037      titles          canonical    Stack Overflow        normalized    Stack Overflow        display    Stack Overflow          pageid   21721040     thumbnail          source    https   upload wikimedia org wikipedia en thumb f fa Stack Overflow homepage 2C Feb 2017 png 320px-Stack Overflow homepage 2C Feb 2017 png        width   320       height   149         originalimage          source    https   upload wikimedia org wikipedia en f fa Stack Overflow homepage 2C Feb 2017 png        width   462       height   215         lang    en      dir    ltr      revision    902900099      tid    1a9cdbc0-949b-11e9-bf92-7cc0de1b4f72      timestamp    2019-06-22T03 09 01Z      description    website hosting questions and answers on a wide range of topics in computer programming      content urls          desktop            page    https   en wikipedia org wiki Stack Overflow          revisions    https   en wikipedia org wiki Stack Overflow action history          edit    https   en wikipedia org wiki Stack Overflow action edit          talk    https   en wikipedia org wiki Talk Stack Overflow              mobile            page    https   en m wikipedia org wiki Stack Overflow          revisions    https   en m wikipedia org wiki Special History Stack Overflow          edit    https   en m wikipedia org wiki Stack Overflow action edit          talk    https   en m wikipedia org wiki Talk Stack Overflow                api urls          summary    https   en wikipedia org api rest v1 page summary Stack Overflow        metadata    https   en wikipedia org api rest v1 page metadata Stack Overflow        references    https   en wikipedia org api rest v1 page references Stack Overflow        media    https   en wikipedia org api rest v1 page media Stack Overflow        edit html    https   en wikipedia org api rest v1 page html Stack Overflow        talk page html    https   en wikipedia org api rest v1 page html Talk Stack Overflow          By default  it follows redirects  so that  api rest v1 page summary StackOverflow also works   but this can be disabled with  redirect false  If you need to access the API from another domain you can set the CORS header with  amp origin   e g   amp origin     Update 2019  The API seems to return more useful information about the page

User · Answer

I tried  Michael Rapadas and  Krinkle s solution but in my case I had trouble to find some articles depending of the capitalization  Like here   https   en wikipedia org w api php format json amp action query amp prop extracts amp exintro  amp exsentences 1 amp explaintext  amp titles Led 20zeppelin  Note I truncated the response with exsentences 1  Apparently  title normalization  was not working correctly      Title normalization converts page titles to their canonical form  This   means capitalizing the first character  replacing underscores with   spaces  and changing namespace to the localized form defined for that   wiki  Title normalization is done automatically  regardless of which   query modules are used  However  any trailing line breaks in page   titles   n  will cause odd behavior and they should be stripped out   first    I know I could have sorted out the capitalization issue easily but there was also the inconvenience of having to cast the object to an array   So because I just really wanted the very first paragraph of a well-known and defined search  no risk to fetch info from another articles  I did it like this   https   en wikipedia org w api php action opensearch amp search led 20zeppelin amp limit 1 amp format json  Note in this case I did the truncation with limit 1  This way    I can access the response data very easily  The response is quite small    But we have to keep being careful with the capitalization of our search    More info  https   www mediawiki org wiki API Opensearch

[api] Is there a wikipedia API just for retrieve content summary?

Query

JSON Response

Examples related to api

Examples related to wikipedia

Examples related to wikipedia-api