Parser API

Access the web's most powerful content parser with our simple and powerful Parser API.
samson@readability:~/developers/api/parser
#

Note: The Parser API is freely available on a limited basis. If you'd like to use the Parser API for commercial use, get in touch with us at licensing@readability.com to learn about our licensing options.

Authentication

Requests to the Parser API are not signed like an OAuth request. The Parser token is simply passed as a POST or GET parameter depending on the request type. Be careful not to reveal this token, requests directly to the Parser API should not be made on the client device but rather proxied to keep the API token secure.

Quick Start

Here's how to pull an article's content from the Readability Parser API:

Request

GET /api/content/v1/parser?url=http://blog.readability.com/2011/02/step-up-be-heard-readability-ideas/&token=1b830931777ac7c2ac954e9f0d67df437175e66e

Response

HTTP/1.0 200 OK
{
    "content" <div class=\"article-text\">\n<p>I'm idling outside Diamante's, [snip] ...</p></div>",
    "domain": "www.gq.com",
    "author": "Rafi Kohan",
    "url": "http://www.gq.com/sports/profiles/201202/david-diamante-interview-cigar-lounge-brooklyn-new-jersey-nets?currentPage=all",
    "short_url": "http://rdd.me/g3jcb1sr",
    "title": "Blowing Smoke with Boxing's Big Voice",
    "excerpt": "I'm idling outside Diamante's, a cigar lounge in Fort Greene, waiting for David Diamante, and soon I smell him coming. It's late January but warm. A motorcycle growls down the Brooklyn side street,&hellip;",
    "direction": "ltr",
    "word_count": 2892,
    "total_pages": 1,
    "date_published": null,
    "dek": "Announcer <strong>David Diamante</strong>, the new voice of the New Jersey (soon Brooklyn) Nets, has been calling boxing matches for years. On the side, he owns a cigar lounge in the heart of Brooklyn. We talk with Diamante about his new gig and the fine art of cigars",
    "lead_image_url": "http://www.gq.com/images/entertainment/2012/02/david-diamante/diamante-628.jpg",
    "next_page_id": null,
    "rendered_pages": 1
}

Data Formats

All requests are, by default, provided as JSON. You may also pass "?format=xml" in the URL to convert this into XML data to be consumed.

Resources, Representations & Errors

Resources

/

Methods

GET

Retrieve the base API URI - information about subresources.
request header parameters
parametervaluedescription

Authorization

(required)

available response representations:

/parser?token&url&id&max_pages

Methods

GET

Parse an article
request query parameters
parametervaluedescription

token

string (required)

url

string

The URL of an article to return the content for.

id

string

The ID of an article to return the content for.

max_pages

integer

The maximum number of pages to parse and combine. Default is 25.

available response representations:

potential faults:

HEAD

Retrieve the Content Status of an article.

request query parameters
parametervaluedescription

token

string (required)

url

string

The URL of an article to check.

id

string

The ID of an article to check.
response header parameters
parametervaluedescription

X-Article-Id

string

The ID of the article within Readablity.

X-Article-Status

string

The status of the content in Readability. One of:

INVALID
We were unable to parse this URL for some reason. Recommendation: Fail
UNRETRIEVED
We know of this article, but have not yet retrieved its content, or the cache has expired.
PROVIDED_BY_USER
We have retrieved the content for this URL from at least one user.
VALIDATED_BY_USERS
We have retrieved the content for this URL from multiple users, and have validated it. Recommendation: GET the content from us.
FETCHED
We fetched the content for this URL manually, and it has been cached. Recommendation:GET the content from us.

potential faults:

/confidence?url&callback

Methods

GET

Detect the confidence with which Readability could parse a given URL. Does not require a token.
request query parameters
parametervaluedescription

url

string (required)

The URL of an article to return the confidence for.

callback

string

The jsonp callback function name.

available response representations:

potential faults:

Representations

Example root representation. (application/json)

{
    "resources": {
        "parser": {
            "description": "The Content Parser Resource",
            "href": "/api/content/v1/parser"
        }
    }
}
            

Example article representation. (application/json)

{
    "content" <div class=\"article-text\">\n<p>I'm idling outside Diamante's, [snip] ...</p></div>",
    "domain": "www.gq.com",
    "author": "Rafi Kohan",
    "url": "http://www.gq.com/sports/profiles/201202/david-diamante-interview-cigar-lounge-brooklyn-new-jersey-nets?currentPage=all",
    "short_url": "http://rdd.me/g3jcb1sr",
    "title": "Blowing Smoke with Boxing's Big Voice",
    "excerpt": "I'm idling outside Diamante's, a cigar lounge in Fort Greene, waiting for David Diamante, and soon I smell him coming. It's late January but warm. A motorcycle growls down the Brooklyn side street,&hellip;",
    "direction": "ltr",
    "word_count": 2892,
    "total_pages": 1,
    "date_published": null,
    "dek": "Announcer <strong>David Diamante</strong>, the new voice of the New Jersey (soon Brooklyn) Nets, has been calling boxing matches for years. On the side, he owns a cigar lounge in the heart of Brooklyn. We talk with Diamante about his new gig and the fine art of cigars",
    "lead_image_url": "http://www.gq.com/images/entertainment/2012/02/david-diamante/diamante-628.jpg",
    "next_page_id": null,
    "rendered_pages": 1
}

Example confidence representation. (application/json)

{
    "url": "http://www.gq.com/article/12",
    "confidence": .7
}

Example confidence representation as jsonp. (application/json)

callback({
    "url": "http://www.gq.com/article/12",
    "confidence": .7
});

Errors

400 Bad Request (application/json)

The server could not understand your request. Verify that request parameters (and content, if any) are valid.

401 Authorization Required (application/json)

Authentication failed or was not provided. Verify that you have sent valid ixDirectory credentials via HTTP Basic.

A 'Www-Authenticate' challenge header will be sent with this type of error response.

Status Code - ()

500 Internal Server Error (application/json)

An unknown error has occurred.

404 Not Found (application/json)

The resource that you requested does not exist.