Skip to content

API Documentation

Table of Contents

This documentation provides detailed information on how to use the API for efficient document processing and text extraction.

Document Text Extraction API

The StructHub API endpoint for text extraction from documents:

POST https://api.structhub.io/extract

Request Parameters

Headers

  • API-KEY: Your subscription API key for authentication.

Form Data

  • file: Upload the document file. Ensure the correct file path is provided.
  • ocr: (Optional) Set to “auto” (default), true, or false to enable Optical Character Recognition (OCR).
  • lang: (Optional) Explicitly set the language of the uploaded document. Languages are auto-detected but you can also explicitly set language param in case results are not optimum. E.g., Use “eng+fra” for documents with both english and frech text.
  • out_format: (Optional) Set the output format to text, xml, json, or html.

Example Curl Request

Terminal window
curl --location 'https://api.structhub.io/extract' \
--header 'API-KEY: YOUR_API_KEY' \
--form 'file=@"/path/to/your/document.docx"' \
--form 'ocr="auto"' \
--form 'lang="eng+fra"' \
--form 'out_format="text"'
Response
[
{
"page": 1,
"text": "<text output>"
},
...
]

Request Parameters Details

ParameterDefault ValueRequiredDescription
file-YesUse the file parameter to upload the document file. Only one file can be uploaded at a time. Ensure the correct file path is provided.
ocr“auto”NoBy default, OCR is set to “auto”. The API detects if OCR is required (e.g., for scanned documents or documents with images). Set ocr to true or false as needed. Enabling OCR can significantly slow down processing.
lang-NoUse the lang parameter to explicitly set the language of the uploaded document. While the API detects some languages, explicitly setting “hin” can enhance processing for Hindi documents.
out_format“text”NoSet the out_format parameter to define the output format. Choose from text, xml, json, or html. Each page’s extracted data will be returned in the specified format.

Knowledge Base As a Service API

  • The StructHub API endpoint to search knowledge base:

POST https://api.structhub.io/search

Request Parameters

Headers

  • API-KEY: Your subscription API key for authentication.

Request Body

  • q: Search query string to search knowledge base.
  • topk: top count of results.

Example Curl Request

Terminal window
curl --location 'https://api.structhub.io/search' \
--header 'API-KEY: YOUR_API_KEY' \
--data-raw '{"query":"dd","topk":10}' \
Response
{
"count": 9,
"data": [
{
"source": "sample-pdf.pdf",
"page": 2.0,
"text": "matched docuemnt text chunk"
},
....
]
}

Rate Limit

Each subscription comes with a per-minute rate limit. The rate limit is calculated within a moving 1-minute window. If the rate limit is exceeded, the API will respond with a 429 error. Ensure your application adheres to the rate limits to avoid disruptions.

Response Codes

401 Unauthorized: Invalid API key or no API key provided. 200 OK: Successful operation. 429 Too Many Requests: Rate limit exceeded. Ensure to replace YOUR_API_KEY in the example Curl request with your actual subscription API key.

For any questions or assistance, feel free to contact our support team.