Purchase Order API: Extract Purchase Order Data to JSON

Send a purchase order PDF, scan, or photo to one REST endpoint and get back clean JSON: PO number, supplier, dates, terms, and the full line-item table. No template per supplier, no manual keying. Upload a real PO below to see the exact structured output the API returns.

PDF, JPG, PNG, BMP, HEIC, TIFF

Submit your purchase orders

REST API, JSON response
PO number, supplier, and line items
Reads PDF, scanned, and photo POs
Free to try

Purchase Order Data Is Locked in Documents Your Code Cannot Read

A purchase order PDF is a picture of a table, not data. Your application cannot post a supplier order to an ERP, run a match, or update inventory until every field is structured. Building that extraction in house means wiring up OCR, layout parsing, and per-supplier rules, then maintaining them as vendors change their templates.

Raw OCR Is Not Structured Data

Open-source OCR returns a wall of text with no field labels. You still have to figure out which number is the PO number, which block is the supplier, and where each line item starts. That parsing layer is the hard part, and it breaks on the next new layout.

Every Supplier Sends a Different Format

Template-based extraction needs a rule set per vendor. The moment a supplier moves a field or a new one arrives, the template fails and a PO drops to manual review. Maintaining hundreds of templates is a permanent engineering cost.

Line-Item Tables Are the Hardest Part

Header fields are one thing; a multi-page line-item table with wrapped descriptions, merged cells, and per-row quantities and prices is where most parsers fail. Missing a row throws off matching and spend totals downstream.

Scans and Photos Have No Text Layer

A faxed or photographed PO is pure pixels. A PDF text parser returns nothing on it, so any solution that skips real OCR silently drops the messiest documents, which are exactly the ones a team wants automated.

Teams reach for a purchase order API when the volume of inbound supplier orders passes the point where a person can key them, and the data has to land in another system programmatically. The requirement is almost always the same: submit a file, receive validated JSON that maps to an ERP or database schema, and never build a per-vendor template. The work below is turning the document into that JSON.

One REST Endpoint, Structured JSON Back

The PurchaseOrders API reads each purchase order with AI and returns the fields your application needs as JSON. Post a file, poll for the result, and get a consistent object with header fields and a line-item array, whatever supplier sent it and whether it is a native PDF, a scan, or a phone photo.

Simple REST Flow

Upload the file to the documents upload endpoint, start extraction with the extract endpoint, then fetch the result from the extraction endpoint by its hash. Standard JSON in, JSON out.

Token Authentication

Authenticate with a bearer token tied to your account. The extract permission is granted on the Pro plan, so you control which tokens can pull data.

Full Line-Item Array

The response includes each line item with SKU, description, quantity, unit of measure, unit price, and line total, not just the header, so your code can match and total at the row level.

Maps to Your Schema

Consistent field names across every supplier mean the JSON drops into your ERP, database, or accounting import without a cleanup pass per vendor.

The API is the programmatic version of the same engine behind the on-site tools. If you want the full field breakdown, purchase order line item extraction covers how the table is captured, and purchase order OCR vs AI extraction explains why field-by-meaning parsing beats templates. For a no-code path, the PO PDF to CSV converter and bulk purchase order processing produce files instead of JSON, and teams automating the whole flow use it to automate purchase order data entry end to end and import purchase orders into an ERP. Full request and response details are in the API documentation.

Why Choose PurchaseOrders?

  • No per-supplier templates to build or maintain
  • One flow handles native PDFs, scanned PDFs, and photos
  • Consistent JSON schema across every vendor layout
  • Per-document pricing that scales with call volume

Ways to Turn Purchase Order Documents into JSON

Three approaches to getting structured PO data into your application programmatically.

What you need Build it on raw OCR Generic OCR API PurchaseOrders API
Structured PO fields, not raw text You write the parser Partial, generic fields PO fields mapped for you
Handles any supplier layout Template per vendor Varies No templates needed
Full line-item table capture Hard to build Often header only Full line-item array
Reads scanned and photo POs Add your own OCR Sometimes Yes, built in
Setup and maintenance Ongoing engineering Some mapping work Call the endpoint
Cost model Your build plus infra Per call or seat Per document processed

PurchaseOrders is a data-extraction API: it returns structured PO data from documents you already have. It does not create, approve, match, or track purchase orders the way a procurement suite does; validation and matching stay in your system.

Extract a Purchase Order in 3 API Calls

Upload, extract, retrieve. Standard REST, JSON responses.

1

Upload the File

POST the PO file to the upload endpoint with your bearer token. Native PDFs, scanned PDFs, and images are all accepted in the same call.

Tip: Multi-page purchase orders are handled in one upload.

2

Start Extraction

Call the extract endpoint with the returned file reference. The AI reads the document and pulls the header fields and the full line-item table.

Tip: A token needs the extract permission, granted on the Pro plan.

3

Retrieve the JSON

Fetch the result by its hash and receive a consistent JSON object with PO number, supplier, dates, terms, and every line item, ready to write to your ERP or database.

Why Developers Use the PurchaseOrders API

3
REST calls to structured JSON
Any
Supplier format or layout
0
Templates to maintain

Security & Privacy

  • Bearer token authentication
  • Bank-grade TLS encryption in transit
  • Files auto-deleted after processing
  • US-based cloud infrastructure

Purchase Order API: Common Questions

Upload the PO file to the API, start extraction, then retrieve the result as JSON. With PurchaseOrders you POST the file to the upload endpoint, call the extract endpoint, and fetch the structured response by its hash. The JSON contains the PO number, supplier, dates, terms, and every line item, so your application can write it straight to an ERP or database.

The API returns a structured JSON object, not raw text. It includes header fields such as PO number, supplier, order and delivery dates, and payment terms, plus a line-item array where each row carries the SKU, description, quantity, unit of measure, unit price, and line total. Field names stay consistent across suppliers so the output maps to your schema.

Yes. The API runs AI-based OCR, so it reads native PDFs, scanned PDFs, and phone photos through the same endpoint. A scan or photo has no text layer, which is why a plain PDF text parser returns nothing on it, but the API reads the pixels and rebuilds the fields and line-item table.

Authentication uses a bearer token tied to your account, sent in the request header. The token needs the extract-data permission, which is available on the Pro plan. You can issue and manage tokens from your account, so you control which integrations are allowed to pull purchase order data.

Yes. Line-item capture is the core of the extraction. The response includes the full line-item table as an array, with quantity, unit price, and total per row, even when the table spans multiple pages or descriptions wrap. That row-level detail is what makes downstream matching and spend analysis accurate.

No. The API is an extraction service: it turns purchase order documents into structured JSON. It does not perform two-way or three-way matching, route approvals, or track PO status. Those steps stay in your ERP or procurement system, which consumes the clean data the API provides.