Tuesday, 16 April 2024

unstructured.io API

docker pull quay.io/unstructured-io/unstructured-api
20gb image. After docker-pull:
docker image inspect --format '{{json .}}' "fd79888e10ea" | jq -r '. | {Id: .Id, Digest: .Digest, RepoDigests: .RepoDigests, Labels: .Config.Labels}'
To get a fixed tag for the docker-compose:
  unstructured:
    image: quay.io/unstructured-io/unstructured-api@sha256:612d85e7a8d4816b1c71119a285238fc3bbb822f78cb00d8b47e32ef08c08031
    ports:
      - "8123:8000"

And use it
curl -X 'POST' \
  'http://localhost:8123/general/v0/general' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@example-docs/english-and-korean.png' \
  -F 'strategy=ocr_only' \
  -F 'ocr_languages=eng'  \
  -F 'ocr_languages=kor'  \
  | jq -C . | less -R
Image from here: https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/english-and-korean.png (you may want to use the example-docs dir.

No comments:

Post a Comment

Parse Wikipedia dump

""" This module processes Wikipedia dump files by extracting individual articles and parsing them into a structured format, ...