Image OCR for Text Extraction using LLMs

Usage

llm_image_extract_text(
  llm_model = "qwen2.5vl",
  image = system.file("img/text_img.jpg", package = "kuzco"),
  backend = "ellmer",
  additional_prompt = "",
  provider = "ollama",
  ...
)

Arguments

llm_model: a local LLM model pulled from ollama
image: a local image path that has a jpeg, jpg, or png
backend: either 'ellmer' or 'ollamar', note that 'ollamar' suggests structured outputs while 'ellmer' enforces structured outputs
additional_prompt: text to append to the image prompt
provider: for backend = 'ollamar', provider is ignored. for backend = 'ellmer', provider refers to the ellmer::chat_* providers and can be used to switch from "ollama" to other providers such as "perplexity"
...: a pass through for other generate args and model args like temperature. set the temperature to 0 for more deterministic output

Value

a df with text and a confidence score