A GenAI Tale of a Burr'iful State Based Receipt Scanner
A GenAI tale of a burr’iful state based receipt scanner
In this 2 part post, i will go through the steps for creating a stateful receipt scanner using gemini multimodal capbilities to extract transaction information and burr to manage the processes involved.
The major Objectives are
- Utilize GenAI for data extraction from scanned receipts.
- Create a Jupyter notebook with the entire process.
- Create a burr application to handle the entire process.
This post will also touch on the following python libraries
pydantic
(for structured JSON extraction)instructor
(for translating pydantic to llm request)burr
(for state machine modeling and simulation)google.generativeai
(LLM for generative AI)PIL
(for processing images)rich
(for pretty printing)
Part 1 - Scanning Receipts
Why scan receipts with GenAI?
I spend my morning walking to a nearby grocery store to pick up locally sourced ingredients to make healthy home cooked meals. This started off as a way to distract myself from my laptop but eventually became sort of a lifestyle.
However, i noticed that my grocery bill gradually become more expensive due to subtle changes to various items. I decided to build a spreadsheet with every receipt i got.
I don’t love data entry, who does? so obviously i ended up with a pile of papers which was unscalable for a laptop guy with no time.
Besides, papers?. Where tech is going we don’t need papers.
GenAI Solution, But first…
I struggle with the LLM framework offerings. They are plagued with way too much abstractions which take away control and introspection from it’s users. This makes it difficult to dig deep into what exactly the framework is doing with all those sweet tokens
. There are tools you can use but they are backed by subscriptions et al.
Let’s take a simple receipt scanner for managing my daily transactions. I chose to use a library that gives me control, structure, introspection and a simple UI, especially in its application to machine learning.
Introducing burr
, a library which brings functional programming to state management and manages to slide in a bit of visualization and state storage abstraction.
I will talk more about burr much later but For now, watch this walkthrough to learn more.
Receipt scanning as a series of states
Before diving into the main code the receipt scanners states need to be identified. I identified the following basic states of the scanner.
- Fetch Receipt Image
- Extract Receipt Information in JSON
- Validate Extracted JSON
- Step missing
- High 5!
Ok, so step 5 is not really a step but when did High 5’s stop being cool.
Nonetheles, these steps give a starting point for the functions we need to build our scanner so lets dive into each step.
But first a little intermission to install dependencies
|
|
|
|
Step 1 | Fetch Receipt Image
The expressexpense receipt dataset is a good source of various types of scanned receipt images that we can play with.
Lets go get it…
|
|
|
|
Step 2 | Extract Receipt Information in JSON Format
Lets setup the google generativeai library with the API KEY obtained from google AI studio.
An API KEY
is required To use gemini 1.5 and you can get one from ai.google.dev.
|
|
The phrase "I'm bringing sexy back" is a lyric from the song **"SexyBack"** by **Justin Timberlake**. The song was released in 2006 and became a massive hit, making the phrase popular and synonymous with Timberlake himself. There's no particular place where "sexy" went, as it's a concept that evolves with fashion, trends, and cultural shifts. The phrase was meant to be playful and ironic, suggesting that Timberlake was bringing back a certain style of sex appeal.
Ok google, sheesh 🙄
That was a bit wordy, let’s try to constrain gemini’s response so as to save some tokens.
|
|
Justin Timberlake said "I'm bringing sexy back." Sexy never left.
GenAI’s tend to be very talktaive since it’s very lonely in those GPU farms hence the note to simmahdownnow
.
Now i am going to work with the receipt images. The following receipt seems crumpled enough to use.
|
|
Now we prompt…
|
|
El Valle Mexican Restaurant 305 W. John St. Matthews, NC 28105 (704) 845-1417 Server: GEDVANNY Station: 10 Order #: 403593 Table: 9 Dine In Guests: 1 1 GUACAMOLE DIP 3.99 1 DIET COKE 2.15 1 LUNCH FAJITAS 7.75 1 TORTILLAS (3) 1.25 1 SOUR CREAM 1.25 SUB TOTAL: 16.39 Tax 1: 1.36 TOTAL: $17.75 >> Ticket #: 7 << 12/2/2017 11:48:28 AM THANK YOU!
Gemini 1.5 seems to have a good handle on images. However, this is not really helpful if the plan is to use this receipt data for something actionable.
The response needs to be formatted to JSON This can be achieved with the wonderful instructor model which translates pydantic models to gemini functions. A schema will be defined with enough description of the data that will be extracted from the receipt image.
|
|
The instructor library works by monkey patching
whatever llm is being used. It enhances the original prompt with a system message that instructs
the llm to generate a valid json based on a schema. It further adds a tool to the llm with the specification for the schema.
|
|
ReceiptDetail( currency='$', business_name='El Valle Mexican Restaurant', created=datetime.datetime(2017, 12, 22, 11, 48, 28), address=['305 W. John St.', 'Matthews, NC 28105', '(704) 845-1417'], phone='(704) 845-1417', teller='GEDVANNY', order_no='403593', invoice_items=[ InvoiceItem(name='GUACAMOLE DIP', quantity=1, price=3.99, category='Appetizer'), InvoiceItem(name='DIET COKE', quantity=1, price=2.15, category='Drink'), InvoiceItem(name='LUNCH FAJITAS', quantity=1, price=7.75, category='Entree'), InvoiceItem(name='TORTILLAS (3)', quantity=1, price=1.25, category='Side'), InvoiceItem(name='SOUR CREAM', quantity=1, price=1.25, category='Side') ], subtotal=16.39, taxes=[1.36], total=17.75 )
Now that looks caliente 🔥. It accurratel picked out all the relevant information that was defined in the schema.
Step 3 | Validate Extracted JSON
This is more of a note than a step.
Pydantic handles validation using the json schema defined by the model, however, thanks to the word Generative in GenAI, LLM’s tend to create or hallucinate content that may pass validation.
For this reason, you may need to have some other way of validating that the generated receipt details are correct.
Step 4 | You know where this is going…
👇
Step 5 | High 5
Congrats, you sat through the whole post and probalbly learnt something. High 5’s for you.
Next steps
Get the colab book
Part 2
In part 2, i will take this simple receipt scanner and turn it into a state aware pipeline for handling receipt scanning requests using burr
.
Stay tuned.
Let’s connect
Getting in touch with me every where you see my name.
x.com/osiloke | linkedin.com/in/osiloke | github.com/osiloke | facebook.com/osiloke | instagram.com/osiloke