

I’m just presenting that as a “is this what you mean”. If it is, then perhaps a FOSS or self hostable version fists or the community might be interested in one existing.
I’m just presenting that as a “is this what you mean”. If it is, then perhaps a FOSS or self hostable version fists or the community might be interested in one existing.
It’s not self hostable, but you mean something like this? https://calendarbudget.com/
I played a bit with the basic concept of identifying and categorizing merchants by importing a transaction csv into google sheets and writing a custom function that called the OpenAI API, basically just passing the raw merchant string along with “What category of business is this?”. It did well, the next step would have been to add a step that compared to a predefined list of possible categories. I didn’t compare any models or other platforms though. This was last year so I might play with it again.
I found this which is overkill for personal use but does a good job of laying out this sort of application: https://midday.ai/updates/automatic-reconciliation-engine/
“Instead of just comparing text strings, we use 768-dimensional vector embeddings to capture the semantic meaning of transactions and receipts.
// Generate embeddings for transaction data
const transactionText = prepareTransactionText({
name: transaction.name,
counterpartyName: transaction.counterpartyName,
merchantName: transaction.merchantName,
description: transaction.description
});
const embedding = await generateEmbeddings([transactionText]);
These embeddings allow our system to understand that “AMZN MKTP” and “Amazon Marketplace Purchase” refer to the same thing, even though the text strings are completely different. The system learns patterns like:
You’re missing the point, that would require sitting down and manually doing that for every conceivable payee. Walmart is just an example. The value of any sort of “intelligent” component would be for this to happen automatically and seamlessly for the user. Hell, the AI layer could just be “write regex for al the possible similar payees across these documents”.
Yep, that’s exactly the sort of thing I’m thinking about here. And it doesn’t even need to be full on chat style LLM, just some decent NLP that can recognize WALMART, WAL-MART, or WMART are all the same thing and label it.
But for some reason this question brings out all the assumption people who want to give financial advice or talk about the AI image the saw last year with 6 fingers.
I just started using them and I like it. It’s a good balance of easy and secure for me. I just added the container to my stack and then use their UI to point a subdomain at the internal port. Security can go pretty extreme if you set up their whole zero trust thing.
An alternative similar option is Pangolin. I’ve seen a lot of people like it to avoid Cloudflare, but I haven’t used it myself. There still has to be an endpoint running it, so you’ll need an external VPS, which then adds a cost to the equation but at least you control it.