This is the second part of the mini series on writing Telegram bots to read my gas meter using Claude.
This post belongs to a mini series (I’m aiming for three posts in a row), where I try prompting Claude with different levels of verbosity and compare the results:
- In Claude writing a Telegram Bot, 1st try I’m using an elaborate, step-by-step prompting strategy with Haiku model
- In Claude writing a Telegram Bot, 2nd try I aim for a detailed one-shot prompt
- In Claude writing a Telegram Bot, 3rd try I try a rather short one-shot prompt
- In Other LLMs writing a Telegram Bot I quickly try DeepSeek and some more LLMs
Like in the first round, I will first show the full dialog I’ve used to let it generate the implementation. Then following up with a discussion of the code and some final thoughts.
This time, I aim to provide a more concise approach by offering a single prompt that encompasses all the necessary steps to achieve our goal.
Create a Telegram chatbot in JavaScript. The bot receives an image that is sent to it, shall then scale it to have 1000 pixels or less on the longer edge. Then upload the image to OpenAPI gpt-4o model along with the prompt “Wie lautet der Zählerstand auf dem Foto des Gaszählers. Die Zahl hat fünf Ziffern vor dem Komma und drei Nachkommastellen. Gib nur die Zahl aus, keinen Text davor oder danach”. Take the resulting number, store it to a database table named gas_meter, along with the current timestamp. Furthermore return the extracted number within the telegram communication, so the chat contact has some feedback.
I’ll help you create a Telegram chatbot in JavaScript for processing gas meter images. I’ll break this down into key components.
This implementation requires:
- Telegram bot token
- OpenAI API key
- PostgreSQL database connection
- npm packages: telegraf, axios, sharp, pg, openai
Database setup:
CREATE TABLE gas_meter ( id SERIAL PRIMARY KEY, reading NUMERIC(8,3), timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
Recommended environment variables:
- TELEGRAM_BOT_TOKEN
- OPENAI_API_KEY
- DATABASE_URL
Would you like me to elaborate on any specific part of the implementation?
Looking at the code, I immediately noticed, that I forgot to mention that I want support for .env
files and also that it must expect the LLM to reply with German number formatting (using a comma to separate the decimal places), … quite like I had with the first try.
So let’s fix this:
consider that the reading, as returned by the LLM, might use a comma as decimal separator (due to the German prompt). Also read environment variables from an .env file, if it exists
Key changes:
- Added
dotenv
to load.env
file- Modified
extractMeterReading()
to replace comma with dot- Converted reading to
parseFloat()
for correct number parsingRequires npm install dotenv in addition to previous dependencies.
Running it for the first time I noticed it’s calling the LLM wrongly
the code causes the following openai error:
Error processing image: BadRequestError: 400 Invalid value: ‘image’. Supported values are: ’text’, ‘image_url’, ‘input_audio’, ‘refusal’, ‘audio’, and ‘file’.
figure out what’s wrong please
Key changes:
- Use
image_url
instead of image- Write image to temporary file
- Convert image to base64 using file-based method
- Ensure cleanup of temporary file
This approach resolves the OpenAI API image submission issue by using the correct
image_url
format.
I’ve uploaded the generated JavaScript code to GitHub.
Code Review
Compared to the results of the first iteration I like that this time it’s using axios
for all http requests. Be it downloading from Telegram image storage, be it the call to the LLM.
When scaling the image, I think it’s interesting how it calculates the target image size (considering the aspect ratio) manually this time, where it simply delegated that stuff to sharp
with the first implementation.
What I really find weird is the way how it’s converting the ArrayBuffer
from sharp
to a base64-encoded Buffer
, which it’s passing on to the LLM. To do so it first writes the ArrayBuffer
to disk, just to re-read it immediately after, like this:
const tempFilePath = `/tmp/gas_meter_${Date.now()}.jpg`;
// Write buffer to temporary file
fs.writeFileSync(tempFilePath, imageBuffer);
// Read file and convert to base64
const fileBuffer = fs.readFileSync(tempFilePath);
const base64Image = fileBuffer.toString('base64');
… would this be an actual code review situation, I very likely would address this. Especially since the guessable filename might pose a security risk.
Instead this should just do the trick in one go:
const base64Image = Buffer.from(imageBuffer).toString('base64');
Furthermore, the way it’s (not) handling errors from the LLM invokation, it would not remove the temporary file, if the LLM invocation fails.
As mentioned earlier, the above code was produced by Claude 3.5 Haiku. I’ve later on tried the same prompt with Sonnet, and that just directly converted the ArrayBuffer
as shown by me.
Final thoughts
All in all another nice solution to the problem, that just works fine :-)
This demonstrates that it’s not always essential to be overly wordy when giving guidance, unlike I initially did. However, breaking it down into individual steps, also aids personal understanding. Moreover, since I haven’t created a Telegram bot before and it’s been a while since I’ve scaled an image using Node.js, examining code samples in detail has helped me gain insights.
Observing the variations in how the sharp
library is used with the .resize(...)
function, it brings to mind the differing approaches that two human developers might take when writing code. Similar to Claude’s example, it is possible for two individuals to reach different conclusions, with one opting to manually calculate the aspect ratio and the other choosing to utilize the library’s built-in features.