Image OCR Analysis


In the name of Allah, the Most Gracious, the Most Merciful

Analyzing product images using Node.js, Google Gemini, and Puppeteer.

I recently discovered that Google Gemini’s API interface supports multimedia inputs (images, videos, etc.). I used the Puppeteer library to take screenshots of a Panda product catalog, then uploaded the image to Gemini and retrieved the results for product names, prices, and categories in JSON format.

There are many potential improvements to make this application more practical:

  • For example, the code can be enhanced to better read product names by processing images to clarify text using libraries like Sharp.
  • Screenshots can be taken of all pages in the catalog (currently only takes the first page) and the outputs can be stored in a Google Sheets document or in an AWS S3 Bucket for later AI data processing.
  • It can be hosted in a serverless environment like AWS Lambda or Cloudflare Workers and run automatically and periodically as needed. Overall, there are many possible improvements.

What are the applications of this idea?

  • Monitoring product prices and getting automated alerts.
  • Price comparison between companies.
  • Market research analysis.
  • And more…