Shobhit Sharma
Posted on:December 6, 2023 at 07:29 AM

Simgen SSG: A minimal vector similarity generator for your static sites

I recently came across this article by Simon Willison. In it, he talks about how he uses OpenAI’s embeddings to overhaul the related content section for his TIL blog. I was intrigued by the idea and wanted to make a pluggable version of the same.

I have a few static sites that I maintain, and I wanted to add a related content section to them. I wanted something minimal which can be easily integrated with my existing static site generators without an external dependency on the OpenAI API. So, I thought I’d give qdrant a try. In particular, qdrant + Fastembed for generating vector embeddings on the fly.

Simgen SSG

Without further ado, I present to you Simgen SSG. It’s a simple vector similarity generator for your static sites. It’s a command-line tool that takes a directory of markdown, text, files and manages their vector embeddings. It provides the embeddings via a simple HTTP API which you can query during the build process to retrieve the related content.

How it works

Under the hood, it uses Fastembed to generate vector embeddings for the chunks in your site. It then uses Qdrant to store these embeddings and retrieve similar embeddings for a given embedding.

It parses your content files and generates embeddings for each chunk in the file. It then stores these embeddings in Qdrant. It also provides an HTTP API to retrieve similar embeddings for a given embedding.

Usage

I use astro to build my static sites. So, I’ll be using it as an example. But, you can use any static site generator of your choice.

You can query the HTTP API to retrieve similar embeddings for a given embedding.

const rec = await fetch(
  `http://simgen:8000/recommend/?id=${collectionName}/${contentId}&collection=${collectionName}`
);

data = await Promise.all(
  data.map(async (instance: { collection: string }) => {
    const collection = await getCollection(instance.collection);
    return collection.find(
      post_instance =>
        `${instance.collection}/${post_instance.id}` === instance.file_path
    );
  })
);

Who is it for?

  • Bloggers: Offer personalized ‘you might also like’ sections for blog posts, keeping readers engaged and on your site longer.
  • Knowledge Base Enhancement: Implement ‘similar articles’ suggestions in FAQs or documentation pages, aiding users in finding relevant information quickly.
  • Content Publishing Platforms: Enhance content discovery by suggesting related articles or posts, keeping users engaged and extending their browsing time.
  • Portfolio Websites: Showcase similar projects or works to visitors, allowing them to explore related content based on their interests.
  • Educational Websites: Offer ‘you might also like’ sections for courses or resources, aiding learners in discovering supplementary material.
  • Community Forums: Facilitate the discovery of related threads or discussions, encouraging deeper engagement among users.
  • Event or Conference Websites: Recommend similar events or sessions based on attendee preferences, optimizing their experience and participation.
  • Recipe or Lifestyle Blogs: Suggest similar recipes or articles catering to specific user tastes or interests.
  • Job Boards: Offer job recommendations based on user profiles or previous job searches, enhancing user experience and increasing the likelihood of finding suitable positions.
  • Travel Websites: Recommend similar destinations, accommodations, or activities, making trip planning more personalized and engaging.

Installation and usage

Check out the Installation guide to get the system ready with simgen-ssg. Once you have the system ready, you can check out the Usage guide to get started with simgen-ssg.

The GitHub Actions workflow

As GitHub doesn’t allow services to mount the repository as a volume, I had to work around that and had to use docker to build and deploy the site. I used the following GitHub Actions workflow to build and deploy the site.

- name: Prepare the environment.
  env:
    VERCEL_TOKEN: ${{ secrets.VERCEL_TOKEN }}
    VERCEL_PROJECT_ID: ${{ secrets.VERCEL_PROJECT_ID }}
    VERCEL_ORG_ID: ${{ secrets.VERCEL_ORG_ID }}
  run: |
    docker-compose -f docker-compose-prod.yml pull
    docker-compose -f docker-compose-prod.yml build
- name: Build code and deploy to vercel
  env:
    VERCEL_TOKEN: ${{ secrets.VERCEL_TOKEN }}
    VERCEL_PROJECT_ID: ${{ secrets.VERCEL_PROJECT_ID }}
    VERCEL_ORG_ID: ${{ secrets.VERCEL_ORG_ID }}
  run: |
    docker-compose -f docker-compose-prod.yml run web

The docker-compose-prod.yml invokes the entrypoint.sh file which essentially runs the following commands:

vercel pull --yes --environment=preview --token=$VERCEL_TOKEN && vercel build --prod --token=$VERCEL_TOKEN && vercel deploy --prod --prebuilt --token=$VERCEL_TOKEN

Thanks for reading! Do check out the repository and let me know what you think about it.