Idea Factory

Published 21. March 2023 by Kevin Nørby Andersen - LINK TO THIS POST

Ideas Generative AITools

I’m curious how we get AI tools to work for us while we’re not working.
So I built a factory to work on my ideas, when I’m not working.

Link: idea factory prototype

walkthrough of what the idea factory looks like
The factory takes in a prompt and keeps generating images every night

What do you (want to) spend your time on?

While I get tremendous inspiration and productivity out of tools like (Chat)GPT, Stable Diffusion, etc., the limitation is I constantly have to breathe life into them, and along with it, spend my entire attention.

Just like we are spending time on our phones that we used to spend on other things (do we even remember anymore?), we now spend time with AI tools, that just a year or two ago didn’t exist. Is that the best use of our time? Besides, it feels reductive that we aren’t taking advantage of the fact these tools can work while we don’t.

Asynchronous AI

But what if we could have it all, and get AI to work when we’re not working? The inspiration for the prototype is this concept of ‘asynchronous AI’ - using generative AI tools, without taking too much time and attention out of our day.

This first iteration is really just an image factory, since it’s limited to text-to-image generation, but I’m hoping to extend the factory over time.

diagram of how idea factory works
The factory takes in a prompt and keeps generating images every night

How it works

Idea Factory consists of two parts: a bot, for image generation and uploading, and a website, for displaying projects, prompts, and images.

Between the two, a shared database and cloud storage is used to store and retrieve project data and images.

In the database, there are three collections: projects, images and prompts. On the website, they’re displayed in a messaging-style interface.

screenshot of an idea factory project
Prompts and images are laid out in a messaging-style interface.

Image Bot

Image bot is a Node.js script that runs on a schedule. Between 6pm and midnight, it is triggered every 30 minutes. Every time it is triggered, it scans the project database. For each active project, it generates an image based on the latest project prompt. It uses OpenAI’s DALL-E-3 image generation model.

Website

Factory website is a SvelteKit app that rebuilds every day at 7am. It fetches the latest images from the shared cloud storage and displays them in a grid. For each project, images are displayed in an Instagram-style carousel, grouped by day.

Not quite there yet, but it’s a start

My experience using the prototype has been mixed. In one way, it was great to wake up every morning to new work. In a way, it felt like coming into a morning meeting with a team and having work presented. It felt like I had less to lose, only needing to load up a website and having the work presented to me.

The quality of the images vary a lot. Some are useful and inspiring, a lot feel repetitive, and some even feel ridiculous.

screenshot of an idea factory project
example images generated for the Sonic Jewelry project

One of the highlights was for the Sonic Jewelry project. The project is around a piece of jewelry that uses machine learning to listen to your environment, records and maybe reacts to certain words like “hi” or “thanks”. In some way, becoming an auditory diary (will write more about this project later). My own sketches have been simple small devices that would hang from a regular earring, but the images above inspired me to think about other ways that the device could sit in the ear, or maybe even have some sort of frame around it.

But overall, I was hoping for more from this prototype. Things that felt limiting:

  • No “inpainting”, meaning it is not possible to paint over the generated images to make specific edits like “change that shape”
  • The DALL-E-3 API doesn’t return the unique ID of the generated image to be referenced later on. E.g. “Generate an image like ID but in the style of line art”
  • The API does not take an image as input, so all prompts have to be written

Some of these limitations might be overcome by switching to other models like Stable Diffusion or Midjourney, while others would need to be built.

Future Work

I think a tool like this could become powerful if there were better ways to interact with the image generation and more clever ways of using text generation to improve the prompts. Here are some ideas

Images as moodboards

It would be great to have the ability to render multiple images on a larger canvas in the style of moodboards. Think Pinterest boards.

Synthetic prompts

It should be possible to chain the image generation with text generation. For example, “a piece of electronic jewelry” as a prompt could be deconstructed into nouns and verbs and extended with different adjectives that would then turn into new prompts.

Nudging

Enable the ability to select image and mark them “more like this”/”less like that”, and use that as input for the next round of image generation

Image-to-image

Rather than prompting using text, using image input. This is a technique already available in many image generation models

Seed ID

Every image should have a unique ID that can be used to reference it in the future. Example: “Remember 9808458973? Generate something like that, but with a nice sunset behind it”

Live painting

Use something like TLDraw as an editor inside the factory interface, to enable painting, instead of having to manipulate images outside.


If you liked this post, please consider sharing it. Here is the link.

super ultra - cvr 34937575
all rights reserved 2024©