Her Is Here: The Power of Voice Memos and AI

Learn how to transform your spoken ideas into actionable tasks using Zapier, OpenAI, and Claude.

Jun 21, 2024

After shaky warm-up rounds with Siri, Alexa, and Google Assistant, voice looks to become the way we’ll interact with computers. ChatGPT is rolling out a real-time conversational interface, and you can brief Anthropic’s latest model reliably with voice memos — the focus of this article.

I’m going to show you how to set up an automated process that allows you to:

Drop a voice memo in a folder.
Have a powerful AI assistant analyze and execute it automatically.
See the outcome appear in your note-taking or todo app.

Configuring this workflow takes some effort, but it’s magical once you have it set up.

Why Use Voice Memos to Interact With AI?

ChatGPT’s voice options are good for short back and forts. Recording longer thoughts and assignments works better with voice memos.

When my thinking isn’t organized, a voice memo allows me to ramble a bit, leave silences, get interrupted, revisit an earlier thought, go on for as long as I want — and the AI will still make sense of it all.

For example, a thing I do all the time now is recording voice memos while driving. I spew thoughts for ten minutes, save the file in Dropbox when I’ve parked, and by the time I get to my computer it’s processed, polished, and ready to use.

You can do the same when you go for a walk, an idea strikes in the supermarket, or you’re just tired of typing.

The Problem With Using Voice Memos With AI

Before we get to the good stuff, let’s get the bad news out of the way: the process I’m going to show you is a hassle to set up, especially if you’re unfamiliar with Zapier and the API accounts of OpenAI and Anthropic.

You don’t need coding skills, and I’ll show you every step of the way, but you need to create accounts, get used to new interfaces, etc. — as I said, a hassle.

Is It Worth Your Time?

I always go back to the diagram below to check if improving or automating a process is worth the effort. It shows at which point you’re spending more time optimizing than saving on the actual work.

The two most extreme examples from this table:

If do a task 50 times a day (horizontal axis) and you can save 1 second each time (vertical axis) through optimization or automation, you can spend one day on it.
If you do a task once a year but the efficiency gain is an entire day, you can spend five days on it. 1

I record at least one voice memo daily. By not having to run the transcription manually, copy that into the AI, and process the output, I save at least five minutes every time. According to this chart, that means I can spend six days on this workflow!

Even if you only go through this process once a week, you can spend four hours on setting it up.

How to Record Effective Voice Memos for an AI

Record your voice memos as if you’re asking a friend or coworker to do something for you. Here are some pointers on how to do that:

Talk to the AI in the second person. It might feel a bit weird at first, but you can address the AI as “you.”
Include all essential information. Your voice memo is going straight into the AI, so you’ll not be able to add attachments, send a follow-up prompt, or reference information elsewhere. This means you might have to repeat basic information in many memos (e.g., “My name is X, I work as a Y at company Z.”)
More context is usually better. More context allows the AI to come up with better solutions. Don’t just tell it what you want, but why you want it. And explicitly give the AI permission to give feedback and suggestions.
Don’t forget your ask. Make sure to include instructions for what you want the AI to do with your voice memo. (E.g., summarize this, act as my coach, give feedback, add more ideas, prepare as an email for my coworker, etc.).

You don’t have to provide all these details in a structured way. For example, if you start your memo asking for a summary of what you’re going to say, but at the end decide you actually only want some feedback, you can just mention that and change your request. The AI will get that.

To give you an impression of the kinds of voice memos you can record, here’s a breakdown of some of my recent ones:

Put together a brief for a coworker to get her involved in a marketing campaign I’m working on.
Give pros and cons on a proposal I’m working on.
Structure a workflow I’ve been thinking about and give suggestions for further improvements.
Help prepare an interview with someone.
Act as a coach and give advice on how to train for an upcoming trail run race.
Get a second opinion on how I handled an awkward social situation.
Make an overview of a new marketing idea and outline pros and cons.

In all these cases, the meat of the voice memo is me outlining or thinking through the brief, the proposal, or the workflow. No attachments are involved; it’s just me talking through these things, with one or more specific asks for the AI on what to do with that information.

Creating the Workflow: 🎙️ → 🤖 → ✅

The backbone of this process is Zapier, an automation platform that connects apps and automates workflows without coding.

Here’s how it works:

🎙️ Record your voice memo.
📂 Save it in a designated cloud folder (e.g., Dropbox, Google Drive).
🤖 Transcribe the voice memo with OpenAI’s Whisper model.
🤖 Process the transcription with Claude Opus. [3]
✅ Receive the output as a task on your todo list (e.g., Notion, Asana).

Zapier is the glue between all these steps. It monitors the cloud folder, sends any new file you place there into the Whisper model for transcription, and so on.

Now for the tough part — the one-time setup.

1. Configuring OpenAI, Anthropic, and Zapier accounts

We’ll need a bunch of stuff for this to work:

An OpenAI API account with some 💰️ for access to their Whisper transcription model.
An Anthropic API account with some 💰️ for access to their Claude Opus API for processing the voice memo transcript.
A paid Zapier account and a Zap (Zapier’s term for a workflow).

To not waste too much space here on detailed walkthroughs of all these steps, I’ve created a separate page (with a new Perplexity feature) that shows you exactly how to configure each one:

👉️ OpenAI account setup and API key generation.
👉️ Anthropic account setup and API key generation.
👉️ Zapier account registration.

2. Setting up your Zap

Once you have those accounts set up, you need to create a Zap that executes the steps I outlined earlier:

Here’s a template of this Zap. Import it to your Zapier account2, and then configure the details of each step as I explain below.

1. Voice memo saved in Dropbox folder

This step monitors a designated Dropbox folder where you’ll drop the voice memo audio file. Here’s how to set this up:

Connect your Dropbox account in the “Account” tab in the right panel.
Select the Dropbox Space and folder where you’ll save your voice memos in the “Trigger” tab.
In that same tab, make sure that “Include file contents?” is set to “Yes.”
Save a voice memo in the folder you’ve selected and run a test in the “Test” tab to make sure everything works correctly.

2. OpenAI Whisper transcription

This step sends the voice memo file into OpenAI’s Whisper model for transcription:

Connect your OpenAI account in the “Account” tab. You’ll be asked to enter your API key which you should have created earlier (instructions here).
Enter the following settings in the “Action” tab:
1. File: Select the input field and an “Insert Data…” dropdown will show up. Select the option labeled “File”. (See screenshot.)
2. Prompt: Add this text:
  
  “Please transcribe this voice memo into English. Remove filler words. Highlight words you're unsure about in bold.”
3. Response Format: Choose “Text”.
4. Language of the Audio: Set it to “en”.
You can run a test again in the “Test” tab to see if everything works.

3. Send message in Anthropic (Claude)

This step sends the voice memo transcription into the Claude model to execute whatever you’ve asked it to do:

Connect your Anthropic account in the “Account” tab. You’ll be asked to enter your API key which you should have created earlier (instructions here).
In the “Action” tab, set the transcription of your voice memo as the user message. This is the output from the previous step, the transcription created by OpenAI’s Whisper model.
Change the “Show Advanced Options” field to “True”.
New options appear when advanced options are turned on. Scroll down to select “Claude 3 Opus” as the model.
Run a test in the “Test” tab to see if everything works.

4. Create Database Item in Notion

The last step is to create a task with the AI’s output in Notion:

Connect to your Notion account in the “Account” tab.
Select the Notion database in which you want to create your item.
Use the File Name (of the original audio file) from Step 1 as the title for your task.
Insert the “Request Context Text” from the previous step (Claude’s output) into the Content of your task.
Optionally, you can also include a copy of your original voice memo transcription by inserting the output from step 2 into the Content field of your task (see screenshot above).
Test this final step to make sure the output indeed shows up in Notion.

Hit “Publish” if everything works correctly. You can now drop voice memos in your folder and their AI-processed results appear in Notion a few minutes later.

🕺🕺🕺🕺🕺🕺🕺

Customizing your Zap

You can customize most parts of this workflow:

Change Dropbox for another cloud storage. Zapier also supports Google Drive, OneDrive, and Box, so you can use those for step 1 instead.
Change Claude’s model. Claude 3 has three different models: Haiku, Sonnet, and Opus. Haiku is the fastest and cheapest, and Opus is the most intelligent and expensive. Even with Opus, I find the costs for processing these voice memos negligible, so I’ve set that as the default model in this Zap.

Some folks say Sonnet is more creative, so if you want to try changing the model to see what results you get, you can do so in step 3 (“Send Message in Anthropic”) of the Zap in the Model field of the Advanced Options section.
Change Notion for another task manager. In step 4, you can swap Notion for many of the other task managers Zapier supports, like Todoist, Things, TickTick, Any.do, Asana, Trello, or ClickUp.

The Voice Memo Workflow as Inspiration for Other Processes

This was a pretty intense process. But you now have a magical workflow to get things done with just your voice and likely lots of other ideas for how you can use AI and Zapier.

Zapier has been around for a while (2011), but it has become exponentially more powerful through the integration of AI models. You can route data from over 5,000+ apps into a model, sprinkle some AI magic on it, and send the output wherever you want to use it.

So even though this might seem daunting at first, I recommend you play around with the voice memo workflow, even if it’s just to get more familiar with Zapier. I’m happy to help and answer any questions you might have; just hit Reply!

Until next time.

Tim

There’s one caveat. The data in the table assumes a five-year period to reach a positive RoA (“Return on Automation”). For example, if you spend five days on an annual task to save yourself one day, the assumption is that you reach "break-even" after five years.

For this use case, that “pay-off” horizon might be too long. The probability is high that OpenAI and Anthropic will add the ability to upload large audio files straight into their models within the next 12 to 24 months.

For an accurate reflection of whether this voice memo workflow creation is worth our time, we should adjust the timeline to a 12-month payoff period:

Instructions on how to import Zap templates.

John

Fantastic article Tim,

My mind is spinning a thousand miles an hour thinking about the possibilities for this. Some practical questions though:

You mention feedback on proposals. How can I get feedback on a proposal that I wrote if I cannot upload an attachment with the actual proposal?

To set this up, do I first need to have paid subscriptions on several platforms such as Zapier? Or are there free plans on all needed platforms?

We Eat Robots

Discussion about this post