Generative AI is a technology that generates something based on your input. For simplicity sakes, let’s split them into two types for now: image generative AI (it generates images) and text generative AI (it generates text).
An image AI is an AI that generates images based on text input. You give it a string of text, called a prompt, and it generates a few images based on that. The ones we will talk about generate 4 images with the option to upscale any or all of them if you like the results, giving you a higher resolution image.
We will focus most of our efforts on Midjourney, since for now it’s simply the better AI that you can use for free (trial basis). Here’s a quick comparison between the three tools so you can understand why. We will use the same prompt: one for photorealistic images, one for art styles, one for concepts.
You might say, well it doesn’t look like that big of a difference. Mainly because I have not used any parameters, and we’ll go over them in a second and how those make the picture better and closer aligned to what you are looking for.
One thing that you need to understand is that these tools are close to limitless in their ability to create any type of visual art. If you want to see breathtaking examples, check out our Instagram feed where we post our own experiments and community examples, with the prompt so you can replicate the results. Or the multitude of examples you might already have found online (remember the Pope wearing a puffy jacket?).
Now let’s talk about Midjourney.
How do I use it?
Midjourney is the trickiest to use out of the three. While StabbleDiffusion and Dall-E have easy to use web interfaces, Midjourney works on Discord alone. There are a few reasons for this, and while it does seem complicated at first, it is by far the best way to unlock your imagination.
What do I mean by that? Well, if you can find interviews with the Midjourney team, and there are very few, it’s because their mission statement is quite different from other companies. Their sole purpose is to simply help people be more creative. And this is where Discord comes into play.
Discord is a chat room, where everyone can talk with everyone, but when using Midjourney, everyone talks to the Midjourney bot. It looks like this:
In one interview, the creator of Midjourney said:
Here’s a machine, it will picture anything you want, anything you can imagine, what do you want? And they go “dog”. And it will show them a photo of a dog. And they go “ok”. “Come one, a little bit more than that” “Big dog. Big fluffy god” and the end of it they are so uninterested and think “why would I care about this?”. But then you throw these people in the same environment, with complete strangers and someone else goes “space dog” “space dog with lasers” “space dog with angel wings” and all of a sudden their mind goes “OH MY GOD” and they’ve been put into an imaginative environment that changes their beliefs about themselves and what they can do, and all of a sudden it’s creative an imaginative environment that makes people more imaginative.Check out this video for a more in depth look: https://www.youtube.com/watch?v=dp9ViOIG9j0
And I can attest to this fully. When I started using it, my mind was blank. I had no idea what I wanted to generate, I was thinking “I’ll never have any crazy ideas, this is not my personality – to come up with crazy cool ideas that the world will love”. But then you spent a few minutes looking at what other people are generating, and you get going. Here’s the first picture I made and one of the latest:
“Oh, I didn’t know it can do that!”
Yes, it is a paradigm shift. Because your perception goes from “I don’t have any ideas” to “how do I get it to do exactly this by using the right words”. I’ll talk more in another article about what this means in terms personal growth, skillsets that people are developing today using image AIs and how this benefits you in your professional life, whether that is as a business owner or part of a company.
Let’s start with the beginning – how to use it. First, you go to the Midourney website, it will redirect you to Discord and once you link your Discord account, you’re good to go. You get a trial account and paid accounts. The trial account sometimes goes on and off because of high workloads, when it’s available it gives you 25 different image generations.
There are currently 6 versions of Midjourney, v1 to v5.1. If you use a free account, you’ll join 3 random channels where people generate what they want. If you type /settings you get this:
Some versions are not available for trial accounts, we’ll assume you have access to all of them for the sake of this article.
What are the versions? Do I need to think about them?
Version 1 to Version 3 are the first versions that are more rugged. Here is a sample image that shows you how it evolved (in around 12 months).
Version 4 is the one that is more opinionated, meaning it carries a more artistic style. If you are interested in replicating art images, this is the one to use.
Version 5 and 5.1 are the ones that make photography much more realistic. Version 4 had a few problems with generating hands, generating teeth and some other unrealistic human representations. Version 5 and 5.1 are much better at this, to the point that there are no, to very few mistakes.
--v 1 use for an abstract painterly vibe without coherency, good for breaking archetypes
--v 2 use for a painterly abstract look with minor coherency, good for breaking archetypes
--v 3 use for a painterly abstract look with some coherency, good for breaking archetypes
--v 4 v4 will strive to deliver your prompt with good coherency, but may be weak with photographic quality
--v 5 gives the most power to your prompt, good for power users
--test use for digitally painted art style with excellent coherency
--testp use for a photographic style with very strict coherency (does not like to break archetypes)
--niji use for any style, just like
--v 4, but with an anime/manga data set in play
Prompting – do I need to become a prompt engineer?
Surely you’ve heard of new jobs that pay $300k-$500k for prompt engineers. While that is great, I’ll cover this topic in another article to explain how technology evolves and the skillsets that are needed associated with it. The short answer is no – by the time this technology becomes mainstream, the UI/UX will be so polished that it will be dumbed down. You won’t need to be anything to use it.
The nuanced answer is – if you want to take advantage of the technology today, you might need to remember a parameter or two. This is where Discord helps, since you can see what others are doing and just reuse some of that. You can learn from thousands of people every day. But here’s a short intro to help you get started.
This is what a Midjourney prompt looks like:
cheetah natgeo professional realistic photo, wild world, dynamic scene, motion blur, low angle, surrealistic colors --ar 3:2 --stylize 1000 --upbeta --q 2 --v 5 and it will deliver something like this
The U1 to U4 will allow you to Upscale the image, meaning a larger image of the one you like. In this example, I upscaled the 3rd picture. This one:
V1 to V4 will ask the AI to make variations of that particular picture. I’ll do the first image and select V1 and see what we get.
If you like in-depth guides, the “prompt-faqs” channel on the Midjourney discord that everyone has access to will have a long list of instructions.
Main rule is because Midjourney employs a bit of NLP (Natural Language Processing), which means your prompts will perform better if you follow these guidelines:
Write in sentence fragments. Avoid word list prompts: sleeping, dog, green, forest, 2d, illustration. Use grammatical prompts: A green dog sleeps in the forest, 2d illustration.
This is the official recommendations, I found that version 4 responded quite well to word lists. But this just shows that the more the technology advances, the less you’ll have to learn prompts since it will be a big overlap with just normal grammatical sentences.
I highly recommend going to the official website to read what they are, but I can explain some of the most common and easiest to use for anyone starting with them.
These parameters effect ONLY the initial
– Aspect Ratio
Default aspect ratio is usually a square, so 1:1. Version 4 allows anything from 1:2 to 2:1. This means you can still do 4:3 or 3:4 or 16:9. Version 5 allows any aspect ratio in theory, in practice there are some limitations. But I’ve seen people generate really long infographics that are 1:60.
By using the same seed, you’ll get similar images – in theory, you can find a character you want and generate them into different settings.
Version 4 has 4 styles, low (–stylize 50), med (–stylize 100) which is the default, high (–stylize 250), very high (–stylize 750).
Version 5 goes to 6 levels, 0, 50, 100 (default), 250, 750, 1000.
This changes the number of details the AI will add to the image it is generating. If you want something simple like a logo, best to go lower on the stylize number.
Accepts values of .25, .5 or 1. Higher quality means the image will take longer to be generated, producing more details. The paid accounts pay for GPU processing time so you might think – I always want more details – the trade off is you’ll get fewer images in total that you can generate.
This will generate images that can be put into a tile to create patterns for fabrics, ceramics, posters and so on.
Anime style images.