Inspired by the images of the universe recently released by NASA, the first prompt I entered into the Midjourney research lab’s artificial intelligence (AI) tool was “a spaceship surrounded by galaxies”. The result, pictured below, was an image of a ship floating in space that appears to reflect the cosmos around it – pretty much faithful to the prompt.
For Midjourney founder David Holz, an important aspect of Generative AI is its “ability to merge with language,” whereby we “can use language as a tool to create things.” Simply put, generative AI uses commands from the user to create novel images based on the dataset it has learned from various sources over time. The rise of the text-to-image generation has also raised philosophical questions about the definition of an “artist”.
British mathematician Marcus du Sautoy argues in his book The Creativity Code (Art and Innovation in the Age of AI), 2019: “Art is ultimately an expression of human free will, and until computers have their own version of it, art will be made of a Computer will always be traced back to a human creative need.” He says that if we created a “ghost” in a machine, it might give a glimpse into its thoughts. “But we’re still a long way from creating conscious code,” concludes du Sautoy.
Similarly, Holz notes, “It’s important that we don’t think of this as an AI ‘artist.’ We see it more like using AI to expand our imaginations. It’s not necessarily about art, it’s about imagination. We ask, “What if?” In a way, AI increases the power of our imaginations.”
Midjourney allows its users to enter their prompts on its Discord server, and then generates four images that resemble the text. The user can choose to explore more variations and upscale the perfect fit to a higher quality image. The bot entered open beta last month, giving users a set number of free trials to bring their imaginations to life. The generated images can also be embossed into NFTs, for which Midjourney until recently required royalty payments.
“It’s a massive community of almost a million people all taking pictures together, dreaming and ripping each other off. All the prompts are public and everyone can see each other’s pictures…that’s pretty unique,” says Holz indianexpress.com.
Holz co-founded Leap Motion, a hand-tracking motion capture user interface company, in 2010 and was named to the Forbes 30 under 30 list in 2014. Today he runs a small, self-funded research and design lab, Midjourney, which explores a number of different projects with 10 other colleagues, including the AI visualization tool.
Regarding the feedback from the AI bot, Holz says: “Many people are very happy and find the use of the product deeply emotional. People use it for everything from a project to art therapy. There are people who always had something in their head but couldn’t express it before. Some people have states like aphantasia where the mind cannot visualize things and they are now using the bot to visualize for the first time in their life. Lots of nice things are happening.”
The bot also takes care to prevent abuse of the platform to generate offensive images. The Community Guidelines urge users not to use prompts that are “inherently disrespectful, aggressive, or otherwise offensive” and generate “adult or gore content.” Midjourney also employs moderators to monitor and warn or ban people who violate the policy. It also has automated content moderation where certain words are banned on the server. The AI also learns from user data, explains Holz. “If people don’t like something, it creates less of them.”
I came across the midjourney bot during a cursory glance through my Twitter feed, where I saw user psychedelic interpretations of a somewhat post-apocalyptic Delhi.
Having previously dabbled with AI bots like Disco Diffusion and Craiyon, one interesting aspect of discovering Midjourney was investigating how different AIs would react to the same texts. The images below show the results generated using the same City During Monsoon Rain prompt by Midjourney, Disco Diffusion, a free-to-use AI tool hosted by Google Colab, and Craiyon, formerly known as DALL-E mini.
While Craiyon produces relatively realistic images, Disco Diffusion produces surreal, impressionistic results, and Midjourney falls somewhere in the middle of the two.
According to Holz, Midjourney can be understood as a “playful, imaginative sandbox”. “The goal is to give everyone access to this sandbox so everyone can understand what is possible and where we are as a civilization. What can we do? What does that mean for the future?”
Holz dismisses fears that AI is here to ‘replace’ humans or their jobs. “When computer graphics was invented there were similar questions – will this replace artists? And it hasn’t. If anything, computer graphics make artists more powerful,” he says.
Adds Holz: “Whenever we see something new, we’re tempted to find out if it’s dangerous and we treat it like a tiger. AI is not a tiger. It’s actually more like a big river of water. A tiger is dangerous in a completely different way than water. Water is something for which you can build a boat, learn to swim, or build dams that generate electricity. It doesn’t try to eat us, it doesn’t get mad at us. It has no emotions or feelings or thoughts. It’s like a powerful force. It’s an opportunity.”