Can Artificial Intelligence Systems like DALL-E or Midjourney Perform Creative Tasks?
Recently we are witnessing a major shift in the process of generating images. The recent influx and growth of machine learning and artificial intelligence raises questions about the way in which creative processes evolve and develop through technology. Systems like DALL-E, DALL-E 2 and Midjourney are AI programs trained to generate images from text descriptions using a dataset of text-image pairs. The diverse set of capabilities includes creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, and applying transformations to existing images.
DALL-E and similar systems are able to create plausible images for a great variety of sentences that explore the compositional structure of language. DALL-E has some of the capabilities of a 3D rendering engine, but the difference lies in the nature of inputs. For 3D rendering, the input must be specified in complete detail, while DALL-E is often able to “fill in the blanks”. It can also independently control the attributes of a small number of objects.
One of the most exciting features is the ability to combine unrelated concepts. This ability could have implications for the fields of architecture and design, as it for architecture and product design to take inspiration from seemingly unrelated concepts. The AI generative models encourage designers to explore a greater number of design possibilities from a new perspective, as it lowers the time between intention and execution. They offer an accessible way to play with data and generate imaginative variations of solutions to creative problems.
The Metaverse as Opportunity for Architects: An Interview with Patrick Schumacher
Some researchers are calling these “Artificial serendipity”, systems that maximize the opportunity for serendipity, opening up the range of creative capabilities beyond the classical methods. Architects are already experimenting with these tools to explore complex issues like urban planning and the possibilities of existing spaces. Others are combining architectural keywords with contemporary design cliches, references to pop culture and various art styles to design buildings or simply explore the nature of design trends and technology.
While these models have limitations, the field is evolving at an unprecedented rate. Recently, Apple has released Gaudi, a “neural architect” that takes this process one step further by creating 3D scenes from text prompts like “go upstairs” or “go through the hallway”. It is hard to predict where these developments will take us, but their impact can already be felt. In the fields of architecture and design, these can be understood as powerful tools to explore, optimize, and test creative designs rapidly.