Help me solve the mystery: which one of us is slacking off?
Suffice it to say that I’ve made my fair share of art. For proof, I offer up my BFA in Visual Art, acquired a couple of decades ago, even though the process of attaining that degree unexpectedly turned out to be more of a deconstruction of my own artmaking process than anything else, and would see me turn away from making much of anything for a long while afterward.
I never wanted a job making art (I never wanted a job, period, but that’s a story for another time). I ended up teaching myself web development because
- I didn’t think I could hack a design gig involving constant requests for revision to art I’d created,
- I was a fine artist, never a proper designer anyway, and
- I initially found it less stressful to work with computers than with co-workers.
I always expected to return to visual art eventually, but “eventually” didn’t seem in any hurry to get here. I tried “Make Something 365” at the start of 2013, and if I remember correctly, I think I burned out before February.
When I first heard of DALL-E in the summer of 2022 and entered my first few experimental/jokey prompts, it seemed like yet another of the many pleasant distractions that the internet has become perfection at providing. As I kept hearing about it, its impending upgrade to version 2, and its healthy competition (Midjourney and Stable Diffusion), I signed up for the DALL-E 2 waitlist and promptly forgot that I ever did.
Still, the buzz around text-to-image AIs grew steadily, until at some point in late summer or early fall of 2022, I spent most of a weekend attempting to install a local instance of the Python-powered Stable Diffusion on my MacBook, only to discover, quite maddeningly, toward the very end of that process, that my 2019 Intel-chipped machine was already too old and insufficient to serve this purpose. I joined the Midjourney Discord and soon enough remembered that I had signed up for DALL-E 2’s waitlist months ago, sifting through my inevitably spam-littered inbox to find the neglected invitation they’d sent in response.
Now armed with a generous stack of free generations, I still found myself staring at the prompt input blankly more often than not, uncertain what I should try to generate next. I tried describing a few of the sculptures and installations I’d created (or hadn’t had the chance to create) in college to see what it would make of them… But these experiments did not prove particularly fruitful, except to reveal that DALL-E seemed to have at least an occasional/limited grasp on human hands and feet, whereas Midjourney fudged every attempted hand and didn’t seem to have any concept of what feet were at all.
Things began to come further into focus when I became fascinated by the generation of realistic human faces. I had seen sites before that would generate random photorealistic faces on request, but never had I been given such power to be so specific with the kinds of faces I’d like to see created, not since I’d been doodling imaginary girls with #2 pencils in the margins of my high school notebooks.
Pieces seemed to fall into place more quickly now, in rapid succession. I discovered the GFPGAN Face Restoration Algorithm and learned from a YouTube video how to run it on my DALL-E faces using Google Colab, then I would take the restored faces back into DALL-E to outpaint bodies onto them and environments around them, then into Photoshop for finishing touches. It very quickly became a somewhat firm yet malleable system for cranking out images in a matter of hours that would have taken me weeks to make in the past with Photoshop alone.
Quite unexpectedly, by delving into the shallow depths of a novel technological tool that was already infamous for alienating professional artists worldwide by nature of its very existence, I had stumbled into the first satisfying/compelling artmaking process I’d been able to start and sustain for decades. It quickly, if improbably, became The Most Important Thing I Could Be Doing, which was admittedly a relatively easy threshold to attain, considering the sense of congested malaise and general lack of direction that had characterized 2022 for me up to that point.
At this early stage, communicating with DALL-E was a freakish, fun time. I remembered trying to talk to ELIZA, the AI therapist from way back in the day, who fell ever so woefully short of passing a Turing test… I remembered every night when my head hits the pillow and I say “Hey, Siri… Turn off the bedroom,” meaning the lights, and she does… But this was definitely another level. It was the first time I was anywhere near convinced that there was actual Intelligence behind the “I” in AI.
Still, I was not one to take the idea of “prompt engineering” all that seriously. Aside from sounding like a buzzword that felt relatively obsolete from the moment it had been coined, “engineering” one’s prompts seemed more suited to working with Midjourney and Stable Diffusion, both of which have a slew of actual command-line-like parameters that one can include in their prompts to tweak the output and reduce or increase randomness (as well as having spawned no shortage of prompt-building helper apps). DALL-E supposedly responds to many of the same tactics, but notoriously mentions none of them in its official documentation (or at least none that I’ve so far been able to find, and I’d like to think that I’ve looked).
I found rather quickly that lengthy, overly-specific prompts seemed to curtail DALL-E’s creativity rather than enhance it. I had to aim carefully, to strike the right verbal balance between specifics and vagueries, leading the AI closer to where I was hoping it would go, while still leaving enough room for it to improvise… And quite consistently, it would come up with some excellent ideas (many of which seemed to have only a tenuous relationship to my prompts), which would often inspire me in turn to try to get more on DALL-E’s wavelength and adjust my text input before the next attempted generation. It felt like nothing less than collaborative creation.
Over the course of September and October 2022, I created 50 of these images using my system, more if you count the multiple “levels” of outpainting I tended to do, allowing me to feature a close-up of the face and then have it get progressively smaller in the center of the frame as more world was added around it. I very much enjoyed the pace and process of outpainting on DALL-E’s site: a brilliantly simple, responsive UI/UX that I have yet to see replicated in any instance of Midjourney or Stable Diffusion. I enjoyed moving the generation frame around and seeing what would show up next… It’s probably the one thing I spent the most time doing in the latter half of 2022, apart from Photoshopping in general. I was much less interested in the “one-and-done” nature of what most others seemed to be doing with these AIs, I liked to spend more time on each image, and feel like I was actually working on it for awhile, even if DALL-E was handling the bulk of the grunt work. To be perfectly honest, I don’t think I’d ever produced visual art with such consistency and determination before in my life. And naturally, I wanted to keep the flow going…
I was generally generating (and by necessity, rejecting the majority of) hundreds of one-megapixel images in order to compile/compose/produce a single final image, and as my ambitions grew, spending increased hours in Photoshop. But the “partnership” seemed solid. Sure, I got/made a handful of images that didn’t quite work out, that showed initial promise but ended up heading in some unfavorable and/or unrecoverable direction, but a failure rate of less than 10%, working at this speed, seemed entirely acceptable. And even using all these 13 cent generations, each girl still cost only between around $4-$9 to make, also totally reasonable, at least in my estimation, for a chance to wrangle the bleeding edge of technology (especially in comparison to the prohibitive cost of a brand new computer capable of locally running Stable Diffusion).
In early October, I created a Tumblr called @inhumantouch, with the intent to post a new girl every day for as long as I could keep it up, the first time in my life when posting new creative content I was proud of every single day even remotely seemed like a possibility. A week later, getting absolutely no love on Tumblr, I shifted the primary focus to an identically-named account on Instagram, and my posts there cascaded over to my fairly inactive personal Facebook profile, where they began to garner more attention, as well as no shortage of questions as to the nature of the images and the process involved in making them, all of which I was happy to answer at great length, encouraging others to try out the AIs for themselves and see what would happen…
Then came Halloween. By this point, I’d already begun to butt up against the edges of what DALL-E considered permissible. I almost never intentionally attempted to circumvent their expansive content policy, as I had no idea how many times I would even be given the pop-up warning before OpenAI potentially cut me off from generating anything else. But I was frequently annoyed by it, as I could not intentionally generate, say, fishnet stockings, which would be utterly appropriate attire for a completely clothed goth girl, or, seemingly, any kind of convincing horror-related element, which would have been nice to be able to post around Halloween. I was able to work around some of these limitations using multiple fallbacks: combining images from disparate DALL-E generations, importing elements generated by Midjourney, occasionally resorting to extracting items from stock photos or my own photo library, even several times drawing prohibited blood into an image by hand, all using Photoshop to take up DALL-E’s slack.
My habit had become to work on a girl/image and outpaint her for as long as I could while DALL-E remained cooperative. And if/when DALL-E decided to stop cooperating, I would move on to another face and try again. But whereas doing this throughout September and October had generated 50 images I absolutely loved and only 4 or 5 that I hid away, by the end of November I found myself with only a few new images I considered successful and nearly SEVENTY more in states of incompleteness or, if “complete,” that I felt were insufficient and paled in comparison to the earlier work. I watched in despair as my healthy buffer of completed/upcoming daily posts dwindled from two weeks, to one, to mere days…
So here is where I start to lose it. From an abundance of practice over the years, I’d become quite used to coping with my own often-flickering, faltering lantern of creative motivation, but here I was, suddenly jazzed as hell and ready to go, go, go, and keep going, and my AI partner wanted, what, more space? I haven’t changed my “system” much at all since the beginning… So what, exactly, is different now? Why is DALL-E no longer giving me anything, well, interesting? Why, on the worst days, does it seem to be actively mocking me?
I’m still able to generate realistic faces I like, but that effectively only adds to the frustration, because I can’t consistently generate decent/matching bodies/outfits or environments for/around them. DALL-E has decided to default, in response to the vast majority of my prompts, to a kind of indistinct, blurry, hazy background, be it indoors or out, day or night. Its grasp on body morphology seems to have largely reverted to a grammar school drawing level. Though I became hooked on DALL-E’s ability to render photorealism, and every single one of my prompts begins with “photo of” and ends with “in 4k ultra high resolution,” generated frames now tend to shift swiftly into a weird sort of pseudo-illustration style that no one asked for, even when the image I’m outpainting from is clearly photorealistic to begin with.
The end result of this madness is, as detailed, a profusion of unfinished images, and more generations and more hours in Photoshop required to finish what few images I can manage to salvage. I write to DALL-E’s help chat and get no response. I ask questions on their Discord and, after getting briefly banned twice (due to my not being very accustomed to using Discord), finally get connected to an admin who starts a private chat with me, but then almost immediately stops responding. I’m beginning to wonder, not only “is this me or is this DALL-E?” but if I’m being subjected to some kind of perverse, live A/B test (which would be particularly perverse, considering I’m still paying full price for every generation).
My prompts were not generally very long or grammatically/syntactically complex and have barely changed since September. Paranoia creeps on in: does the possibility exist that I am actually being personally targeted because the content of my images verges on being less than entirely family-friendly? And/or because some of my non-existent girls come out looking like they might be, well, not quite of legal age? OpenAI has yet to accept any of my multiple invitations to be “a collaborator” on any of the @inhumantouch Instagram posts… But does that mean anything, other than that they must be constantly bombarded with hundreds of thousands of such invitations these days? WTF, if anything, is really going on?
So, as you can see, I’m in a state. I found this wholly unexpected, amazing new tool that re-ignited my long-smoldering passion to create visual art, quickly developed an intense yet enjoyable system to produce as many images as possible for an increasingly cohesive, genre-spanning series of imaginary portraits, then had the most crucial component of that system suddenly, inexplicably crap out on me and never completely recover. I’ve managed to complete fewer than 15 new pieces across November and December 2022, while arguably spending even more time overall taking stabs at these images than I did in September and October, so that previously mentioned “incomplete” statistic of 70 has now grown to, as of this writing, 106. Yep, after years spent idling on the creative sidelines, I am extremely reluctant to throw in the towel here… We had something, DALL-E! And you just THREW IT AWAY! I… think…?
No one in the short list of people I’ve pestered about this so far seems willing or able to help me even figure out what’s happened. I’m not allowed to illuminate these issues by posting links to my images (or even the link to this story) in Discord servers or related subreddits, because such posts are inevitably perceived as prohibited “self-promotion.” Hence, not knowing where else to turn, in increasing desperation to get back to that all-too-short sweet spot, it was time to write this, and provide a framework within which to post a few of the more egregiously offensive unfinished post-Halloween images (that I’d really rather no one ever had to see as they now stand, but whaddayagonnado?).
Does anyone reading this have any clue what’s going on? Or, at the very least, some idea when the next version of DALL-E (that I’m praying will be advanced enough to resolve these issues) will be released to the public?
It’s that sudden 180 degree flip around the beginning of November that makes it practically impossible for me to just let go of this. Sure, I can’t entirely dismiss the idea that the entire course of our perceived productive “relationship,” its honeymoon period and subsequent stormy patches, could all just be in my head, doing what it normally does, automatically personifying and anthropomorphizing as we humans are so apt to do, granting abilities and intentions to something that’s extremely unlikely to have evolved nearly far enough to possess them yet, but still… Looking at the sheer statistics, this total turnaround just seems incredibly improbable in general, let alone to be entirely the fault of my only real avenue of communication with the other party involved: my text prompts, which, as I’ve explained, were never all that complex to begin with, and have remained largely the same in structure this whole time: “photo of a (pastel) (goth) girl (with some color hair) wearing [whatever] in [whatever setting], in 4k ultra high resolution.” How could an AI have gotten tired of creating variations on this before I have?