I was in a cab listening to music on my AirPods, and just as we were pulling up, I switched to Transparency Mode and heard a song playing over the car’s radio that sounded kinda familiar. I knew it was a remix of some tune I wanted to know, and managed to Shazam it before getting out.
Looking into it later, I realized the melody was what I’d been trying to figure out about Charli XCX’s White Mercedes for over a year. Why does that one line she sings — literally the line “like a white Mercedes” — sound like some other song I can’t name? It turns out, it’s literally a song I would have absorbed from the world around me but never intentionally listened to: One Direction’s Night Changes from 2014. Ahhh it’s so good to have that itch scratched! And there are so many more like this I’ve yet to solve.
Let me say it again for the search engines: Charli XCX’s White Mercedes sounds like, samples, or contains an interpolation from One Direction’s Night Changes.
Another similar thing happened as I was playing WarioWare Inc. (GBA, 2003) again for the first time in years. The background music in one stage awoke some long dormant memory and I needed to know what pop song from my younger days it sounded like. After a lot of humming aloud and trying to Shazam it and searching online… I concluded that the song it reminded me of was… itself. It’s called Drifting Away, and I must have really loved it back when I was playing the game for the first time.
Speaking of retro games, I lasted a full week. The Anbernic RG35XX I said I wouldn’t buy since I already have a Retroid Pocket Flip is now on the way to me from China. There are some reports of shoddy QA and long-term durability, but for S$90 I think that’s to be expected.
Another week, another bunch of water-cooler conversations about AI. Specifically how it relates to our work in design: as accelerator, collaborator, ambiguous combatant, amoral replacement. I don’t just mean the making pictures and writing words part, but analyzing messy human interactions (it’s just unstructured data) and presenting them in new ways.
I ran one experiment with ChatGPT on Sunday afternoon, just for kicks, and it sort of blew my mind. From a handful of behavioral traits and demographic details I supplied, it was able to inhabit a fictional personality that I could speak to and pitch various products to. So far so par for the course. But then it reacted to a hypothetical KFC offering called “The Colonel’s Colossal Combo” in a way I didn’t expect, citing a conflict with values and dietary preferences that I did not specify. When asked where they came from, it argued that although they were not specified, they could be reasonably expected from the “Frank” persona I’d created, because of some other background that I DID provide. It sounded a lot like intelligent reasoning to me, and regardless of how it works, I was happy to accept the inference the same as if a colleague were making it.
Like with all advances in automation, it’s inevitable that we’ll now be able to (have to) do more in less time, with fewer people. Until things go “too far” and need to be reined in, it’s not even a question of whether we should — every industry is incentivized to discover when can be done before it gets done to them. I think there are some exciting opportunities for designers, and a slew of unknown consequences for society. And just like that, we’re back in a new “fuck around” phase of the tech cycle.
A couple of weeks ago I made a bunch of fashion-style athleisure photos with Midjourney v5 but somehow forgot to post them. The photorealistic ones are quite incredible, and the few illustrations I got were really strong too.
This week, v5.1 dropped, promising more opinionated outputs and sharper details, so I tried the same prompt again. Many of the results were as broken as these bodies.
They probably fixed something quietly because it’s been more reliable in the days since. I thought it would be interesting to do a comparison of models 1 through 5.1 with the same prompt. It’s crazy how far it’s come in just over a year.
photograph of Queen Elizabeth II in a dim video arcade, sitting at a street fighter 2 arcade cabinet, intense concentration playing game, side view, screen glow reflected on her face, atmospheric dramatic lighting --ar 3:2
If you saw Midjourney a year ago, you were probably impressed by how it and Dall-E 2 could turn quite natural text descriptions into imagery, even if the results were still quite hallucinatory, like DeepDream’s outputs circa 2015. I don’t think you would have expected to see the pace of improvement be this quick.
It’s not just rendering improvements from distorted pastiches to photorealistic scenes with internal logic (global light affecting nearby objects realistically, fabrics folding, leather seat covers stretching under buttocks), but also how it’s evolved through feedback and training to understand intent: the idea of a “side view” started working from v4. None of the earlier re-generations got me the camera angle I was going for. The tools that promise to do this for video are probably going to get good faster than you expect.
Leave a Reply