Tag: Humane

  • Week 15.24

    Week 15.24

    It was Hari Raya Puasa here on Wednesday, which, along with the city’s oppressively hot and humid weather, left those of us who don’t celebrate the holiday feeling somewhat unsatisfied upon returning to work on Thursday. More than one person slipped and called it a Monday, or asked how the weekend was. So instead of a four-day workweek, it felt like two weeks in one.

    Perhaps the depressed mood was justified. Earlier in the week, tragedy struck a colleague who lost their father to a heart attack — a feeling all too familiar within our team as the same thing happened to another young designer just over a year ago. And you may recall just 9 weeks ago, another friend lost their dad too. At the same time, my thoughts have been occupied by a family friend, virtually family, currently recovering from surgery with an as-yet-unquantified cancer running loose in her body.

    I’m tired, but feeling better about the recent decision to make room for more important things than my current work. I came across this poem about mortality that captures the suddenness of loss and how we take everything for granted: If You Knew, by Ellen Bass. I was also reminded of this Zen concept that a glass always exists in two states, whole and broken, while reading responses to a tweet asking for ā€œsentences that will change your life immediately upon readingā€.

    Hitting the books

    Speaking of reading, I picked up Isle McElroy’s People Collide again after months of sipping its beautiful phrases through a tiny time straw, finishing it quickly. It’s the best thing I’ve read in many months; a profound questioning of what it means to be a particular person in a specific body, and how much of you makes up who you are to everyone else. At its core it’s a Freaky Friday body swap story. I don’t know if it’s because McElroy is trans that these perspectives and insights are so tangible, but I felt them. Even though the story didn’t go where I wanted at all, I gave it five stars on Goodreads because the final page is a triumph. I had to fight back tears of admiration while reading it on the bus.

    Right after that, the book train was rolling again and I read After Steve: How Apple Became a Trillion-Dollar Company and Lost Its Soul by Tripp Mickle, which had some inside stories and gossip I’d not heard before, and an interest in how Jony Ive ā€œneglectedā€ his design leadership role in the later years, a story I’ve been interested in hearing. Still, it’s one of those non-fiction narratives that dramatizes and assumes a lot about what its subjects did and felt at key moments, things nobody can know for certain.

    Here it comes, the AI part

    Meanwhile, the Apple Design Team alums who decamped to Humane launched their first product, the ā€œAi Pinā€, to largely middling reviews from tech outlets like The Verge. Quick recap: this is a camera-equipped, voice-enabled wearable you attach to your clothing, letting you access a generative AI assistant so you can ask general questions and take various actions without getting your phone out. In theory.

    Most of its faults seem to stem from issues intrinsic to OpenAI’s GPT models and online services, on which the Pin is completely dependent. It’s a bit tragic for Humane’s clearly talented startup team. I’m inclined to see the hardware as beautiful and an engineering accomplishment, and what parts of the user experience they could customize with the laser projector and prompt design are probably pretty good, but it doesn’t change the fact that the Pin’s brains are borrowed. A company with financial independence and the ability to make its own hardware, software, and AI services would have a better chance. Hmm… is there anyone like that?

    Meanwhile, a new AI music generation tool called Udio launched in public beta this week and I spent some time with it. I’ve only played with AI models that do text, images, and video, but never audio. It’s currently free while in beta and lets you make a generous amount of samples, so there’s no reason not to take a look.

    Basically you describe the song you want with a text prompt, and it spits out a 33-second clip. From there, you can remix or extend the clip by adding more 33-second chunks. It generates everything from the melodies to the lyrics (you can provide some if you want), including all instruments and voices you hear. Is it any good? It’s very impressive, although not every song is a banger yet. Listening to hip-hop instrumentals featured on the home page, I thought to ask for a couple of conscious rap songs and they came out well, with convincing sounding vocals. I then asked it to write a jazzy number about blogging on a weekly basis and you can judge for yourself if the future is here.

    At present, I see this as a fun toy for the not-so-musically inclined like myself, and as an inspiration faucet for amateur songwriters who work faster with a starting point. So, pretty much like what ChatGPT is for everything else. And like ChatGPT, I can see a future where this threatens human livelihoods by being good enough, at the very least disrupting the background music industry.

    Comfort sounds

    One musical suite that stands as a symbol of human ingenuity’s irreplaceability, though, is what I’ve been playing in the background on my HomePods all week while reading and writing: the soundtrack to Animal Crossing New Horizons. Because Nintendo hasn’t made the official tracks available for streaming, I’ve been playing this fantastic album of jazz piano covers by Shin Giwon Piano on Apple Music. It takes me right back to those quiet, cozy house-bound days of the pandemic. Could an AI ever take the place of composers like Kazumi Totaka? I remain hopeful that they won’t.

    Maggie Rogers released her third album, Don’t Forget Me. I put it on for a walk around the neighborhood on Saturday evening and found it’s the kind of country-inflected folk rock album I tend to love. One song in particular, If Now Was Then, triggered my musical pattern recognition and I realized a significant bit sounds very much like the part in Counting Crows’ Sullivan Street where Adam Duritz goes ā€œI’m almost drowning in her seaā€. It’s a lovely bit of borrowing that I enjoyed; putting copyright aside, experiencing a nostalgic callback to another song inside a new song is always cool. It’s one of the best things about hip-hop! But why is it okay when a human does it but not when it’s generative AI? I guess we’re back to Buddhism: Everything hangs on intention.

    ===

    Miscellanea

    • I watched more Jujutsu Kaisen despite not being really blown away by it. Mostly I’ve been keen to see the full scene of a clip I saw posted on Twitter, where the fight animation looked more kinetic and inventive than you’d normally expect. I decided that it must have come from Jujutsu Kaisen 0: The Movie, because movies have bigger budgets and the animation in season 1 looked nothing like it. And I had to finish season 1 in order to watch and understand the movie.
    • Well, I saw the movie and it was alright, but it didn’t have that fight scene. So where is it?? That got me watching more episodes of the TV anime, and I don’t think I’ve ever seen a jump in quality like this between two seasons of a show. It seems a new director came on board (maybe more money too), and suddenly the art is cleaner, the camera angles are more striking and unconventional, and everything else went up a notch. I guess I’m watching another 20+ episodes of this then.
    • I finished Netflix’s eight-episode adaptation of Three Body Problem. I’m not invested enough to say I’d definitely watch a second season, assuming they pick it up at all.
    • On that topic, Utada Hikaru released a greatest hits compilation called Science Fiction, with three “new” songs, and 23 other classics either re-recorded, remixed, and/or remastered in Dolby Atmos. I don’t really know these songs in that I have no idea what many are actually about, but I’ve heard them so much over the last 25 years, I probably know them more deeply than most.
  • Week 45.23: AI on the brain

    Week 45.23: AI on the brain

    This week in artificial intelligence was a big one: Humane unveiled their highly-anticipated wearable, while OpenAI made strides with ChatGPT enhancements.

    The Humane Ai Pin

    A lot has already been said about the letdown that the Humane reveal was, mostly by people confused by the presentation style of the two ex-Apple employees who founded the company.

    If you’ve seen Apple events and Humane’s 10-minute launch video, you’ll note the contrast in delivery and positioning. Apple tries to couch features and designs in real-life use cases, and show authentic enthusiasm for what they do to improve customers’ lives (Steve was unmatched at this). Humane kicked off with all the warmth of a freezer aisle, missing the chance to sell us on why their AI Pin wasn’t just another tech trinket in an already cluttered drawer. They puzzlingly started with how there are three colors available and it’ll come with extra batteries you can swap out, before even saying what the thing does! The rules of storytelling are quite well established, and why they chose to ignore them is a mystery.

    A lot was also said about how two key facts in the video presentation, provided by the AI assistant so central to their product, turned out to be inaccurate. One was about the upcoming solar eclipse in 2024 (and Humane’s logo is an eclipse! How do you get this wrong?), and the other was an estimate of how much protein a handful of nuts has. It’s a stunning lack of attention to detail that this was not fact-checked in a prerecorded video.

    Personally, I have been waiting for the past five years to see what this stealth startup was going to launch, and as the rumors and leaks came out, I was extremely excited to see an alternative vision for how we interact with computers and personal technology. What they showed did not actually stray from what we knew. An intelligent computer that sees what you see, is controlled by natural language, and is able to synthesize the world’s knowledge and project it onto your hand in response to queries is amazing!

    The hardware looks good, channeling the iPhone 5’s design language to my eyes, and I’ll bet they had to pioneer new ideas in miniaturization and engineering to get it down to that size. I expected it to cost as much as an iPhone, but it’s only $699 USD, which feels astoundingly low. That’s not much more than what we used to pay for a large-storage iPod.

    The disappointment is in their strategy. By positioning it as a replacement for your phone rather than an accessory, they’ve reduced the total addressable market to a few curious early adopters and people who want to address having a tech or screen addiction. The kind who intentionally buy featurephones in 2023. I think their anti-screen stance is interesting, but it doesn’t win over the critical mass necessary to scale and challenge norms.

    The Ai Pin comes with its own phone line for messages and calls (for $24/mo), so it’s not going to be convenient to use this alongside your phone, and I would not give up my phone while this is still half-baked — I say this kindly, because even the iPhone launched half-baked in many ways. For many things that we have become accustomed to in life, there is no substitute for a high-definition Retina display capable of showing images, video, and detailed or private information when necessary.

    Do I believe that Apple can one day get Siri to the level of competence that OpenAI has? I have to hope, because the Apple Watch is probably a better place for an AI assistant to live than in a magnetically attached square on my T-shirt. In any case, Humane seem to have taken a leaf out of their old employers’ playbook, and will be releasing this first version only in the US, and so whether or not I would buy one is a moot point.

    OpenAI and GPTs

    Speaking of OpenAI, it would seem that they’re still the team to beat when it comes to foundation models. The playing field is full of open-source alternatives now, including Lee Kai-Fu’s 01.ai and their Yi-series models, but as a do-it-all company offering dependable access to dependable AI, OpenAI seems unassailable.

    They announced enhancements to their models, increasing context windows and speeds while halving prices for developers, and launched a new consumer-friendly product: customized instances of ChatGPT that work like dedicated apps, which they call ā€œGPTsā€. In effect, these are a version of Custom Instructions which were introduced earlier this year as a way to tell ChatGPT how to behave across all chats. But sometimes you’re a researcher at work and sometimes you want to have some dumb fun, thus I’m not sure they caught on.

    So now GPTs let you specify (pre-prompt?) different contexts and neatly turn them into separate tools for different purposes. Importantly, you can now also upload knowledge in the form of files and documents for the agents’ reference in generating replies. This makes them more powerful and app-like, and normal people like me with no coding ability can create them by telling a bot what they want (in natural language, of course), or writing prompts directly. I recommend the latter, because chatting with the ā€œCreateā€ front-end tends to oversimplify your instructions over time and you risk losing a lot of detail about how you want it to work and interact with users.

    So what does the launch of these GPTs mean? Well, for many of the developers who were riding the OpenAI wave and only used their APIs to build simplistic wrapper apps, it’s a sudden shift in the tide and they’re now forced to build things that aren’t reducible to mere prompts.

    What we’ll soon see is a GPT gold rush. Brace yourself for a stampede of AI prospectors, each hunting for their piece of OpenAI’s bonanza — the company will be curating and offering GPTs in a ā€œStoreā€ and sharing revenue with creators. That’s a different model than their APIs where developers pay OpenAI for compute and charge users in turn. Here, users all pay OpenAI a flat fee for ChatGPT Plus and can use community-made GPTs all they want (within the rate limits).

    Hear everyone talking about a viral GPT that makes it so easy to do X? When you want to try it out, you’ll see a call-to-action to sign up for ChatGPT Plus. This signals to me that launching GPTs is a strategy to drive paid account conversion, which begins the lock-in that OpenAI needs in order to make ChatGPT the new OS for services, not unlike how WeChat is the base layer that runs China, regardless of whether you use iOS or Android. Eventually you won’t even need to know about or choose the GPTs you use; the master ChatGPT system will call them as necessary. We may not be headed for a screen-less future, but we’ll probably see an app-less one.

    My GPT projects

    Of course I’m playing with this and making some of my own! Did you think I wouldn’t, given the ability to create AI things without coding?

    I’ve got a list of ideas to work on, and so far I’ve acted on three of them, which are explained on this blog in separate posts.

    ✨ PixelGenius was my first, and contains the most complex prompt I’ve ever written. It started out as a tool to generate photo editing presets/filters that you can use on your own in any sufficiently advanced photo editing app with curves, H/S/L controls, and color grading options. You can just say ā€œI want to achieve the look of Fujifilm Astia slide filmā€ and it’ll tell you how to do that. But now it does more than just make presets, which you can find out about here. More details and examples in the blog post here.

    😓 SleepyTales was the second, and I’m still amazed at how good it is. It’s designed for Voice Conversations mode (currently only in the mobile app), so you can get a realistic human voice reading you original (and also interactive, if desired) bedtime stories. These are never-ending, long, and absolutely boring tales with no real point, in drama-free settings, told in a cozy and peaceful manner. It’s the storytelling equivalent of watching paint dry, yet oddly mesmerizing. More on this and the next one here.

    🄱 SleepyKills šŸ”Ŗ was born from a hilarious misread — I told Cien about it and ‘mundane’ became ‘murder’. So if your bedtime stories of choice are usually true crime podcasts, then you’re in luck. This GPT agent will create an infinite number of dreary murder stories, but stripped of all suspense, mystery, and excitement. They’re about as exciting as real police work, not the flashy TV investigating sort. Again, I still can’t believe how cool it is to hear these being written and read in real time.

    People have said the Voice Conversations feature is a game-changer for ChatGPT, but I didn’t really get it at first when using it for general queries. IMO, the killer app for it is storytelling. I’ve been using the voice called Sky for both the above bedtime stories apps, and it works well.

    Films

    • I watched David Fincher’s new film The Killer in bed on my iPad, just like he would want me to. Even then, it was spectacular, a cinematic victory lap for both him and Michael Fassbender. It plays with genre conventions, expectations, and riffs off his own body of work. There are some great moments and a fantastic performance by Tilda Swinton. 4.5 stars.
    • Speaking of performances by English actors, I also watched Guy Ritchie’s Operation Fortune: Ruse de Guerre, which is both a terrible name and attempt at creating a new globetrotting spy/special ops team franchise. But, he has a certain touch even when making shit, and the film is a hell of a lot of fun, bringing out the best in Jason Statham (who tried to hold up The Expendables 4 and failed), as well as a villainous turn from Hugh Grant that — I shit you not — is easily a Top 10 career highlight for him. Jason Statham in the right hands is a very different animal than when he’s doing B material; I don’t know how to explain it. I actually gave it 4 stars on Letterboxd and won’t take it back.

    Album of the week

    REM’s Up received a 25th Anniversary Edition, with some tracks seemingly remastered and a whole second ā€œdiscā€ of an unreleased live performance they recorded on the set of the TV show Party of Five?! Sadly it is not a track-for-track live performance of the album, which would have been great. There’s no Dolby Atmos here either, so I’m just taking this as an opportunity to revisit this album.

    I can still feel the gut punch from the day Bill Berry bowed out, post-aneurysm. I was afraid they might break up, and REM was absolutely my favorite band back then (maybe still), so when Up came out, I was hopeful for a new and long-lived chapter to begin. And yeah, it was a weird album, playing with new sounds and using drum machines — not unlike The Smashing Pumpkins’ Adore album after Jimmy Chamberlin left. But many songs were great, some even recognizably REM. The band kept going for a few more albums, each a new spin on an evolving sound. And in true style, they dropped the mic at just the right moment.

  • Week 16.23

    I usually look through my camera roll to recall events as I start writing these posts. It’s telling me nothing much happened this week.

    That’s not true; it’s just a lot of it was spent online. You might have noticed the excitement and fast pace of advancements in AI recently, and it seems I’m spending a correspondingly larger amount of time playing with, reading about, and discussing the impact of it on our work and lives. It’s enough to make one consider taking a gap quarter or year off work to focus on this stuff.

    One catalyst was a colleague being invited to do an interview on what it means for design, and so we had a conversation about the trends beforehand. Unsurprisingly, the media is still thinking about both design and AI simplistically: will image generation mean fewer jobs for illustrators and that sort of thing. I find it hard to be optimistic in the short-term, in that AI is lighting a fire under our asses and it’s going to cause a lot of pain. But the potential for us as a discipline to evolve under pressure into something greater is undeniable.

    It didn’t help that the next thing I saw was The AI Dilemma, a talk by the creators of the documentary, The Social Dilemma, wherein they say the problems unleashed on society by social media were just the prequel to what AI is on track to do if we don’t prepare. And let’s just admit we don’t have a great track record of preparing for things we know are going to hit us later. It’s about an hour long but I’d file it under essential viewing just for awareness of what’s building up.

    The above talk was given at The Center for Humane Technology, and coincidentally this was the week we finally got a look at what Humane, the secretive product company founded by a load of ex-Apple designers and engineers, has been building and teasing.

    I’ve been anticipating their debut for a long time and had a pretty good idea of the core concept from their leaked pitch deck and patents. Essentially, a device achieves AR by projecting a digital interface on the world around you the old-fashioned way, using rays of light pointed outwards, rather than on the inside of glasses. At some point along the way they started mentioning AI a lot, and it looks like the secret ingredient that turns a nothing-new wearable camera + laser projector into a real alternative to smartphones. In other words, an intelligent assistant that isn’t primarily screen based, so we can be less distracted from ā€œreal lifeā€.

    It’s probably best to withhold judgment until we see more at some sort of unveiling event, with more demos, a name, a price, a positioning. But it’s worth remembering that when the iPhone came out, it was a phone good enough to replace whatever you were using at the time. Humane’s device is said to be standalone and not an accessory to be paired with a smartphone. It’s also shown taking calls. The bar for replacing your telephone is now much higher after some 16 years of iPhones.

    An intelligent assistant that let you do things quicker with less fiddling was always my hope for the Apple Watch from its very first version; that Siri would be the heart of the experience, and the UI wouldn’t be a mess of tiny app icons and widgets, but a flexible and dynamic stream of intelligently surfaced info and prompts. We all know Siri (as a catch-all brand/name for Apple AI) wasn’t up to the task at the time, but I keep hoping the day is right around the corner. Fingers crossed for the rumored watchOS revamp at WWDC this year.

    There’s now also a rumor that iOS 17 will add a new journaling app, and my expectations are already very high. They say it’ll be private, but tap into on-device data like Health and your contacts and calendars. That goes beyond what Day One does. I’m imagining the ultimate lifelogging app that automatically records where you go, who you met, what you did, how tired you were, what music you were listening to, and your personal reflections, all in one searchable place. I’ve tried a bunch of these before, like Moves and Momento, but nothing lasted. If Apple does do this, I may finally be able to ditch Foursquare/Swarm, which I still reluctantly use to have a record of where I’ve been. Its social network aspect is nice but not essential since hardly anyone else uses it now.

    I remember there was a Twitter-like app called Jaiku on Nokia smartphones over 15 years ago that had a feature where, using Bluetooth, it could tell if you met up with a fellow user, and post to your other friends about it. I was excited by it but had few friends and even fewer ones on Jaiku. Just like with AirTags and Find My, tapping into Apple’s giant user base could finally make this concept viable. As long as Apple isn’t trying to do a social network again.

    ===

    Oh right, back to AI. What have I been doing? Some of it was playing games with ChatGPT, essentially asking it to be a dungeon master using the following superprompt (which I did not create btw!):

    I want you to act like you are simulating a Multi-User Dungeon (MUD). Subsequent commands should be interpreted as being sent to the MUD. The MUD should allow me to navigate the world, interact with the world, observe the world, and interact with both NPCs and (simulated) player characters. I should be able to pick up objects, use objects, carry an inventory, and also say arbitrary things to any other players. You should simulate the occasional player character coming through, as though this was a person connected online. There should be a goal and a purpose to the MUD. The storyline of the MUD should be affected by my actions but can also progress on its own in between commands. I can also type ā€œ.ā€ if I just want the simulated MUD to progress further without without any actions. The MUD should offer a list of commands that can be viewed via ā€˜help’. Before we begin, please just acknowledge you understand the request and then I will send one more message describing the environment for the MUD (the context, plot, character I am playing, etc.) After that, please respond by simulating the spawn-in event in the MUD for the player.

    Try it! I even had success asking it (in a separate chat) to come up with novel scenarios for a SF text adventure game, which I then fed back into this prompt. I can’t emphasize enough how fun this is: you can take virtually any interesting, dramatic scenario and immediately play it out as an interactive story.

    Here’s an example where I played the role of a time traveler who has to stop a future AI from destroying humanity by going back in time to prevent the invention of certain things, starting with the Great Pyramid of Giza, which will purportedly become a power source for the AI.

    And here are a couple of new products made possible by GPT. There are so many, all asking for about $10/mo. Most won’t survive as this stuff becomes commoditized, but for the moment they are all amazing because these things weren’t possible before.

    • Tome: It’s a sort of PowerPoint that can create entire decks on its own from a short brief you give it. For example, ask for a sales deck and it’ll set up a working narrative arc over multiple slides, not filled with placeholder text and images mind you! But actually generate text and original pictures to fill every one of them. Of course, it will use common storytelling structures — the portfolio introduction I made as a test looked like 90% of the applications that we see, using very familiar language for describing one’s experience, design philosophy, values, skills. This is fine, of course. You can edit it, or use it for as long as ā€œwhat went beforeā€ continues to have currency in this society. When quality is everywhere, quality becomes meaningless. Fire under buttocks.
    • Rationale AI: Describe a decision you’re trying to make, and it’ll tell you the pros and cons, or generate a SWOT analysis, or work out the causal chain of the path you’re on. For many people, this sort of reasoning is not hard to do, but perhaps it’s a game changer for those who can’t. For example, if you’re in an emotionally distressing situation and cool logic is evasive; it could help to show the bigger picture. I tested it with such a scenario and it gave some solid insights (be careful with advice from an AI, of course). But that this thing works at all is a marvel! ā€œShould I become a full-time influencer?ā€ is not a question a machine could have understood in the past, and certainly it could not have forecasted that failing down the road might put stress on your finances and lead to harmful self doubt and regret over quitting your job.
    • Summarize.tech: I found this by accident when someone shared a two-hour YouTube video essay in a group chat and everyone said ā€œI ain’t got time for thatā€. I remarked that it sure would be great if an AI could watch that and write a tl;dr for us. And then I thought… surely that exists. And it does.

    ===

    It was also my birthday, and I saw John Wick 4 and ate a lot of Taiwanese hot pot. Also binged all of the new Netflix show, The Diplomat, and it was actually good. Life’s alright when that happens.