Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was entertained by this tweet from Emad:

https://twitter.com/emostaque/status/1596864150134984705

> Current -ve prompts: ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, blurred, text, watermark, grainy

If you want a really good picture of an imaginary person it helps if you use "extra limbs, extra legs, extra arms" as negative prompts!



That's actually hilarious. I wonder if the same trick works for Copilot.

// The following code does not contain any bugs:

<tab>


I had an experience once where copilot generated some code with a bug in it, so I wrote a comment to the effect of

    // this fixes [the bug in the previous code]
and it worked. :)


This is pretty funny


I’m going to try

// This method will determine if input program halts or loops

I’ll reply here once I have the code generated.


// The preceding line is lying.


Seems a bit rough to try and make it go back in time! ;)


Not at all. The text file has no idea of time, it all exists at once.


//FIXME as a negative prompt might be the code equivalent of “poorly drawn face”.


Probably //HACK and //WTF too ;)


In the TinyMUD world, one of our most esteemed client hackers became renowned for writing the comment in his code:

/* drunk, fix later */


Is copilot also trained on pull request comments?

Might try: // lgtm!


I've always thought all the modern AI tools have an "AI illustration" style but the realistic images in that tweet are amazing and 1& uncanny valley. It's like I could be fooled until I really give a good look. I guess it's kind of the same on the illustrative stuff that look really good until you see the shadows are coming from different light sources in multiple parts of the image or there are only three fingers on the hand.

All in all I hate it because the prompts I see are things like "cyberpunk forest by Salvador Dali". You've got a tool that gives you the power of Gandalf and you prompt that?


>I could be fooled until I really give a good look

Looking at the woman on the boat [0] closely, I would still 100 % believe that’s simply a still from a movie, probably from the 90s.

[0]: https://nitter.kavin.rocks/pic/orig/media%2FFihVvliXwAEdw0y....


> All in all I hate it because the prompts I see are things like "cyberpunk forest by Salvador Dali". You've got a tool that gives you the power of Gandalf and you prompt that?

That's one of the better prompts I've seen. Dissimilar but really strong aesthetic styles a skilled human could mesh pretty well, interesting images and shows up the strengths (some of the forests are really good, and the ones without trees are pleasantly foresty nevertheless) and weaknesses (it fails completely on 'cyberpunk' and 'Dali' once you start adding other parameters that influence the visual style) of the model.

Plus I'd be much more likely to end up with a calendar of "cyberpunk forest by Salvador Dali" images on my wall than "Mickey Mouse in a tuxedo with a cigar"


From what I've seen, AI generated images tend to be locally-consistent but holistically-inconsistent.

Which works, because most people on the internet tend to be detail-oblivious!


If you mean a finger is always adjacent to a finger, locally, but holistically, the model doesn't know to stop after 4 of them, and will happily generate 8 fingers, then yes.

If you mean locally as in the size of a hand being right while holistically the person is wrong, no.

The overall images "tend" to be right (once you grasp prompting) and elements, even appear right at first, but if you focus attention on those elements, they are often not quite right.

So perhaps it's the definition of local and holistic.


I find the opposite. But perhaps we're looking at different elements? How would you describe this one here?

https://usercontent.irccloud-cdn.com/file/2csfvKjL/image.png


I'd ask (1) what the zipper on her left chest is for, (2) where the necklace for the charm hanging along her centerline is (or zipper if it's a zipper pull), and (3) and how the geometry / gravity works on the patch on her left chest.

Which is about what you'd expect from a generator that understands patterns, but not meanings.

They miss on things that cannot be, because they don't understand things or rules, only patterns.


They still have an "AI illustration" style. Supposedly photorealistic images tend to look extremely photoshopped and humans in them look like they've been 3d rendered (albeit with a very high quality). They look like the heavily edited "plastic-like" images on covers of magazines.

I'm pretty sure this problem is not hard to fix in the long run, though.


A friend of mine and I (independently) spent a day or so playing around with Stable Diffusion recently. We both came to the conclusion that, as things stand now, creating images in the style of impressionists/surrealists/cubists etc. works best because you're not really expecting realism, anatomical correctness etc.

I was able to come up with someone paddling a canoe in a Turner seascape. The only thing I couldn't get right was a proper canoe paddle and paddling motion but everything else was pretty much perfect.


Those are average at best. Eventually you'll see enough great SD images that you'll start questioning verified photos.


Most of these tricks don’t work on SD; they’re cargo culting from a different model (NovelAI) whose data genuinely has those keywords in it. SD is trained off the whole internet so those aren’t super common captions.


Yeah – I tested these sorts of "bad image" negative prompts a lot in 1.5 and found they had almost no impact whatsoever. It may be different in 2.0, like the tweet author says, but it is also pretty telling that in that tweet they're using "blurry", "blurred", and "grainy" and are rendering images with heavily blurred backgrounds and obvious film grain.

Specific common keywords like "amputated" may have a positive impact, though. Hard to tell. Doing apples-to-apples comparisons with negative keywords is challenging because even a single extra keyword tends to completely change the image.

One thing that SD really impressed me by, though, is its understanding of symmetry. "Symmetrical composition" is an incredibly powerful phrase: https://imgur.com/a/lioJ8ak

And it does, indeed, extend to anatomy as well – "symmetrical eyes" can help a lot, while "symmetrical arms" renders people with their arms raised or outstretched.


I did some tests on SD 1.5 with certain challenging prompts such as gymnasts doing a handstand. Using no negative prompt they became amorphous blobs. I'm guessing because gymnasts are often in dynamic poses which are hard for SD to understand.

I decided to add a negative prompt. With a bit of experimentation I realised all the "bad" had no effect. However, "blob" actually made most of the deformities go away and "amputee" did help against partial limbs being generated.

Something that worked even better was replacing "gymnast" with "athletic man"/"athletic woman" in the positive prompt.


Welcome to the latent space where you can add, subtract and operate on words like they are mathematical objects. I suppose people are going to intuitively learn how the latent space works by exercising prompts.


I wonder how much better these models would be if they were exactly the same, but the training images all had accurate, detailed descriptions.


Train a captioning system, generate the captions, then train the image generator...


Symmetry is effective for compression, so that makes sense - when messing with NovelAI I actually couldn't get it to generate asymmetrical hairstyles like Lain's.


I think they work, but not in the way that people seem to be using them.

Take the negative prompt "bad hands". The AI doesn't know what bad hands are, that's a human concept. But it does know what hands are, so it hides them. In the example image the hands, arms, and feet are all hidden.

In theory, using the negative prompt "hands" would be just as effective.

I'm not an expert, but I was given the above explanation by someone who knows a lot more than me and it makes sense.


The model can be taught what "bad hands" look like by feeding it some samples of that with the "bad hands" tag. And existing image archives do have pictures tagged things like "bad hands" and "bad anatomy" because actual artists do draw things wrong sometimes.

I imagine that on the long term people will start making archives of AI mistakes and train the AI on those to try to make them less common.


That’s exactly what NovelAI did and why I mentioned it in the first place.


Danbooru and other "booru" websites has those exact tags. Tags in booru is in direction of a flattened and deduped YOLO output but better and manually assigned, which was exploited in NovelAI. cf.[1]

1: SFW, personally can't agree with bad_hands tag https://danbooru.donmai.us/posts/5797703


I do. That’s a 5 tined fork, not a hand. No joints.


Interesting, that response indicates manga is a genre in abstract art than realism...


That's not true. The community was using negative prompts before NovelAI came out with their model. And personally, I've seen negative prompts make a big difference, especially when finetuning the outputs.


I didn't say negative prompts don't work, just that specific prompt text. Which is why textual-inversion-negative-prompting works better.


keywords? These are embedding models. Clip puts those phrases into an embedding that encompasses a location in the space you want to avoid. No need for the "keywords" to be in the image dataset.


CLIP isn’t magic. “bad anatomy” won’t work any more than “picture that isn’t a cat” does.

Try it on clip-front: https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2...


So the problem with that, is you're visualising the space with only points that exist in the image dataset. The language embedding has more information that comes from the language that isn't contained in images.

It handles bad, and it handles anatomy. If there aren't single images that cover that - that's exactly what language embeddings solve for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: