Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: RNN that generates Game of Thrones text (github.com/zackthoutt)
143 points by zthoutt on Aug 22, 2017 | hide | past | favorite | 38 comments


You need to take down the data folder now and rebuild the repository. You're going to get hit with a DMCA notice. It's not a matter of IF, it's a matter of WHEN. Publishers of popular book series don't take these matters lightly.


This brings up the interesting question of whether it's ok to publish the trained model and not the data?

I'd imagine it's sufficiently derivative as to be allowed? What if you wrote a model that produced summaries of the book - would that be allowed?


I believe it would still be a derivative work. https://www.copyright.gov/circs/circ14.pdf


What would happen if many (ie 10000) people forked that repo? Github getting the DMCA notice?


Why do the copyright mafia's work for them? Let them pay their lawyers first, THEN take the data down. Money is the only language these people understand, so let's force them to spend as much of it as we can.


> Why do the copyright mafia's work for them?

Because it's the right thing to do in this case? If you define what you want to defend purely based on what a law you don't agree with can be used to do, as opposed to the merits of the actual instance, you're bound to pick a bunch of really bad battles.


The first of those books was published less than 28 years ago and the author is continuing to actively build upon those ideas.

A world in which authors of creative works don't have the ability to have a monopoly on those works for limited terms is a world in which people who focus full-time on writing those creative works cannot pay their grocery bills.

It is also a world in which only banks, large corporations, and highly educated folks can get software written for their interests.


This guy is distributing entire books. You think that's ok?


Sure. What harm is it causing?


Really? You don't think theft is an appreciable and measurable societal harm? You honestly have no problem whatsoever with outright ripping off these works? I'd love to hear your defense of that position.

It's not like we're talking about situations like cable where I have to pay $100 / month to watch one or maybe two shows I enjoy and have no other way to get access at a reasonable price (and even that is debatable). You can pick up a copy of GRRM's work for < $10 without hassle and contribute back to someone whose work gives you joy and the publishers who got that work into your hands.


It's not theft, it's copyright violation, and no, I don't think it's a societal harm. Nobody is being deprived of anything, because digital goods are non-rivalrous.

The author of this project is riffing on an older creative work to make a new creative work, exactly as artists did for all the millenia of human civilization, until lawyers got involved in the 20th century and fucked it all up.


You're right; theft is a legal term, and it's not technically theft, it's a copyright violation as you say. They rest is your opinion. If I intended on buying the book and instead downloaded it for free I am absolutely depriving the copyright holder of income. I'm far from a hard-liner on this but, in this instance, I believe it's wrong to download these books and read them. If you want the book go ahead and pay for it.

As far as the project goes, the author should not be bundling the books with everything else (common sense and debatable moral issues.) I have no issue with him using the texts, it's the distribution that's a problem.


It should be noted that per the included Jupyter Notebook, the resulting network took 24 hours to train, and the resulting network (of a 3-layer 512 cell LSTM) would be hundreds of MB. (too much to share on GitHub anyways).

The approach in the repo also uses a word-based approach instead of a character-based approach. Although a word-approach avoids typoes/nonsense, it makes prediction much more difficult since the selection space is harder. (from notebook, error is high as well). There are lots of trade offs in this field.

Training these networks is not for the faint of heart and one of the reasons I developed a tool to train text generators using a pretrained RNN (https://github.com/minimaxir/textgenrnn) on a much more simple network architecture that trains quickly, although it probably would not hit the depth of all of Game of Thrones.


It does seem like a lot of work to get output that looks like something from a simple Markov chain.


Phrase generation doesn't work well without any constraints. For example, in translation a similar system can generate decent text.


This is hilarious! We did a similar project last year using the scripts from the show. http://docs.pachyderm.io/en/latest/examples/tensor_flow/read...

Ours was not nearly as comprehensible, but still funny at points. Some example output:

{{ Stannis runs away and Melisandre walks through the lift door , no smiles) . }}

[ TYRION ]: I wanted him to be a spoiled Greyjoy again , that 's your niece . And you 're a terrible man . My little Give me A_VOICE . .

{{ The HIGH SPARROW sits down on the bucket , DORAN sentences to have waiting to the shoulder , wearing this building . She wargs off the woods and approaches The sound of the riders . BLACK WALDER reaches over his horse beside MARGAERY . }} .

[ EDMURE ]: We 're among The Khaleesi . .

[ JORAH ]: For far soon . .

[ SANSA_STARK ]: He 's travelling a true vow to command until his father 's death.

[ DAVOS ]: I don 't mind , Your Grace . I got to build your place and see rolled women in all of their fanatics And so I getting another conversation , watcher ships in the matters of the dead as gods of Westeros .


Haha, that's awesome! I was thinking about trying to combine the scripts and book text so that I can train on more data.


If you're using TF 1.2.1 or above, you want to replace this line:

cell = tf.contrib.rnn.MultiRNNCell([drop_cell] * num_layers)

with:

cell = tf.contrib.rnn.MultiRNNCell([create_cell() for _ in range(num_layers)])

otherwise you'll be using the same weights for each LSTM layer and your model is less expressive.


Yep, I was using TF 1.0.0. I will probably upgrade in the future and will keep this in mind.


The RNN seems to agree with the most popular fan theory about what will happen with Jaime and Cersei. So, it's not entirely crazy.

Nonetheless, I don't think GRR Martin has to worry about competition from RNN Martin anytime soon, but this is still quite fun, and surprisingly almost cogent at times.


That data folder seems ripe for a DMCA takedown request.


You really should not upload other people's complete works while under copyright.

Edit: Nice License

> Copyright (c) 2017 Zack Thoutt


I would like to see a TV comedy - perhaps in the vain of Who’s Line Is It Anyways - where actors act out these NN scripts.


I guess this is close: https://www.youtube.com/watch?v=LY7x2Ihqjmc. It's a script written by an RNN acted out as a short sci-fi film, and Thomas Middleditch is in it.

Don't expect too much, though. I was reading an article on it and was excited to watch it, but it's too incoherent, in the usual fashion of today's RNNs.


I wonder what an RNN could generate for an episode of Twin Peaks.


Game of Thrones is the TV series, A Song of Ice and Fire is the book series.


A message to all of the vigilantes on HN: I'm sorry that the GOT text ended up in the repo. I actually worked on this project about a month ago and after showing it to a few friends, they convinced me to share it. I didn't think about what I had committed to the repository when I made it public and shared it. This is the first project I have ever shared on HN, so I didn't expect so many people to see it.

I actually tried to remove the text within the first few hours of posting (I was not even home or with my computer), but I removed the text in a separate branch and when I merged it into master it created duplicates of every commit instead of removing the text from the git history. Someone put up an issue on Github later that night and I fixed my mistake early the next morning.

I own all of the ASOIAF books both in paperback and ebook and pay for an HBO subscription for the sole purpose of watching GOT for 6-10 weeks per year.


With Markov Chains:

"Raved on fire, he clambered up her pallet in a swatch of blood and crept toward him shudder. She was too well Ghost back behind their crews could not lack for the warg could stand guard. The singers was a fire with our beds they could have won that make yo, tell Joffrey awarded to make repairs and threatening to my sons had entered the flesh. A life and a dim his promises of Eddark Stark, he announced the foundation and set sail cracking their wounded and bucket swayed back room to go go. Har Tormund Well the Bastard s home When the Others followed him a fine rule the better look stupid. She could see the same only empty sockets weeping child he ran with that morning. His nose Wife pleased to the man with winter s notion. My brother will never lifted a scare me wise where."


So, what are the odds that we'll have an AI that writes at GoT quality before George finishes the last book?


Lower than the odds that George never finishes his last book.


I wonder if there are any forums/fan fiction websites where you could find big chunks of prose to add to the training data. Might help bring the loss down a bit if you mix some in.


fanfiction.net undoubtedly has a ton, but I wouldn't rely on the quality or style of writing being similar.

Maybe not yet a concern for this project.


This is amazing. I wonder if it'd be possible to actually get a coherent plot out of it while retaining the humorous aspects of it.


Ya, I'm going to continue to tune it and try to get a more coherent plot. I think that I will have to (at a minimum) make the sequence length really long, which will make training time even slower.


Nice! I did a similar thing with a character-based rnn a few years ago, but its output was less coherent. I liked when it started spitting out character descriptions though: "NOTHAN, a boy of nine, cupped and dismal,"

https://twitter.com/asoiaf_ebooks


I'd like to see a trailer video generated for the next series


Hah, I built something similar a while ago (with a pretty front-end): https://asoiaf.now.sh/


You definitely need to add an attention mechanism if you want better-than-markov-chain results.

Found a project ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: