You need to take down the data folder now and rebuild the repository. You're going to get hit with a DMCA notice. It's not a matter of IF, it's a matter of WHEN. Publishers of popular book series don't take these matters lightly.
Why do the copyright mafia's work for them? Let them pay their lawyers first, THEN take the data down. Money is the only language these people understand, so let's force them to spend as much of it as we can.
Because it's the right thing to do in this case? If you define what you want to defend purely based on what a law you don't agree with can be used to do, as opposed to the merits of the actual instance, you're bound to pick a bunch of really bad battles.
The first of those books was published less than 28 years ago and the author is continuing to actively build upon those ideas.
A world in which authors of creative works don't have the ability to have a monopoly on those works for limited terms is a world in which people who focus full-time on writing those creative works cannot pay their grocery bills.
It is also a world in which only banks, large corporations, and highly educated folks can get software written for their interests.
Really? You don't think theft is an appreciable and measurable societal harm? You honestly have no problem whatsoever with outright ripping off these works? I'd love to hear your defense of that position.
It's not like we're talking about situations like cable where I have to pay $100 / month to watch one or maybe two shows I enjoy and have no other way to get access at a reasonable price (and even that is debatable). You can pick up a copy of GRRM's work for < $10 without hassle and contribute back to someone whose work gives you joy and the publishers who got that work into your hands.
It's not theft, it's copyright violation, and no, I don't think it's a societal harm. Nobody is being deprived of anything, because digital goods are non-rivalrous.
The author of this project is riffing on an older creative work to make a new creative work, exactly as artists did for all the millenia of human civilization, until lawyers got involved in the 20th century and fucked it all up.
You're right; theft is a legal term, and it's not technically theft, it's a copyright violation as you say. They rest is your opinion. If I intended on buying the book and instead downloaded it for free I am absolutely depriving the copyright holder of income. I'm far from a hard-liner on this but, in this instance, I believe it's wrong to download these books and read them. If you want the book go ahead and pay for it.
As far as the project goes, the author should not be bundling the books with everything else (common sense and debatable moral issues.) I have no issue with him using the texts, it's the distribution that's a problem.
It should be noted that per the included Jupyter Notebook, the resulting network took 24 hours to train, and the resulting network (of a 3-layer 512 cell LSTM) would be hundreds of MB. (too much to share on GitHub anyways).
The approach in the repo also uses a word-based approach instead of a character-based approach. Although a word-approach avoids typoes/nonsense, it makes prediction much more difficult since the selection space is harder. (from notebook, error is high as well). There are lots of trade offs in this field.
Training these networks is not for the faint of heart and one of the reasons I developed a tool to train text generators using a pretrained RNN (https://github.com/minimaxir/textgenrnn) on a much more simple network architecture that trains quickly, although it probably would not hit the depth of all of Game of Thrones.
Ours was not nearly as comprehensible, but still funny at points. Some example output:
{{ Stannis runs away and Melisandre walks through the lift door , no smiles) . }}
[ TYRION ]: I wanted him to be a spoiled Greyjoy again , that 's your niece . And you 're a terrible man . My little Give me A_VOICE . .
{{ The HIGH SPARROW sits down on the bucket , DORAN sentences to have waiting to the shoulder , wearing this building . She wargs off the woods and approaches The sound of the riders . BLACK WALDER reaches over his horse beside MARGAERY . }} .
[ EDMURE ]: We 're among The Khaleesi . .
[ JORAH ]: For far soon . .
[ SANSA_STARK ]: He 's travelling a true vow to command until his father 's death.
[ DAVOS ]: I don 't mind , Your Grace . I got to build your place and see rolled women in all of their fanatics And so I getting another conversation , watcher ships in the matters of the dead as gods of Westeros .
The RNN seems to agree with the most popular fan theory about what will happen with Jaime and Cersei. So, it's not entirely crazy.
Nonetheless, I don't think GRR Martin has to worry about competition from RNN Martin anytime soon, but this is still quite fun, and surprisingly almost cogent at times.
Don't expect too much, though. I was reading an article on it and was excited to watch it, but it's too incoherent, in the usual fashion of today's RNNs.
A message to all of the vigilantes on HN: I'm sorry that the GOT text ended up in the repo. I actually worked on this project about a month ago and after showing it to a few friends, they convinced me to share it. I didn't think about what I had committed to the repository when I made it public and shared it. This is the first project I have ever shared on HN, so I didn't expect so many people to see it.
I actually tried to remove the text within the first few hours of posting (I was not even home or with my computer), but I removed the text in a separate branch and when I merged it into master it created duplicates of every commit instead of removing the text from the git history. Someone put up an issue on Github later that night and I fixed my mistake early the next morning.
I own all of the ASOIAF books both in paperback and ebook and pay for an HBO subscription for the sole purpose of watching GOT for 6-10 weeks per year.
"Raved on fire, he clambered up her pallet in a swatch of blood and crept toward him shudder. She was too well Ghost back behind their crews could not lack for the warg could stand guard. The singers was a fire with our beds they could have won that make yo, tell Joffrey awarded to make repairs and threatening to my sons had entered the flesh. A life and a dim his promises of Eddark Stark, he announced the foundation and set sail cracking their wounded and bucket swayed back room to go go. Har Tormund Well the Bastard s home When the Others followed him a fine rule the better look stupid. She could see the same only empty sockets weeping child he ran with that morning. His nose Wife pleased to the man with winter s notion. My brother will never lifted a scare me wise where."
I wonder if there are any forums/fan fiction websites where you could find big chunks of prose to add to the training data. Might help bring the loss down a bit if you mix some in.
Ya, I'm going to continue to tune it and try to get a more coherent plot. I think that I will have to (at a minimum) make the sequence length really long, which will make training time even slower.
Nice! I did a similar thing with a character-based rnn a few years ago, but its output was less coherent. I liked when it started spitting out character descriptions though: "NOTHAN, a boy of nine, cupped and dismal,"