Saturday, July 20, 2019

A note on quality and how to improve it

The poems generated by the neural network aren't very understandable. They can be improved in four ways:

1) In the training file data. This needs to be as clean as possible with as good data as we can accumulate, the more the better. There is an upper limit on this but we're not there yet.

2) At the training cycle itself. There are multiple parameters to change including the number of neurons and the number of layers. These become more important as the training file data gets better and bigger. With a little bit of jiggery pokery we can maybe get better sampling files.

3) Doing a bit of cleanup on the generated poems. There is some rudimentary plugins right now for spell checking and tense normalization. We're experimenting with using these. Think of these poems as first drafts. Perhaps we can do a little bit of autocorrection before tweeting.

4) Picking a poem to tweet. Right now we generate a bunch of poems (sometimes it spits out only 1 long one though) and pick the smallest one to tweet so that it hopefully fits into the thread limits we've imposed (currently no more than 3 tweets in a thread). We're exploring different ways for the bot to choose the best poems of the ones generated. We're also planning on instituting a feedback loop so the bot can learn from its better liked poems and choose ones it thinks will be better liked from the generated poems.

Suggestions or comments welcome.

No comments:

Post a Comment