Self-Generated Training Examples is actually possible right now.

Written by LWFlouisa — July 6, 2019

First on a Linux machine (usually Ubuntu, I use Mint):

sudo apt-get install ruby-full; sudo gem install gabbler; sudo gem install espeak-ruby

Now go into a text editor, and change your formatting to Ruby. Now create a simple gabbler file.

def gabble_with_examples
  require "gabbler"
  require "espeak"
  gabbler = Gabbler.new
  data    = File.read("data/examples.txt")
  gabbler.learn(data)
  phrase = gabbler.sentence
  speech = ESpeak::Speech.new(phrase)
  speech.speak
end

Note that I place gabbler inside of a method, the reason is that you will call another method to generate the training examples. This should be placed above your gabble_with_examples method.

Now to create the grammar-generator method:

def grammar_generator
  def greeting
    hi    =    "Hi "
    hey   =   "Hey "
    hello = "Hello "
    heyo  =  "Heyo "
    ola   =   "Ola "
greeting_index = [
      hi, hey, heyo, hello, ola
    ]
$do_greeting = greeting_index.sample
  end
def agent
    your_name = File.read("identity/config_name.txt").strip
$do_agent = your_name
  end
def prompt
    good_morning   = ", good morning!\n"
    how_are_you    = ", how are you?\n"
    you_doing_well = ", you doing well?\n"
    prompt_index = [
      good_morning, how_are_you, you_doing_well
    ]
  $do_prompt = prompt_index.sample
  end
open("data/examples.txt", "w") { |f|
    9.times do
      g = greeting;
      g = agent;
      p = prompt;
      f.print $do_greeting
      f.print $do_agent
      f.print $do_prompt
    end
  }
end

Now simply call you methods: remember to call the grammar generator first:

g1 = grammar_generator; g2 = gabble_with_examples

So it's not actually that far off in the future to be able to create an algorithm that can generate its own training examples. It's more about how complex you want to write your code.

I usually use multiple machine learning algorithms: ex. Decision Trees are usually best as an input learning algorithm, while gabbler is usually better as a data sampling algorithm.

This article is under the CC-BY-NC-ND license.

Add yours

Boost

LWFlouisa

I am a poet, short story, novellette, and novella writer. Genres in the past I've written include Time Travel, Post Apocalyptic, Cyberpunk, and GameLit. I briefly ventured into Slipstream, and now circling back into writing GameLit fiction with literary elements.

Nowadays I simply want to write, and not worry about genre.

Comments

LWFlouisa LWFlouisa@cafe.sunbeam.city

July 6, 2019 06:56

Also Plume in general seems to effect the alignment, so remember to put in the proper spacing again.

Respond

anymus anymus

July 15, 2019 18:55

I may not have understood you correctly, but I've got the feeling that what you describe is actually called "self-training" in machine learning. It's been researched since many years, but there is a fundamental flaw about this method: it's that when you generate training samples with model A in order to retrain model A, you do not infuse any new information in your training data ! So the next version of model A, after being trained on the samples that it has generated itself, will not contain any new information, and so will hardly be better than before. Of course, what I say is not 100% true, because it really depends on how you "generate" data; it may be that you actually capture real data and merely generate "labels" on the data - this is actually self-training-, and in that case, you may capture some new information coming from the fact that it is real data from which you can learn new information in the form of priors for instance, but even in this favorable case, the additional amount of information is quite small and self-training quickly leads to a plateau in terms of performances.

To conclude, I think nothing replaces real data, and self-generated data are not a panace... :-)

Respond