First on a Linux machine (usually Ubuntu, I use Mint):
sudo apt-get install ruby-full; sudo gem install gabbler; sudo gem install espeak-ruby
Now go into a text editor, and change your formatting to Ruby. Now create a simple gabbler file.
def gabble_with_examples
require "gabbler"
require "espeak"
gabbler = Gabbler.new
data = File.read("data/examples.txt")
gabbler.learn(data)
phrase = gabbler.sentence
speech = ESpeak::Speech.new(phrase)
speech.speak
end
Note that I place gabbler inside of a method, the reason is that you will call another method to generate the training examples. This should be placed above your gabble_with_examples method.
Now to create the grammar-generator method:
def grammar_generator
def greeting
hi = "Hi "
hey = "Hey "
hello = "Hello "
heyo = "Heyo "
ola = "Ola "
greeting_index = [
hi, hey, heyo, hello, ola
]
$do_greeting = greeting_index.sample
end
def agent
your_name = File.read("identity/config_name.txt").strip
$do_agent = your_name
end
def prompt
good_morning = ", good morning!\n"
how_are_you = ", how are you?\n"
you_doing_well = ", you doing well?\n"
prompt_index = [
good_morning, how_are_you, you_doing_well
]
$do_prompt = prompt_index.sample
end
open("data/examples.txt", "w") { |f|
9.times do
g = greeting;
g = agent;
p = prompt;
f.print $do_greeting
f.print $do_agent
f.print $do_prompt
end
}
end
Now simply call you methods: remember to call the grammar generator first:
g1 = grammar_generator; g2 = gabble_with_examples
So it's not actually that far off in the future to be able to create an algorithm that can generate its own training examples. It's more about how complex you want to write your code.
I usually use multiple machine learning algorithms: ex. Decision Trees are usually best as an input learning algorithm, while gabbler is usually better as a data sampling algorithm.
Comments
July 6, 2019 06:56
Also Plume in general seems to effect the alignment, so remember to put in the proper spacing again.
July 15, 2019 18:55
I may not have understood you correctly, but I've got the feeling that what you describe is actually called "self-training" in machine learning. It's been researched since many years, but there is a fundamental flaw about this method: it's that when you generate training samples with model A in order to retrain model A, you do not infuse any new information in your training data ! So the next version of model A, after being trained on the samples that it has generated itself, will not contain any new information, and so will hardly be better than before. Of course, what I say is not 100% true, because it really depends on how you "generate" data; it may be that you actually capture real data and merely generate "labels" on the data - this is actually self-training-, and in that case, you may capture some new information coming from the fact that it is real data from which you can learn new information in the form of priors for instance, but even in this favorable case, the additional amount of information is quite small and self-training quickly leads to a plateau in terms of performances.
To conclude, I think nothing replaces real data, and self-generated data are not a panace... :-)