Jump to content

Can you make a voice synthesizer in Autoit? Here the answer...


Mateocedillo
 Share

Recommended Posts

I don't know if anyone has ventured to do something like this; vos synthesis (tts) I can say that it is a utility and more than that that allows you to transcribe written text to spoken voice, a clear example is the voices of microsoft sapi5 and third-party voice engines such as Loquendo, Vocalizer, Ibona etc. I, as a person with disabilities, use this resource a lot because an example is the screen reader, which makes the text that appears on the screen or the maus read it with a voice, since it is more than clear that these screen readers They use text to speech.
A few months ago I began to investigate a bit of an idea to create a voice synthesis. I have been learning Autoit for two years now and it would be a language to fulfill everything I wanted, although this was difficult at first, but now what? Two months ago I was developing a mini text-to-speech engine in autoit. Although I know that for many this is incredible, but apart from the automation with autoit you can get the rest of its potential. Just visit this code and I would like you to comment on it. By the way, the voice synthesizer is in Spanish, but it can be adapted to another language such as English.
https://github.com/rmcpantoja/SUCSpeech
I share the help of the synthesizer:

## Introduction:

SucSpeech is a free text-to-speech synthesizer and has two synthesis modes: Simple (letters) and advanced (syllables). These syntheses use the unit selection method, concatenating audio files that correspond to letters or syllables. It is made in [autoit](http://autoitscript.com/) which means it supports only Windows. In fact, the program [Blind Text](https://github.com/rmcpantoja/Blind-Text) has clear examples of how this synthesizer is used in the program.
The Synthesizer-comaudio.au3 file is the main engine or base of the synthesizer. You can explore it and see how it is made. Don't worry, there are comments on the necessary lines.

## instructions for creating voices:
1. [Download](https://github.com/rmcpantoja/SUCSpeech) or clone the SucSpeech repository using Git clone or the URL in your browser.
2. Download [autoit](https://www.autoitscript.com/cgi-bin/getfile.pl?autoit3/autoit-v3-setup.exe) to run.
3. To create a voice, in simple synthesis mode, keep in mind that we must record the phonemes and letters as they sound (this can record words that contain a sound for a specific letter or vowel and cut with an editor audio).
4. In the case of the advanced synthesis mode (syllables) we must record longer sentences or phrases and cut each syllable of that sentence, but there must not be clips left and we must cut very carefully so that the voice can come out as correctly as possible possible.
There are examples in the voicepacks_source folder, with 120 .wav files with the sounds and phonemes recorded in each voice. For each letter, vowel or sign, three audios are needed. For example, if we want to record the sound of the vowel a, then we have to record an a1.wav, a2.wav and a3.wav, the three vowels with different pitches or else the voice sounds monotonous when processing text. Apply the same to the Voicepacks_source_pro folder, but in this case it's syllables.
Note: There should be no silence in the files.
simple mode:
To start making the voice, we simply create a new folder in voicepack_sources with the name of the language, an underscore, and the name of the voice. For example Es_Fulano. "Es" from Español (Spanish), "Fulano" the name of the voice. From that folder we can make our recordings. We can build on the structure of the voices that are already integrated in the repository. Basically there are three sounds from a to z, signs like dot, comma, percent, plus, minus, phonemes like ch, sh, etc.
Advanced mode:
To create a more advanced and high-quality voice, we also create a subfolder in Voicepacks_source_pro with the language name, an underscore, the voice name, underscore, and hq, for example: Es_carla_hq. It should be noted that this mode is still in beta, but it can be recorded following the structure of the es_default_hq voice, which is a few steps away from being completed.
5. If we already have the voice, we have to package it.The first thing is to enter the folder of your voice, select all 120 files and create a .zip without compression with your favorite program.
5.1. One last thing: Once the zip has been created, we must execute the encrypter.au3 that is located in the root folder of the source code. Once opened, an open dialog opens where you can select the .zip file that you just created and, once selected, another window will appear with the explorer that now allows you to save the encrypted zip file, which guarantees more security since unlike zip this encryptor offers encryption with a password. The encrypted files must be saved in voicepacks, and the extension must be .dat.
Then we can make a script as an example to see how our voice turned out or to test an existing voice. You must include the include\synthesizer-comaudio.au3 file.

### example:

hablarenletras("Es_default (wisper)", "Este es un ejemplo de síntesis de voz. Me llamo susurro y te voy a contar un secreto: El día de ayer fui a la tienda y me compré diez manzanas.", 1, 0.75)

### Explanation:

The function is HablarEnLetras (SpeakInLetters), followed by the parameters. The first is the name of the voice (es_default (wisper)), the second is the string or text "Este es un ejemplo de síntesis de voz. Me llamo susurro y te voy a contar un secreto: El día de ayer fui a la tienda y me compré diez manzanas.", third volume (1) and fourth speed (0.75).
In this way, executing the script, the test of our voice among all the available ones would come out.

## Collaboration

If you have any suggestions that help improve this project, do not hesitate to make a pull request. Your help and suggestions are welcome!

Link to comment
Share on other sites

  • 4 months later...

Hi guys,
I have made an update for this speech synthesizer. Many bugs have been fixed, the sound library has been changed, the code has been optimized, some functions have been changed, well... there are many functions, although I have a list of changes. It is recommended to download this code if you want to test, because the previous one had errors.
Although the alignment in this sound system is not correct because many clips are produced, I am thinking of rewriting a new version with an even more advanced synthesis mode than the two existing in this version.
I hope you like it, any suggestion is welcome.
Changes:
1. More signs are taken into account according to an established score.
2. Optimized the code and restructured the folders. Now the data of all the synthesizer such as voices and dictionaries in question are in the "Sucspeech" folder within the program.
3. Changed sound system to BASS.
4. When synthesizing the text, it is no longer concatenated in real time; on the contrary, first it does the concatenation of all the text, synthesizing it and then reproducing the joint result, this has been a great change without a doubt to reduce the use of the CPU and make it work in CPU'S of any performance, which avoids the intermissions of sound, definitely.
5. Added support for saving the output of a speech synthesis result to a sound file.
Follow the repository and support it:
https://github.com/rmcpantoja/SUCSpeech/

Edited by Mateocedillo
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...