Benutzerspezifische Werkzeuge
Sie sind hier: Startseite Forschung Phonetik Sprachsynthese TXT2PHO Txt2Pho - Readme

Txt2Pho - Readme

TXT2PHO - a TTS front end for the German inventories of the MBROLA project.

Version 0.9.0

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Important since 0.8.2:
Command line options have changed!
.txt2phorc file will be read!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

LICENSE:

This program and object code is being provided to "you", the licensee, by Thomas Portele, the "author", under the following license, which applies to any program, object code or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this license. The "program", below, refers to any such program, object code or work.

By obtaining, using and/or copying this program, you agree that you have read, understood, and will comply with these terms and conditions:

Terms and conditions for the distribution of the program

This program may not be sold or incorporated into any product which is sold without prior permission from the author.

When no charge is made, this program may be copied and distributed freely, provided that this notice is copied and distributed with it. Each time you redistribute the program (or any work based on the program), the recipient automatically receives a license from the original licensor to copy or distribute the program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License.

If you wish to incorporate the program into other free programs whose distribution conditions are different, write to the author to ask for permission.

If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this license, they do not excuse you from the conditions of this license. If you cannot distribute so as to satisfy simultaneously your obligations under this license and any other pertinent obligations, then as a consequence you may not distribute the program at all. For example, if a patent license would not permit royalty-free redistribution of the program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this license would be to refrain entirely from distribution of the program.

Terms and conditions on the use of the program

Permission is granted to use this software for non-commercial, non-military purposes, with and only with the lexicon and prosody files made available by the author from the HADIFIX for MBROLA project (HADIFIXforMBROLA).

Disclaimer

THIS SOFTWARE CARRIES NO WARRANTY, EXPRESSED OR IMPLIED. THE USER ASSUMES ALL RISKS, KNOWN OR UNKNOWN, DIRECT OR INDIRECT, WHICH INVOLVE THIS SOFTWARE IN ANY WAY. IN PARTICULAR, THE AUTHOR DOES NOT TAKE ANY COMMITMENT IN VIEW OF ANY POSSIBLE THIRD PARTY RIGHTS.

GOAL

This software is made public in order to support the MBROLA project with a free German Text-to-Phoneme conversion, which is essentially the linguistic and prosodic part of the synthesis system tpo@ikp.uni-bonn.de). I will supply frequent updates of the lexica as bugfixes.

DISTRIBUTION:

This distribution contains the executables txt2pho and pipefilt, two file sets with prosodic information for male and female voices (*.int, *.dur, *.prt, *.amp), and the lexica (hadifix.sce, hadifix.scf, hadifix.scr), files with missing diphone lists (de1.mdp, and de2.mdp), two sets of parameters for the neural net duration control (kons.par, vokal.par) a few demo files (*.txt) and this file.

It requires the MBROLA synthesizer with at least one German database, obtainable from

http://tcts.fpms.ac.be/synthesis

This program is tested only tentatively - it is likely that certain phoneme combinations are not supported by the current German voices. In this case, let me know, and I will try to fix it.

INSTALLATION AND USAGE:

After the extraction of the file txt2pho.tgz a subdirectory txt2pho_data with all files necessary to run txt2pho is created.

A sample file .txt2phorc is also included - copy it to your home directory and edit it.

txt2pho now has proper command line options. Try "txt2pho -h" for a summary. Sorry, no backward compatibility...

The program
reads text from stdin and writes the MBROLA commands to stdout. In its current version every NEWLINE is treated as a marker for sentence finality. When synthesizing continuous text this will lead to unwanted breaks in an utterance. The program pipefilt should correct this fault in a very simple fashion: it inserts a NEWLINE after every '.:;!?', followed by a whitespace, and changes all other NEWLINEs into blanks. There are of course much better and application-oriented methods for this.

In its current stage txt2pho does not much preprocessing. Abbreviations, numbers, symbols etc. are not handled.

 

IMPORTANT:
A program that allows reading of numbers written by Michael Binder can be obtained from ftp.gwdg.de/pub/linux/misc/text_to_speech/numfilt-0.1.tar.gz

Another program that does text preprocessing can be obtained from Preproc.html

German ä can be entered as ä, "a, or ae (but that's not unambiguous). TeX notation is safe. Is it? I'm not sure about this :-(

The file with missing diphones belongig to the current database must be copied to imppairs.xxx in the txt2pho_data directory prior to the start of txt2pho.

WHAT DOES IT DO, AND HOW:

The conversion from graphemes to phonemes is done using a lexicon with more than 50000 entries and flexion tables. The system tries to find compound words (which does sometimes leads to interesting results). A description of the linguistic processing is given in (Portele & Kraemer, 1995 - ftp://ftp.asl1.ikp.uni-bonn.de/pub/vm41/tpwolf95.ps.gz - in German). During this process word classes are derived which are converted to prominence values (Widera, Portele & Wolters, 1997 - not yet on the Internet). These parameters drive a prominence-based synthesis system (Portele & Heuft, 1996 - ftp://asl1.ikp.uni-bonn.de/pub/vm41/tpspeak2.ps.gz). A decision tree (*.prt) converts the prominence values into prosodic parameters (Heuft & Portele, 1996 - ftp://asl1.ikp.uni-bonn.de/pub/vm41/bhics962.ps.gz). Here, speaker dependent parameters (*.int for f0, *.dur for duration) are used.

From version 0.7.0 the sound durations can be controlled by a number of neural networks. The use of the networks is controlled by a line in the file "hadifix.cfg"
USENET=<X>
with 0=Do not use NN
     1=Use one NN for vowels and one for consonants
     2=Use several NN for different phonemes

From version 0.7.2 a new decision tree for intonation contour generation "newtree.prt" is included which uses a phonological description of German intonation peaks. The use of this tree is controlled by a line in the file "hadifix.cfg"
NEWTREE=<tree_name>

The reduction level can be defaulted by a line in the file "hadifix.cfg"
REDUCTION=<level>
with level 0 to 6. The higher the number, the more symbolic reductions will be simulated.

Form version 0.7.7 the speechrate can be controlled by a line in the file "hadifix.cfg"
SPEECHRATE=<rate>
1.0 is normal, 1.5 is slow, 0.5 is fast

From version 0.8.4 the way internal syllable prominences are computed can be controlled by a line in the file "hadifix.cfg"
PROMCOMP=<X>
with 0=Use Rules
     1=Use CART tree

From version 0.8.7 a comment can be added to the input by {Comment:XXX}, and the output will contain a line
; XXX

MANIPULATIONS:

Manipulations at the lexicon files are not allowed. Manipulations of the decision tree is possible but dangerous. In the files *.dur the first parameter denotes the default speaking rate. In the files *.int pitch height is controlled, and these numbers can be adapted to personal needs.

A number of control sequences can be included (Portele, 1996, - ftp://asl1.ikp.uni-bonn.de/pub/vm41/tpdaga962.ps.gz) in the orthographic input when placed between braces. A space character may not appear in a command. Commands must be separated from other words or commands by a space character. Commands contain a keyword followed by a colon and the command parameters. Among the commands are

Voice

The command value is the name of the inventory to be used from now on. Its files must be in the directory of the current inventory. However, this only changes the prosodic output and has no effect on the MBROLA synthesizer. Possible names are "male" and "female".

Duration

accepts positive and negative values. The speech rate is changed accordingly.

Pitch

accepts positive and negative values in Hz. The average pitch changes accordingly.

Pause

inserts a pause. The command parameter specifies the length in milliseconds.

WordProminence

controls the prominence of the following word and can be set between 0 (not prominent) and 31 (very prominent).

WordRhythm

modifies the rhythm of a word. For each syllable one of the characters _ (not accented), ' (primary accent), ,(secondary accent) must appear in the command parameter string.

Transcription

gives the transcription for the following word. Syllable boundaries must be indicated by a | and the transcription must be in SAMPA . Primary ' and secondary , accent marks can be included after the syllable boundary sign.

BoundaryStrength

determines the number of boundaries between words. The lower this value is, the more boundaries will be inserted. 0 produces a boundary after every word. Default is 2.

Accent

mimics in a very crude way some local varieties of Standard German. Up to now, only Rheinisch is a valid parameter.

Singing

allows the synthesis of a song. For every syllable the length and the value of the pertinent note must be specified. The length may be one of 4 , 8 , 16 , whereby 16 is the shortest. The value can be one of c , cis , d , es , e , f , fis , g , as , a , b , h . If the value is followed by a ' it is assumed to be in the next octave. Length and value are seperated by a comma, notes for different syllable by a semicolon. A command might therefore look like: {Singing:8,f;8,g;4,e}. Multiple singing commands can appear in one utterance.

SetSingingTempo

defines the tempo of the song from now on. Valid parameter values are a number for beats per minute or one of the following strings:
Largo , Larghetto , Adagio , Andante , Alegretto , Allegro , Presto .

SetOctave

defines the heigth of the singing voice. Valid parameters are C , c , c' , c'' , c''' .

UseNet

controls the use of the Neural Networks for duration control. It overrides the values set in the file "hadifix.cfg"

Comment

hands a number of characters over to the output

Examples can be found in the files with the extension .txt in the txt2pho_data directory

PROBLEMS:

  • Because txt2pho was made for a more complex inventory than diphones where every phoneme combination is synthesizable the system might demand combinations not in the inventory. In such case a substitute is provided, however, the speech quality might be degraded,
  • Some reductions inherent in the original inventory cannot be synthesized adequately with the diphone databases, and the speech may sometimes sound overaticulated (e.g. non-syllablic vowels in "Bosnien" or syllabic consonants in "haben").
  • A few words are wrongly accented
  • A number of irrelevant error messages may appear in the log file

 

HISTORY:

22.11.1997 French nasal vowels and English /EI/ remapped
26.11.1997 New version of the lexicon
28.11.1997 Fix for the "missing diphone" problem
13.12.1997 Command line argument for data directory / new version of the lexicon
3.1.1998 new lexicon version
6.1.1998
 
bug in syllable border placement caused wrong accentuation - fixed different boundary strengths can be adjusted, new default value new version of the lexicon
22.1.1998 bug in preprocessing, new version of the lexicon
24.2.1998 new version of the lexicon, correction in spelling routine
6.3.1998 new version of the lexicon
4.5.1998 minor revisions
3.7.1998 duration network included, 'R@R' error fixed, new version of the lexicon
4.7.1998 severe bug in compound analysis fixed
14.7.1998 new intonation model, reduction level externalized
15.7.1998 version number in program
14.8.1998 fixed all storage leaks, tested by processing 1,2M words
18.8.1998 switched to standard transcription SAMPA, standard input ANSI
30.8.1998 fixed bug in rule transcription, new version of lexicon
10.9.1998 fixed a bug in duration network, enabled SPEECHRATE in config file
15.9.1998 included decision tree duration control, fixed bug in word prosody module
20.9.1998 bug in lexicon lookup fixed, new lexicon
21.11.1998 some bugs, new lexicon, better number support
8.1.1999 a large number of bugfixes, new lexicon
24.1.1999 other bugfixes, proper command line support, CART prominence computation
27.1.1999 rc file included
12.3.1999 several fixes, small changes, {Comment}-feature
19.5.1999 some minor fixes
1.6.1999 fixed some severe errors

Acknowledgements:

I got a lot of help from a number of people:

Barbara Heuft
Horst Meyer
Florian Höfer
Monika Rauth
Gerit Sonntag
Jürgen Krämer
Stefan Breuer
Jörg Bröggelwirth
Felicitas Haas
Volker Strom
Karlheiz Stöber
Anja Elsner
Wolfgang Hess
Dieter Stock
Dietmar Lance
Bernhard Schröder
...

and my family (as usual).

Thomas Portele
email: thomas@portele.de

Artikelaktionen