Format Conversion¶
The CHAT class can convert CHAT data to other annotation formats.
CHAT to ELAN¶
|
Convert to an ELAN object. |
|
Write ELAN (.eaf) files to a directory. |
Return EAF XML strings, one per file. |
to_elan() converts CHAT data to an
ELAN object.
Each CHAT file produces one ELAN file.
Tier mapping:
Each CHAT participant (e.g.,
*CHI:,*MOT:) becomes an alignable (time-aligned, parent) tier in ELAN, with the tier ID set to the participant code (e.g.,CHI,MOT).Each CHAT dependent tier (e.g.,
%mor,%gra,%gpx) becomes a reference annotation (child) tier in ELAN, with the tier ID{tier}@{participant}(e.g.,mor@CHI,gra@MOT).If the CHAT file has an
@Mediaheader, an ELANMEDIA_DESCRIPTORelement is included.
Example:
import pylangacq
chat = pylangacq.read_chat("path/to/your/data.cha")
# Convert to an ELAN object
elan = chat.to_elan()
# Write .eaf files to a directory
chat.to_elan_files("output_dir/")
# With custom filenames
chat.to_elan_files("output_dir/", filenames=["alice.eaf", "bob.eaf"])
To get EAF XML strings in memory (e.g., for inspection or further processing),
use to_elan_strs():
eaf_strings = chat.to_elan_strs()
The resulting ELAN object (or .eaf files) can be opened in
ELAN
or further processed with rustling.elan.ELAN.
CHAT to TextGrid¶
|
Convert to a TextGrid object. |
|
Write TextGrid (.TextGrid) files to a directory. |
|
Return TextGrid format strings, one per file. |
to_textgrid() converts CHAT data to a
TextGrid object.
Each CHAT file produces one TextGrid file.
Mapping:
Each CHAT participant becomes an IntervalTier (tier name = participant code).
Utterances without time marks are skipped.
Times are converted from milliseconds to seconds.
Participant selection:
By default, all participants are included.
To select specific participants, pass the participants keyword argument:
import pylangacq
chat = pylangacq.read_chat("path/to/your/data.cha")
# Convert to a TextGrid object
textgrid = chat.to_textgrid()
# Only include specific participants
textgrid = chat.to_textgrid(participants=["CHI"])
# Write .TextGrid files to a directory
chat.to_textgrid_files("output_dir/")
# With custom filenames
chat.to_textgrid_files("output_dir/", filenames=["child.TextGrid"])
To get TextGrid strings in memory, use to_textgrid_strs():
textgrid_strings = chat.to_textgrid_strs()
The resulting TextGrid object (or .TextGrid files)
can be opened in Praat.
CHAT to CoNLL-U¶
Convert to a CoNLL-U object. |
|
|
Write CoNLL-U (.conllu) files to a directory. |
Return CoNLL-U format strings, one per file. |
to_conllu() converts CHAT data to a
CoNLLU object.
Each CHAT file produces one CoNLL-U file, with each utterance becoming one sentence.
Mapping:
Each CHAT utterance becomes one CoNLL-U sentence.
Token.wordmaps to FORM.Token.pos(from%mor) maps to UPOS.Token.mor(from%mor) maps to LEMMA.Token.gra(from%gra) maps to HEAD and DEPREL.Fields without a direct mapping (XPOS, FEATS, DEPS, MISC) are set to
_.
Example:
import pylangacq
chat = pylangacq.read_chat("path/to/your/data.cha")
# Convert to a CoNLL-U object
conllu = chat.to_conllu()
# Write .conllu files to a directory
chat.to_conllu_files("output_dir/")
# With custom filenames
chat.to_conllu_files("output_dir/", filenames=["output.conllu"])
To get CoNLL-U strings in memory, use to_conllu_strs():
conllu_strings = chat.to_conllu_strs()
The resulting CoNLLU object (or .conllu files)
can be used with Universal Dependencies tools.
CHAT to SRT¶
|
Convert to an SRT object. |
|
Write SRT (.srt) files to a directory. |
|
Return SRT format strings, one per file. |
to_srt() converts CHAT data to an
SRT object.
Each CHAT file produces one SRT file.
Mapping:
Each CHAT utterance with time marks becomes one subtitle block.
Utterances without time marks are skipped (SRT requires time ranges).
When multiple participants are present, the subtitle text is prefixed with the participant code (e.g.,
"CHI: more cookie ."). For a single participant, no prefix is added.
Participant selection:
By default, all participants are included.
To select specific participants, pass the participants keyword argument:
import pylangacq
chat = pylangacq.read_chat("path/to/your/data.cha")
# Convert to an SRT object
srt = chat.to_srt()
# Only include specific participants
srt = chat.to_srt(participants=["CHI"])
# Write .srt files to a directory
chat.to_srt_files("output_dir/")
# With custom filenames
chat.to_srt_files("output_dir/", filenames=["child.srt"])
To get SRT strings in memory (e.g., for inspection or further processing),
use to_srt_strs():
srt_strings = chat.to_srt_strs()
The resulting SRT object (or .srt files)
can be opened in any media player or subtitle editor.