API Reference¶
- pylangacq.read_chat(path: str, *, filter_files: str | Sequence[str] | None = None, filter_participants: str | Sequence[str] | None = None, cls: type = <class 'builtins.CHAT'>, strict: bool = True) CHAT¶
Read CHAT data.
- Parameters:
path – Path to a
.zipfile, a local directory containing.chafiles, or a single.chafile.filter_files – Filename(s) to keep. Regular expression matching is supported. If
None, all files are included.filter_participants – Participant code(s) to keep. Regular expression matching is supported. If
None, all participants are included.cls – The class used to create the reader. Must be
CHATor a subclass of it.strict – If
True, enforce strict parsing of the CHAT data.
- Returns:
A
CHATinstance filtered by the specified files and participants.- Raises:
TypeError – If cls is not
CHATor a subclass of it.ValueError – If path does not point to a
.zipfile, a directory, or a.chafile.
- class pylangacq.Age¶
Age in the CHAT format: years;months.days (e.g., “2;10.05”).
- in_months()¶
Return the age in total months as a float.
- class pylangacq.CHAT¶
CHAT data reader for CHILDES/TalkBank transcripts.
- ages()¶
Return the age of the target child (CHI) in each file.
- append(other)¶
Append data from another CHAT reader.
- append_left(other)¶
Left-append data from another CHAT reader, preserving order.
- clear()¶
Remove all data from this reader.
- extend(others)¶
Extend data from multiple CHAT readers.
- extend_left(others)¶
Left-extend data from multiple CHAT readers, preserving order.
- file_paths¶
Return the list of file paths.
- filter(*, files=None, participants=None)¶
Return a new CHAT filtered by file path and/or participant regex.
- classmethod from_dir(path, match=Ellipsis, extension='.cha', parallel=True, strict=True)¶
Recursively load CHAT data from a directory.
- classmethod from_files(paths, parallel=True, strict=True)¶
Load CHAT data from file paths.
- classmethod from_strs(strs, ids=None, parallel=True, strict=True)¶
Parse CHAT data from in-memory strings.
- classmethod from_zip(path, match=Ellipsis, extension='.cha', parallel=True, strict=True)¶
Load CHAT data from a ZIP archive.
- head(n=5)¶
Return the first n utterances with a formatted display.
- headers()¶
Return file-level headers.
- info(*, verbose=False)¶
Print a summary of this reader’s data.
- ipsyn(*, participant='CHI', n=Ellipsis)¶
Index of Productive Syntax, one value per file.
- languages(*, by_file=False)¶
Return languages, optionally grouped by file.
- mlu(*, participant='CHI', n=Ellipsis)¶
Mean length of utterance in morphemes, one value per file.
Alias for [mlum][Chat::mlum].
- mlum(*, participant='CHI', n=Ellipsis)¶
Mean length of utterance in morphemes, one value per file.
- mluw(*, participant='CHI', n=Ellipsis)¶
Mean length of utterance in words, one value per file.
- n_files¶
Return the number of files.
- participants(*, by_file=False)¶
Return participants, optionally grouped by file.
- pop()¶
Remove and return the last file as a new CHAT reader.
- pop_left()¶
Remove and return the first file as a new CHAT reader.
- tail(n=5)¶
Return the last n utterances with a formatted display.
- to_chat(path, *, is_dir=False, filenames=None)¶
Write CHAT data to disk.
- to_strs()¶
Return CHAT data strings, one per file.
- tokens(*, by_utterance=False, by_file=False)¶
Return tokens, optionally grouped by utterance and/or file.
- ttr(*, participant='CHI', n=Ellipsis)¶
Type-token ratio, one value per file.
- utterances(*, by_file=False)¶
Return utterances, optionally grouped by file.
- word_ngrams(n)¶
Return an Ngrams for word n-grams across all utterances.
N-grams do not cross utterance boundaries.
# Arguments
n - The n-gram order (1 for unigrams, 2 for bigrams, etc.).
- words(*, by_utterance=False, by_file=False)¶
Return words, optionally grouped by utterance and/or file.
- class pylangacq.ChangeableHeader¶
A changeable header that can appear mid-file in CHAT transcripts.
- class pylangacq.Gra(dep, head, rel)¶
A grammatical relation from the %gra tier.
- class pylangacq.Headers¶
All file-level (non-changeable) headers from a CHAT file.
- class pylangacq.Ngrams(n, *, min_n=None)¶
An n-gram counter for counting n-gram frequencies.
Accumulates n-gram counts from sequences of elements. N-grams do not cross sequence boundaries.
- clear()¶
Clear all counts.
- count(seq)¶
Count n-grams from a single sequence.
Extracts all n-grams of the configured order from the sequence and increments their counts. N-grams do not cross sequence boundaries.
- count_seqs(seqs)¶
Count n-grams from multiple sequences.
Each sequence is treated independently (n-grams do not cross boundaries).
- get(ngram)¶
Return the count for a specific n-gram.
Returns 0 if the n-gram has not been observed.
- items(*, order=None)¶
Return all (n-gram, count) pairs.
- min_n¶
The minimum n-gram order.
- most_common(n=None, *, order=None)¶
Return the n most common n-grams with their counts.
If n is None, returns all n-grams sorted by count (descending).
- n¶
The n-gram order.
- to_counter(*, order=None)¶
Convert to a Python
collections.Counter.Returns a
Countermapping n-gram tuples to their counts.
- total(*, order=None)¶
Return the total number of n-gram tokens counted.
- class pylangacq.Participant¶
A single participant from @Participants + @ID fields merged.
- class pylangacq.Token(word, pos=None, mor=None, gra=None)¶
A token with word, POS, morphology, and grammatical relation.
- class pylangacq.Utterance(*, participant=None, tokens=None, time_marks=None, tiers=None, changeable_header=None)¶
A single utterance from a CHAT transcript.
For changeable headers (e.g., @Comment, @New Episode), only changeable_header is set; all other fields are None.
- raw¶
Raw transcript of this utterance, or None for headers.
- to_str()¶
Return a plain text tabular representation of this utterance.
- class pylangacq.Utterances¶
A sequence of utterances with a formatted display for terminal/notebook use.
Returned by [Chat::head] and [Chat::tail].