segram.grammar.doc module

class segram.grammar.doc.Doc(doc: Doc | Doc, smap: Mapping[tuple[int, int], Sent] | None = None)[source]

Bases: DocElement

Grammar document class.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. This is grammar equivalent of NLP documents.

doc: Underlying NLP document.

smap: Mapping from sentence ids to sentences.

property sents: DataTuple[Sent]: Sentences in the document.

property phrases: DataIterator[Phrase]: Phrase in the document grouped by sentences and conjunct groups.

property components: DataIterator[Component]: Unique components by sentences.

property tokens: DataTuple[Token]: Tokens sequence of the element.

property has_vectors: bool: Check if document is equiped with word vectors.

property vector: ndarray[tuple[int], floating]: Word vector.

is_comparable_with(other: Any) → bool[source]: Are self and other comparable.

to_str(**kwds: Any) → str[source]: Represent as string.

to_data() → dict[str, Any][source]

Dump to data dictionary.

Parameters:: grammar – Should grammar data be serialized too.

copy() → Self[source]: Copy self and modify attributes with **kwds.

classmethod from_data(doc: Doc, data: dict[str, Any]) → Self[source]: Construct from NLP documet and data dictionary.

Similarity: alias of DocSimilarity

classmethod from_doc(doc: Doc, *args: Any, **kwds: Any) → Self[source]: Construct from NLP document object.