segram.grammar.doc module
- class segram.grammar.doc.Doc(doc: Doc | Doc, smap: Mapping[tuple[int, int], Sent] | None = None)[source]
Bases:
DocElementGrammar document class.
All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare
__slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard__init_subclass__interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. This is grammar equivalent of NLP documents.- doc
Underlying NLP document.
- smap
Mapping from sentence ids to sentences.
- property phrases: DataIterator[Phrase]
Phrase in the document grouped by sentences and conjunct groups.
- property components: DataIterator[Component]
Unique components by sentences.
- property has_vectors: bool
Check if document is equiped with word vectors.
- property vector: ndarray[tuple[int], floating]
Word vector.
- to_data() dict[str, Any][source]
Dump to data dictionary.
- Parameters:
grammar – Should grammar data be serialized too.
- classmethod from_data(doc: Doc, data: dict[str, Any]) Self[source]
Construct from NLP documet and data dictionary.
- Similarity
alias of
DocSimilarity