segram.nlp.pipeline.base module
Base Segram module.
It implements the _Segram_ pipe component providing all main semantic grammar transformations and related auxiliary methods.
- class segram.nlp.pipeline.base.Segram(nlp: Language, name: str, *, grammar: str, preprocess: Sequence[str], alias: str = 'segram', vectors: str | Language | None = None, store_data: bool = True)[source]
Bases:
Pipe
Semantic grammar pipeline component.
It extends
spacy
token classes with semantic grammar methods and related functionalities such as custom preprocessing (e.g. merging and corrected lemmatization).- nlp
Language model object.
- name
Name of the component.
- extensions
Module defining custom
spacy
extensions.
- grammar
Label of grammar implementation.
- meta
Metadata dictionary with details on
spacy
and segram models being used.
- __init__(nlp: Language, name: str, *, grammar: str, preprocess: Sequence[str], alias: str = 'segram', vectors: str | Language | None = None, store_data: bool = True) None [source]
Initialization method.
- Parameters:
preprocess – List of
segram
pipeline components to use for preprocessing documents before applying the mainsegram
pipe. IfNone
then all available preprocessing components are used.alias – Set
spacy_alias
in the global settings. It is used for namespacing extension attributes added bysegram
in order to avoid collision with other packages.vectors – Vector table to use instead of the vectors provided by the main model. Must be provided by the name of a model or the model object itself, so the it is possible to keep track of the model name.
store_data – Should document data be stored automatically at the time of parsing.
- property id: int
Hash id of the component.
- static set_docattrs(doc: Doc, alias: str, meta: dict[str, Any]) None [source]
Set document attributes.
- static import_extensions(grammar: str, lang: str, alias: str) SpacyExtensions [source]
Import NLP module from grammar label and language code.
- Returns:
SpacyExtensions
instance.- Return type:
extensions