segram.nlp.pipeline.base module
Base Segram module.
It implements the _Segram_ pipe component providing all main semantic grammar transformations and related auxiliary methods.
- class segram.nlp.pipeline.base.Segram(nlp: Language, name: str, *, grammar: str, preprocess: Sequence[str], alias: str = 'segram', vectors: str | Language | None = None, store_data: bool = True)[source]
Bases:
PipeSemantic grammar pipeline component.
It extends
spacytoken classes with semantic grammar methods and related functionalities such as custom preprocessing (e.g. merging and corrected lemmatization).- nlp
Language model object.
- name
Name of the component.
- extensions
Module defining custom
spacyextensions.
- grammar
Label of grammar implementation.
- meta
Metadata dictionary with details on
spacyand segram models being used.
- __init__(nlp: Language, name: str, *, grammar: str, preprocess: Sequence[str], alias: str = 'segram', vectors: str | Language | None = None, store_data: bool = True) None[source]
Initialization method.
- Parameters:
preprocess – List of
segrampipeline components to use for preprocessing documents before applying the mainsegrampipe. IfNonethen all available preprocessing components are used.alias – Set
spacy_aliasin the global settings. It is used for namespacing extension attributes added bysegramin order to avoid collision with other packages.vectors – Vector table to use instead of the vectors provided by the main model. Must be provided by the name of a model or the model object itself, so the it is possible to keep track of the model name.
store_data – Should document data be stored automatically at the time of parsing.
- property id: int
Hash id of the component.
- static set_docattrs(doc: Doc, alias: str, meta: dict[str, Any]) None[source]
Set document attributes.
- static import_extensions(grammar: str, lang: str, alias: str) SpacyExtensions[source]
Import NLP module from grammar label and language code.
- Returns:
SpacyExtensionsinstance.- Return type:
extensions