segram.grammar.phrases module

class segram.grammar.phrases.Phrase(*args: Any, **kwds: Any)[source]

Bases: TokenElement

Sentence phrase class.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. .. attribute:: dep

Dependency relative to the (main) parent.

sconj

Subordinating conjunction token.

__init__(tok: Token, *, dep: Dep = Dep.misc, sconj: Token | None = None, lead: int | None = None) None[source]

Initialization method.

Parameters:

lead – Index of the lead phrase in a conjunct group.

property idx: int

Index of the head token.

property head: Component

Head component of the phrase.

property lead: Self

Lead phrase.

property is_lead: Self

Is the phrase a lead phrase.

property tokens: tuple[Token, ...]

Tokens sequence of the element.

property data: dict[str, Any]

Dictionary mapping names and values for main slots.

property children: PhraseGroup[Phrase]

Child phrases.

property parents: PhraseGroup[Phrase]

Parent phrases.

property subdag: PhraseGroup[Phrase]

Phrasal proper subdag.

property supdag: PhraseGroup[Phrase]

Phrasal proper superdag.

property depth: int

Depth of the phrase within the phrasal tree of the sentence.

property conjuncts: Conjuncts

Conjoined phrases.

property group: Conjuncts

Group of self and its conjoined phrases.

property verb: PhraseGroup[Phrase]

Return self if VP or nothing otherwise.

property subj: PhraseGroup[Phrase]

Subject phrases.

property dobj: PhraseGroup[Phrase]

Direct object phrases.

property iobj: PhraseGroup[Phrase]

Indirect object phrases.

property desc: PhraseGroup[Phrase]

Description phrases.

property cdesc: PhraseGroup[Phrase]

Clausal descriptions.

property adesc: PhraseGroup[Phrase]

Adjectival complement descriptions.

property prep: PhraseGroup[Phrase]

Prepositions.

property pobj: PhraseGroup[Phrase]

Prepositional objects.

property subcl: PhraseGroup[Phrase]

Subclauses.

property relcl: PhraseGroup[Phrase]

Relative clausses.

property xcomp: PhraseGroup[Phrase]

Open clausal complements.

property appos: PhraseGroup[Phrase]

Appositional modifiers.

property nmod: PhraseGroup[Phrase]

Nominal modifiers.

property vector: ndarray[tuple[int], floating]

Word vector.

abstract classmethod governs(comp: Component) bool[source]

Check whether given phrase class may govern comp.

iter_subdag(*, skip: int = 0) DataIterator[Self][source]

Iterate over phrasal subtree and omit skip first items.

Each phrase is emitted only when reached the first time during the depth-first search.

iter_supdag(*, skip: int = 0) DataIterator[Self][source]

Iterate over phrasal supertree and omit skip first items.

Each phrase is emitted only when reached the first time during the depth-first search.

dfs(subdag: bool = True) DataTuple[DataTuple[Self]][source]

Depth-first search.

Parameters:

subdag – Should search be performed in the subgraph direction (i.e. through the children).

similarity(*args: Any, **kwds: Any) float[source]

Structured similarity with respect to other phrase or sentence.

All methods defined here are designed to ensure that:

  • Similarity of a phrase with respect to itself is 1.

  • Similarity x ~ y == y ~ x.

In some case the above may be true only approximately due to accumulation of floating point imprecision.

Parameters:
  • element – Grammar phrase to compare.

  • spec – Specification against which the phrase is to be compared. Can be another phrase, a string or an iterable of strings, which should be single words. A single strings is splitted at whitespace and turned into multiple words. Finally, an averaged word vector for all words is computed. Alternatively, a specification can have a form of a dictionary mapping names of phrase parts or components (see segram.grammar.phrases.Phrase.part_names and segram.grammar.phrase.Phrase.component_names) to either strings or iterables of strings convertible to word vectors (as previously) or other phrases. Importantly, phrases can be also compared against segram.grammar.Sent and segram.grammar.Doc objects as long as they are comprised of a single sentence. See SentSimilarity for details.

  • method

    Method for calculating similarity between phrases:

    components

    Components are grouped in buckets by type (verbs, nouns, prepositions and descriptions) and averaged vectors are compared between the same types. Finally, a weighted average (with weights defined by the weight parameter) is taken and rescaled with a factor shared / union, where shared is the numebr of types present in both elements and union is the total number of unique types among both of them. Thus, the final result is akin to a fuzzy Jaccard similarity:

    \[J = \frac{|A \cap B|}{|A \cup B|}\]
    phrases

    As above but based on phrase parts and phrase head compoents. See segram.grammar.Phrase.part_names for a full list.

    both

    As above but components and phrases are used together.

    average

    Simple average vectors calculated over all component head tokens are used. In this case weights are ignored.

    recursive

    NOTE. Currently not implemented. First, head components are compared between two phrases, and then the same rule is applied recursively to all parts (subjects, direct objects etc.) where for each type elements of the two phrases are matched in pairs to maximize similarity. As previously, weights can be applied to different types and a Jaccard-like rescaling is applied. Additionaly, importance of nested phrases may be discounted using decay_rate parameter by rescaling each weight with a factor of decay_rate**depth, where depth is calculated relative to the depth of the self.phrase.

  • weights – Dictionary mapping phrase part or component names to arbitrary weights (which must be positive). The weights do not have to be normalized and sum up to one.

  • decay_rate – Additional parameter used when method="recursive", which controls the rate at which contributions coming from nested subphrases are discounted.

  • only – Lists of part or component names to selectively use or ignore. Both arguments cannot be used at the same time.

  • ignore – Lists of part or component names to selectively use or ignore. Both arguments cannot be used at the same time.

Raises:

RuntimeError – If word vectors are not available.

is_comparable_with(other: Any) bool[source]

Are self and other comparable.

to_str(*, color: bool = False, only_head: bool = False, **kwds: Any) str[source]

Represent as a string.

iter_token_roles(*, bg: bool = False) Iterable[tuple[Token, Role | None]][source]

Iterate over token-role pairs.

Parameters:

bg – Should tokens be marked as a background token (e.g. as a part of a subclause). This is used for graying out subclauses when printing.

classmethod from_component(comp: Component, **kwds: Any) Self[source]

Construct from a grammar component.

to_data() dict[str, Any][source]

Serialize to a data dictionary.

classmethod from_data(doc: Doc, data: dict[str, Any]) Self[source]

Construct from sentence and data dictionary.

Similarity

alias of PhraseSimilarity

class segram.grammar.phrases.VerbPhrase(*args: Any, **kwds: Any)[source]

Bases: Phrase

Abstract base class for verb phrases.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. .. attribute:: dep

Dependency relative to the (main) parent.

sconj

Subordinating conjunction token.

classmethod governs(comp: Component) bool[source]

Check whether given phrase class may govern comp.

class segram.grammar.phrases.NounPhrase(*args: Any, **kwds: Any)[source]

Bases: Phrase

Abstract base class for noun phrases.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. .. attribute:: dep

Dependency relative to the (main) parent.

sconj

Subordinating conjunction token.

classmethod governs(comp: Component) bool[source]

Check whether given phrase class may govern comp.

class segram.grammar.phrases.DescPhrase(*args: Any, **kwds: Any)[source]

Bases: Phrase

Abstract base class for descriptive phrases.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. .. attribute:: dep

Dependency relative to the (main) parent.

sconj

Subordinating conjunction token.

classmethod governs(comp: Component) bool[source]

Check whether given phrase class may govern comp.

class segram.grammar.phrases.PrepPhrase(*args: Any, **kwds: Any)[source]

Bases: Phrase

Abstract base class for prepositional phrases.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. .. attribute:: dep

Dependency relative to the (main) parent.

sconj

Subordinating conjunction token.

classmethod governs(comp: Component) bool[source]

Check whether given phrase class may govern comp.