segram.utils.misc module

segram.utils.misc.cosine_similarity(X: ndarray[tuple[int] | tuple[int, int], floating], Y: ndarray[tuple[int] | tuple[int, int], floating], *, aligned: bool = False, nans_as_zeros: bool = True) float | ndarray[tuple[int, ...], floating][source]

Cosine similarity between two vectors.

When 2D arrays are passed it is assumed that vectors for calculating similarities are arranged in rows.

Parameters:
  • X – Vectors or arrays of vectors.

  • Y – Vectors or arrays of vectors.

  • aligned – If True then X and Y have to be 2D and of the same shape and row-by-row similarities are calculated.

  • nans_as_zeros – Should NaN values arising from zero vector norm be interpreted as zero similarities.

segram.utils.misc.stringify(obj: Any, **kwds: Any) str[source]

Convert obj to string.

If obj exposes to_str() then it is used with keyword arguments passed in **kwds. Otherwise the plain __repr__() is used.

segram.utils.misc.ensure_cpu_vectors(vocab: Vocab | Any) None[source]

Ensure that word vectors are stored on CPU.

Parameters:

vocab – Vocabulary object. If an arbitrary object is passed then an attempt at retrieving .vocab attribute is made.

segram.utils.misc.prefer_gpu_vectors(vocab: Vocab | Any, device_id: int | None = None) bool[source]

Store word vectors on GPU if possible.

Parameters:
  • object. (Vocabulary) – If an arbitrary object is passed then an attempt at retrieving .vocab attribute is made.

  • device_id – GPU device id. If None then the default device is used (typically it is with id 0).

Returns:

Specifies whether the vectors where successfully moved to GPU.

Return type:

bool