Namespace Matrix

namespace matrix

Functions

void findBestNOccurrences(const af::array &q, const af::array &t, long n, af::array &distances, af::array &indexes)

Calculates the N best matches of several queries in several time series.

The result has the following structure:

  • 1st dimension corresponds to the nth best match.
  • 2nd dimension corresponds to the number of queries.
  • 3rd dimension corresponds to the number of time series.

For example, the distance in the position (1, 2, 3) corresponds to the second best distance of the third query in the fourth time series. The index in the position (1, 2, 3) is the is the index of the subsequence which leads to the second best distance of the third query in the fourth time series.

Parameters
  • q: Array whose first dimension is the length of the query time series and the second dimension is the number of queries.
  • t: Array whose first dimension is the length of the time series and the second dimension is the number of time series.
  • n: Number of matches to return.
  • distances: Resulting distances.
  • indexes: Resulting indexes.

void mass(const af::array &q, const af::array &t, af::array &distances)

Mueen’s Algorithm for Similarity Search.

The result has the following structure:

  • 1st dimension corresponds to the index of the subsequence in the time series.
  • 2nd dimension corresponds to the number of queries.
  • 3rd dimension corresponds to the number of time series.

For example, the distance in the position (1, 2, 3) correspond to the distance of the third query to the fourth time series for the second subsequence in the time series.

[1] Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh (2016). Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. IEEE ICDM 2016.

Parameters
  • q: Array whose first dimension is the length of the query time series and the second dimension is the number of queries.
  • t: Array whose first dimension is the length of the time series and the second dimension is the number of time series.
  • distances: Resulting distances.

void findBestNMotifs(const af::array &profile, const af::array &index, long m, long n, af::array &motifs, af::array &motifsIndices, af::array &subsequenceIndices, bool selfJoin = false)

This function extracts the best N motifs from a previously calculated matrix profile.

Parameters
  • profile: The matrix profile containing the minimum distance of each subsequence.
  • index: The matrix profile index containing where each minimum occurs.
  • m: Subsequence length value used to calculate the input matrix profile.
  • n: Number of motifs to extract.
  • motifs: The distance of the best N motifs.
  • motifsIndices: The indices of the best N motifs.
  • subsequenceIndices: The indices of the query sequences that produced the minimum reported in the motifs output array.
  • selfJoin: Indicates whether the input profile comes from a self join operation or not. It determines whether the mirror similar region is included in the output or not.

void findBestNDiscords(const af::array &profile, const af::array &index, long m, long n, af::array &discords, af::array &discordsIndices, af::array &subsequenceIndices, bool selfJoin = false)

This function extracts the best N discords from a previously calculated matrix profile.

Parameters
  • profile: The matrix profile containing the minimum distance of each subsequence.
  • index: The matrix profile index containing where each minimum occurs.
  • m: Subsequence length value used to calculate the input matrix profile.
  • n: Number of discords to extract.
  • discords: The distance of the best N discords.
  • discordsIndices: The indices of the best N discords.
  • subsequenceIndices: The indices of the query sequences that produced the discords reported in the discords output array.
  • selfJoin: Indicates whether the input profile comes from a self join operation or not. It determines whether the mirror similar region is included in the output or not.

void stomp(const af::array &ta, const af::array &tb, long m, af::array &profile, af::array &index)

STOMP algorithm to calculate the matrix profile between ‘ta’ and ‘tb’ using a subsequence length of ‘m’.

[1] Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk and Eamonn Keogh (2016). Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins. IEEE ICDM 2016.

Parameters
  • ta: Query time series.
  • tb: Reference time series.
  • m: Subsequence length.
  • profile: The matrix profile, which reflects the distance to the closer element of the subsequence from ‘ta’ in ‘tb’.
  • index: The matrix profile index, which points to where the aforementioned minimum is located.

void stomp(const af::array &t, long m, af::array &profile, af::array &index)

STOMP algorithm to calculate the matrix profile between ‘t’ and itself using a subsequence length of ‘m’. This method filters the trivial matches.

[1] Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk and Eamonn Keogh (2016). Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins. IEEE ICDM 2016.

Parameters
  • t: Query and reference time series.
  • m: Subsequence length.
  • profile: The matrix profile, which reflects the distance to the closer element of the subsequence from ‘t’ in a different location of itself.
  • index: The matrix profile index, which points to where the aforementioned minimum is located.

void matrixProfile(const af::array &tss, long m, af::array &profile, af::array &index)

Calculates the matrix profile between ‘t’ and itself using a subsequence length of ‘m’. This method filters the trivial matches.

[1] Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk and Eamonn Keogh (2016). Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins. IEEE ICDM 2016.

Parameters
  • tss: Query time series.
  • m: Subsequence length.
  • profile: The matrix profile, which reflects the distance to the closer element of the subsequence from ‘ta’ in ‘tb’.
  • index: The matrix profile index, which points to where the aforementioned minimum is located.

void matrixProfile(const af::array &ta, const af::array &tb, long m, af::array &profile, af::array &index)

Calculates the matrix profile between ‘ta’ and ‘tb’ using a subsequence length of ‘m’.

[1] Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk and Eamonn Keogh (2016). Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins. IEEE ICDM 2016.

Parameters
  • ta: Query and reference time series.
  • tb: Query and reference time series.
  • m: Subsequence length.
  • profile: The matrix profile, which reflects the distance to the closer element of the subsequence from ‘t’ in a different location of itself.
  • index: The matrix profile index, which points to where the aforementioned minimum is located.

void matrixProfileLR(const af::array &tss, long m, af::array &profileLeft, af::array &indexLeft, af::array &profileRight, af::array &indexRight)

Calculates the matrix profile to the left and to the right between ‘t’ and using a subsequence length of ‘m’.

[1] Yan Zhu, Makoto Imamura, Daniel Nikovski, and Eamonn Keogh. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. IEEE ICDM 2017

Notice that when there is no match the subsequence index is the length of tss.

Parameters
  • tss: Time series to compute the matrix profile.
  • m: Subsequence length.
  • profileLeft: The matrix profile distance to the left.
  • indexLeft: The subsequence index of the matrix profile to the left.
  • profileRight: The matrix profile distance to the right.
  • indexRight: The subsequence index of the matrix profile to the right.

void getChains(const af::array &tss, long m, af::array &chains)

Calculates all the chains within ‘tss’ using a subsequence length of ‘m’.

[1] Yan Zhu, Makoto Imamura, Daniel Nikovski, and Eamonn Keogh. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. IEEE ICDM 2017

Notice that the size of the first dimension is the maximum possible size which is n - m + 1. If the number of values belonging to a chain is lower than the maximum, the remaining values and indexes are 0. It implies that 0 is an invalid chain index.

Parameters
  • tss: Time series to compute the chains within them.
  • m: Subsequence length.
  • chains: The calculated chains with the following topology:
    • 1st dimension corresponds to the chains indexes flattened.
    • 2nd dimension:
      • [0] corresponds to all the indexes in the chains flattened
      • [1] corresponds to the index of the chain that the value in [0] belongs to.
    • 3rd dimension corresponds to the number of time series.

namespace internal

Typedefs

using khiva::matrix::internal::DistancesVector = typedef std::vector<double>
using khiva::matrix::internal::IndexesVector = typedef std::vector<unsigned int>
using khiva::matrix::internal::MatrixProfilePair = typedef std::pair<DistancesVector, IndexesVector>
using khiva::matrix::internal::LeftRightProfilePair = typedef std::pair<MatrixProfilePair, MatrixProfilePair>
using khiva::matrix::internal::Chain = typedef std::vector<unsigned int>
using khiva::matrix::internal::ChainVector = typedef std::vector<Chain>

Functions

af::array slidingDotProduct(const af::array &q, const af::array &t)

Calculates the sliding dot product of the time series ‘q’ against t.

Return
array Returns an array with as many elements as ‘t’ in the first dimension and as many elements as the last dimension of ‘q’ in the last dimension.
Parameters
  • q: Array whose first dimension is the length of the query time series and the last dimension is the number of time series to calculate.
  • t: Array with the second time series in the first dimension.

void meanStdev(const af::array &t, af::array &a, long m, af::array &mean, af::array &stdev)

Calculates the moving average and standard deviation of the time series ‘t’.

Parameters
  • t: Input time series. Multiple time series.
  • a: Auxiliary array to be used in the function calculateDistanceProfile. Use the overloaded method without this parameter.
  • m: Window size.
  • mean: Output array containing the moving average.
  • stdev: Output array containing the moving standard deviation.

void meanStdev(const af::array &t, long m, af::array &mean, af::array &stdev)

Calculates the moving average and standard deviation of the time series ‘t’.

Parameters
  • t: Input time series. Multiple time series.
  • m: Window size.
  • mean: Output array containing the moving average.
  • stdev: Output array containing the moving standard deviation.

void calculateDistances(const af::array &qt, const af::array &a, const af::array &sum_q, const af::array &sum_q2, const af::array &mean_t, const af::array &sigma_t, const af::array &mask, af::array &distances)

Calculates the distance between ‘q’ and the time series ‘t’, which produced the sliding. Multiple queries can be computed simultaneously in the last dimension of ‘q’.

Parameters
  • qt: The sliding dot product of ‘q’ and ‘t’.
  • a: Auxiliary array computed using the meanStdev function. This array contains a precomputed fixed value to speed up the distance calculation.
  • sum_q: Sum of the values contained in ‘q’.
  • sum_q2: Sum of squaring the values contained in ‘q’.
  • mean_t: Moving average of ‘t’ using a window size equal to the number of elements in ‘q’.
  • sigma_t: Moving standard deviation of ‘t’ using a window size equal to the number of elements in ‘q’.
  • mask: Mask band matrix to filter the trivial match of a subsequence with itself.
  • distances: Resulting distances.

void calculateDistances(const af::array &qt, const af::array &a, const af::array &sum_q, const af::array &sum_q2, const af::array &mean_t, const af::array &sigma_t, af::array &distances)

Calculates the distance between ‘q’ and the time series ‘t’, which produced the sliding. Multiple queries can be computed simultaneously in the last dimension of ‘q’.

Parameters
  • qt: The sliding dot product of ‘q’ and ‘t’.
  • a: Auxiliary array computed using the meanStdev function. This array contains a precomputed fixed value to speed up the distance calculation.
  • sum_q: Sum of the values contained in ‘q’.
  • sum_q2: Sum of squaring the values contained in ‘q’.
  • mean_t: Moving average of ‘t’ using a window size equal to the number of elements in ‘q’.
  • sigma_t: Moving standard deviation of ‘t’ using a window size equal to the number of elements in ‘q’.
  • distances: Resulting distances.

bool tileIsFarFromDiagonal(long bandSize, long numRows, long row, long numColumns, long column)

Given a tile indices and sizes it returns true when tile would not be affected by a identity band matrix.

Return
If it is far or not.
Parameters
  • bandSize: The band size.
  • numRows: Number of rows of the tile.
  • row: Starting row of the tile.
  • numColumns: Number of columns of the tile.
  • column: Starting column of the tile.

af::array generateMask(long m, long numRows, long row, long numColumns, long column, long nTimeSeries = 1)

Generate an identity band matrix for a given tile indices.

Return
The mask.
Parameters
  • m: The query size.
  • numRows: Number of rows of the tile.
  • row: Starting row of the tile.
  • numColumns: Number of columns of the tile.
  • column: Starting column of the tile.
  • nTimeSeries: Number of time series.

void massWithMask(af::array q, const af::array &t, const af::array &a, const af::array &mean_t, const af::array &sigma_t, const af::array &mask, af::array &distances)

Calculates the Mueen distance.

[1] Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh (2016). Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. IEEE ICDM 2016.

Parameters
  • q: Array whose first dimension is the length of the query time series and the last dimension is the number of time series to calculate.
  • t: Array with the second time series in the first dimension.
  • a: Auxiliary array computed using the meanStdev function. This array contains a precomputed fixed value to speed up the distance calculation.
  • mean_t: Moving average of ‘t’ using a window size equal to the number of elements in ‘q’.
  • sigma_t: Moving standard deviation of ‘t’ using a window size equal to the number of elements in ‘q’.
  • mask: Specifies the elements that should not be considered in the computation.
  • distances: Resulting distances.

void mass(af::array q, const af::array &t, const af::array &a, const af::array &mean_t, const af::array &sigma_t, af::array &distances)

Mueen’s Algorithm for Similarity Search.

Parameters
  • q: Array whose first dimension is the length of the query time series and the last dimension is the number of time series to calculate.
  • t: Array with the second time series in the first dimension.
  • a: Auxiliary array computed using the meanStdev function. This array contains a precomputed fixed value to speed up the distance calculation.
  • mean_t: Moving average of ‘t’ using a window size equal to the number of elements in ‘q’.
  • sigma_t: Moving standard deviation of ‘t’ using a window size equal to the number of elements in ‘q’.
  • distances: Resulting distances.

void stomp_batched(const af::array &ta, af::array tb, long m, long batch_size, af::array &profile, af::array &index)
void stomp_batched_two_levels(af::array ta, af::array tb, long m, long batch_size_b, long batch_size_a, af::array &profile, af::array &index)
void stomp_parallel(const af::array &ta, af::array tb, long m, af::array &profile, af::array &index)
void stomp_batched_two_levels(af::array t, long m, long batch_size_b, long batch_size_a, af::array &profile, af::array &index)
void stomp_parallel(af::array t, long m, af::array &profile, af::array &index)
void findBestN(const af::array &profile, const af::array &index, long m, long n, af::array &distance, af::array &indices, af::array &subsequenceIndices, bool selfJoin, bool lookForMotifs)
void scamp(af::array tss, long m, af::array &profile, af::array &index)
void scamp(af::array ta, af::array tb, long m, af::array &profile, af::array &index)
void getChains(af::array tss, long m, af::array &chains)
ChainVector extractAllChains(const IndexesVector &profileLeft, const IndexesVector &profileRight)
LeftRightProfilePair scampLR(std::vector<double> &&ta, long m)
void scampLR(af::array tss, long m, af::array &profileLeft, af::array &indexLeft, af::array &profileRight, af::array &indexRight)