Namespace Dimensionality

namespace dimensionality

Typedefs

using khiva::dimensionality::Point = typedef std::pair<float, float>
using khiva::dimensionality::Segment = typedef std::pair<int, int>

Functions

std::vector<Point> PAA(const std::vector<Point> &points, int bins)

Piecewise Aggregate Approximation (PAA) approximates a time series \(X\) of length \(n\) into vector \(\bar{X}=(\bar{x}_{1},…,\bar{x}_{M})\) of any arbitrary length \(M \leq n\) where each of \(\bar{x_{i}}\) is calculated as follows:

\[ \bar{x}_{i} = \frac{M}{n} \sum_{j=n/M(i-1)+1}^{(n/M)i} x_{j}. \]
Which simply means that in order to reduce the dimensionality from \(n\) to \(M\), we first divide the original time series into \(M\) equally sized frames and secondly compute the mean values for each frame. The sequence assembled from the mean values is the PAA approximation (i.e., transform) of the original time series.

Return
result A vector of Points with the reduced dimensionality.
Parameters
  • points: Set of points.
  • bins: Sets the total number of divisions.

af::array PAA(const af::array &a, int bins)

Piecewise Aggregate Approximation (PAA) approximates a time series \(X\) of length \(n\) into vector \(\bar{X}=(\bar{x}_{1},…,\bar{x}_{M})\) of any arbitrary length \(M \leq n\) where each of \(\bar{x_{i}}\) is calculated as follows:

\[ \bar{x}_{i} = \frac{M}{n} \sum_{j=n/M(i-1)+1}^{(n/M)i} x_{j}. \]
Which simply means that in order to reduce the dimensionality from \(n\) to \(M\), we first divide the original time series into \(M\) equally sized frames and secondly compute the mean values for each frame. The sequence assembled from the mean values is the PAA approximation (i.e., transform) of the original time series.

Return
af::array An array of points with the reduced dimensionality.
Parameters
  • a: Set of points.
  • bins: Sets the total number of divisions.

af::array PIP(const af::array &ts, int numberIPs)

Calculates the number of Perceptually Important Points (PIP) in the time series.

[1] Fu TC, Chung FL, Luk R, and Ng CM. Representing financial time series based on data point importance. Engineering Applications of Artificial Intelligence, 21(2):277-300, 2008.

Return
af::array Array with the most Perceptually Important numPoints.
Parameters
  • ts: Expects an input array whose dimension zero is the length of the time series.
  • numberIPs: The number of points to be returned.

std::vector<Point> PLABottomUp(const std::vector<Point> &ts, float maxError)

Applies the Piecewise Linear Approximation (PLA BottomUP) to the time series.

[1] Zhu Y, Wu D, Li Sh (2007). A Piecewise Linear Representation Method of Time Series Based on Feature Points. Knowledge-Based Intelligent Information and Engineering Systems 4693:1066-1072.

Return
std::vector Vector with the reduced number of points.
Parameters
  • ts: Expects an input vector containing the set of points to be reduced.
  • maxError: The maximum approximation error allowed.

af::array PLABottomUp(const af::array &ts, float maxError)

Applies the Piecewise Linear Approximation (PLA BottomUP) to the time series.

[1] Zhu Y, Wu D, Li Sh (2007). A Piecewise Linear Representation Method of Time Series Based on Feature Points. Knowledge-Based Intelligent Information and Engineering Systems 4693:1066-1072.

Return
af::array with the reduced number of points.
Parameters
  • ts: Expects an af::array containing the set of points to be reduced. The first component of the points in the first column and the second component of the points in the second column.
  • maxError: The maximum approximation error allowed.

std::vector<Point> PLASlidingWindow(const std::vector<Point> &ts, float maxError)

Applies the Piecewise Linear Approximation (PLA Sliding Window) to the time series.

[1] Zhu Y, Wu D, Li Sh (2007). A Piecewise Linear Representation Method of Time Series Based on Feature Points. Knowledge-Based Intelligent Information and Engineering Systems 4693:1066-1072.

Return
std::vector Vector with the reduced number of points.
Parameters
  • ts: Expects an input vector containing the set of points to be reduced.
  • maxError: The maximum approximation error allowed.

af::array PLASlidingWindow(const af::array &ts, float maxError)

Applies the Piecewise Linear Approximation (PLA Sliding Window) to the time series.

[1] Zhu Y, Wu D, Li Sh (2007). A Piecewise Linear Representation Method of Time Series Based on Feature Points. Knowledge-Based Intelligent Information and Engineering Systems 4693:1066-1072.

Return
af::array with the reduced number of points.
Parameters
  • ts: Expects an af::array containing the set of points to be reduced. The first component of the points in the first column and the second component of the points in the second column.
  • maxError: The maximum approximation error allowed.

std::vector<Point> ramerDouglasPeucker(const std::vector<Point> &pointList, double epsilon)

The Ramer–Douglas–Peucker algorithm (RDP) is an algorithm for reducing the number of points in a curve that is approximated by a series of points. It reduces a set of points depending on the perpendicular distance of the points and epsilon, the greater epsilon, more points are deleted.

[1] Urs Ramer, “An iterative procedure for the polygonal approximation of plane curves”, Computer Graphics and Image Processing, 1(3), 244–256 (1972) doi:10.1016/S0146-664X(72)80017-0.

[2] David Douglas & Thomas Peucker, “Algorithms for the reduction of the number of points required to represent a

digitized line or its caricature”, The Canadian Cartographer 10(2), 112–122 (1973) doi:10.3138/FM57-6770-U75U-7727

Return
std:vector<khiva::dimensionality::Point> with the selected points.
Parameters
  • pointList: Set of input points.
  • epsilon: It acts as the threshold value to decide which points should be considered meaningful or not.

af::array ramerDouglasPeucker(const af::array &pointList, double epsilon)

The Ramer–Douglas–Peucker algorithm (RDP) is an algorithm for reducing the number of points in a curve that is approximated by a series of points. It reduces a set of points depending on the perpendicular distance of the points and epsilon, the greater epsilon, more points are deleted.

[1] Urs Ramer, “An iterative procedure for the polygonal approximation of plane curves”, Computer Graphics and Image Processing, 1(3), 244–256 (1972) doi:10.1016/S0146-664X(72)80017-0.

[2] David Douglas & Thomas Peucker, “Algorithms for the reduction of the number of points required to represent a

digitized line or its caricature”, The Canadian Cartographer 10(2), 112–122 (1973) doi:10.3138/FM57-6770-U75U-7727

Return
af::array with the selected points.
Parameters
  • pointList: Set of input points.
  • epsilon: It acts as the threshold value to decide which points should be considered meaningful or not.

af::array SAX(const af::array &a, int alphabetSize)

Symbolic Aggregate approXimation (SAX). It transforms a numeric time series into a time series of symbols with the same size. The algorithm was proposed by Lin et al.) and extends the PAA-based approach inheriting the original algorithm simplicity and low computational complexity while providing satisfactory sensitivity and selectivity in range query processing. Moreover, the use of a symbolic representation opened a door to the existing wealth of data-structures and string-manipulation algorithms in computer science such as hashing, regular expression, pattern matching, suffix trees, and grammatical inference.

[1] Lin, J., Keogh, E., Lonardi, S. & Chiu, B. (2003) A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, CA. June 13.

Return
result An array of symbols.
Parameters
  • a: Array with the input time series.
  • alphabetSize: Number of element within the alphabet.

std::vector<Point> visvalingam(const std::vector<Point> &pointList, int64_t numPoints, int64_t scale = 1000000000)

Reduces a set of points by applying the Visvalingam method (minimum triangle area) until the number of points is reduced to numPoints.

[1] M. Visvalingam and J. D. Whyatt, Line generalisation by repeated elimination of points, The Cartographic Journal, 1993.

Return
std:vector<khiva::dimensionality::Point> where the number of points has been reduced to numPoints.
Parameters
  • pointList: Expects an input vector of points.
  • numPoints: Sets the number of points returned after the execution of the method.
  • scale: Sets the precision used to compute the areas of the triangularization, the longer, the more accurate.

af::array visvalingam(const af::array &pointList, int numPoints)

Reduces a set of points by applying the Visvalingam method (minimum triangle area) until the number of points is reduced to numPoints.

[1] M. Visvalingam and J. D. Whyatt, Line generalisation by repeated elimination of points, The Cartographic Journal, 1993.

Return
af::array where the number of points has been reduced to numPoints.
Parameters
  • pointList: Expects an input array formed by to columns where the first column is interpreted as the x cordinate of a point and the second column as the y coordinate.
  • numPoints: Sets the number of points returned after the execution of the method.