![icon](../../assets/method_icons/discord.png "icon")
# Discord-based methods

## Matrix Profile

Matrix Profile [Yeh et al. 2016, Zhu et al. 2016] is a discord-based method that represents time series as a matrix of closest neighbor distances. Compared to its predecessor, Matrix Profile proposed a new metadata time series computed effectively, capable of providing various valuable details about the examined time series, such as discords.

The MatrixProfile is computed using Mueen’s ultra-fast Algorithm for Similarity Search (MASS) [Mueen et al. 2017] that requires just O(nlog(n)) time by exploiting the Fast Fourier Transform (FFT) to calculate the dot products between the query and all the sub-sequences of the time series. Once these metadata are generated, retrieving the Top-k discord is possible by considering the maximum value of the Matrix Profile and ordering it, excluding the trivial matches (overlapping sub-sequences). Retrieving the sub-sequences with the shortest distance to their nearest neighbor (called motifs) is also possible. These sub-sequences correspond to a recurrent motif in the time series and can be useful in the anomaly search. 

The TSB-UAD implementation of MatrixProfile is wrapper of [Stumpy](https://stumpy.readthedocs.io/en/latest/index.html) implementation.

```{eval-rst}  
.. autoclass:: TSB_UAD.models.matrix_profile.MatrixProfile
    :members:

```

### Example

```python
import os
import numpy as np
import pandas as pd
from TSB_UAD.utils.visualisation import plotFig
from TSB_UAD.models.damp import DAMP
from TSB_UAD.models.feature import Window
from TSB_UAD.utils.slidingWindows import find_length
from TSB_UAD.vus.metrics import get_metrics

#Read data
filepath = 'PATH_TO_TSB_UAD/ECG/MBA_ECG805_data.out'
df = pd.read_csv(filepath, header=None).dropna().to_numpy()
name = filepath.split('/')[-1]

data = df[:,0].astype(float)
label = df[:,1].astype(int)

#Pre-processing
slidingWindow = find_length(data)

# Run MatrixProfile
modelName='MatrixProfile'
clf = MatrixProfile(window = slidingWindow)
clf.fit(data)
score = clf.decision_scores_

#Post-processing
score = MinMaxScaler(feature_range=(0,1)).fit_transform(score.reshape(-1,1)).ravel()
score = np.array([score[0]]*math.ceil((slidingWindow-1)/2) + list(score) + [score[-1]]*((slidingWindow-1)//2))

#Plot result
plotFig(data, label, score, slidingWindow, fileName=name, modelName=modelName)

#Print accuracy
results = get_metrics(score, label, metric="all", slidingWindow=slidingWindow)
for metric in results.keys():
    print(metric, ':', results[metric])
```
```
AUC_ROC : 0.7968186887782313
AUC_PR : 0.09205761752802392
Precision : 0.058823529411764705
Recall : 0.0297029702970297
F : 0.039473684210526314
Precision_at_k : 0.0297029702970297
Rprecision : 0.125
Rrecall : 0.09090909090909093
RF : 0.10526315789473685
R_AUC_ROC : 0.9531611224056705
R_AUC_PR : 0.4926688922361494
VUS_ROC : 0.9186620929224953
VUS_PR : 0.39033909329157723
Affiliation_Precision : 0.9015749833720904
Affiliation_Recall : 0.9720951147963328
```
![Result](../../assets/method_results/MP.png "MatrixProfile Result")

### References

* [Yeh et al. 2016] C. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. Dau, D. Silva, A. Mueen, and E. Keogh. 2016a. Matrix profile I: all pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. In ICDM.

* [Zhu et al. 2016] Y. Zhu, Z. Zimmerman, N. S. Senobari, C.-C. M. Yeh, G. Funning, A. Mueen, P. Brisk, and E. Keogh. 2016a. Matrix profile ii: Exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In 2016 IEEE 16th international conference on data mining (ICDM), pp. 739–748. IEEE.

* [Mueen et al. 2017] A. Mueen, Y. Zhu, M. Yeh, K. Kamgar, K. Viswanathan, C. Gupta, and E. Keogh, August 2017. The fastest similarity search algorithm for time series subsequences under euclidean distance.

## DAMP

DAMP [Lu et al. 2022] is a discord-based method, and scalable matrix Profile-based approach proposed to solves the problem of multiple similar anomalies. Moreover, is able to work on online settings, and scale to fast-arriving streams. 

The TSB-UAD implementation of the DAMP algorithm follows the descripition in the original paper [Lu et al. 2022](https://www.cs.ucr.edu/~eamonn/DAMP_long_version.pdf).
The TSB-UAD implementation is adapted from [TimeEval](https://github.com/HPI-Information-Systems/TimeEval-algorithms/blob/main/damp/damp/damp.py).


```{eval-rst}  
.. autoclass:: TSB_UAD.models.damp.DAMP
    :members:

```

### Example

```python
import os
import numpy as np
import pandas as pd
from TSB_UAD.utils.visualisation import plotFig
from TSB_UAD.models.damp import DAMP
from TSB_UAD.models.feature import Window
from TSB_UAD.utils.slidingWindows import find_length
from TSB_UAD.vus.metrics import get_metrics

#Read data
filepath = 'PATH_TO_TSB_UAD/ECG/MBA_ECG805_data.out'
df = pd.read_csv(filepath, header=None).dropna().to_numpy()
name = filepath.split('/')[-1]

data = df[:,0].astype(float)
label = df[:,1].astype(int)

#Pre-processing
slidingWindow = find_length(data)

# Run DAMP
modelName='DAMP'
clf = DAMP(m = slidingWindow,sp_index=slidingWindow+1)
clf.fit(data)
score = clf.decision_scores_

#Post-processing
score = MinMaxScaler(feature_range=(0,1)).fit_transform(score.reshape(-1,1)).ravel()
score = np.array([score[0]]*math.ceil((slidingWindow-1)/2) + list(score) + [score[-1]]*((slidingWindow-1)//2))

#Plot result
plotFig(data, label, score, slidingWindow, fileName=name, modelName=modelName)

#Print accuracy
results = get_metrics(score, label, metric="all", slidingWindow=slidingWindow)
for metric in results.keys():
    print(metric, ':', results[metric])
```
```
AUC_ROC : 0.9796517653209067
AUC_PR : 0.5354674121425284
Precision : 1.0
Recall : 0.0462046204620462
F : 0.08832807570977919
Precision_at_k : 0.0462046204620462
Rprecision : 1.0
Rrecall : 0.1427450980392157
RF : 0.24982841455044613
R_AUC_ROC : 0.9861962693093778
R_AUC_PR : 0.6140113439366928
VUS_ROC : 0.9813282886141234
VUS_PR : 0.5943507237860649
Affiliation_Precision : 0.6162807136520358
Affiliation_Recall : 0.9999402806808003
```
![Result](../../assets/method_results/DAMP.png "DAMP Result")

### References

* [Lu et al. 2022] Y. Lu, R. Wu, A. Mueen, M. A. Zuluaga, and E. Keogh. 2022. Matrix profile xxiv: scaling time series anomaly detection to trillions of datapoints and ultra-fast arriving data streams. In SIGKDD, pp. 1173–1182.