DSWoK — Data Science Well of Knowledge

#algorithm

17 notes · co-occurs with 12 tags · last updated Jun 22, 2026

Co-tags#nlp8 #tabular-ml7 #supervised5 #tree-based3 #unsupervised3 #embeddings3 #ensemble2 #architecture2 #topic-modeling2 #clustering1 #concept1 #transformer1

Notes tagged #algorithm

A Decision Tree is a supervised learning algorithm used for both classification and regression tasks.

Gradient boosting

Gradient Boosting is an ensemble machine learning technique (boosting) that combines weak learners (typically shallow Decision Tree) sequentially to create a strong predictive model.

K-Nearest Neighbors

K-Nearest Neighbors (KNN) is a simple, non-parametric algorithm used for both classification and regression tasks.

K-means clustering

K-means is an unsupervised machine learning algorithm used for partitioning a dataset into K distinct, non-overlapping subgroups (clusters).

Linear Regression

Linear regression is a supervised algorithm or statistical method that learns to model a dependent variable (target) as a function of some independent variables (features) by finding a line (or surface) that best “fits” the data.

Logistic regression

Logistic regression is a statistical method used for binary classification problems, modeling the probability of an instance belonging to a particular class.

Multi-armed bandits

A multi-armed bandit is a sequential decision problem where a learner repeatedly chooses among k actions (arms), observes a stochastic reward for the chosen arm only, and adapts future choices to balance exploration (sampling under-tested arms to learn their value) against exploitation (sampling the...

Random Forest is an ensemble learning method (bagging) that constructs a multitude of decision trees at training time and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Support Vector Machines (SVM) is a supervised learning algorithm used for classification, regression, and outlier detection.

Most of the information is available in the BERT paper. Key details: Multi-head attention. Transformer encoder.

NLP

BERTopic is a modular topic modeling pipeline.

NLP

GloVe (Global Vectors for Word Representation) is a word embedding technique developed in 2014.

NLP

Latent Dirichlet Allocation (Blei, Ng, Jordan, 2003) is the canonical probabilistic topic model.

NLP

Recurrent Neural Networks are a type of neural networks designed for processing sequential data.

NLP

Term Frequency-Inverse Document Frequency

Term Frequency-Inverse Document Frequency (TF-IDF) is a numerical statistic used in NLP to show how important a word (term) is to a document in a corpus.

NLP

Word2Vec is a approach for learning word embeddings.

NLP

fastText is a library for efficient text classification and word representation learning by Facebook.

NLP

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community