diff --git a/public/og/en/what-is-a-linear-model.png b/public/og/en/what-is-a-linear-model.png new file mode 100644 index 0000000..e1d994c Binary files /dev/null and b/public/og/en/what-is-a-linear-model.png differ diff --git a/public/og/en/what-is-a-linear-regression.png b/public/og/en/what-is-a-linear-regression.png new file mode 100644 index 0000000..8d18bcc Binary files /dev/null and b/public/og/en/what-is-a-linear-regression.png differ diff --git a/public/og/en/what-is-supervised-learning.png b/public/og/en/what-is-supervised-learning.png new file mode 100644 index 0000000..ff14639 Binary files /dev/null and b/public/og/en/what-is-supervised-learning.png differ diff --git a/src/content/articles/en/what-is-a-linear-model.mdx b/src/content/articles/en/what-is-a-linear-model.mdx new file mode 100644 index 0000000..b9fc10f --- /dev/null +++ b/src/content/articles/en/what-is-a-linear-model.mdx @@ -0,0 +1,272 @@ +--- +lang: 'en' +slug: 'what-is-a-linear-model' +title: 'What is a Linear Model?' +excerpt: 'A linear model is a fundamental supervised machine learning algorithm used to mathematically model the linear relationship between input and output variables in labeled datasets.' +category: 'machine-learning' +tags: ['linear-regression', 'supervised-learning', 'teori'] +author: '@omerfdmrl' +date: '2026-03-29' +views: 0 +order: 3 +status: 'Published' +--- + +import HyperplaneScene from '../../../components/content/HyperplaneScene.astro' +import GradientVectorScene from '../../../components/content/GradientVectorScene.astro' + +The first step into the world of machine learning and statistics usually begins with **Linear Models**. +This approach, which has been the most fundamental building block of data science for decades, allows us to express seemingly complex relationships in data with a simple and linear equation. +At its core, what it does is very clear: It tries to predict the output value $Y$ given the input vector $X^T = (X_1, X_2, X_3 ...)$: + +$$ +\hat{Y} = \hat{\beta}_0 + \sum_{j=1}^p X_j \hat{\beta}_j +$$ + +$\hat{\beta}_0$ is the **bias (intercept)** value. If we write this formula with $X$ as a vector: + +$$ +\hat{Y} = X^{\top} \hat{\beta} +$$ + +## Understanding Dot Product + +Let's break down the formula step by step to better understand it: + +$$ +\hat{Y}=\hat{\beta}_0 + X_1\hat{\beta}_1 + X_2\hat{\beta}_2 + \cdots + X_p\hat{\beta}_p +$$ + +Here, each $X_j$ is a single feature value and is a scalar: + +$$ +X_j \in \mathbb{R} +$$ + +Similarly, each coefficient is also a scalar: + +$$ +\hat{\beta}_j \in \mathbb{R} +$$ + +Therefore, the product of two real numbers yields a single real number: + +$$ +X_j \hat{\beta}_j \in \mathbb{R} +$$ + +For example: + +$$ +X_1=3,\quad \hat{\beta}_1=4 +$$ + +then: + +$$ +X_1\hat{\beta}_1 = 12 +$$ + +is obtained. + +This 12 is now a **scalar**; it is not a vector or matrix, but just a single number. + +The same applies to all terms: + +$$ +X_2\hat{\beta}_2,\; X_3\hat{\beta}_3,\; \ldots,\; X_p\hat{\beta}_p +$$ + +each produces scalar values individually. + +As a result, their sum is also a single number: + +$$ +\hat{Y}\in \mathbb{R} +$$ + +Therefore, the output of a linear model is a single scalar prediction. +Each product is also a scalar, and this is called the **dot product (inner product)**. + +### Including the Bias + +Instead of writing the bias separately, we can extend the feature vector: + +$$ +X = +\begin{bmatrix} +1 \\ X_1 \\ X_2 \\ \vdots \\ X_p +\end{bmatrix}, +\quad +\hat{\beta} = +\begin{bmatrix} +\hat{\beta}_0 \\ \hat{\beta}_1 \\ \hat{\beta}_2 \\ \vdots \\ \hat{\beta}_p +\end{bmatrix} +$$ + +The reason we start with 1 is to be able to include the $\hat{\beta}_0$ value in the formula as well. + +### Why is Transpose Needed? + +We cannot directly multiply two column vectors. + +$$ +X = +\begin{bmatrix} +1 \\ X_1 \\ X_2 \\ \vdots \\ X_p +\end{bmatrix} += +(p + 1)\times1, +\quad +\hat{\beta} = +\begin{bmatrix} +\hat{\beta}_0 \\ \hat{\beta}_1 \\ \hat{\beta}_2 \\ \vdots \\ \hat{\beta}_p +\end{bmatrix} = +(p + 1)\times1 +$$ + +In dot product multiplication, the inner dimensions must be the same. Therefore, we need to write one of them as a row. + +$$ +((p + 1)\times1) \times ((p + 1)\times1) \rightarrow (1\times(p + 1)) \times ((p + 1)\times1) +$$ + +$$ += 1\times1 +$$ + +So now: + +$$ +X = +\begin{bmatrix} +1 & X_1 & X_2 & \ldots & X_p +\end{bmatrix} += +1\times(p + 1), +\quad +\hat{\beta} = +\begin{bmatrix} +\hat{\beta}_0 \\ \hat{\beta}_1 \\ \hat{\beta}_2 \\ \vdots \\ \hat{\beta}_p +\end{bmatrix} = +(p + 1)\times1 +$$ + +### Example Solution + +Let there be two features: + +$$ +X= +\begin{bmatrix} +1\\ +3\\ +5 +\end{bmatrix} +,\quad +\hat{\beta}= +\begin{bmatrix} +2\\ +4\\ +6 +\end{bmatrix} +$$ + +Then: + +$$ +X^\top \hat{\beta} += +\begin{bmatrix} +1 & 3 & 5 +\end{bmatrix} +\begin{bmatrix} +2\\ +4\\ +6 +\end{bmatrix} +$$ + +Result: + +$$ +=1\cdot2 + 3\cdot4 + 5\cdot6 +$$ + +$$ +=44 +$$ + +## Hyperplane + +The geometric meaning of a linear model is a **hyperplane**. Let's think with two features ($X_1, X_2$) and one output ($Y$). +The predictions produced by our model ($\hat{Y}$) form a flat surface, i.e., a plane (or a hyperplane in higher dimensions), extending along the $X_1$ and $X_2$ axes. + +In the visualization below; the blue plane represents the hyperplane created by our linear model in space, the points represent actual data points, and the red dashed lines represent the distances of the actual data to our model (plane): + + + +Each point is orthogonally projected onto the hyperplane. The difference between the actual value ($Y$) and the model's prediction on the plane ($\hat{Y}$) is called the **residual**: +$e_i = Y_i - \hat{Y}_i$. + +> **Note:** How this plane is placed in space and how these residual errors are minimized is the subject of optimization methods. The linear model is simply the mathematical definition of this plane. + +### Coefficients and the Gradient Vector + +When we think of the linear model's equation as a function, $f(X) = X^\top \beta$, what determines the model's slope in space is the $\beta$ coefficients. Mathematically, the **gradient** of a function indicates the direction of steepest ascent at that point. + +For a linear model, this gradient is directly equal to the coefficient vector: + +$$ +f'(X) = \nabla f(X) = \beta +$$ + +So the $\beta$ vector indicates the direction in which the model's output ($\hat{Y}$) increases fastest in the input space. + + + +In the visualization above, the green arrow represents the $\nabla f$ (i.e., $\beta$) vector, and the orange dot represents the current position $X$. +Since the slope of the plane is constant in a linear model, this direction is the same everywhere. The algorithms we use when training models (e.g., Gradient Descent) use these vectorial properties to bring the model to the most accurate position. diff --git a/src/content/articles/en/what-is-a-linear-regression.mdx b/src/content/articles/en/what-is-a-linear-regression.mdx new file mode 100644 index 0000000..1bbda7a --- /dev/null +++ b/src/content/articles/en/what-is-a-linear-regression.mdx @@ -0,0 +1,218 @@ +--- +lang: 'en' +slug: 'what-is-a-linear-regression' +title: 'What is Linear Regression?' +excerpt: 'Linear regression is a supervised machine learning algorithm used to model linear relationships from labeled datasets' +category: 'machine-learning' +tags: ['linear-regression', 'supervised-learning', 'teori'] +author: '@omerfdmrl' +date: '2026-03-22' +views: 0 +order: 2 +status: 'Published' +--- + +import ScatterChart3D from '../../../components/content/ScatterChart3D.astro' +import ScatterChart from '../../../components/content/ScatterChart.astro' +import CsvViewer from '../../../components/content/CsvViewer.astro' +import JsonViewer from '../../../components/content/JsonViewer.astro' +import Grid from '../../../components/content/Grid.astro' + +Linear regression is a supervised machine learning algorithm used to model linear relationships from labeled datasets. +The model created allows us to make predictions on a different dataset. +To use it, there must be a linear relationship between input and output, meaning the output must change at a constant rate as the input changes. +This way, we can separate the data with a straight line. + + + + + + + + +In the first graph here, we can separate our dataset because it is linear, but we cannot separate the $y = x^2 + 4$ parabola in the second example. + +## Variables and Terminology + +In statistics literature, input values are called predictors or more commonly, independent variables. +Outputs are used as dependent variables. + +**For example:** Let's say we have a dataset that records how many hours students studied and their exam scores. + + + +From the data, we can see that as study hours increase, the score also rises. +In this example, the exam score prediction is made based on study hours. + +- **Independent Variable:** Study hours because we can control and observe it. +- **Dependent Variable:** Exam score because it depends on how many hours were studied. + +We use the independent variable (input) to predict the dependent variable (output). +If there is one dependent and one independent variable as in this example, it is called **simple linear regression**. + + + + + + + + +In our first example, $y = a + bx$ is a simple regression example. Because our independent variable is only `x`. +In the second example, in the formula $y ≈ 5 + 4x + 2z$, `x` and `z` are independent variables, so there are 2 independent variables in total. +Therefore, the first example is simple, and the second is **multiple linear regression**. + +If there are multiple independent variables (input) and we are trying to predict multiple dependent (output) variables, it is called **Multivariate Linear Regression**. +For example, let's say we have the following dataset: + + + +Here we are trying to predict math and physics exam results based on study hours and the number of practice tests taken. + +- $$y_1 = \beta_{01} + \beta_{11}x_1 + \beta_{21}x_2 + \epsilon_1$$ +- $$y_2 = \beta_{02} + \beta_{12}x_1 + \beta_{22}x_2 + \epsilon_2$$ + +This gives us two different equations, and this is called multivariate linear regression. + +## How It Works + +In Linear Regression, relationships are modeled using **linear predictor functions** from known data, as we have shown in previous examples. +The model answers the question "What is the expected output under these conditions?" by looking at the inputs we have, rather than the probability distribution of all variables. + +Linear regression is historically the first type of regression analysis that has been rigorously studied in statistics and widely used in practical applications. +The main reason for this is that models that are linearly dependent on their unknown parameters are much easier to mathematically solve and fit compared to non-linear models. + +## Its Place in Machine Learning + +Linear regression is not just a classic statistical method, but also a fundamental **machine learning** algorithm. More specifically, it falls under the category of **supervised learning**. + +As in our student score examples, we provide the model with both the inputs (study hours) and the correct answers, i.e., labels (exam scores). The algorithm learns from this labeled dataset and maps the data points to the most optimized linear function that can be used to make predictions on new, previously unseen data. + +## Key Use Cases + +The practical applications of linear regression are generally divided into two main categories: + +- **Prediction and Forecasting:** If our goal is to predict a future or unavailable value by minimizing error, linear regression is an excellent tool. A predictive model is trained with existing data. Then, for a new situation where we only have input values (e.g., study hours), the output (exam score) is predicted. +- **Explaining Relationships and Understanding Variance:** Sometimes the purpose is not just to make predictions, but to numerically measure the strength of the relationship between variables. For example, it is used to find how much change in independent variables leads to change in the dependent variable, to prove that some variables have no relevance to the result, or to identify which variables contain redundant/duplicate information with each other. diff --git a/src/content/articles/en/what-is-supervised-learning.mdx b/src/content/articles/en/what-is-supervised-learning.mdx new file mode 100644 index 0000000..e97be23 --- /dev/null +++ b/src/content/articles/en/what-is-supervised-learning.mdx @@ -0,0 +1,119 @@ +--- +lang: 'en' +slug: 'what-is-supervised-learning' +title: 'What is Supervised Learning?' +excerpt: 'Supervised Learning is a machine learning technique that understands the relationship between input and output data using a labeled dataset' +category: 'machine-learning' +tags: ['supervised-learning', 'teori'] +author: '@omerfdmrl' +date: '2026-03-21' +views: 0 +order: 1 +status: 'Published' +--- + +import CsvViewer from '../../../components/content/CsvViewer.astro' + +Supervised Learning is a machine learning technique that understands the relationship between input and output data using a labeled dataset. +The input data and its corresponding output are prepared in advance, and the model attempts to predict the result for unseen data. + +During training, the model's algorithm processes the datasets to discover potential correlations between the input and output data. +Then, Cross-Validation is applied with a separate test dataset to evaluate the model's success and determine whether it has been trained effectively. + +A sample dataset looks like the following. + + + +In the table above, we can see the relationship between study hours (input) and exam score (output). Our model examines this training data to learn the mathematical relationship between them. Then, when we ask the model about a case in the test data (e.g., studying for 7 hours), it produces a predicted exam score (e.g., a value close to 82) based on what it has learned. + +## Types of Supervised Learning Based on Output + +In supervised learning, the method we use also changes depending on the output we are trying to predict. There are fundamentally two main problem types: + +### Regression + +If the output we are trying to predict is quantitative, meaning a continuous numerical value, this process is called regression. + +**For example:** Predicting the exam score (a value like 82.5, 90.1) by looking at study hours in the table above is a regression problem. +Or predicting tomorrow's ozone level based on atmospheric measurements is also a quantitative measurement. + +### Classification + +If the output we are trying to predict is qualitative, meaning one of specific classes or categories, this is called classification. +There is no mathematical magnitude relationship between the classes. + +**For example:** The famous _MNIST_ dataset, where you look at a handwritten digit image and determine which digit (0,1...8,9) it is. + +To show an example data: + + + +In the table above, the Sepal and Petal lengths, which are the physical measurements of a flower, are our inputs (X). The Species (Setosa, Versicolor, Virginica) that we are trying to predict based on these inputs is our categorical output (Y). +By learning the pattern between these measurements and flower species, the model can classify which species a newly found flower in nature belongs to when we measure its petals. + +### Ensemble Learning + +We don't always have to rely on a single model when solving regression or classification problems. This is where Ensemble Learning comes into play. This approach involves training multiple models for the same task and combining their results. By aggregating the predictions of all models in the pool (e.g., by averaging or voting), the strongest overall result is obtained to solve the problem. + +Each individual algorithm working within this large ensemble structure is called a weak learner or base model. + +So why do we need multiple weak learners? + +Some weak learners may have high bias (they oversimplify the data). + +Others may have high variance (they overfit the training data and fail on new data). + +Theoretically, Ensemble Learning alleviates this famous bias-variance tradeoff by bringing together the best aspects of each base model. It's like getting a collective decision from a council of doctors with different specializations (a concilium) instead of trusting a single doctor's opinion for diagnosing a difficult disease. Even if one is wrong, the majority's decision usually yields the closest result to the truth. + +## Notation + +While researching in literature or different sources, you will see that these input and output values are always expressed with specific mathematical symbols: + +- X (Input): Represents the features we provide to the model. The observed values are usually denoted with lowercase `x`, while matrices (the entire dataset) are denoted with uppercase bold letters. +- Y (Quantitative Output) / G (Qualitative Output): These are the actual results we are targeting. For categorical (Group) outputs, the letter `G` is also commonly used. +- $\hat{Y}$ (Prediction): Represents the predicted value produced by our model. Our main goal is to find the rules that will make the $\hat{Y}$ value as close as possible to the actual $Y$ value using the training data we have. + +## Encoding Categorical Data for Computers + +So what if our output value is not a number but textual (qualitative) data like "Success/Failure" or "Cat/Dog"? How does a computer understand this? + +Since models work with mathematical equations, we need to convert (encode) these categories into numbers: + +- Binary Classes: This is the simplest method. Binary values are assigned, such as `1` for "Survived" and `0` for "Died" (or `-1` and `1`). The model can use the `.5` threshold as a basis when making predictions (e.g., if $\hat{Y} > 0.5$, it belongs to class 1). +- Multiple Classes (Dummy Variables): If there are more than two categories (e.g., Red, Blue, Green), a method called Dummy Variables is used. A separate column is opened for each color, and vectors are created where the column corresponding to the current data's color is `1` (on) and the others are `0` (off). + + + +If you notice, "Vehicle A" only received a value of 1 in the Color_Red column because it is red. Now our model is ready to work not with text, but entirely with mathematical matrices consisting of 1s and 0s! diff --git a/src/data/en/articles.json b/src/data/en/articles.json index fe51488..6a297aa 100644 --- a/src/data/en/articles.json +++ b/src/data/en/articles.json @@ -1 +1,20 @@ -[] +[ + { + "title": "What is a Linear Model?", + "slug": "what-is-a-linear-model", + "category": "machine-learning", + "order": 3 + }, + { + "title": "What is Linear Regression?", + "slug": "what-is-a-linear-regression", + "category": "machine-learning", + "order": 2 + }, + { + "title": "What is Supervised Learning?", + "slug": "what-is-supervised-learning", + "category": "machine-learning", + "order": 1 + } +] diff --git a/src/data/en/categories.json b/src/data/en/categories.json index fe51488..3a9b4ba 100644 --- a/src/data/en/categories.json +++ b/src/data/en/categories.json @@ -1 +1,10 @@ -[] +[ + { + "id": "1", + "name": "machine-learning", + "slug": "machine-learning", + "description": "", + "color": "", + "articleCount": 3 + } +] diff --git a/src/data/en/tags.json b/src/data/en/tags.json index fe51488..55b0efd 100644 --- a/src/data/en/tags.json +++ b/src/data/en/tags.json @@ -1 +1,23 @@ -[] +[ + { + "id": "1", + "name": "linear-regression", + "slug": "linear-regression", + "articleCount": 2, + "description": "" + }, + { + "id": "2", + "name": "supervised-learning", + "slug": "supervised-learning", + "articleCount": 3, + "description": "" + }, + { + "id": "3", + "name": "teori", + "slug": "teori", + "articleCount": 3, + "description": "" + } +]