Categories: More

Descriptive Statistics Methods & Examples

Data science has various prerequisites, one of which is the knowledge of statistics. And the most prominent of them is descriptive statistics, which we have discussed in this article. If you are one of those people who have started learning the basics of science, stay with us until the end of this article to learn one of the requirements of this skill.

What is Descriptive Statistics?

Descriptive statistics is a term that refers to data analysis that helps to describe, display or summarize data in a meaningful way. Descriptive statistics are very important because if we simply present our raw data, it will be difficult to visualize what the data shows. Therefore it enables us to present data in a more meaningful way and allows for easier interpretation of data.

Descriptive Statistics Methods

Following are the methods used in descriptive statistics to rearrange the given data.

Formation of Frequency Distribution Table

Frequency distribution is the organization of data or observations into classes along with the frequency of each class. To form an abundance distribution table, the range of changes, the number of classes, and the volume of classes should be calculated by the relevant formulas.

Then the distribution table should be written in two columns X (class column) and F (abundance of classes). After this stage, if desired or necessary, the researcher can calculate other indicators such as density, and density percentage. Forming a frequency distribution table is an economical yet easy way to represent large amounts of irregular data.

Draw a Diagram

One of the weaknesses of displaying data in the form of a table is the inability to quickly understand the information in the table. Charts are a good tool for a visual representation of information. There are different types of charts, including histograms, column charts, density polygon charts, pie charts, time series charts, etc.

Calculate the Correlation

There is research where the researcher wants to determine the relationship between two variables and for this purpose he uses correlation methods. In correlation calculation, the type of measurement scale is involved and they are generally divided into parametric and non-parametric categories.

Regression and Prediction

Regression is a method to study the contribution of one or more independent variables in predicting the dependent variable. It can be used in both descriptive (non-experimental) and experimental research.

According to the type of research and its variables, there are various methods for regression analysis, some of which are: linear regression (with three simultaneous, step-by-step, hierarchical strategies), curve regression, logistic regression, and covariance analysis.

Covariance Matrix Data Analysis

One of the correlation analyses is the analysis of the covariance matrix or correlation matrix. The two most famous types of these analyzes are

The factor analysis model finds out the underlying variables of a phenomenon in two categories, exploratory and confirmatory.
The structural equation model investigates the causal relationships between variables.

How to Explain Descriptive Statistics?

In the real world, we have huge amounts of data that we need to work with in order to distinguish them by certain features. But what are these features we are talking about? In the following, we will have a brief explanation of each of these features.

Minimum

In a data set, the smallest number will be the minimum value or min.

Maximum

The largest number in a data set will be the maximum value or max of that set.

Summation

The sum of all the numbers in a set is called the sum.

Length

The number of numbers in a set or list is called the length of the data set.

Mean

The result of dividing the total data by the number of data or length is called the mean of the data set.

Sorted List

When we sort the numbers in a list or set from small to large, we have actually turned that list into a sorted list.

Median

To find the median, the first and most important thing is that the list data must be sorted. The middle data in each list is the median. For example, if we have 11 data in a list, we consider the first 5 data as a list and the last 5 data as a list. The middle data will be the middle of that list.

In the case that the number of data in the list is even. For example, 10, after considering the first 5 numbers as a list and the last 5 numbers as a separate list, the average of the last number of the first list and the first number of the second list will be our median.

Mode

It is the data that has the highest number of repetitions in the list.

Variance

Variance is also called a deviation from the mean. To calculate the variance, we act as follows:

First, find the mean of the data set by adding all the data together and then dividing by the number.
Now calculate the square of the result by subtracting each data from the mean.
Add together the values of the previous step for each data.
Now we divide the value obtained in the previous step by the total number of data to obtain the variance.

Standard Deviation

The square root of the variance is the standard deviation.

Difference Between Statistics and Mathematics

The difference between Statistics and Mathematics is that probability models existing real phenomena, but Statistics counts real phenomena. The attempt of statistics is to understand how the models and mechanisms that exist in real phenomena can be counted.

In general, it can be said that the goal of statistics is to provide an estimate of a population using a smaller sample. We have two types of statistics: descriptive statistics and inferential statistics.

In descriptive statistics, the goal is to describe the data, but in inferential statistics, the goal is to analyze the data. In the following, we will discuss descriptive statistics and in the next sections, we will also discuss inferential statistics.

Data Types Descriptive Statistics

Before entering the concepts of descriptive statistics, it is necessary to know the types of data; Because in data science, our main task is to work with data. Data can be qualitative or descriptive and quantitative or quantitative.

Qualitative data are divided into two categories: nominal and ordinal. The difference between the two is that nominal data cannot be compared or sorted, while ordinal data can be compared and sorted. For example, if we divide people based on income, we can have low income, middle income, and high income, and these can be compared or sorted.

Quantitative data is also divided into two branches: discrete data and continuous data. Discrete data can only have specific values; For example, if we throw dice in the air, we will only get a number between 1 and 6, and data like 1.5 is meaningless, but continuous data can have any value, such as people’s weight.

Concepts of Descriptive Statistics

Now that we are familiar with the types of data, we will enter the discussion of descriptive statistics. In general, in this type of statistics, our goal is either the numerical description of the data or their graphic description.

First, we will describe the number in which there are indicators of central tendency such as mean, median, mode and quotient, dispersion indicators such as variance, standard deviation, and coefficient of variation, and indicators of the shape of the function such as skewness and kurtosis.

Average

Average has different types:

Arithmetic Mean

We add the data together and divide by the number.

Geometric Mean

We multiply the data together and calculate the term n or the number of data.

Harmonic Mean

We divide the total number of data by the inverse sum of all data.

Median

In the case that the number of data in the list is even, for example, 12, after considering the first 5 numbers as a list and the last 5 numbers as a separate list, the average of the last number of the first list and the first data of the second list will be our median.

Mode

It is the data that has the highest number of repetitions in the list.

Quantile

A quotient can be a quarter, a tenth, or a percent. A first percentile is a number that 1% of the data is smaller than and 99% of the data is larger than it. In the same way, this can be generalized for the quarter and the tenth.

Variance

to the variance of the deviation from 1. We find the average of the data set, in this way, we add all the data together and then divide by the number. It is also called the average. To calculate the variance, we act as follows:

We find the mean of the data set by adding all the data together and then dividing by the number.
We calculate the square of the result by subtracting each data from the mean.
We add together the values of the previous step for each data.
Now we divide the value obtained in the previous step by the total number of data to obtain the variance.

Standard Deviation

The square root of the variance is the standard deviation.

Coefficient of Variation

The ratio of the standard deviation to the mean is called the coefficient of variation.

Descriptive Statistics Vs Inferential Statistics

Descriptive Statistics

It describes the characteristics of populations or samples.
It organizes and presents data in a completely realistic manner.
It presents the final results using tables and graphs.
Draws conclusions based on known data.
It uses measures such as central tendency, distribution, and variance.

Inferential Statistics

It uses samples to make generalizations about larger populations.
It helps us to estimate and predict future results.
It presents the final results as probabilities.
It shows a conclusion that goes beyond the available data.
It uses techniques such as hypothesis testing, confidence intervals, and regression and correlation analysis.

Final Words

Hope you understand the topic completely. If you still have any questions write us in the comment section. we will answer you very soon. Do share with your friends if you like this. Thanks.