Categories: More

Table of Contents

Data science has various prerequisites, one of which is the knowledge of statistics. And the most prominent of them is descriptive statistics, which we have discussed in this article. If you are one of those people who have started learning the basics of science, stay with us until the end of this article to learn one of the requirements of this skill.

Descriptive statistics is a term that refers to data analysis that helps to describe, display or summarize data in a meaningful way. Descriptive statistics are very important because if we simply present our raw data, it will be difficult to visualize what the data shows. Therefore it enables us to present data in a more meaningful way and allows for easier interpretation of data.

Following are the methods used in descriptive statistics to rearrange the given data.

Frequency distribution is the organization of data or observations into classes along with the frequency of each class. To form an abundance distribution table, the range of changes, the number of classes, and the volume of classes should be calculated by the relevant formulas.

Then the distribution table should be written in two columns X (class column) and F (abundance of classes). After this stage, if desired or necessary, the researcher can calculate other indicators such as density, and density percentage. Forming a frequency distribution table is an economical yet easy way to represent large amounts of irregular data.

One of the weaknesses of displaying data in the form of a table is the inability to quickly understand the information in the table. Charts are a good tool for a visual representation of information. There are different types of charts, including histograms, column charts, density polygon charts, pie charts, time series charts, etc.

There is research where the researcher wants to determine the relationship between two variables and for this purpose he uses correlation methods. In correlation calculation, the type of measurement scale is involved and they are generally divided into parametric and non-parametric categories.

Regression is a method to study the contribution of one or more independent variables in predicting the dependent variable. It can be used in both descriptive (non-experimental) and experimental research.

According to the type of research and its variables, there are various methods for regression analysis, some of which are: linear regression (with three simultaneous, step-by-step, hierarchical strategies), curve regression, logistic regression, and covariance analysis.

One of the correlation analyses is the analysis of the covariance matrix or correlation matrix. The two most famous types of these analyzes are

- The factor analysis model finds out the underlying variables of a phenomenon in two categories, exploratory and confirmatory.
- The structural equation model investigates the causal relationships between variables.

In the real world, we have huge amounts of data that we need to work with in order to distinguish them by certain features. But what are these features we are talking about? In the following, we will have a brief explanation of each of these features.

In a data set, the smallest number will be the minimum value or min.

The largest number in a data set will be the maximum value or max of that set.

The sum of all the numbers in a set is called the sum.

The number of numbers in a set or list is called the length of the data set.

The result of dividing the total data by the number of data or length is called the mean of the data set.

When we sort the numbers in a list or set from small to large, we have actually turned that list into a sorted list.

To find the median, the first and most important thing is that the list data must be sorted. The middle data in each list is the median. For example, if we have 11 data in a list, we consider the first 5 data as a list and the last 5 data as a list. The middle data will be the middle of that list.

In the case that the number of data in the list is even. For example, 10, after considering the first 5 numbers as a list and the last 5 numbers as a separate list, the average of the last number of the first list and the first number of the second list will be our median.

- It is the data that has the highest number of repetitions in the list.

Variance is also called a deviation from the mean. To calculate the variance, we act as follows:

- First, find the mean of the data set by adding all the data together and then dividing by the number.
- Now calculate the square of the result by subtracting each data from the mean.
- Add together the values of the previous step for each data.
- Now we divide the value obtained in the previous step by the total number of data to obtain the variance.

The square root of the variance is the standard deviation.

The difference between Statistics and **Mathematics** is that probability models existing real phenomena, but **Statistics** counts real phenomena. The attempt of statistics is to understand how the models and mechanisms that exist in real phenomena can be counted.

In general, it can be said that the goal of statistics is to provide an estimate of a population using a smaller sample. We have two types of statistics: descriptive statistics and inferential statistics.

In **descriptive statistics**, the goal is to describe the data, but in inferential statistics, the goal is to analyze the data. In the following, we will discuss descriptive statistics and in the next sections, we will also discuss inferential statistics.

Before entering the concepts of descriptive statistics, it is necessary to know the types of data; Because in data science, our main task is to work with data. Data can be qualitative or descriptive and quantitative or quantitative.

Qualitative data are divided into two categories: **nominal and ordinal.** The difference between the two is that **nominal data** cannot be compared or sorted, while **ordinal data** can be compared and sorted. For example, if we divide people based on income, we can have low income, middle income, and high income, and these can be compared or sorted.

Quantitative data is also divided into two branches: **discrete data and continuous data**. **Discrete data** can only have specific values; For example, if we throw dice in the air, we will only get a number between 1 and 6, and data like 1.5 is meaningless, but **continuous data** can have any value, such as people’s weight.

Now that we are familiar with the types of data, we will enter the discussion of descriptive statistics. In general, in this type of statistics, our goal is either the numerical description of the data or their graphic description.

First, we will describe the number in which there are indicators of central tendency such as mean, median, mode and quotient, dispersion indicators such as variance, standard deviation, and coefficient of variation, and indicators of the shape of the function such as skewness and kurtosis.

Average has different types:

We add the data together and divide by the number.

We multiply the data together and calculate the term n or the number of data.

We divide the total number of data by the inverse sum of all data.

To find the median, the first and most important thing is that the list data must be sorted. The middle data in each list is the median. For example, if we have 11 data in a list, we consider the first 5 data as a list and the last 5 data as a list, the middle data will be the middle of that list.

In the case that the number of data in the list is even, for example, 12, after considering the first 5 numbers as a list and the last 5 numbers as a separate list, the average of the last number of the first list and the first data of the second list will be our median.

It is the data that has the highest number of repetitions in the list.

A quotient can be a quarter, a tenth, or a percent. A first percentile is a number that 1% of the data is smaller than and 99% of the data is larger than it. In the same way, this can be generalized for the quarter and the tenth.

to the variance of the deviation from 1. We find the average of the data set, in this way, we add all the data together and then divide by the number. It is also called the average. To calculate the variance, we act as follows:

- We find the mean of the data set by adding all the data together and then dividing by the number.
- We calculate the square of the result by subtracting each data from the mean.
- We add together the values of the previous step for each data.
- Now we divide the value obtained in the previous step by the total number of data to obtain the variance.

The square root of the variance is the standard deviation.

The ratio of the standard deviation to the mean is called the coefficient of variation.

**Descriptive Statistics**

- It describes the characteristics of populations or samples.
- It organizes and presents data in a completely realistic manner.
- It presents the final results using tables and graphs.
- Draws conclusions based on known data.
- It uses measures such as central tendency, distribution, and variance.

**Inferential Statistics**

- It uses samples to make generalizations about larger populations.
- It helps us to estimate and predict future results.
- It presents the final results as probabilities.
- It shows a conclusion that goes beyond the available data.
- It uses techniques such as hypothesis testing, confidence intervals, and regression and correlation analysis.

Hope you understand the topic completely. If you still have any questions write us in the comment section. we will answer you very soon. Do share with your friends if you like this. Thanks.

Having bad credit can make it challenging to obtain a personal loan, but it's not… Read More

3 months ago

Traveling doesn't have to break the bank. With some careful planning and smart strategies, you… Read More

3 months ago

Are you looking for a job in the fruit packing industry with the added benefit… Read More

3 months ago

Are you considering a move from the United States to Canada? Whether it's for a… Read More

3 months ago

A credit card is a financial tool that allows you to borrow money from a… Read More

3 months ago

Watching sports online for free can be challenging due to the licensing agreements and restrictions… Read More

4 months ago