How To Find Degrees In Statistics

Last Updated on January 17, 2023

What You Need to Know About Inferential Statistics to Boost Your Career in  Data Science

Definition

The degrees of freedom in a statistical calculation represent how many values involved in a calculation have the freedom to vary. The degrees of freedom can be calculated to help ensure the statistical validity of chi-square tests, t-tests and even the more advanced f-tests. These tests are commonly used to compare observed data with data that would be expected to be obtained according to a specific hypothesis.

For example, let’s suppose a drug trial is conducted on a group of patients and it is hypothesized that the patients receiving the drug would show increased heart rates compared to those that did not receive the drug. The results of the test could then be analyzed to determine whether the difference in heart rates is considered significant, and degrees of freedom are part of the calculations.

Because degrees of freedom calculations identify how many values in the final calculation are allowed to vary, they can contribute to the validity of an outcome. These calculations are dependent upon the sample size, or observations, and the parameters to be estimated, but generally, in statistics, degrees of freedom equal the number of observations minus the number of parameters. This means there are more degrees of freedom with a larger sample size

Formula for Degrees of Freedom

The statistical formula to determine degrees of freedom is quite simple. It states that degrees of freedom equal the number of values in a data set minus 1, and looks like this:

df = N-1

Where N is the number of values in the data set (sample size). Take a look at the sample computation.

If there is a data set of 4, (N=4).

Call the data set X and create a list with the values for each data.

For this example data, set X includes: 15, 30, 25, 10

This data set has a mean, or average of 20. Calculate the mean by adding the values and dividing by N:

(15+30+25+10)/4= 20

Using the formula, the degrees of freedom would be calculated as df = N-1:

In this example, it looks like, df = 4-1 = 3

This indicates that, in this data set, three numbers have the freedom to vary as long as the mean remains 20.

Critical Values

Knowing the degrees of freedom for a population or for a sample does not give us a whole lot of useful information by itself. This is because after we calculate the degrees of freedom, which is the number of values in a calculation that we can vary, it is necessary to look up the critical values for our equation using a critical value table. These tables can be found in textbooks or by searching online. When using a critical value table, the values found in the table determine the statistical significance of the results.

Examples of how degrees of freedom can enter statistical calculations are the chi-squared tests and t-tests. There are several t-tests and chi-square tests that can be differentiated by using degrees of freedom.

Chi-Square Distribution for Different Number of Degrees of Freedom

Standard Normal Distribution

Procedures involving standard normal distribution are listed for completeness and to clear up some misconceptions. These procedures do not require us to find the number of degrees of freedom. The reason for this is that there is a single standard normal distribution. These types of procedures encompass those involving a population mean when the population standard deviation is already known, and also procedures concerning population proportions.

One Sample T Procedures

Sometimes statistical practice requires us to use Student’s t-distribution. For these procedures, such as those dealing with a population mean with unknown population standard deviation, the number of degrees of freedom is one less than the sample size. Thus if the sample size is n, then there are n – 1 degrees of freedom.

T Procedures With Paired Data

Many times it makes sense to treat data as paired. The pairing is carried out typically due to a connection between the first and second value in our pair. Many times we would pair before and after measurements. Our sample of paired data is not independent; however, the difference between each pair is independent. Thus if the sample has a total of n pairs of data points, (for a total of 2n values) then there are n – 1 degrees of freedom.

T Procedures for Two Independent Populations

For these types of problems, we are still using a t-distribution. This time there is a sample from each of our populations. Although it is preferable to have these two samples be of the same size, this is not necessary for our statistical procedures. Thus we can have two samples of size n1 and n2. There are two ways to determine the number of degrees of freedom. The more accurate method is to use Welch’s formula, a computationally cumbersome formula involving the sample sizes and sample standard deviations. Another approach, referred to as the conservative approximation, can be used to quickly estimate the degrees of freedom. This is simply the smaller of the two numbers n1 – 1 and n2 – 1.

Chi-Square for Independence

One use of the chi-square test is to see if two categorical variables, each with several levels, exhibit independence. The information about these variables is logged in a two-way table with r rows and c columns. The number of degrees of freedom is the product (r – 1)(c – 1).

Chi-Square Goodness of Fit

Chi-square goodness of fit starts with a single categorical variable with a total of n levels. We test the hypothesis that this variable matches a predetermined model. The number of degrees of freedom is one less than the number of levels. In other words, there are n – 1 degrees of freedom.

One Factor ANOVA

One factor analysis of variance (ANOVA) allows us to make comparisons between several groups, eliminating the need for multiple pairwise hypothesis tests. Since the test requires us to measure both the variation between several groups as well as the variation within each group, we end up with two degrees of freedom. The F-statistic, which is used for one factor ANOVA, is a fraction. The numerator and denominator each have degrees of freedom. Let c be the number of groups and n is the total number of data values. The number of degrees of freedom for the numerator is one less than the number of groups, or c – 1. The number of degrees of freedom for the denominator is the total number of data values, minus the number of groups, or n – c.

It is clear to see that we must be very careful to know which inference procedure we are working with. This knowledge will inform us of the correct number of degrees of freedom to use.

Why Do Critical Values Decrease While DF Increase?

Thanks to Mohammed Gezmu for this question.

t-score

Let’s take a look at the t-score formula in a hypothesis test:

When n increases, the t-score goes up. This is because of the square root in the denominator: as it gets larger, the fraction s/√n gets smaller and the t-score (the result of another fraction) gets bigger. As the degrees of freedom are defined above as n-1, you would think that the t-critical value should get bigger too, but they don’t: they get smaller. This seems counter-intuitive.

However, think about what a t-test is actually for. You’re using the t-test because you don’t know the standard deviation of your population and therefore you don’t know the shape of your graph. It could have short, fat tails. It could have long skinny tails. You just have no idea. The degrees of freedom affect the shape of the graph in the t-distribution; as the df get larger, the area in the tails of the distribution get smaller. As df approaches infinity, the t-distribution will look like a normal distribution. When this happens, you can be certain of your standard deviation (which is 1 on a normal distribution).

Let’s say you took repeated sample weights from four people, drawn from a population with an unknown standard deviation. You measure their weights, calculate the mean difference between the sample pairs and repeat the process over and over. The tiny sample size of 4 will result a t-distribution with fat tails. The fat tails tell you that you’re more likely to have extreme values in your sample. You test your hypothesis at an alpha level of 5%, which cuts off the last 5% of your distribution. The graph below shows the t-distribution with a 5% cut off. This gives a critical value of 2.6. (Note: I’m using a hypothetical t-distribution here as an example–the CV is not exact).

sample size and t dist shape


Now look at the normal distribution. We have less chance of extreme values with the normal distribution. Our 5% alpha level cuts off at a CV of 2.

Back to the original question “Why Do Critical Values Decrease While DF Increases?” Here’s the short answer:

Degrees of freedom are related to sample size (n-1). If the df increases, it also stands that the sample size is increasing; the graph of the t-distribution will have skinnier tails, pushing the critical value towards the mean.

About the author

The Editorial Team at Infolearners.com is dedicated to providing the best information on learning. From attaining a certificate in marketing to earning an MBA, we have all you need. If you feel lost, reach out to an admission officer.
Study on Scholarship Today -- Check your eligibility for up to 100% scholarship.

Leave a Comment