Banner Image

Chi-Square Test

Written by Prerit Jain

Updated on: 03 Feb 2025

Chi-Square Test

Chi-Square Test

Introduction

A chi-square test is a statistical test used to compare observed results with expected results. It is the analysis of the data based on the observations of a random set of variables and is used to compare sets of statistical data. 

Chi-square tests are used in hypothesis testing where a condition can be true, and it is tested afterwards. The tests are used to estimate the irregularities between the actual results and the expected results using the number of variables and the size of the samples.

The test is used to estimate how likely the observations made would be, by considering the null hypothesis to be true.

The null hypothesis is a hypothesis in which the sample observations result from the chance. The null hypothesis is a kind of hypothesis which explains the population parameter whose purpose is to test the validity of the given experimental data.

Understanding the Chi-Square

  •  Chi-square distribution

Considering the null hypothesis to be true, the sampling distribution of the test statistic is called the chi-square distribution.

The test is used to determine if there is a significant difference between the observed frequencies and the normal frequencies. It gives the probability of independent variables.

  • The degree of freedom

It is calculated by the formula: 

Where: 

is the degree of freedom.

 is the number of rows. 

is the number of columns. 

  • The P-value in statistics

P is the value for probability and the chi-square is used to calculate it. 

It defines the probability of getting a result that is either the same or more extreme than the other actual observations. The P-value represents the probability of occurrence of the given event. 

The hypothesis interpretation and the value of P relation is as follows:

If then the hypothesis is rejected.

If then the hypothesis is accepted.

Applications of the Chi-Square test

The Chi-square test which is also called  the test is used mainly in three different types of statistical circumstances. The following will help to assess which one can be used as an appropriate inference procedure for categorical data.

  • Homogeneity: To determine if two populations with uncertain distributions share the same distribution, use the homogeneity test. In this instance, two distinct populations will each get one qualitative survey question or experiment.

The alternate and null hypotheses are:

: the populations follow the same distribution.

: the population has different distributions.

  • Goodness-of-fit: Use the goodness-of-fit test to ascertain if a population with an unknown distribution matches a known distribution. In this case, there will only be one qualitative survey question or one experiment’s findings from one demographic. In order to evaluate if a population is uniform (all outcomes occur with the same frequency), normal, or identical to another population with a known distribution, the Goodness-of-Fit test is widely employed.The alternate and null hypotheses are:

: the population fits the given distribution.

: the population does not fit the given distribution.

  • Independence: To determine if two variables (factors) are independent or dependent, use the independence test. In this instance, a contingency table will be built along with two qualitative survey questions or experiments. The objective is to determine if the two variables are connected or unrelated (dependent). 

The alternate and null hypotheses are:

: the two variables are independent.

: the two variables are dependent. 

Properties 

Some of the major properties of the chi-square test are:

  • The chi-square distribution curve approaches the normal distribution when the degree of freedom increases.
  • The number of degrees of freedom is equal to the mean distribution.
  •  Two times the number of degrees of freedom is equal to the variance.

Formula 

The formula for the chi-square test is to check the difference between the observed value and expected value.

The formula is:

Where,

is the observed value,

is the expected value.

Steps to use Chi-Square Test

  • Define the hypothesis, i.e., find the right definition we want to use forandof the data.
  • Calculate the expected frequency value using the formula:
  • Calculate for each cell. 
  • is the sum of the calculated  of each cell.
  • Now we have to determine the critical statistics and we use the formula of degree of freedom and then choose an appropriate alpha level in the critical values of chi-square distribution table with  .

      The table is as follows:

Chi_Sq_formula_3

If the obtained value of is higher than the critical value then we can reject the hypothesis, if it is lesser, we can accept the null hypothesis.

Limitations of the Chi-Square test

  • Only the relationship between two variables can be determined using the chi-square. It does not follow that there must be a causal connection between two variables.
  • To begin with, the chi-square test is very sensitive to sample size. When a big enough sample is employed, even small corrections and connections might seem statistically significant.
  • The Chi-square statistic is only applicable to numerical data. They are not applicable to information with percentages, proportions, means, or other statistical components.

Conclusion

The chi-square test is a very important topic in statistical analysis of random data sets and is used in day-to-day analysis of expected values. We learnt how the chi-squared distribution works and how to find the related values. We also learnt how the chi-square value and the critical value are related.

Solved Examples

Example 1: Calculate the Chi-Square value of the following data of cars by each family in the area using the data given in the table below.

Number of cars 
One car2520.4
Two cars1311
Three cars75.9
Total45

Solution 1:

To find the chi-square let us use the formula

Finding for each category we get 

One car1.04
Two cars0.36
Three cars0.21

Hence =

The chi-square value is 1.61.

Example 2: The number of corn dogs sold during a carnival to men, women and children and the percentage of the total corndogs bought are as follows. Find the Chi-square value.

Category Percentages
Men 6440
Women 4520
Children 5020
Total 159

Solution 2:

First, we need to find the expected value for each category.

Category 
Men 
Women 
Children 

Now let us use the formula to calculate the value of for each category.

Category 
Men0.002
Women 5.439
Children 10.416

Now we find the sum of the calculated values to get .

Hence, =.

Example 3: The number of times (in million) the songs by different artists has been streamed is as follows. Find the Chi-Square.

Artists
Drake 1311
Travis Scott 2522

Solution 3:

To find the chi-square let us use the formula 

Findingfor each artist we get:

Artists
Drake0.36
Travis Scott0.41

Hence =

 The chi-square value is 0.77.

Example 4: The sample for the voting of prom queen are given as follows. Prom queen 1 and 2 are two girls who were nominated for it. 

Prom queen 1Prom queen 2Total
Male 101222
Female 14620
Total241842

Find out if gender has anything to do with the prom queen preference.

Solution 4: 

We first define a hypothesis. 

: There is no link between gender and prom queen preference.

: There is a link between gender and prom queen preference.

Now lets calculate the expected values for each cell using .

Prom queen 1Prom queen 2Total 
Male 12.579.4222
Female 11.428.5720
Total241842

Now we need to calculate for each cell.

Prom queen 1Prom queen 2
Male 0.5250.706
Female 0.5830.770

Now to calculate we  need to add all the calculated values in the previous table.

Hence, the value of is 2.584.

Let us find the value of which is:

We use the value to determine the critical value using from the table.

We get the critical value to be 3.841.

We can see that our value (2.584) is lesser than the obtained critical value (3.841).

Hence, we can accept our null hypothesis.

Example 5: In a survey of cars, a sample of the study of the number of Audi cars and Jeep cars in two cities, city1 and city2 are as follows. Find the chi-square and see if there is a link between the cities and the type of cars used.

City1City2Total 
Audi4560105
Jeep504090
Total 95100195

Solution 5: 

We first define a hypothesis. 

: There is no link between gender and prom queen preference.

: There is a link between gender and prom queen preference.

Now let’s calculate the expected values for each cell using 

City1City2Total 
Audi 51.1553.85105
Jeep 43.8546.1590
Total 95100195

Now we need to calculate for each cell.

City1City2
Audi 0.7390.702
Jeep 0.8630.820

Now to calculate we need to add all the calculated values in the previous table.

Hence, the value of is 3.124.

Let us find the value of which is:

We use the   value to determine the critical value using from the table.

We get the critical value to be 3.841.

We can see that our value (3.124) is lesser than the obtained critical value (3.841).

Hence, we can accept our null hypothesis.

FAQs

1. What is a preferred and advisable Chi-square value?

Ans: The Chi-square value of 5 is considered to be good. The anticipated frequency must be at least five for a chi-square method to be reliable.

2. What are critical values in statistics?

Ans: Critical value in statistics is a cut-off value that is compared with a test statistic in hypothesis testing to check whether the null hypothesis should be rejected or not.

3. What does it mean when the calculated chi-value is close to the critical value?

Ans: The hypothesis needs more attention.

4. Where else is the degree of freedom used in statistics?

Ans: It is an essential idea that appears in many contexts throughout statistics including hypothesis tests, probability distributions, and linear regression.

5. What kind of statistical data is used for chi-square calculation?

Ans: The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample. 

References

Lancaster, H. O., & Seneta, E. (2005). Chi‐square distribution. Encyclopedia of biostatistics2.Wilson, E. B., & Hilferty, M. M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences17(12), 684-688

Written by

Prerit Jain

Share article on

tutor Pic
tutor Pic

First Lesson Free

No Credit Card

No Subscription