Find top 1-on-1 online tutors for Coding, Math, Science, AP and 50+ subjects

Table of Contents

AP

Chi-Square Test

A chi-square test is a statistical test used to compare observed results with expected results. It is data analysis based on the observations of a random set of variables and is used to compare sets of statistical data.

Chi-square tests are used in hypothesis testing where a condition can be true, and it is tested afterward. The tests are used to estimate the inconsistency between the actual results and the expected results using the number of variables and the size of the samples.

The test is used to estimate how likely the observations made would be, by considering the null hypothesis to be true.

The null hypothesis is a hypothesis in which the sample observations result from chance. The null hypothesis is a kind of hypothesis that explains the population parameter whose purpose is to test the validity of the given experimental data.

Understanding the chi-square test

  1. Chi-square distribution
    • Considering the null hypothesis to be true, the sampling distribution of the test statistic is called the chi-square distribution.
    • The test is used to determine if there is a significant difference between the observed frequencies and the normal frequencies. It gives the probability of independent variables.
  2. The degrees of freedom
    • It is calculated by the formula:

          \[d=(r-1)(c-1)\]


      Where:
      d is the degree of freedom.
      ris the number of rows.
      cis the number of columns.
  3. The P-value in statistics
    • P is the value for probability and the chi-square is used to calculate it.
    • It defines the probability of getting a result that is either the same or more extreme than the other actual observations. The P-value represents the probability of occurrence of the given event.

The hypothesis interpretation and the value of P relation is as follows:

If P\le 0.05 then the hypothesis is rejected.
If P>0.5then the hypothesis is accepted.

Applications of the chi-square test

The Chi-square test which is also called the{{X}^{2}} test is used mainly in three different types of statistical circumstances. The following will help to assess which one can be used as an appropriate inference procedure for categorical data.

  • Homogeneity: To determine if two populations with uncertain distributions share the same distribution, use the homogeneity test. In this instance, two distinct populations will each get one qualitative survey question or experiment.

The alternate and null hypotheses are:

{{H}_{0}}: the populations follow the same distribution.

{{H}_{a}}: the population has different distributions.

  • Goodness-of-fit: Use the goodness-of-fit test to ascertain if a population with an unknown distribution matches a known distribution. In this case, there will only be one qualitative survey question or one experiment’s findings from one demographic. In order to evaluate if a population is uniform (all outcomes occur with the same frequency), normal, or identical to another population with a known distribution, the Goodness-of-Fit test is widely employed. The alternate and null hypotheses are:

{{H}_{0}}: the population fits the given distribution.

{{H}_{a}}: the population does not fit the given distribution.

  • Independence: To determine if two variables (factors) are independent or dependent, use the independence test. In this instance, a contingency table will be built along with two qualitative survey questions or experiments. The objective is to determine if the two variables are connected or unrelated (dependent).

The alternate and null hypotheses are:

{{H}_{0}}: the two variables are independent.

{{H}_{a}}: the two variables are dependent.

ap statistics practice tests and past papers download

Properties

Some of the major properties of the chi-square test are:

  • The chi-square distribution curve approaches the normal distribution when the degree of freedom increases.
  • The number of degrees of freedom is equal to the mean distribution.
  • Two times the number of degrees of freedom is equal to the variance.

Formula

The formula for the chi-square test is to check the difference between the observed value and expected value.

The formula is:

Where,

{{O}_{i}}is the observed value,

{{E}_{i}}is the expected value.

How to use the chi-square test?

  • Define the hypothesis, i.e., find the right definition we want to use for{{H}_{0}}and{{H}_{a}}of the data.
  • Calculate the expected frequency value using the formula:

    \[Expected\ value=\frac{(row\ total)\times (Column\ total)}{total\ number\ of\ observations}\]

  • Calculate \frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}for each cell.
  • {{X}^{2}} is the sum of the calculated  \frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}} of each cell.
  • Now we have to determine the critical statistics and we use the formula of degree of freedomd=(r-1)(c-1) and then choose an appropriate alpha level in the critical values of chi-square distribution table with  d.

The table is as follows:

Chi_Sq_formula_3

If the obtained value of {{X}^{2}}is higher than the critical value then we can reject the hypothesis, if it is lesser, we can accept the null hypothesis.

Limitations of the chi-square test

  • Only the relationship between two variables can be determined using the chi-square. It does not follow that there must be a causal connection between the two variables.
  • To begin with, the chi-square test is very sensitive to sample size. When a big enough sample is employed, even small corrections and connections might seem statistically significant.
  • The Chi-square statistic is only applicable to numerical data. They are not applicable to information with percentages, proportions, means, or other statistical components.

Solved examples

Example 1: Calculate the Chi-Square value of the following data of cars by each family in the area using the data given in the table below.

Number of cars

    \[{{O}_{i}}\]

    \[{{E}_{i}}\]

One car2520.4
Two cars1311
Three cars75.9
Total45 

Solution 1:

To find the chi-square let us use the formula:

    \[{{X}^{2}}=\sum{\frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}}\]

Finding \frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}for each category we get

One car1.04
Two cars0.36
Three cars0.21

Hence {{X}^{2}}=1.04+0.36+0.21=1.61

The chi-square value is 1.61.

Example 2: The number of corn dogs sold during a carnival to men, women, and children and the percentage of the total corndogs bought is as follows. Find the Chi-square value.

Category

    \[{{O}_{i}}\]

Percentages
Men6440
Women4520
Children5020
Total159 

Solution 2:

First, we need to find the expected value for each category.

Category

    \[{{E}_{i}}\]

Men

    \[159\times 0.4=63.6\]

Women

    \[159\times 0.2=31.8\]

Children

    \[159\times 0.2=31.8\]

Now let us use the formula to calculate the value of \frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}for each category.

Category

    \[\frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}\]

Men0.002
Women5.439
Children10.416

Now we find the sum of the calculated values to get {X^2}.

Hence, {{X}^{2}}=0.002+5.439+10.416=15.587.

Example 3: The number of times (in million) the songs by different artists has been streamed is as follows. Find the Chi-Square.

Artists

    \[{{O}_{i}}\]

    \[{{E}_{i}}\]

Drake1311
Travis Scott2522

Solution 3:

To find the chi-square let us use the formula:

    \[{{X}^{2}}=\sum{\frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}}\]

Finding\frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}for each artist we get:

Artists

    \[\frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}\]

Drake0.36
Travis Scott0.41

Hence {{X}^{2}} =0.36+0.41=0.77

 The chi-square value is 0.77.

Example 4: The sample for the voting of prom queen is given as follows. Prom queen 1 and 2 are two girls who were nominated for it.

 Prom queen 1Prom queen 2Total
Male101222
Female14620
Total241842

Find out if gender has anything to do with the prom queen preference.

Solution 4:

We first define a hypothesis.

{{H}_{0}}: There is no link between gender and prom queen preference.

{{H}_{a}}: There is a link between gender and prom queen preference.

Now let’s calculate the expected values for each cell using:

    \[Expected\ value=\frac{(row\ total)\times (Column\ total)}{total\ number\ of\ observations}\]

.

 Prom queen 1Prom queen 2Total
Male12.579.4222
Female11.428.5720
Total241842

Now we need to calculate \frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}}for each cell.

 Prom queen 1Prom queen 2
Male0.5250.706
Female0.5830.770

Now to calculate {{X}^{2}} we need to add all the calculated values in the previous table.

Hence, the value of {{X}^{2}}is 2.584.

Let us find the value of dwhich is:

    \[d=(r-1)(c-1)\]

    \[d=(2-1)(2-1)=1\]

We use the d value to determine the critical value using \alpha =0.05from the table.

We get the critical value to be 3.841.

We can see that our value (2.584) is lesser than the obtained critical value (3.841).

Hence, we can accept our null hypothesis.

Example 5: In a survey of cars, a sample of the study of the number of Audi cars and Jeep cars in two cities, city1, and city2 are as follows. Find the chi-square and see if there is a link between the cities and the type of cars used.

 City1City2Total
Audi4560105
Jeep504090
Total95100195

Solution 5:

We first define a hypothesis.

{{H}_{0}}: There is no link between gender and prom queen preference.

{{H}_{a}}: There is a link between gender and prom queen preference.

Now let’s calculate the expected values for each cell using:

    \[Expected\ value=\frac{(row\ total)\times (Column\ total)}{total\ number\ of\ observations}\]

 City1City2Total
Audi51.1553.85105
Jeep43.8546.1590
Total95100195

Now we need to calculate\frac{{{({{O}_{i}}-{{E}_{i}})}^{2}}}{{{E}_{i}}} for each cell.

 City1City2
Audi0.7390.702
Jeep0.8630.820

Now to calculate {{X}^{2}}we need to add all the calculated values in the previous table.

Hence, the value of {{X}^{2}}is 3.124.

Let us find the value of dwhich is:

    \[d=(r-1)(c-1)\]

    \[d=(2-1)(2-1)=1\]

We use the value to determine the critical value using\alpha =0.05 from the table.

We get the critical value to be 3.841.

We can see that our value (3.124) is lesser than the obtained critical value (3.841).

Hence, we can accept our null hypothesis.

Conclusion

The chi-square test is a very important topic in the statistical analysis of random data sets and is used in the day-to-day analysis of expected values. We learned how the chi-squared distribution works and how to find the related values. We also learned how the chi-square value and the critical value are related.

Frequently asked questions (FAQs)

What is a preferred and advisable Chi-square value?

The Chi-square value of 5 is considered to be good. The anticipated frequency must be at least five for a chi-square method to be reliable.

What are critical values in statistics?

Critical value in statistics is a cut-off value that is compared with a test statistic in hypothesis testing to check whether the null hypothesis should be rejected or not.

What does it mean when the calculated chi-value is close to the critical value?

The hypothesis needs more attention.

Where else is the degree of freedom used in statistics?

It is an essential idea that appears in many contexts throughout statistics including hypothesis tests, probability distributions, and linear regression.

What kind of statistical data is used for chi-square calculation?

The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample. 

References

Lancaster, H. O., & Seneta, E. (2005). Chi‐square distribution. Encyclopedia of biostatistics, 2.

Wilson, E. B., & Hilferty, M. M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences, 17(12), 684-688.

Get 1-on-1 online AP Statistics tutor