Advanced Statistics for Business

Non-Parametric Methods

Module 1

View Mindmap

Core Concepts

  • Statistical tests are classified into parametric and non-parametric based on assumptions about the underlying population distribution.
  • Parametric tests assume data follows a specific distribution (e.g., normal) and estimate population parameters (mean, variance).
  • Non-parametric tests are distribution-free, making them robust alternatives when parametric assumptions are violated (e.g., non-normality, outliers, small samples).
  • Non-parametric methods often utilize data ranks rather than actual values.
  • The choice depends on data characteristics; parametric tests offer parameter estimates, while non-parametric tests provide reliability when assumptions fail.
  • Both test types aid in summarizing data, identifying relationships, and testing hypotheses.

Definitions

  • Parametric Tests: Statistical methods assuming data originates from a specific probability distribution (e.g., normal) and involving the estimation of population parameters.
  • Non-Parametric Tests: Statistical methods that do not rely on assumptions about the population distribution; often called distribution-free tests.
  • Population Parameters: Numerical characteristics of a population (e.g., mean (μ), standard deviation (σ), proportion (p)).
  • Ranks: The relative position or order of data points when sorted.

Chi-Square Test of Independence

Chi-Square Test of Independence - Definition

A non-parametric method used to determine if a statistically significant association exists between two categorical variables.

Chi-Square Test of Independence - Key Insights

  • Tests the null hypothesis (H0) that the two variables are independent against the alternative hypothesis (H1) that they are dependent.
  • Compares observed frequencies (from sample data in a contingency table) with expected frequencies (calculated assuming independence).
  • The test statistic quantifies the discrepancy between observed and expected counts.
  • Decision (reject/fail to reject H0) is based on comparing the test statistic to the chi-square distribution with (rows-1) * (columns-1) degrees of freedom.
  • Continuous variables must be discretized (grouped into categories) before applying this test.
  • Independence implies that the occurrence of one variable's category does not influence the probability of the other variable's category occurring.

Chi-Square Test of Independence - Examples

  • Examining if airline ticket class (economic, business, first) is associated with travel type (domestic, international).
  • Assessing if salary distributions differ significantly across various company departments.
  • Testing if employee satisfaction level is independent of categorized average monthly hours worked.

Chi-Square Test of Independence - Formula

  • Chi-Square Statistic (χ²): ∑ [(Oᵢⱼ - Eᵢⱼ)² / Eᵢⱼ]
    • Oᵢⱼ = Observed frequency in cell (row i, column j)
    • Eᵢⱼ = Expected frequency in cell (row i, column j)
  • Expected Frequency (Eᵢⱼ): (Row i Total * Column j Total) / Grand Total

Course image

Goodness of Fit Test

Goodness of Fit Test - Definition

A statistical test to assess how well observed sample data fits a hypothesized theoretical probability distribution (e.g., uniform, Poisson, normal).

Goodness of Fit Test - Key Insights

  • Compares the observed frequency distribution from the sample against the expected frequency distribution derived from the hypothesized theoretical distribution (H0).
  • The alternative hypothesis (H1) states the data does not follow the specified distribution.
  • Requires grouping continuous data into intervals/categories to obtain observed frequencies for comparison.
  • Expected frequencies are calculated based on the probability of each category under the assumed distribution.
  • Uses a chi-square test statistic to measure the discrepancy between observed and expected frequencies.

Goodness of Fit Test - Examples

  • Testing if employee satisfaction scores are uniformly distributed between 0 and 1.
  • Determining if the number of projects assigned per employee follows a Poisson distribution.
  • Validating if historical sales data conforms to a specific distribution to inform forecasting models.

Goodness of Fit Test - Formula

  • Chi-Square Statistic (χ²): ∑ [(Observed Frequency - Expected Frequency)² / Expected Frequency] (Summation is across all categories/intervals)

Course image

Wilcoxon Signed Rank Test

Wilcoxon Signed Rank Test - Definition

A non-parametric test used for one-sample or paired two-sample scenarios to assess hypotheses about the median.

  • One-Sample: Tests if the sample median differs significantly from a specific hypothesized median value.
  • Paired Two-Sample: Tests if the median of the differences between paired observations is significantly different from zero.

Wilcoxon Signed Rank Test - Key Insights

  • Does not assume data follows a normal distribution; suitable for non-normal data or small sample sizes.
  • Based on ranking the absolute values of the differences (observation vs. hypothesized median, or paired differences).
  • Uses the signs (+/-) of the original differences, applied to the ranks.
  • The test statistic (W) is typically the sum of positive ranks (W+) or negative ranks (W-).
  • If H0 (no difference in medians) is true, W+ and W- are expected to be similar.
  • For large samples (n > 30), the test statistic (W+) approximates a normal distribution, allowing the use of a Z-score.

Steps for Calculation

  1. Compute differences: (Observation - Hypothesized Median) or (Observation Pair 1 - Observation Pair 2).
  2. Discard zero differences.
  3. Rank the absolute values of the non-zero differences, assigning average ranks for ties.
  4. Apply the original sign (+ or -) of the difference to its corresponding rank.
  5. Sum the ranks with positive signs (W+) and/or the ranks with negative signs (W-).

Formula (Large Sample Approximation, n > 30)

  • Mean of W+: μ(W+) = n(n + 1) / 4
  • Z-Score: Z = (W+ - μ(W+)) / Standard Error(W+) (Note: The specific formula for Standard Error(W+) depends on ties and is derived from the properties of ranks, but its detailed calculation was not provided in the source summary.)

Course image

Conclusion

Parametric and non-parametric tests offer distinct approaches to hypothesis testing, differentiated primarily by their assumptions regarding data distribution. While parametric tests estimate population parameters assuming specific distributions, non-parametric methods like the Chi-Square tests (for independence and goodness of fit) and the Wilcoxon Signed Rank test provide robust, distribution-free alternatives. These non-parametric techniques are particularly valuable for analyzing categorical data, assessing distributional fit, or comparing medians when parametric assumptions are untenable, offering flexible tools for data analysis in diverse real-world scenarios.

Course image