Statistics Formulas

Mean

The mean (average) is calculated by summing up all the values in a dataset and then dividing the sum by the total number of values. It represents the central tendency of the data.

Formula: Mean = (Σx) / n

Where:

Mean is the average
Σx is the sum of all values in the dataset
n is the total number of values in the dataset

Median

The median is the middle value in a dataset when the values are arranged in ascending order.

If there is an even number of values, the median is the average of the two middle values.

Formula (Odd number of values): Median = Middle value

Formula (Even number of values): Median = (Value at position n/2 + Value at position (n/2 + 1)) / 2

Minimum

The minimum is the smallest value in a dataset.

Formula: Minimum = Smallest Value

Maximum

The maximum is the largest value in a dataset.

Formula: Maximum = Largest Value

Range

The range is the difference between the maximum and minimum values in a dataset. It provides a measure of the spread or variability in the data.

Formula: Range = Maximum - Minimum

Midrange

The midrange is the average of the maximum and minimum values in a dataset.

Formula: Midrange = (Maximum + Minimum) / 2

Count

The count represents the total number of values in a dataset.

Sum

The sum is the total of all values in a dataset.

Formula: Sum = Σx

Where:

Σx is the sum of all values in the dataset

Percentile

A percentile represents the value below which a given percentage of the data falls. It is often used to identify specific data points in a distribution.

Quartile

A quartile divides a dataset into four equal parts, with each part containing 25% of the data. Quartiles are often used to assess the spread of data.

Sum of Squares

The sum of squares is the sum of the squares of the differences between each data point and the mean. It is a key component in calculating variance and standard deviation.

Formula: Sum of Squares = Σ(x - Mean)²

Where:

Σ represents the summation symbol
x is each data point
Mean is the mean (average) of the dataset

Standard Deviation

The standard deviation measures the amount of variation or dispersion in a dataset. It indicates how spread out the data points are from the mean.

Formula: Standard Deviation = √(Σ(x - Mean)² / (n - 1))

Where:

√ represents the square root
Σ represents the summation symbol
x is each data point
Mean is the mean (average) of the dataset
n is the total number of values in the dataset

Variance

The variance is a measure of the spread or dispersion of a dataset. It is the average of the squared differences between each data point and the mean.

Formula (Population Variance): Variance (σ²) = Σ(x - Mean)² / N

Where:

Σ represents the summation symbol
x is each data point
Mean is the mean (average) of the dataset
N is the total number of values in the population

Note: When working with a sample of data, use the sample variance formula, which divides by (N - 1) instead of N. This correction accounts for sample bias.

Z-Score

The Z-score measures how many standard deviations a data point is from the mean in a standard normal distribution. It is used to standardize data and assess its position relative to the mean.

Formula: Z-Score = (x - Mean) / Standard Deviation

Where:

x is the data point
Mean is the mean (average) of the dataset
Standard Deviation is the standard deviation of the dataset

Interquartile Range (IQR)

The interquartile range is the range between the first quartile (Q1 - 25th percentile) and the third quartile (Q3 - 75th percentile) in a dataset. It provides a measure of the spread of the middle 50% of the data.

Formula: IQR = Q3 - Q1

Where:

Q1 is the first quartile (25th percentile)
Q3 is the third quartile (75th percentile)

Coefficient of Variation (CV)

The coefficient of variation is a relative measure of variability and is expressed as a percentage. It is used to compare the standard deviation of data to its mean, making it useful for assessing relative variability between datasets with different means.

Formula: CV = (Standard Deviation / Mean) * 100%

Skewness

Skewness measures the asymmetry of the probability distribution of a real-valued random variable. It indicates whether the data is skewed to the right or left.

A positive skew indicates that the distribution tail is skewed to the right (right-skewed), meaning there are more extreme values on the right side of the distribution.

A negative skew indicates that the distribution tail is skewed to the left (left-skewed), meaning there are more extreme values on the left side of the distribution.

Kurtosis

Kurtosis measures the "tailedness" of the probability distribution of a real-valued random variable. It indicates the presence and degree of outliers in the data.

A positive kurtosis (leptokurtic) indicates heavy tails and a peak, meaning the data has more extreme values and is more peaked than a normal distribution.

A negative kurtosis (platykurtic) indicates light tails and a flatter distribution, meaning the data has fewer extreme values and is flatter than a normal distribution.

Covariance

Covariance measures the degree to which two variables change together. It indicates whether the variables have a positive or negative linear relationship.

Formula: Cov(X, Y) = Σ((X - Mean(X)) * (Y - Mean(Y))) / (n - 1)

Where:

Σ represents the summation symbol
X and Y are variables
Mean(X) and Mean(Y) are the means of X and Y, respectively
n is the total number of observations

If the covariance is positive, it indicates a positive relationship (X tends to increase when Y increases).

If the covariance is negative, it indicates a negative relationship (X tends to decrease when Y increases).

Correlation Coefficient (Pearson's r)

The correlation coefficient measures the strength and direction of the linear relationship between two variables. It is a normalized version of covariance that ranges from -1 to 1.

Formula: r = Cov(X, Y) / (Standard Deviation(X) * Standard Deviation(Y))

Where:

Cov(X, Y) is the covariance between X and Y
Standard Deviation(X) and Standard Deviation(Y) are the standard deviations of X and Y, respectively

If |r| is close to 1, it indicates a strong linear relationship, with positive r indicating a positive correlation and negative r indicating a negative correlation. If |r| is close to 0, it indicates a weak or no linear relationship.

Last Updated : 03 October, 2024

One request?

I’ve put so much effort writing this blog post to provide value to you. It’ll be very helpful for me, if you consider sharing it on social media or with your friends/family. SHARING IS ♥️

Facebook Tweet Pin LinkedIn Print Email

Sandeep Bhandari

Sandeep Bhandari holds a Bachelor of Engineering in Computers from Thapar University (2006). He has 20 years of experience in the technology field. He has a keen interest in various technical fields, including database systems, computer networks, and programming. You can read more about him on his bio page.

Jennifer69

November 25, 2023 at 6:46 pm

This article provides a comprehensive analysis of the topic. Thank you for the insightful information.

Oholmes

December 1, 2023 at 5:11 pm

The article’s thorough examination of the topic is commendable. It provides a solid foundation for further exploration and discussion.

Ureid

December 9, 2023 at 1:13 pm

The article’s argument is compelling and well-supported. It effectively challenges existing viewpoints and opens up new avenues of thought.

Khughes

December 10, 2023 at 12:49 pm

The article presents a compelling case for reevaluating established beliefs. It challenges readers to think critically and consider new perspectives.

Katie86

December 23, 2023 at 5:18 pm

The article raises pertinent questions and encourages critical thinking on the topic. It’s a valuable contribution to the discourse.

Henry Bell

December 31, 2023 at 1:54 am

I disagree with some of the points raised in the article. It fails to address certain key aspects of the topic.

Karl Clark

December 31, 2023 at 6:51 am

The article presents a thought-provoking argument that challenges conventional wisdom. I appreciate the fresh take on the subject.

Matthew Wright

January 2, 2024 at 9:11 am

I second that. The article challenges readers to rethink their preconceived notions and engage with the complexities of the issue.

Jwalker

January 5, 2024 at 1:27 pm

I see your point, but I think the article offers a valuable perspective that shouldn’t be dismissed.

Lee Charlie

January 12, 2024 at 9:30 pm

I respectfully disagree. The article seems to overlook important counterarguments and evidence.

Xwilliams

January 14, 2024 at 2:57 am

I concur. The article’s careful analysis and logical reasoning make it a persuasive contribution to the ongoing debate.

Peter Jackson

January 18, 2024 at 1:39 am

I couldn’t agree more. The article’s depth and scope make it a valuable resource for anyone interested in the subject.

Qmartin

January 25, 2024 at 1:46 pm

The article’s thorough research and well-constructed argument make it an illuminating read. It’s a valuable addition to the discourse on the topic.

Abbie93

February 4, 2024 at 12:52 pm

I respectfully disagree. The article’s conclusions seem to overlook important nuances and alternative interpretations.

Butler Alice

February 5, 2024 at 11:58 pm

I couldn’t agree more. The article presents a compelling case for reconsidering traditional views.

Carole Kennedy

February 12, 2024 at 10:19 pm

I found the article to be a bit dry and lacking in captivating content. It could benefit from a more engaging approach to maintain the reader’s interest.

Nick Griffiths

February 18, 2024 at 3:45 pm

The article is a masterful blend of wit and wisdom. It sheds light on the topic with a touch of humor, making it an enjoyable read.

Tanya04

February 18, 2024 at 4:04 pm

I disagree. The article’s clarity and precision are its strengths, offering an intellectual depth that is often missing in similar pieces.

Wmurray

February 19, 2024 at 1:50 pm

Absolutely! The article manages to inform and entertain simultaneously, a rare and commendable feat.

Williams Caitlin

February 20, 2024 at 5:43 pm

I see your point, but I also think the article’s straightforward approach has its merits. It’s a matter of personal preference.

Similar Reads

20 thoughts on “Statistics Formulas”