Key Terminology for Statistics
Introduction.
Statistics is the scientific use of numbers to get interpretation for interesting events. It is a very useful tool in business because its knowledge is used in knowing where adverts should be made. This is possible since the business men have used their knowledge in statistics to know where their potential consumers are. In addition, business strategies are reconstructed based on the available results from a continuous change in the interests and choices of consumers thereby pulling more profits for their market.
-
Additional rule-This is the formula used to find out the probability that either an event or another event under study will happen. If the event are A and B, then it is defined thus: P (AUB) = P (A) + P (B). Where P (A) is the probability that A occurs, P (B) is the probability that B occurs and P (AUB) is the probability of A or B occurring.
-
Alternative hypothesis- This is the assumption held after the null hypothesis has been stated. It is second to the null hypothesis. In fact, it is a test that is carried out in order to accept or reject the null hypothesis
-
Bar chart: This is a means of representing the results obtained fro an experiment by means of rectangles. It has two axes. The vertical axis shows the number of outcomes of events. The widths of the bars/rectangles are all the same and the bars are equally placed apart from each other.
-
Baye’s theorem- This is written thus: {P (A/B) =P (B/A).P (A)}/P (B). Where P(B/A) is the conditional probability of B given A, P(A/B) is the conditional probability of A given B, P(A) is the probability of A which is independent of the value of B and P(B) is the probability of B which is also independent of the value of A
-
Box and whisker plot- This is a graphical representation that defines the data collected from an experiment. It provides information such as the mode, median and the interquartile range. It is normally used to represent the 5-Number summary.
-
Bias- This term is used to qualify a given result. A result is said to be biased if it is not within the actual standards.
-
Categorical data- This refers to a set of data that can be differentiated. For example, in finding the number of boys in Harvard University , the data collected can be subdivided into the number of boys in first year up to the final year.
-
Cluster sampling- This is a technique used in sampling when there is a large population. The large population is divided into smaller groups and a random selection is done on a selected group(s) to analyze a hypotheses.
-
Composite hypothesis- This is a hypothesis that has no definite or clear boundaries.
-
Coefficient of variation- Like the name implies, it is a percentage representation of how far or how close the results of an event are to the mean.
-
Confidence interval- This is a computation for a given set of data to give information about their outcomes. It is basically used when finding information for an unknown parameter.
-
Conditional probability- This is the likeliness of an event occurring given added information.
-
Confidence interval for mean- This is a set of data within which the calculation for the mean is to be made. Its width determines how close or how far the results are correct to the mean.
-
Confidence interval for mean for the difference between two means- This is a set of data within which the calculation for the difference between two means of different populations is to be made.
-
Confidence level- This is a way of expressing the probability of an event from its confidence interval.
-
Confidence limits- These are the upper and lower limits of a confidence interval.
-
Critical value- This is the value that must be held in considering the validity of a null hypothesis.
-
Correlation coefficient- It is an assigned number to determine the extent of linear relationship between two variables. The number is usually between -1 and +1 to signify negative and positive slopes respectively. If the value is zero, it means that there is no linear relationship between the two variables.
-
Continuous data- This is a set of data that can be measured. The measurements can attain a definite or continuous value. An example is the time taken for radioactive uranium to reach its half life.
-
Discrete data: This is data that is clear and easily understood. For example, the number of girls and boys in a classroom is discrete because the number gotten can be separated.
-
Dispersion- This is the difference in the values of a given experiment.
-
Dot plot- This is used to explain the outcome of events from a set of nominal or ordinal data. In the case of a set of nominal data, it looks like a bar chart when the rectangles have been replaced with dots while for continuous, the dots are closely related, making the graph look like a histogram.
-
Estimate- These are values based on similar or previous occurrences. They are normally rough working values since they have not been proved to be the actual values.
-
Estimation- This is the process of choosing the right values for determining the right value or range of values of a parameter.
-
Estimator- This is the result obtained from a sample data that is to be used in achieving the result of a parameter in the entire population
-
Event- This is a case to be studied while analyzing a sample space.
-
Expected value- This is the average value of a distribution that tends top define the general characteristic of the set of data.
-
Experiment- This is the activity carried out in collecting a set of data.
-
Experimental (or sampling) unit- This is a particular object on which the researcher does his experiment. For example, in the experiment of knowing how many boys speak English fluently in Nigeria , a particular person who is being tested on his fluency in speaking English language is the experimental unit.
-
Frequency table- This is a tabular arrangement of groups and the number of times an event under each group occurs.
-
Histogram- This is a means of representing the results obtained fro an experiment by means of rectangles. It has two axes. The vertical axis shows the number of outcomes of events. The widths of the bars/rectangles are all the same and the bars are placed together (each touching its neighbor).
-
Independent events- These are events that do not relate to one another. Their probability of occurrence gives no insight to the probability of the occurrence of the other ones.
-
Independent samples- These are samples collected from a population that exists on their own.
-
Interval scale- This is a scale that has equal distributions of points. For example, the widths of the rectangles of a bar chart are drawn on an interval scale.
-
Interquartile range- This is the difference between the upper and the lower quartile. It measures the differences in the values of the data collected.
-
Law of total probability- This states that the prior probability of an event is the same as the prior expected value of the probability of that same event when given added information.
-
Least squares- This is the method used in finding the roots of a straight line equation.
-
Matched samples- Samples are matched because of measurements taken more than once or on the basis of knowing a particular phenomenon of two closely related objects.
-
Median- This is the middle value of a given set of data when arranged in an increasing or decreasing order.
-
Mode- This is the highest occurring number in a set of given data.
-
Multiple regression- This is a test used to analyze the various independent variables in a set of data at the same time.
-
Multiplication rule- This is the method used in determining the probability that two events can occur at the same time.
-
Mutually exclusive events- This is the impossibility of two events occurring at the same time. For example, the probability that a ball is red and at the same time, black is mutually exclusive.
-
Nominal data- This is a set of data that is usually coded with letters or numbers because of the difficulty to order or measure them. They can however be counted.
-
Non linear regression- This is the method used to find the best curve that fits any equation as long as it can be defined by its constituent variables.
-
Null hypothesis- This is an assumption made. It is like a theory that has to undergo some tests to know how true it is.
-
Ordinal data- This is a set of data that can be counted or ordered. They however cannot be measured.
-
Outcome- This is the result gotten from an experiment.
-
Outlier- This is a given value in a set of data that is different from the other values in its collection.
-
Parameter- This is an unknown value for a character on a given population.
-
Pearson’s product moment correlation coefficient- This is a correlation coefficient that determines the linear relationship between similar units such as grams and kilograms, centimeter and meter.
-
Percentile- This is the value that successfully divide the given set of data represented into one hundred equal parts
-
Pie chart- This is a form of analyzing data in a circle. The larger the angles assigned to a given outcome, the higher its frequency.
-
Point of averages- This is a point on the graph plotted for two variables where the point’s coordinates are the values of the mean for the two variables.
-
Population- This is the entire group where experiments will be carried on.
-
Precision- This is the measure of the closeness of an estimator to the actual value of the parameter.
-
Principle of insufficient reason- This states that if there is no proof that the results of an experiment are not the same, then it should be taken that the results of the experiment are actually the same. This statement is a kind of fallacy called appeal to ignorance.
-
Probability- This is the likeliness of an event occurring or not occurring.
-
Probability distribution- This is the probability of all the discrete variables in a group of study.
-
Qualitative variable- This is a variable that is also an adjective. Sometimes, they are used to qualify the various data collected. Examples are red, small and large. In an analysis, one can simply say the reds, the blacks or the whites referring to a particular kind of ball having different colors.
-
Quantile- This is the equal divisions made on a representation of outcomes (may be a curve) for a better analysis to be done.
-
Quartile- This is the value that successfully divide the given set of data represented into four equal parts.
-
Quintile- This is the value that successfully divide the given set of data represented into five equal parts.
-
Quota sampling- This means selecting individuals from a defined group. It is different from random sampling in the sense that the sample from which the individuals will be selected is already known. For example, in selecting one million people who live in Africa, if five hundred thousand people must come from Nigeria , then this number is the quoted sample.
-
Random sampling- This is the collection by chance of samples from a population. The chances are however not equal.
-
Random variable- This is a particular number noted for the outcome of events. The random variable is used when the outcomes of the events are not associated with numbers. For example, for a bouncing ball, if the number of times its height gets above three meters is five, then the random variable can be given as five (5).
-
Range- This is the difference between the highest and the lowest results obtained from an experiment.
-
Regression equation- This is an algebraic representation of the relationship between two or more variables.
-
Regression line- This is a line drawn from the results obtained after computing the regression equation for a set of given data.
-
Relative frequency- When the number of times an event occurs is divided by the number of times an experiment was conducted, the result is what is called the relative frequency.
-
Sample- This is the group collected from an entire population for experimental analysis.
-
Sample mean- This is an estimator for the mean of a population because it is the mean of the population a group collected from the population. The mean is the average of the given set of data.
-
Sample space- This is the entire result that can be obtained from experimenting on a given sample.
-
Sample variance- This is the variance of a group collected from a given population. It is an estimator for the variance of the population.
-
Sample size- This is the number of elements that make up a given sample.
-
Sampling distribution- This is a method used to draw conclusions of a statistic of interest (the mean for example) for a population. In the case (of the mean), an analysis for the value of the mean is first done on various samples from the population.
-
Sampling variability- This is the difference in value obtained when a calculated parameter is related among samples of the same population.
-
Scatter plot- This is the representation of two variables with points or dots to show how the values of the variables relate.
-
Simple hypothesis- This hypothesis is aimed at defining the character of the population.
-
Simple linear regression- This is the method of finding the linear relationship between a response variable and a possible predictor by computing the values in a set of data by using the least squares method.
-
Simple random sampling- This is the collection by chance of samples from a population. The chances are equal at any time.
-
Simpson’s paradox- This states that it is possible to have different results of the same statistic for a population and some samples selected from the same population.
-
Sound argument- A sound argument is one whose logic is valid and its premise is true.
-
Spatial sampling- This kind of sampling is done when data is collected from more than one group.
-
Standard error- For a sample whose total number of elements is f, the standard error can be defined as the standard deviation of the sample divided by the square root of the sample size, f.
-
Standard deviation- This is a statistic that shows how a given set of data is spread around the mean. The larger its value, the larger the spread. Similarly, the smaller its value, the smaller the spread.
-
Statistic- This is a parameter calculated in a sample drawn from a population. It could be the mean or median of a given data from a sample.
-
Statistical inference- This is the result gotten from a sample of a given population in order to give its value as a result to its population.
-
Stem and leaf plot- This is a drawing to scale of data to give a quick and detailed summary of the characteristics.
-
Stratified sampling- This is the method used in collecting samples when a large population is divided systematically into several groups and then choosing an element at random from each of the subdivided population.
-
Skewness- This is the non uniform distribution of data above or below the middle of a sample.
-
Subjective probability- This is a reasonable judgment on the likely outcome of an event. It is not based on an experiment.
-
Symmetry- This is the uniform distribution of data above or below the middle of a sample.
-
Target population- This is the group within which researcher is concerned in getting results and conclusions for a particular phenomenon of interest.
-
Test statistic-This is the work done to find out the value of a given quantity in a sample upon which a hypotheses has been made.
-
Transformation to normality- This is the operation done to correct positively or negatively skewed data.
-
Type I error- This is the wrong rejection of a null hypotheses when it actually should be accepted.
-
Type II error- This is the wrong acceptance of a null hypotheses when it should actually be rejected.
-
Variance-This is a positive number that explains the spread of the values of the random variables.
-
Venn diagram- This is a pictorial representation that shows how samples/categories within a population relate to one another.
-
5-Number summary- This is the representation of the maximum values, the minimum values, the upper quartile, the lower quartile and the median of a set of data.
More Statistic Resources:
Statistics Resources From the Math Archives
Math and Statistical Resources From the University of Washington
Institute of Mathematical Statistics
Math and Statistics from the National Institute of Standards and Technology
Call Tracking Services
