Geography 370 Assignment 2
Part I. Hand Calculations of Data
Defining and calculating: range, mean, median, mode, kurtosis, skewness, and standard deviation.
Calculate each in terms of each cycling team: Team ASTANA times vs. Team TOBLER times.
Which one should be invested in, depending on their statistics? Why?
Definitions:
Range: The range of data is simply the variation, or difference between a set of values. It is found by subtracting the highest and smallest observations in a dataset.
Mean: The mean is also known as the average of the values in a dataset. The mean can be considered the central point of the dataset, with all of the points taken into consideration. The mean is a standard calculation but can be greatly affected by what are called "outliers" in the data. Outliers are values that don't fit in with the rest of the data trend, but they still are taken into account when finding the mean. In order to find the mean, all values must be added together and then divided by the number of values that were taken.
Median: The median of a dataset is the middle value of all of the observations taken. To find this, it is the middle number when the data is listed from smallest to largest. If there are an even number of values, one must take the two values in the middle and divide it by two to find the true median.
Mode: The mode of a dataset is the value that appeared the most throughout the dataset. If every number occurs once, there is no mode.
Kurtosis: In a mean probability curve of a dataset, the kurtosis is how the peak of the curve changes. If the curve is fairly flat, the kurtosis is labeled as being platykurtic. When measuring kurtosis, a platykurtic number would be negative. A positive number would be a curve that is very peaked and labeled as a leptokurtic kurtosis. A normal curve would have a mesokurtic kurtosis.
Skewness: Skewness relates a lot to the mean and gets greatly affected by outliers in a dataset. Skewness is the distribution of the probability curve and how that curve deviates from the observation average. When the tail of the curve is elongated towards the left of the graph, it has a negative skewness. When the tail elongates towards the right, it is a positive skewness. If the curve fits perfectly into the normal distribution, there is no skewness. Usually, there is a slight skewness and it is acceptable for that number to be between -1 and 1. If it is a positive number, it is a positive skewness, and vice versa for a negative.
Standard Deviation: The standard deviation basically in a way pulls all of the things listed above into a single number value. It tells the reader how all of the observations group together and collect around the average in a dataset. The lower the number, the more observations are clustered around the mean, meaning a more leptokurtic kurtosis.
Problem set and Calculations:
Team ASTANA times vs. Team TOBLER times
Racer A: 2284 2295
Racer B: 2290 2293
Racer C: 2280 2289
Racer D: 2276 2289
Racer E: 2285 2290
Racer F: 2280 2264
Racer G: 2272 2291
Racer H: 2250 2281
Racer I: 2272 2290
Racer J: 2255 2289
Racer K: 2315 2276
Racer L: 2278 2284
Racer M: 2281 2279
Racer N: 2245 2286
Racer O: 2287 2286
Range
Astana = 70, Tobler = 31
Mean
Astana = 2276.67, Tobler = 2285.47
Median
Astana = 2280, Tobler = 2289
Mode
Astana = two modes of 2272 & 2280, Tobler = 2289
Kurtosis
Astana = 1.17, Tobler = 2.93
Skewness
Astana = -0.003, Tobler = -1.564
Standard Deviation
Astana = 16.63, Tobler = 7.62
Calculations for standard deviation:

Conclusion: Looking at these statistics, I would overall go with the Astana team if I were investing my team in their bicyclists. Though the Tobler team has a lower standard deviation, meaning that they all have very close and similar times that they finish each race, and they have a smaller range, Astana is the better solution when looking at other measurements. For example, the mean average times that the Astana team finish is an overall lower time in minutes. They also have a smaller median and mode, meaning that their middles and most occurrences happen in smaller timeframes. This is a positive when looking at how each team will compare in the amount of minutes it takes for them to finish a race. Astana also has a smaller kurtosis and skewness, meaning that it fits the guidelines of a normal distribution and is relatively mesokurtic, a plus when looking at the consistency of the team. It is for these reasons with statistical facts backing me that I would choose the Astana team when betting on the better bicyclist team.
Part II. Calculating Mean Centers and Weighted Mean Centers
Mean Center :The mean center of a dataset is an average that is calculated from the different x and y coordinates of the values being averaged. In this case, the x and y values are essentially latitude and longitude coordinates. The average is then attached to a Cartesian plane and laid out on that plane. Below me is a graph showing the average mean center of Wisconsin's population.
Weighted Mean Center: Unlike the average mean center, a weighted mean center takes into account the clustered groups of data and thus changes its placement depending on those clusters.
Defining and calculating: range, mean, median, mode, kurtosis, skewness, and standard deviation.
Calculate each in terms of each cycling team: Team ASTANA times vs. Team TOBLER times.
Which one should be invested in, depending on their statistics? Why?
Definitions:
Range: The range of data is simply the variation, or difference between a set of values. It is found by subtracting the highest and smallest observations in a dataset.
Mean: The mean is also known as the average of the values in a dataset. The mean can be considered the central point of the dataset, with all of the points taken into consideration. The mean is a standard calculation but can be greatly affected by what are called "outliers" in the data. Outliers are values that don't fit in with the rest of the data trend, but they still are taken into account when finding the mean. In order to find the mean, all values must be added together and then divided by the number of values that were taken.
Median: The median of a dataset is the middle value of all of the observations taken. To find this, it is the middle number when the data is listed from smallest to largest. If there are an even number of values, one must take the two values in the middle and divide it by two to find the true median.
Mode: The mode of a dataset is the value that appeared the most throughout the dataset. If every number occurs once, there is no mode.
Kurtosis: In a mean probability curve of a dataset, the kurtosis is how the peak of the curve changes. If the curve is fairly flat, the kurtosis is labeled as being platykurtic. When measuring kurtosis, a platykurtic number would be negative. A positive number would be a curve that is very peaked and labeled as a leptokurtic kurtosis. A normal curve would have a mesokurtic kurtosis.
Skewness: Skewness relates a lot to the mean and gets greatly affected by outliers in a dataset. Skewness is the distribution of the probability curve and how that curve deviates from the observation average. When the tail of the curve is elongated towards the left of the graph, it has a negative skewness. When the tail elongates towards the right, it is a positive skewness. If the curve fits perfectly into the normal distribution, there is no skewness. Usually, there is a slight skewness and it is acceptable for that number to be between -1 and 1. If it is a positive number, it is a positive skewness, and vice versa for a negative.
Standard Deviation: The standard deviation basically in a way pulls all of the things listed above into a single number value. It tells the reader how all of the observations group together and collect around the average in a dataset. The lower the number, the more observations are clustered around the mean, meaning a more leptokurtic kurtosis.
Problem set and Calculations:
Team ASTANA times vs. Team TOBLER times
Racer A: 2284 2295
Racer B: 2290 2293
Racer C: 2280 2289
Racer D: 2276 2289
Racer E: 2285 2290
Racer F: 2280 2264
Racer G: 2272 2291
Racer H: 2250 2281
Racer I: 2272 2290
Racer J: 2255 2289
Racer K: 2315 2276
Racer L: 2278 2284
Racer M: 2281 2279
Racer N: 2245 2286
Racer O: 2287 2286
Range
Astana = 70, Tobler = 31
Mean
Astana = 2276.67, Tobler = 2285.47
Median
Astana = 2280, Tobler = 2289
Mode
Astana = two modes of 2272 & 2280, Tobler = 2289
Kurtosis
Astana = 1.17, Tobler = 2.93
Skewness
Astana = -0.003, Tobler = -1.564
Standard Deviation
Astana = 16.63, Tobler = 7.62
Calculations for standard deviation:
Conclusion: Looking at these statistics, I would overall go with the Astana team if I were investing my team in their bicyclists. Though the Tobler team has a lower standard deviation, meaning that they all have very close and similar times that they finish each race, and they have a smaller range, Astana is the better solution when looking at other measurements. For example, the mean average times that the Astana team finish is an overall lower time in minutes. They also have a smaller median and mode, meaning that their middles and most occurrences happen in smaller timeframes. This is a positive when looking at how each team will compare in the amount of minutes it takes for them to finish a race. Astana also has a smaller kurtosis and skewness, meaning that it fits the guidelines of a normal distribution and is relatively mesokurtic, a plus when looking at the consistency of the team. It is for these reasons with statistical facts backing me that I would choose the Astana team when betting on the better bicyclist team.
Part II. Calculating Mean Centers and Weighted Mean Centers
Mean Center :The mean center of a dataset is an average that is calculated from the different x and y coordinates of the values being averaged. In this case, the x and y values are essentially latitude and longitude coordinates. The average is then attached to a Cartesian plane and laid out on that plane. Below me is a graph showing the average mean center of Wisconsin's population.
Weighted Mean Center: Unlike the average mean center, a weighted mean center takes into account the clustered groups of data and thus changes its placement depending on those clusters.
Comments
Post a Comment