
the percentile rank) as input and returns the Z-score corresponding to the percentile rank provided. In Python this can be done by using the () method that takes the AUC to the left (i.e. (assuming the Normal distributions of the populations)įirst, let’s find the lower and the upper critical values (LCV and UCV) using the Normal sample distribution (the z-scores with alpha/2 to the left and alpha/2 to the right correspondingly). 1.2 CONFIDENCE INTERVAL CALCULATIONĬonfidence level = 90% 1.2.1 CI Calculation via the Parametric Method OH Dem voting % by county (image by Gene Mishchenko)Ĭonclusion: the distributions don’t look perfectly Normal, but they are probably close enough for testing the parametric (analytical) solutions, and they are slightly positively skewed (the means are larger than the medians, the tails are longer on the right/positive side). Let’s look at the mean, median, SD and sample size for each state, as well as the histograms: print("Mean:\t\t", round(np.mean(dem_share_PA),2)) print("Median:\t\t", round(np.median(dem_share_PA),2)) print("SD:\t\t", round(np.std(dem_share_PA),2)) print("Sample Size:\t", len(dem_share_PA)) plt.hist(dem_share_PA, bins=30, alpha=0.25) plt.grid() plt.show() dem_share_PA = dem_share_OH = įor this section and for the next section of the note we will be using the same data and the same statistic for the consistency of examples - the difference between PA and OH mean voting percentages.

CI Calculation and HT - Parametrically and via Bootstrappingįor the first part we will be using the 2008 US presidential election results from the “swing states” of PA and OH, specifically, the % of voters who voted for the Democrats within each county in a given state (source: ). import numpy as np from scipy import stats from matplotlib import pyplot as plt 1. We invited some of our best friends over: NumPy, SciPy and Matplotlib… Pandas will take a day off, since the data sets that we will use are rather simple. We will develop a slightly more elaborate example, design a couple of hypothesis tests and compare the bootstrap distributions and the permutation distributions of replicated statistics side-by-side for each test. As an introduction to permutation testing (also called significance testing), we will test a hypothesis using a permutation test on the same data as in Section 1. As an introduction to bootstrapping, we will calculate confidence intervals and test some hypotheses analytically and then solve the same problems non-parametrically using the bootstrap method. In some learning materials I even encountered the statement that we can use either one, without any further explanations.

What prompted me to write this article is what I perceived to be the lack of freely available clear guidelines on this subject at the beginner level. The main goal of this article is to theoretically and experimentally compare the two resampling methods and then draw some conclusions on when each one should be used, in the scenarios when technically there is room for using either one. Theoretical and experimental comparison using Python
