Whilst the definition of Quartiles (see Wikipedia) is basically agreed upon, the calculation of the 1st and 3rd Quartiles is not – and it is tough to find even two Stats Packages that compute them the same way 🙁 Some, like R, give 6 (if I recall correctly) different ways – thus at least giving the user some control.
I have in an ADUG Article discussed how we implemented Quartile calculations in ESBPCS for VCL. I note there that our approach is different to that of Microsoft Excel but that the methodology being used was how I was taught (and how I taught it to my students) and quite a few Stats Texts did it that way as well.
So this week I, and my son Luke, were working on the Descriptive Statistics side of things for our new ESBDevLib for VCL/FMX and we were using Microsoft Excel to generate Test Data. He had recently done some Descriptive Stats at School and so going over the Delphi implementation was a good exercise for both of us 🙂 Then we got to Quartiles – and a couple of days of programming disappeared…
Since my ADUG Article, Microsoft Excel had changed from just offering the QUARTILE Function, to now offering 2 functions: QUARTILE.INC and QUARTILE.EXC (well they actually offer 3 functions, since the old QUARTILE is still available for backward compatibility but it is really the same as QUARTILE.INC). So this warranted further investigation – since over the last several years we have been adding more methods that have a similar layout to the Microsoft Excel ones (and that give similar results). More importantly, we have quite a few customers who have been keenly encouraging us to add more. However, the Microsoft Help and Website were not overly forthcoming on how these were being calculated – so lots of Google Searching.
After reading dozens of articles – and trying quite a few different approaches – most that worked with some but not all the test data, I came upon this Article on Bacon Bits – so we know offer 3 Quartile Calculations (using a TEDLQuartileType to choose, and defaulting to qtDefault). The Default one is the same way we do things in ESBPCS for VCL and also how things are done on Wolfram | Alpha (which is a great tool for generating and checking test data). We also offer qtExcelInc and qtExcelExc which generate following the approach of the two Microsoft Excel Functions.
We have also added QuartileInc and QuartileExc functions that work similar to the Microsoft ones.
Luke also pointed out to me that we should have the Outlier Fences – since given Quartile 1 and Quartile 3, we had the IQR (Inter-Quartile Range) – so we also compute the Lower and Upper Fences for Outliers.
So what was meant to be a simple port from ESBPCS for VCL to ESBDevLib for VCL/FMX that covered a couple of hours of training with my son, turned into about ten hours – but now we have quite a Robust Quartile offering, more “MS Excel like” functions, and a pile of DUnit Tests to check that we got it all working (including with null data as mentioned in the previous post).
Now I need to get Skew and Kurtosis ported – I wonder what “surprises” await 🙂