Gustav Brock
Gustav at cactus.dk
Tue Feb 28 07:41:07 CST 2006
Hi all Anyone having a need to calculate quartiles? The few methods I've found assumes calculation on one full table which I think doesn't mimic real life except if you create a temp table. Thus I have made the function below to take a filter argument which makes it much more versatile. I cannot find a way to do this fast as you need a function to browse the recordset. You can test this out with the Northwind demo base. First, create a query, qdyCustomerYearLineQuantity SELECT Country, Year([OrderDate]) AS OrderYear, Quantity FROM (Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID) INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID; And another, qdyCustomerOrderYear: SELECT Country, Year([OrderDate]) AS OrderYear FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID GROUP BY Country, Year([OrderDate]); Now create a query to calculate quartiles on quantity per customer's country: SELECT Country, OrderYear, Quartile5("qdyCustomerYearLineQuantity","Quantity",1,"Country='" & [Country] & "' And OrderYear=" & [OrderYear] & "") AS Quartile1, Quartile5("qdyCustomerYearLineQuantity","Quantity",2,"Country='" & [Country] & "' And OrderYear=" & [OrderYear] & "") AS Quartile2, Quartile5("qdyCustomerYearLineQuantity","Quantity",3,"Country='" & [Country] & "' And OrderYear=" & [OrderYear] & "") AS Quartile3, FROM qdyCustomerOrderYear; Country OrderYear Quartile1 Quartile2 Quartile3 Canada 1994 11,5 22 37 Canada 1995 15 20 42 Canada 1996 9 18 25 Denmark 1994 8,25 12 17,25 Denmark 1995 18 25 35 Denmark 1996 15 25 30 The function looks like this: <code> Public Function Quartile5( _ ByVal strTable As String, _ ByVal strField As String, _ ByVal bytQuartile As Byte, _ Optional ByVal strFilter As String) _ As Double ' strTable : Name of the table/query to analyze. ' strField : Name of the field to analyze. ' bytQuartile: Which min/max or median/quartile to calculate. ' strFilter: Optional filter expression. ' ' Data must be in ascending order by strField. ' Constant values and return values mimic those of ' Excel's Quartile() function. ' ' Returns: ' Minimum, maximum, median or upper/lower quartile ' of strField of strTable filtered on strFilter. ' ' Reset function (clear Static dbs) by the call: ' Call Quartile5("", "", 0) ' ' 2006-02-27. Cactus Data ApS, CPH. ' Find median. Const cbytQuartMedian As Byte = 2 ' Find lower (first) quartile. Const cbytQuartLow As Byte = 1 ' Find upper (third) quartile. Const cbytQuartHigh As Byte = 3 ' Find minimum value. Const cbytQuartMinimum As Byte = 0 ' Find maximum value. Const cbytQuartMaximum As Byte = 4 ' Define default operation. Const cbytQuartDefault = cbytQuartMedian Static dbs As DAO.Database Dim rst As DAO.Recordset Dim strSQL As String Dim lngNumber As Long Dim dblPosition As Double Dim lngPosition As Long Dim dblInterpol As Double Dim dblValueOne As Double Dim dblValueTwo As Double Dim dblQuartile As Double ' Use default calculation if choice of calculation is outside range. If bytQuartile > 4 Then bytQuartile = cbytQuartDefault End If If dbs Is Nothing Then Set dbs = CurrentDb() End If If Len(strTable) > 0 And Len(strField) > 0 Then strSQL = "SELECT [" & strField & "] FROM [" & strTable & "] " strSQL = strSQL & "WHERE ([" & strField & "] Is Not Null) " If Len(strFilter) > 0 Then strSQL = strSQL & "AND (" & strFilter & ") " End If strSQL = strSQL & "ORDER BY [" & strField & "];" Set rst = dbs.OpenRecordset(strSQL) With rst If Not .EOF = True Then If bytQuartile = cbytQuartMinimum Then ' No need to count records. lngNumber = 1 Else ' Count records. .MoveLast lngNumber = .RecordCount End If Select Case bytQuartile Case cbytQuartMinimum ' Current record is first record. ' Read value of this record. Case cbytQuartMaximum ' Current record is last record. ' Read value of this record. Case cbytQuartMedian ' Locate position of median. dblPosition = (lngNumber + 1) / 2 Case cbytQuartLow dblPosition = (lngNumber + 3) / 4 Case cbytQuartHigh dblPosition = (3 * lngNumber + 1) / 4 End Select Select Case bytQuartile Case cbytQuartMinimum, cbytQuartMaximum ' Read current row. Case Else .MoveFirst ' Find position of first observation to retrieve. ' If lngPosition is 0, then upper position is first record. ' If lngPosition is not 0 and position is not an integer, then ' read the next observation too. lngPosition = Fix(dblPosition) dblInterpol = dblPosition - lngPosition If lngNumber = 1 Then ' Nowhere else to move. ElseIf lngPosition > 1 Then ' Move to record to read. .Move lngPosition - 1 End If End Select ' Retrieve value from first observation. dblValueOne = .Fields(0).Value Select Case bytQuartile Case cbytQuartMinimum, cbytQuartMaximum dblQuartile = dblValueOne Case Else If dblInterpol = 0 Then ' Only one observation to read. dblQuartile = dblValueOne Else .MoveNext ' Retrieve value from second observation. dblValueTwo = .Fields(0).Value ' For positive values interpolate between 0 and dblValueOne. ' For negative values interpolate between 2 * dblValueOne and dblValueOne. ' Calculate quartile using linear interpolation. dblQuartile = dblValueOne + dblInterpol * CDec(dblValueTwo - dblValueOne) End If End Select End If .Close End With Else ' Reset. Set dbs = Nothing End If Set rst = Nothing Quartile5 = dblQuartile End Function </code> The function can easily be converted to ADO but that runs even slower. Comments are most welcome. /gustav