Gustav Brock
Gustav at cactus.dk
Tue Feb 28 07:41:07 CST 2006
Hi all
Anyone having a need to calculate quartiles?
The few methods I've found assumes calculation on one full table which I think doesn't mimic real life except if you create a temp table. Thus I have made the function below to take a filter argument which makes it much more versatile.
I cannot find a way to do this fast as you need a function to browse the recordset.
You can test this out with the Northwind demo base.
First, create a query, qdyCustomerYearLineQuantity
SELECT
Country,
Year([OrderDate]) AS OrderYear,
Quantity
FROM
(Customers
INNER JOIN
Orders
ON Customers.CustomerID = Orders.CustomerID)
INNER JOIN
[Order Details]
ON Orders.OrderID = [Order Details].OrderID;
And another, qdyCustomerOrderYear:
SELECT
Country,
Year([OrderDate]) AS OrderYear
FROM
Customers
INNER JOIN
Orders
ON Customers.CustomerID = Orders.CustomerID
GROUP BY
Country,
Year([OrderDate]);
Now create a query to calculate quartiles on quantity per customer's country:
SELECT
Country,
OrderYear,
Quartile5("qdyCustomerYearLineQuantity","Quantity",1,"Country='" & [Country] & "' And OrderYear=" & [OrderYear] & "") AS Quartile1,
Quartile5("qdyCustomerYearLineQuantity","Quantity",2,"Country='" & [Country] & "' And OrderYear=" & [OrderYear] & "") AS Quartile2,
Quartile5("qdyCustomerYearLineQuantity","Quantity",3,"Country='" & [Country] & "' And OrderYear=" & [OrderYear] & "") AS Quartile3,
FROM
qdyCustomerOrderYear;
Country OrderYear Quartile1 Quartile2 Quartile3
Canada 1994 11,5 22 37
Canada 1995 15 20 42
Canada 1996 9 18 25
Denmark 1994 8,25 12 17,25
Denmark 1995 18 25 35
Denmark 1996 15 25 30
The function looks like this:
<code>
Public Function Quartile5( _
ByVal strTable As String, _
ByVal strField As String, _
ByVal bytQuartile As Byte, _
Optional ByVal strFilter As String) _
As Double
' strTable : Name of the table/query to analyze.
' strField : Name of the field to analyze.
' bytQuartile: Which min/max or median/quartile to calculate.
' strFilter: Optional filter expression.
'
' Data must be in ascending order by strField.
' Constant values and return values mimic those of
' Excel's Quartile() function.
'
' Returns:
' Minimum, maximum, median or upper/lower quartile
' of strField of strTable filtered on strFilter.
'
' Reset function (clear Static dbs) by the call:
' Call Quartile5("", "", 0)
'
' 2006-02-27. Cactus Data ApS, CPH.
' Find median.
Const cbytQuartMedian As Byte = 2
' Find lower (first) quartile.
Const cbytQuartLow As Byte = 1
' Find upper (third) quartile.
Const cbytQuartHigh As Byte = 3
' Find minimum value.
Const cbytQuartMinimum As Byte = 0
' Find maximum value.
Const cbytQuartMaximum As Byte = 4
' Define default operation.
Const cbytQuartDefault = cbytQuartMedian
Static dbs As DAO.Database
Dim rst As DAO.Recordset
Dim strSQL As String
Dim lngNumber As Long
Dim dblPosition As Double
Dim lngPosition As Long
Dim dblInterpol As Double
Dim dblValueOne As Double
Dim dblValueTwo As Double
Dim dblQuartile As Double
' Use default calculation if choice of calculation is outside range.
If bytQuartile > 4 Then
bytQuartile = cbytQuartDefault
End If
If dbs Is Nothing Then
Set dbs = CurrentDb()
End If
If Len(strTable) > 0 And Len(strField) > 0 Then
strSQL = "SELECT [" & strField & "] FROM [" & strTable & "] "
strSQL = strSQL & "WHERE ([" & strField & "] Is Not Null) "
If Len(strFilter) > 0 Then
strSQL = strSQL & "AND (" & strFilter & ") "
End If
strSQL = strSQL & "ORDER BY [" & strField & "];"
Set rst = dbs.OpenRecordset(strSQL)
With rst
If Not .EOF = True Then
If bytQuartile = cbytQuartMinimum Then
' No need to count records.
lngNumber = 1
Else
' Count records.
.MoveLast
lngNumber = .RecordCount
End If
Select Case bytQuartile
Case cbytQuartMinimum
' Current record is first record.
' Read value of this record.
Case cbytQuartMaximum
' Current record is last record.
' Read value of this record.
Case cbytQuartMedian
' Locate position of median.
dblPosition = (lngNumber + 1) / 2
Case cbytQuartLow
dblPosition = (lngNumber + 3) / 4
Case cbytQuartHigh
dblPosition = (3 * lngNumber + 1) / 4
End Select
Select Case bytQuartile
Case cbytQuartMinimum, cbytQuartMaximum
' Read current row.
Case Else
.MoveFirst
' Find position of first observation to retrieve.
' If lngPosition is 0, then upper position is first record.
' If lngPosition is not 0 and position is not an integer, then
' read the next observation too.
lngPosition = Fix(dblPosition)
dblInterpol = dblPosition - lngPosition
If lngNumber = 1 Then
' Nowhere else to move.
ElseIf lngPosition > 1 Then
' Move to record to read.
.Move lngPosition - 1
End If
End Select
' Retrieve value from first observation.
dblValueOne = .Fields(0).Value
Select Case bytQuartile
Case cbytQuartMinimum, cbytQuartMaximum
dblQuartile = dblValueOne
Case Else
If dblInterpol = 0 Then
' Only one observation to read.
dblQuartile = dblValueOne
Else
.MoveNext
' Retrieve value from second observation.
dblValueTwo = .Fields(0).Value
' For positive values interpolate between 0 and dblValueOne.
' For negative values interpolate between 2 * dblValueOne and dblValueOne.
' Calculate quartile using linear interpolation.
dblQuartile = dblValueOne + dblInterpol * CDec(dblValueTwo - dblValueOne)
End If
End Select
End If
.Close
End With
Else
' Reset.
Set dbs = Nothing
End If
Set rst = Nothing
Quartile5 = dblQuartile
End Function
</code>
The function can easily be converted to ADO but that runs even slower.
Comments are most welcome.
/gustav