[AccessD] Progressive SQL Update Query

Wed Feb 22 15:33:52 CST 2023

Ryan,

I've inherited responsibility for a database that includes the processing
of lab results.  The way it works now reflects the novice developer's
familiarity with spreadsheets and I hate the underlying complexity and lack
of commenting.  For years I've been pondering how it could have been done
better.  Unfortunately the existing system isn't broken so there is no
budget to start over.

As in your explanation I also see some data points deemed to be untrusted
and a repeat set of tests being scheduled (occasionally more than once).
Each retest provides another group of raw measurements that are
mathematically combined into a set of meaningful result data points.  Any
of those data points may or may not be deemed more or less trustworthy than
in the previous test.
To my mind, it would make sense to import the entire retest results into
the database.  That enables results to be visualised in the database and an
assessment made of data trustworthiness at that point.  A simplistic view
would suggest that an entire set of results be deemed the most trustworthy
for a particular sample.  As you describe, however, lab technicians would
prefer to choose some data points from one result set and other data points
from another data set, perhaps avoiding the need to redo the entire set of
tests yet again.  This is probably the day the equipment is performing
poorly and they are falling behind schedule.

What I've never figured out is the criteria for the lab technician deeming
a result to be trustworthy or untrustworthy.  If the result simply didn't
meet the customer's specification then retesting until a result passed
would amount to cooking the data.  Any attempt to automate a decision on
data point trustworthiness is surely going to encourage the selection of
bad data points that erroneously represent the product as meeting the
required specification when the truth is that it failed to meet the spec.
IMHO it might be best to let the lab technician decide what data he trusts
- his reputation is on the line.

As far as marking the records is concerned, you need to know which result
set for a sample has the best chromium result, which has the best nickel
result, etc.  Adding a column for each data point that gives it a
trustworthiness rank (which should be unique per sample) would work - you
want to choose the most trustworthy data point for each test for each
sample.  That looks like a lot of work.

Another thought I've had relies on there being few samples with duplicate
test results.  You only need to mark duplicate results with trustworthiness
so perhaps have a separate trustworthiness table that links to the results
table.  Entries in your trustworthiness table could be used to determine
which results are bad and/or which are best.  Presumably the most recent
linked trustworthiness record should take precedence over any older entries
that might be in conflict.  That might be the simplest data structure to
handle the complex situation.  Perhaps the down side is a complex query
to extract the most trustworthy data points for each sample.

Perhaps you wanted a simple answer.  I doubt we are asking ourselves a
simple question!

Paul Wolstenholme

On Thu, 23 Feb 2023 at 04:03, Ryan W <wrwehler at gmail.com> wrote:

> Well, I can't share all the code there...
>
> So we take the data off the instrument (in a csv or txt file) depending on
> the instrument, our Access FE parses it and sends it up.
>
> Every time a sample is imported from the flat file, every analyte that the
> client requests is ON by default, it's up to the analyst to discern whether
> the data is usable, hence why some things get run multiple times.
>
> The goal is so have a progressive/cascading effect. Sample 1 comes in,
> Chromium and Nickel get turned off, Sample 1 (#2) comes in, user hits
> toggle and Chromium and Nickel are all that are on.  User realises Chromium
> needs another look so Sample 1 #2 comes in, user hits toggle and ideally
> only Chromium gets turned on for #3.   As stated the current query resets
> the second run, there could be a third and fourth but I think the way the
> query is currently written anything past 2 is basically dead without hand
> holding from the analyst.
>
> Here's the code that basically aggregates the SEL list for the samples:
>
> ;WITH Analytes AS (
>    SELECT MAX(CAST(arsr.Rpt AS INT))   AS MOA --Rpt is a bit field, the
> value is either 1 or 0
>          ,arsr.Analyte
>          ,ars.TestNum --the testnum that contains the master list of
> requested analytes
>          ,tars.SeqNo
>    FROM   tmpAnalRunSeq                AS tars --staging table for data the
> user is working on, to pare down the dataset
>           INNER JOIN AnalRunSeq        AS ars
>                ON  ars.TestNum = tars.TestNum
>           INNER JOIN AnalRunSeqResult  AS arsr
>                ON  ars.SeqNo = arsr.SeqNo
>    WHERE  tars.SampType = 'SAMP'
>           AND tars.WSID = @MyWSID
>           AND tars.AnalDate > ars.AnalDate
>           AND arsr.SEL = 1
>           AND arsr.AnalyteType IN ('A' ,'C')
>    GROUP BY
>           arsr.Analyte
>          ,ars.TestNum
>          ,tars.SeqNo
> )
>
> UPDATE arsr
> SET    arsr.Rpt = 'False'
> FROM   AnalRunSeqResult AS arsr
>       INNER JOIN tmpAnalRunSeq AS tars
>            ON  tars.SeqNo = arsr.SeqNo
>       INNER JOIN Analytes
>            ON  Analytes.Analyte = arsr.Analyte
>            AND Analytes.SeqNo = arsr.SeqNo
> WHERE  arsr.AnalyteType IN ('A' ,'C')
>       AND arsr.SEL = 'True'
>       AND arsr.Rpt = 'True'
>       AND tars.Validated = 'False'
>       AND tars.WSID = @MyWSID
>       AND Analytes.MOA = 'True'
>
>
>
> The way this query is written is it's turning OFF what's already ON
> (previously) since everything is ON as a default.
>
> I was using a correlated subquery in the update statement to find
> arsr.Analyte IN (SELECT DISTINCT Analyte FROM AnalRunSeqResult where
> TestNum = tars.TestNum and tars.WSID = @MyWSID and arsr.Rpt = 'True') but I
> re-wrote it to use a CTE because it felt a little easier to read (to my
> eyes)
>
>
>
> On Wed, Feb 22, 2023 at 8:50 AM James Button via AccessD <
> accessd at databaseadvisors.com> wrote:
>
> > Maybe post the SQL that sets up the entry set for the sampling.
> > And then that that manipulates the first sampling needs.
> > And then that that manipulates the second needs.
> > And then that that manipulates the third (and final?) sampling needs.
> >
> > As well as whatever script is used to generate the sampling process
> > request from
> > the data.
> >
> > Without that sort of detail your post is pretty much a statement along
> the
> > lines
> > of:
> >
> > "Somebody has made at least one error in the design and/or scripting."
> >
> > JimB
> >
> >
> > -----Original Message-----
> > From: AccessD
> > <accessd-bounces+jamesbutton=blueyonder.co.uk at databaseadvisors.com> On
> > Behalf Of
> > Ryan W
> > Sent: Wednesday, February 22, 2023 2:36 PM
> > To: Access Developers discussion and problem solving
> > <accessd at databaseadvisors.com>
> > Subject: Re: [AccessD] Progressive SQL Update Query
> >
> > https://i.imgur.com/34g0Srj.png
> >
> >
> > Here is an image of an example I was working on yesterday:
> >
> > Sample 1 had Chromium and Nickel off
> >
> > So hitting "toggle" turned Chromium and Nickel ON on #2 (accurate)
> >
> > And subsequently everything went OFF for Sample #3 (also accurate based
> on
> > current query logic)
> >
> > The user then toggled Chromium off on Sample #2 and hit Toggle and it
> > turned his Chromium back on for #2, and #3 was still all off.
> >
> > I guess the gist of it is if the list is hand manipulated it resets the
> > list and anything > run #2 gets entirely turned off.
> >
> >
> >
> >
> >
> > On Wed, Feb 22, 2023 at 8:09 AM Ryan W <wrwehler at gmail.com> wrote:
> >
> > > I'm not sure how to explain this without explaining poorly or not
> getting
> > > the message right.
> > >
> > > We have data we get from clients and they ask us to analyze them for
> > trace
> > > metals,
> > > so the client list wants to know about 10 trace metals out of 30.
> > >
> > > How it currently works is we run the analysis and import the data, our
> > > software pulls it in and matches the requested list to the analysis and
> > > turns off all trace metals not requested.
> > >
> > > Sometimes one (or more) of the metals requires re-analysis for whatever
> > > reason. So the user toggles the 'bad' hits out of the sequence. (say
> they
> > > turn off Lead and Copper)
> > >
> > > So we re-run it and import it and if the data is good, they turn off
> > > everything BUT Lead and Copper.
> > >
> > > In some cases, we have a third analysis. (sometimes we have to dilute
> the
> > > sample to get a good reading)...
> > >
> > > So say in that second analysis, Copper was not good. So they turn it
> > off..
> > > the third Analysis gets copper ONLY turned on for reporting.
> > >
> > >
> > > Right now the toggling is all done by the user.
> > >
> > > I want to write a progressive query that looks at the aggregated
> analysis
> > > for anything requested (SEL), that's OFF and turn it ON (or vice versa)
> > on
> > > the following analysis (this works).
> > >
> > > However, if there's a third analysis in the sequence everything ends up
> > > turned off because Analysis 1 and 2 meet the requested analysis.
> > >
> > > So in this example:
> > >
> > > Sample 1's list contains all trace metals, but lead and copper are off
> > > Sample 2's list after hitting "toggle" button contains lead and copper
> > and
> > > everything else is off.
> > > Sample 3 is completely off because the aggregated list of required
> trace
> > > metals are satisfied by 1 and 2.
> > >
> > > However Sample 2 needs user intervention and turns off Copper.  I want
> > > them to be able to hit "Toggle" again and turn ON Copper for Sample 3,
> > but
> > > leave Copper OFF on Sample 2, even though Sample 1 doesn't satisfy the
> > > copper requirement.
> > >
> > > My query logic works fine if there's only 1 rerun, but if there's more
> > > than one is where the problem lies.
> > >
> > >
> > > I don't know if I need to use a cursor or a recordset and a looping
> > > mechanism to make this work instead of a straight up batch query?
> > >
> > >
> > >
>