[AccessD] Progressive SQL Update Query

Wed Feb 22 15:46:43 CST 2023

Hey Paul!

The 'trustworthiness' of the data point IS selected by the Analyst. What
I'm trying to do is have a button that will take the trusted data points in
Sample 1 Run 1 and turn them off in Sample 1 Run 2 leaving the yet to
review / potentially trusted analytes, and then Sample 1 Run 3 if required,
has only the untrusted analytes in aggregate from the parent/grandparent.

It's pretty usual to see 1 rerun (so 2 sets of analytes aggregated) and
possibly 2 reruns (3 sets of analytes aggregated)

This is all in an effort to try and have the Analyst not have to mentally
manage their "toggle list" for a sample, or a sequence of samples and
subsequent reruns.  Sometimes this happens in one day and it's easier to
view, but on occasions where the reruns happen over multiple days managing
the toggle list requires bouncing between Runs (Containers of sequences).

In short: We are turning off what's already trusted and progressively
moving toward an aggregate of trusted compounds over an unknown amount of
re-analysis.

On Wed, Feb 22, 2023 at 3:34 PM Paul Wolstenholme <
Paul.W at industrialcontrol.co.nz> wrote:

> Ryan,
>
> I've inherited responsibility for a database that includes the processing
> of lab results.  The way it works now reflects the novice developer's
> familiarity with spreadsheets and I hate the underlying complexity and lack
> of commenting.  For years I've been pondering how it could have been done
> better.  Unfortunately the existing system isn't broken so there is no
> budget to start over.
>
> As in your explanation I also see some data points deemed to be untrusted
> and a repeat set of tests being scheduled (occasionally more than once).
> Each retest provides another group of raw measurements that are
> mathematically combined into a set of meaningful result data points.  Any
> of those data points may or may not be deemed more or less trustworthy than
> in the previous test.
> To my mind, it would make sense to import the entire retest results into
> the database.  That enables results to be visualised in the database and an
> assessment made of data trustworthiness at that point.  A simplistic view
> would suggest that an entire set of results be deemed the most trustworthy
> for a particular sample.  As you describe, however, lab technicians would
> prefer to choose some data points from one result set and other data points
> from another data set, perhaps avoiding the need to redo the entire set of
> tests yet again.  This is probably the day the equipment is performing
> poorly and they are falling behind schedule.
>
> What I've never figured out is the criteria for the lab technician deeming
> a result to be trustworthy or untrustworthy.  If the result simply didn't
> meet the customer's specification then retesting until a result passed
> would amount to cooking the data.  Any attempt to automate a decision on
> data point trustworthiness is surely going to encourage the selection of
> bad data points that erroneously represent the product as meeting the
> required specification when the truth is that it failed to meet the spec.
> IMHO it might be best to let the lab technician decide what data he trusts
> - his reputation is on the line.
>
> As far as marking the records is concerned, you need to know which result
> set for a sample has the best chromium result, which has the best nickel
> result, etc.  Adding a column for each data point that gives it a
> trustworthiness rank (which should be unique per sample) would work - you
> want to choose the most trustworthy data point for each test for each
> sample.  That looks like a lot of work.
>
> Another thought I've had relies on there being few samples with duplicate
> test results.  You only need to mark duplicate results with trustworthiness
> so perhaps have a separate trustworthiness table that links to the results
> table.  Entries in your trustworthiness table could be used to determine
> which results are bad and/or which are best.  Presumably the most recent
> linked trustworthiness record should take precedence over any older entries
> that might be in conflict.  That might be the simplest data structure to
> handle the complex situation.  Perhaps the down side is a complex query
> to extract the most trustworthy data points for each sample.
>
> Perhaps you wanted a simple answer.  I doubt we are asking ourselves a
> simple question!
>
> Paul Wolstenholme
>
>
> On Thu, 23 Feb 2023 at 04:03, Ryan W <wrwehler at gmail.com> wrote:
>
> > Well, I can't share all the code there...
> >
> > So we take the data off the instrument (in a csv or txt file) depending
> on
> > the instrument, our Access FE parses it and sends it up.
> >
> > Every time a sample is imported from the flat file, every analyte that
> the
> > client requests is ON by default, it's up to the analyst to discern
> whether
> > the data is usable, hence why some things get run multiple times.
> >
> > The goal is so have a progressive/cascading effect. Sample 1 comes in,
> > Chromium and Nickel get turned off, Sample 1 (#2) comes in, user hits
> > toggle and Chromium and Nickel are all that are on.  User realises
> Chromium
> > needs another look so Sample 1 #2 comes in, user hits toggle and ideally
> > only Chromium gets turned on for #3.   As stated the current query resets
> > the second run, there could be a third and fourth but I think the way the
> > query is currently written anything past 2 is basically dead without hand
> > holding from the analyst.
> >
> > Here's the code that basically aggregates the SEL list for the samples:
> >
> > ;WITH Analytes AS (
> >    SELECT MAX(CAST(arsr.Rpt AS INT))   AS MOA --Rpt is a bit field, the
> > value is either 1 or 0
> >          ,arsr.Analyte
> >          ,ars.TestNum --the testnum that contains the master list of
> > requested analytes
> >          ,tars.SeqNo
> >    FROM   tmpAnalRunSeq                AS tars --staging table for data
> the
> > user is working on, to pare down the dataset
> >           INNER JOIN AnalRunSeq        AS ars
> >                ON  ars.TestNum = tars.TestNum
> >           INNER JOIN AnalRunSeqResult  AS arsr
> >                ON  ars.SeqNo = arsr.SeqNo
> >    WHERE  tars.SampType = 'SAMP'
> >           AND tars.WSID = @MyWSID
> >           AND tars.AnalDate > ars.AnalDate
> >           AND arsr.SEL = 1
> >           AND arsr.AnalyteType IN ('A' ,'C')
> >    GROUP BY
> >           arsr.Analyte
> >          ,ars.TestNum
> >          ,tars.SeqNo
> > )
> >
> > UPDATE arsr
> > SET    arsr.Rpt = 'False'
> > FROM   AnalRunSeqResult AS arsr
> >       INNER JOIN tmpAnalRunSeq AS tars
> >            ON  tars.SeqNo = arsr.SeqNo
> >       INNER JOIN Analytes
> >            ON  Analytes.Analyte = arsr.Analyte
> >            AND Analytes.SeqNo = arsr.SeqNo
> > WHERE  arsr.AnalyteType IN ('A' ,'C')
> >       AND arsr.SEL = 'True'
> >       AND arsr.Rpt = 'True'
> >       AND tars.Validated = 'False'
> >       AND tars.WSID = @MyWSID
> >       AND Analytes.MOA = 'True'
> >
> >
> >
> > The way this query is written is it's turning OFF what's already ON
> > (previously) since everything is ON as a default.
> >
> > I was using a correlated subquery in the update statement to find
> > arsr.Analyte IN (SELECT DISTINCT Analyte FROM AnalRunSeqResult where
> > TestNum = tars.TestNum and tars.WSID = @MyWSID and arsr.Rpt = 'True')
> but I
> > re-wrote it to use a CTE because it felt a little easier to read (to my
> > eyes)
> >
> >
> >
> > On Wed, Feb 22, 2023 at 8:50 AM James Button via AccessD <
> > accessd at databaseadvisors.com> wrote:
> >
> > > Maybe post the SQL that sets up the entry set for the sampling.
> > > And then that that manipulates the first sampling needs.
> > > And then that that manipulates the second needs.
> > > And then that that manipulates the third (and final?) sampling needs.
> > >
> > > As well as whatever script is used to generate the sampling process
> > > request from
> > > the data.
> > >
> > > Without that sort of detail your post is pretty much a statement along
> > the
> > > lines
> > > of:
> > >
> > > "Somebody has made at least one error in the design and/or scripting."
> > >
> > > JimB
> > >
> > >
> > > -----Original Message-----
> > > From: AccessD
> > > <accessd-bounces+jamesbutton=blueyonder.co.uk at databaseadvisors.com> On
> > > Behalf Of
> > > Ryan W
> > > Sent: Wednesday, February 22, 2023 2:36 PM
> > > To: Access Developers discussion and problem solving
> > > <accessd at databaseadvisors.com>
> > > Subject: Re: [AccessD] Progressive SQL Update Query
> > >
> > > https://i.imgur.com/34g0Srj.png
> > >
> > >
> > > Here is an image of an example I was working on yesterday:
> > >
> > > Sample 1 had Chromium and Nickel off
> > >
> > > So hitting "toggle" turned Chromium and Nickel ON on #2 (accurate)
> > >
> > > And subsequently everything went OFF for Sample #3 (also accurate based
> > on
> > > current query logic)
> > >
> > > The user then toggled Chromium off on Sample #2 and hit Toggle and it
> > > turned his Chromium back on for #2, and #3 was still all off.
> > >
> > > I guess the gist of it is if the list is hand manipulated it resets the
> > > list and anything > run #2 gets entirely turned off.
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Feb 22, 2023 at 8:09 AM Ryan W <wrwehler at gmail.com> wrote:
> > >
> > > > I'm not sure how to explain this without explaining poorly or not
> > getting
> > > > the message right.
> > > >
> > > > We have data we get from clients and they ask us to analyze them for
> > > trace
> > > > metals,
> > > > so the client list wants to know about 10 trace metals out of 30.
> > > >
> > > > How it currently works is we run the analysis and import the data,
> our
> > > > software pulls it in and matches the requested list to the analysis
> and
> > > > turns off all trace metals not requested.
> > > >
> > > > Sometimes one (or more) of the metals requires re-analysis for
> whatever
> > > > reason. So the user toggles the 'bad' hits out of the sequence. (say
> > they
> > > > turn off Lead and Copper)
> > > >
> > > > So we re-run it and import it and if the data is good, they turn off
> > > > everything BUT Lead and Copper.
> > > >
> > > > In some cases, we have a third analysis. (sometimes we have to dilute
> > the
> > > > sample to get a good reading)...
> > > >
> > > > So say in that second analysis, Copper was not good. So they turn it
> > > off..
> > > > the third Analysis gets copper ONLY turned on for reporting.
> > > >
> > > >
> > > > Right now the toggling is all done by the user.
> > > >
> > > > I want to write a progressive query that looks at the aggregated
> > analysis
> > > > for anything requested (SEL), that's OFF and turn it ON (or vice
> versa)
> > > on
> > > > the following analysis (this works).
> > > >
> > > > However, if there's a third analysis in the sequence everything ends
> up
> > > > turned off because Analysis 1 and 2 meet the requested analysis.
> > > >
> > > > So in this example:
> > > >
> > > > Sample 1's list contains all trace metals, but lead and copper are
> off
> > > > Sample 2's list after hitting "toggle" button contains lead and
> copper
> > > and
> > > > everything else is off.
> > > > Sample 3 is completely off because the aggregated list of required
> > trace
> > > > metals are satisfied by 1 and 2.
> > > >
> > > > However Sample 2 needs user intervention and turns off Copper.  I
> want
> > > > them to be able to hit "Toggle" again and turn ON Copper for Sample
> 3,
> > > but
> > > > leave Copper OFF on Sample 2, even though Sample 1 doesn't satisfy
> the
> > > > copper requirement.
> > > >
> > > > My query logic works fine if there's only 1 rerun, but if there's
> more
> > > > than one is where the problem lies.
> > > >
> > > >
> > > > I don't know if I need to use a cursor or a recordset and a looping
> > > > mechanism to make this work instead of a straight up batch query?
> > > >
> > > >
> > > >
> >
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>