Clarkson University Criminal Justice System Software Testing

Papers

When Trusted Black Boxes Don't Agree: Incentivizing Iterative Improvement and Accountability in Critical Software Systems: J. Matthews, G. Northup, I. Grasso, S. Lorenz, M. Babaeianjelodar, H. Bashaw, S. Mondal, A. Matthews, M. Njie, J. Goldthwaite
Proceedings of the 2020 AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES) , New York, New York, USA, February 7-8 2020.
PDF (Paper), Slides
The Right To Confront Your Accusers: Opening the Black Box of Forensic DNA Software: J. Matthews, S. Lorenz, M. Babaeianjelodar, A. Matthews, M. Njie, N. Adams, D. Krane, J. Goldthwaite, C. Hughes
Proceedings of the 2019 AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES) , Honolulu, Hawaii, January 27-28 2019.
PDF

Video/Podcast

Exhibit A, Netflix, Episode 4 on Touch DNA
Opting in: Privacy in the Digital Age ,: Jeanna Matthews,Ariel Silverstone
SPIT podcast with Bart with Baratunde Thurston, iHeartRadio, July 10 2019.

Talks

Decoding Probabilistic Genotyping Software: J. Matthews, N. Adams, J. Goldthwaite
Questioning Forensics 2020: 22 and You: Fighting for Privacy & Justice in an Age of Genetic Surveillance, Brooklyn Law School, January 15, 2020.
Slides
Introduction to Decision Making Algorithms: J. Matthews
NYU Conference on Trade Secrets and Algorithmic Systems, November 16, 2018.
Slides
Opening the Black Box: Confronting Software-Based Evidence: J. Matthews
Questioning Forensics 2018: Lawyers, Damn Lawyers, and Statistics, New York City Bar Association, November 2 2018.
Slides
You're just complaining because you're guilty: A DEF CON Guide to Adversarial Testing of Software Used In the Criminal Justice System: J. Matthews, N. Adams, J. Greco
DEF CON 26 , Las Vegas, August 9-12 2018.
Video (English)
Schedule , Slides .
You're Just Complaining Because You're Guilty: Algorithmic Accountability and Transparency in Criminal Justice Software: J. Matthews
Data and Society, New York, USA, June 27 2018.
Video (English with Sign Language).
Full Databite No. 112 with Darakhsan Mir and Taeyoon Choi.

Documentation

FST Documentation.: Contains most of the information needed to run OCME's Forensic Statistical Tool, including the setup steps required to reproduce the results of our data generating runs.
Starting and Running FST on Prometheus.: Aside from our VM connection details, this document details how to perform manual testing runs of submitting data against FST, as well as useful tables comparing the input file formats of FST and LRmix, and links to other data sources.
Manifest of Changes to FST Source for the LRmix Comparison.: Describes all of the database and source level changes done to FST to allow FST to run as close to LRmix as possible.

Data

fst.csv.: FST's results over its own validation study.
fst_nodrop.csv.: FST's results run over its own validation study, but with the suspicious CheckFrenquencyForRemoval function defeated.
fst_noncont.csv.: FST run (with the standard configuration, including the suspicious function) over a database of known non-contributors.
fst_as_lrmix_cont.csv.: FST run with the standard configuration, but with LRmix-like parameters, over the validation study.
lrmix_default.csv.: LRmix run on the validation study, using its own defaults.
lrmix_default_ncont.csv.: LRmix run on the validation study with known non-contributors, using its own defaults.
lrmix_as_fst.csv.: LRmix run on the validation study, using a statistical estimate of FST parameters over three dimensions (large file).
lrmix_as_fst_ncont.csv.: LRmix run on the validation study with known non-contributors, using a statistical estimate of FST parameters over three dimensions (very large file). This dataset has a known bug wherein the AGG Dropout dimension is run only over contributors, fixed below.
lrmix_as_fst_ncont_agg.csv.: Same as the above, but with only the AGG Dropout dimension, and truly run on non-contributors.
lrmix_default_theta.csv.: LRmix, with default parameters, except for varying θ (the "theta correction"), over contributors.
euroformix_default_ncont.csv.: EuroForMix run on the validation study with known non-contributors, using its default settings.
euroformix_as_fst.csv.: EuroForMix run on the validation study, with parameters tuned to be closer to FST.
euroformix_as_fst_ncont.csv.: EuroForMix run on the validation study with known non-contributors, with parameters tuned to be closer to FST.
euroformix-default-known.csv.: EuroForMix run on the validation study with known non-contributors, with default parameters.
euroformix_default.csv.: Same data as above in a slightly different format.

Software

Conversion Scripts: A collection of scripts to convert between the different input formats of several PG softwares.
pgtk.: A collection of tools to manipulate the data from the research study, written mostly in Python 3. (This does not include any tools for running the genotyping software, just for preparing, collecting, and analyzing the output.)
LRmix.jar.: Our custom built LRmix which uses a command line interface for batching.
LRmix Studio.: Source code for the above.
fstint.: A small interface for using NYC OCME's Forensic Statistical Tool
cjs3.: A project for EuroForMix automation and simulation.