Tumgik
#updateRequired
Text
golden data files - naming convention
For everyone's sanity, we're trying for a single naming convention for golden data files. Put them in a single folder, with nothing else in it, called CI_files. There should be one file for each table (at least - see below), both L0 and L1 tables. Name them golden_tableName.csv
If and only if you have a small test dataset CI should start with, and then a larger dataset to run through the code once it's been written, you can have multiple files per table. Number them, golden_tableName1.csv, golden_tableName2.csv, etc, and CI will start with 1.
0 notes
Text
source & inputs columns
For the source and inputs columns in pub workbooks:
source should be the table name of the table in the ingest workbook that the pub field is derived from. inputs should be the field name that the pub field is derived from. In the case of spatial data (domainID, siteID, etc), source should still be the table name from the ingest workbook, but inputs should be namedLocation (as in, that exact text). This is because CI doesn’t look for spatial data by looking at L0 fields, they look it up in their spatial data tables, based on the named location associated with the data.
These changes have been make for records in AirTable. Note that for spatial fields, ingestInput should indicate the namedLocation field (e.g. plotID).
0 notes
Text
new rules for sampleGroups
New guidance from Ross on sampleGroups: we need to provide a value in sampleGroups for any field that goes to the sample management system. We had been populating sampleGroups only for sample IDs, fates, and barcodes, which is fine if that's all you're sending to SMS. But if, say, processedDate is going to SMS as well, populate sampleGroups to indicate which sample in the table it's associated with.
Figuring out which fields should go to SMS is an entirely separate problem, of course. Use your best judgement for now, and in the new system, updating a workbook to populate sampleGroups and smsFieldName for a field where they were NA before shouldn't be a huge deal.
Ingest prep checklist and transposed template have been updated with this change.
0 notes
Text
Another step when updating previous ATBD template
You do not need to create a separate ‘error’ vs ‘no error’ output files for the pub. Just output one pub file that looks exactly how you expect the pub to look after running the code on your golden input (eg. contains rows with and without quality flags, different outcomes for various steps of your ATBD, try to ensure that multiple cases are covered). Remove code from the ATBD template that creates more than just your ‘golden’ pub output.
0 notes
Text
Test datasets
ATBD authors: Some wisdom on test datasets
The current plan for testing in the new CI pipeline anticipates your test dataset will be ingested into a CI test DB. This means it must look EXACTLY like the data going in via fulcrum and/or the spreadsheet, and the outputs must look EXACTLY as they would coming out.
To this end, you will need to make your namedLocations in your golden input look like a real CI named location (e.g. UKFS_010.mammalGrid.mam NOT UKFS_010 [this is also needed to join with the spatial data via API, see below)]. You will also need to make your dateTimes look as they would coming in via spreadsheet or via fulcrum (YYYY-mm-DD or YYYY-mm-DDTHH:MM). Please bear this in mind making test sets and ATBDs going forward.
You'll also need to make your golden datasets contain spatial data that matches the expected output from CI. The only way to ensure this actually happens is to use the same spatial data. With the release of CI's new API, I wrote some code to pull spatial data directly from the CI servers, which should help with the alignment of test vs real outputs in the algorithm testing phase. It is in devTOS->atbdlibrary->get_localityInfo. Given a set of named locations (and yes, they must be real named locations, not plotIDs), it will return to you the lat/long/elev and a bunch of other stuff CI has stored. Please use this going forward rather than faking and/or taking a snapshot of the spatial data. Remember - to use the new functions you'll need to reinstall the library. Note that geodeticDatum doesn't seem to be available via the API, so you may need to 'fake' that on in your test datasets too. If you want some example code/workflow, look in the rpt ATBD, it's working there.
0 notes
Text
publication "usage" update
Small update to the publication workbook: we’ve dropped “transition” as an option in the “usage” column. It’s redundant with usage=“both” and it’s not terribly useful. “both” and “publication” are the options for usage, and should be applied at the table level, i.e. usage shouldn’t have different values for different fields within a table.
0 notes
Text
How to update your existing ATBD in 8 ..or maybe 17...easy steps.
If you are starting an ATBD anew, just pull and clone the ATBD library and ignore this.  If you started an ATBD and have noticed changes in the template, here’s what you need to do to be Agile compliant without starting over.  Mostly it’s careful copy and pasting.  
1. Replace your logo.png with the new one, which is in with the ~devTOS\atbdLibrary\inst\rmarkdown\templates\atbdTemplate\skeleton\logo.png -> the new logo won’t have the trademark in it, if you want to check you did it right
2. Replace your first section (starting with – fontsize: 11pt THROUGH word_document: default –) with the corresponding section in: ~devTOS\atbdLibrary\inst\rmarkdown\templates\atbdTemplate\skeleton\skeleton.Rmd
3. Replace your section (starting with [//:] TEMPLATE SECTION 1 THROUGH [//]: TEMPLATE SECTION 2) with the corresponding section in: ~devTOS\atbdLibrary\inst\rmarkdown\templates\atbdTemplate\skeleton\skeleton.Rmd
4. Search for ‘Remove the next three lines for ATBDs’, and delete the next 3 lines
5. Replace your section (starting with ## PURPOSE THROUGH ## SCOPE) with the corresponding section in: ~devTOS\atbdLibrary\inst\rmarkdown\templates\atbdTemplate\skeleton\skeleton.Rmd
6. Delete the variable reported table
7. Add to variables reported section a final sentence: Some variables described in this document may be for NEON internal use only and will not appear in downloaded data. These are indicated with **downloadPkg** = “none” in `r pubName` (`r ADlist[“pub”,“ref”]`). You may need to adjust the reference in the above sentence to whatever reference your pubwb tables, depending on how you set up that reference.
8. Copy in the new data constraints and validation sections (copy and paste from the skeleton.rmd to replace your existing text. But, look before you paste over if you want to retain any notes to your fulcrum buddy in RED).  You should now have sentences about ## User Interface Specifications: all forms, not things about webUIs vs MDRs.  Your new section should end with ‘1.  All date fields can be entered as dates or dateTimes, the parser will interpret whether time is included based on the formatting.’
9. (updated 10/5/2016) smsOnly fields can occur in your example data.  You may want to remove them when simulating the parser steps (e.g. before you start implementing your algorithm) since these fields will be ignored in any de-duping, etc, as they will not be available in PDR)
More steps added 10/3/2016
10.Adjust your code so it writes out the namedLocation in the L1 goldenData
11.Reformat dateTime fields as necessary to match the preferred CI formatting
12. Make sure your namedLocations are REAL ones that exist, and that you are using the API to populate things looked up from the spatial data table.
13. Make sure your Equals:type samples [EXIST] , if specified by the workflow
14. Samples -> make sure you are passing both the barcode and the id (but not the fate)
15. For any calculations/logic done on sampleIDs, paste in example syntax to algorithm implementation from the skeleton (’In every instance in the algorithm in which a sample tag (generally corresponding to a fieldName of the form xxxSampleID) is used to look up data records, the lookup should be first attempted via the sample barcode. If the sample barcode is not populated, proceed using the sample tag. on using sample tag if it exists, otherwise sample barcode’)
16. Add text (Populate the location description values…)and code from the skeleton to populate the publication location-y things (domainID, plotID, locationID, etc. Copy and paste the sentence from the template that begins ‘ “The named location for each”
17. Make sure your de-duping says whether to treat NULL values as different, or resolve, and that the code and language match.
Updates 10/10/2016
18. Delete the section on sample creation rules, formerly started with ‘ ## ##Sample creation rules
19.It is not necessary to include a list of fields that are NOT passed from L0 to L1 (though if you have it you can keep it, it can be hard to keep up to date
20. Add transitionID to the golden L0 and L1
21. Make sure column headers on golden_in match entryLabelIfDifferentFromFieldName
22. Specify whether you want fields that are NOT passed L0-> L1 in the dedupe check
23. Put your testing files in CI_files subdirectory and name them correctly and clean out any extra bonus files on there so there’s no confusion
24.If you have taxon fuzzing, copy in the new syntax with namedLocation instead of dXX, and where the redaction is folded in.  If you are copying and pasting from the template,the sentence starts with ‘ For each record *p* of `r pTable[“id”]` where **targetTaxaPresent** is ‘Yes’
0 notes
Text
ingest workbook update
The column for the Fulcrum path is back, to give us space for both the Fulcrum name and the Fulcrum JSON path in the ingest workbook. Every ingest workbook should end in a column called parserPath. ATBD authors don't need to fill it in, the Fulcrum developers will.
0 notes
Text
UID for Fulcrum tables
Tables ingested via Fulcrum should include a uid field, contra earlier guidance. It's parserToCreate should read [CREATE_UID], just like in tables ingested by spreadsheet.
0 notes
Text
named location validation
New update to ingest workbook template - see mosquito example:
All named locations must have a [NAMEDLOCATIONTYPE()] validation, even if they are populated via a DERIVE_FROM_SAMPLE_TREE() in the parserToCreate field. For mosquitoes, this means the data associated with field samples gets a location validation of [NAMEDLOCATIONTYPE(OS Plot - mos)], and the data associated with sample mixtures gets a location validation of [NAMEDLOCATIONTYPE(SITE)], because plots are mixed within sites and so the smallest common location for the mixtures is the site. This extra validation means the parser will reject data if the lab attempts to send back data after mixing samples across sites - because then the smallest common location would be domain or realm, neither of which is a valid location type for these data.
If you don't have any sample mixtures, the named location type of the derived location will be the same as the type on the original sample. If you do have sample mixtures, be careful to include all possible location types that will result.
0 notes
Text
Cleaning up text formatting - will be done on ingest, no longer necessary to include in your ATBD
Hi All,
In the wonderful new world of The Parser, Team Parser has agreed to take on the stripping of empty whitespaces, conversion of double quotes to single quotes, etc during the INGEST process.
What this means for you:
1. Use the new ATBD template so it's documented
2. Remember to include the [ASCII] function in the form or parser validation on free text entry string fields to ensure that your data ENTERs the system without nonascii characters (if desired)
3. DELETE (if you had included it) the algorithm in the ATBD to remove special characters.  
We want the ATBD coding to be as efficient as humanly possible, so no need to put this in two places!
0 notes
Text
modified new ingest rules
At Ross' request, we're changing one of the rules for sampleClass in the ingest workbooks. If you reference the same sample multiple times (i.e., sampleInfo contains Equals), put the sample class in sampleClass every time, not just on the row/in the workbook where the sample is created. We'll be updating the examples and instructions accordingly.
0 notes
Text
Fixes to readme_template
The readme_template in how-to-make-a-data-product -> Publication Workbook OS has been updated to reflect the following changes:
1. Link to documents changed from neoninc to neonscience
2. long dashes (non-utf8 text) replaced with regular dashes
3. Non-utf-8 quotes replaced with regular quotes.
Please use this template going forward.  I updated the existing readmes on the portal this AM, and I can manually fix anyone else’s who has already made one with the old template, if you tell me which ones to fix.
0 notes
Text
LOVs in pub workbooks
An lovName column is now required in pub workbooks. The Excel workbook template has been updated, and existing workbooks can also be updated with a bit of cutting and pasting. See more details here: https://github.com/NEONInc/devTOS/issues/63
Additionally, the “pubFormat” value for fields populated from LOVs should be “LOV”. The exception is numeric LOVs (if, say, your LOV allows the values 3, 5, 8, or something like that). Then the pubFormat code should specify the rounding like it does for other numeric fields.
0 notes