learnings
REPO: basic interaction with db: run scripts, retrieve query results: SCRIPT, QUERY, CONNECTION
Model: a WHOLE table
breaks into......
ModelInstance: individual records: UPDATE, SAVE
DirScan: scan a DIRECTORY: COLLECTION, SAMPLE, how to process FILES
SasScanner: for sas database files
Find one associated record by type and name
row count, column count, date created
variable: label, type, length
Periodicity: data updated....two months go by.....SAME data updated....two months go by.....SAME data updated
Seeding a database with data, seed.yml file etc.
DETERMINE COLLECTION
agency name: hhs
collection name: tanf
sample name: 2008
a fulltext index is faster than a like or full query cus it looks for a partial piece type thing
Model: a WHOLE table
breaks into......
ModelInstance: individual records: UPDATE, SAVE
DirScan: scan a DIRECTORY: COLLECTION, SAMPLE, how to process FILES
SasScanner: for sas database files
Find one associated record by type and name
row count, column count, date created
variable: label, type, length
Periodicity: data updated....two months go by.....SAME data updated....two months go by.....SAME data updated
Seeding a database with data, seed.yml file etc.
DETERMINE COLLECTION
agency name: hhs
collection name: tanf
sample name: 2008
a fulltext index is faster than a like or full query cus it looks for a partial piece type thing
SQLServerDataSource MICROSOFT SQL server
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_CATALOG='cmr';
Model: what data structure looks like in db, but make a "picture" of it
directory names~agencies
create_repository is just a SQL schema
geography_level is important (some are state samples, potential exists)
SIPP record type is the person_month record, which is the series of all of the months
of data for a specific person
records are like our rec_types
record layout is the thing the variables are attached to
if we have 10 yrs hud pik layout (record layout), want to save time, but if they are different then have to use new
samples vs. records vs record layouts
HUD, veterans and wives, 12 yrs, and 5yrs have this skeleton structure (cus they might want to know comparatively what kind of record layout)
minimize repeating
there is seeds agencies and dataCollections HHS has TANF and CCDF
Application: gets command options(a combination of what came in on command line and properties, use properties value as default, otherwise command line,
using COMMONS CLI library for options stuff
A record has a sample_id
A sample has a datacollection_id and name
on Model there are three ways to get instance- initialize, findby name, findbyId
SasFileProperties
"Was it a new one, was it an UPDATE"
get rest of db working and thing parsing (SAS guy)
HELP
-making seed complete with datasets
-see if i can reproduce it we can work on different datasets

Comments
Post a Comment