FCAR (facilitated cleanup and remapping) recipes
This page documents common patterns of client FCAR using kiba-extend’s iterative cleanup functionality.
Each recipe has three components:
- Prep/setup job - the structure of the data required as input for the FCAR process, and any transforms that exist to streamline achieving this structure
- FCAR configuration - A commented version of the configuration Module to include in your project to activate this FCAR
- Merge job - patterns for merging the FCAR back into the rest of your project
Table of contents
Review and correction of programmatic value splitting
The prep and merge sections below use :init__prep as the job output from which we are peeling off this FCAR proccess, and thus to which we are merging its results back in.
Prep/setup job
To ease the merge process, it’s recommended you break this into two jobs:
- normalize: normalizes the values to be included in the worksheet
- prep: deduplicates on normalized values and finalizes prep for the split FCAR
Normalization job example
In this example, we are pulling just the location field values out of a single migrating table and applying the normalization described in the worksheet instructions to them.
# frozen_string_literal: true
module Project
module Jobs
module ValueSplit
module FcarNorm
module_function
def job
Kiba::Extend::Jobs::Job.new(
files: {
source: :init__prep,
destination: :value_split__fcar_norm
},
transformer: xforms
)
end
def xforms
Kiba.job_segment do
transform Delete::FieldsExcept,
fields: %i[location]
transform FilterRows::FieldPopulated,
action: :keep,
field: :location
transform Deduplicate::Table,
field: :location
# Adjust the normalization in a way that makes sense for the data.
# We want to be as aggressive as we can in normalizing, without
# starting to over-lump things that should be kept discrete
transform Normalize::FieldValues,
fields: :location,
targets: :norm,
xforms: [:lower],
replacements: {
/ +/ => " ",
/^ / => "",
/ $/ => ""
}
transform Replace::NormWithMostFrequentlyUsedForm,
normfield: :norm,
nonnormfield: :location,
target: :normloc
transform Delete::Fields,
fields: :norm
end
end
end
end
end
end
Prep example
# frozen_string_literal: true
module Project
module Jobs
module ValueSplit
module FcarPrep
module_function
def job
Kiba::Extend::Jobs::Job.new(
files: {
source: :value_split__fcar_norm,
destination: :value_split__fcar_prep
},
transformer: xforms
)
end
def xforms
Kiba.job_segment do
transform Deduplicate::Table,
field: :normloc,
include_occs: true,
compile_uniq_fieldvals: true
transform Rename::Fields, fieldmap: {
location: :unnormalizedlocations
}
# Set up the splitters you need here
transform Fcar::SplitPrep,
orig: :normloc,
splitters: {
/ *; */ => :semicolon,
/ and /i => :and,
/ & / => :ampersand
}
transform Sort::ByFieldValue,
field: :sort,
mode: :string
end
end
end
end
end
end
FCAR configuration
# frozen_string_literal: true
module Project
module ValueSplit
module_function
# Most of these settings/variables are documented in:
# https://lyrasis.github.io/kiba-extend/Kiba/Extend/Mixins/IterativeCleanup.html
# Job key of the prep job to be used as input for the FCAR. Change this
# to whatever you have named the job in your project.
def base_job = :value_type__split_prep
# Don't change this without good reason. The values used to uniquely
# identify a corrected worksheet row
def fingerprint_fields = %i[split_val orig]
extend Kiba::Extend::Mixins::IterativeCleanup
def orig_values_identifier = :prepped_row_fingerprint
# Edit this to work with the tags in your project
def job_tags = %i[value_type split cleanup]
# Edit this if your worksheet data includes other headers you wish to
# include in the ordering
def worksheet_field_order = %i[split_val orig split to_review
sort]
# Delete this if you aren't including an occurrences field or other
# field that should be collated. These include any fields that indicate
# in what field(s) a term was used; the unnormalized forms of name that
# may have been normalized to create the "orig" value for the FCAR
# process, etc.
def collate_fields = %i[occurrences]
# Delete this if you aren't including a numeric occurrences collated field
# that needs to be summed.
def cleaned_uniq_post_xforms
bind = binding
Kiba.job_segment do
mod = bind.receiver
transform Kiba::Extend::Transforms::Fcar::Helpers::SumCollatedOccurrences,
field: :occurrences,
delim: mod.collation_delim
end
end
def final_post_xforms
Kiba.job_segment do
# Get rid of worksheet fields required for merging back into project
# that could have been modified by client, and the helper
# `autosplit` column
transform Delete::Fields,
fields: %i[orig sort autosplit]
# Reconstitute the original values of fields critical for merging from
# the prepped row fingerprint, and delete the fingerprint field, as
# it has served its purpose
transform Fingerprint::Decode,
fingerprint: :prepped_row_fingerprint,
source_fields: %i[orig split_val sort],
delete_fp: true
transform Rename::Fields, fieldmap: {
fp_orig: :orig,
fp_sort: :sort
}
# We don't need the uncorrected `split_val` values from the
# fingerprint
transform Delete::Fields,
fields: :fp_split_val
# Drop rows where client has deleted values from `split_val`
transform FilterRows::FieldPopulated,
action: :keep,
field: :split_val
# This and the following Deduplicate::Table step exist to
# prevent duplicate values being merged into the project
# if/when client has entered corrected split on all
# rows for the original data
transform CombineValues::FromFieldsWithDelimiter,
sources: %i[orig split_val],
target: :combined,
delete_sources: false,
delim: " "
transform Deduplicate::Table,
field: :combined,
delete_field: true
# Set up so merging will keep values in their original order
transform Sort::ByFieldValue,
field: :sort,
mode: :string
end
end
def final_lookup_on_field = :orig
end
end
Merge job
This job replaces the location values in the original :init__prep output with the correctly and unambiguously delimited values from the FCAR worksheet.
# frozen_string_literal: true
module Project
module Jobs
module ValueSplit
module FcarMerge
module_function
def job
Kiba::Extend::Jobs::Job.new(
files: {
source: :init__prep,
destination: :value_split__fcar_merge,
lookup: [
{jobkey: :value_split__fcar_norm, lookup_on: :location},
{jobkey: :value_split__final, lookup_on: :orig}
]
},
transformer: xforms
)
end
def xforms
Kiba.job_segment do
# First, merge the normalized form of each location into the table
transform Merge::MultiRowLookup,
lookup: loc_split__fcar_norm,
keycolumn: :location,
fieldmap: {normloc: :normloc}
# Delete the old location field once normalized forms are merged in, since
# we are replacing with correct forms in a minute
transform Delete::Fields, fields: :location
# Merge in corrected values, matching on the normalized locations we just
# merged in, and the normalized locations in the "orig" column of the FCAR
transform Merge::MultiRowLookup,
lookup: loc_split__final,
keycolumn: :normloc,
fieldmap: {location: :split_val},
delim: Sr.delim
# We don't need to keep the normalized location now that we've matched on it
transform Delete::Fields,
fields: :normloc
end
end
end
end
end
end