Class: Kiba::Extend::Transforms::Fcar::SplitPrep

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/fcar/split_prep.rb

Overview

Prepares data in the given :orig field for input to split FCAR process. Applies the indicated splits to the orig field, writing the results to :split_val field; adds :sort, :autosplit, and :prepped_row_fingerprint fields.

Examples:

With Array of splitters

# Used in pipeline as:
# transform Fcar::SplitPrep,
#   splitters: [",", ";"],
#   orig: :val

xform = Fcar::SplitPrep.new(
   splitters: [",", ";"],
   orig: :val
)
input = [
  {val: "a"},
  {val: "b, c"},
  {val: "d;e"},
  {val: "f,g;h"}
]
result = Kiba::StreamingRunner.transform_stream(input, xform)
  .map{ |row| row }
expected = [
  {split_val: "a", orig: "a", autosplit: "n", sort: "a 000",
    prepped_row_fingerprint: "YeKQn2HikJ9hIDAwMA=="},
  {split_val: "b", orig: "b, c", autosplit: "y", sort: "b, c 000",
    prepped_row_fingerprint: "YiwgY+KQn2LikJ9iLCBjIDAwMA=="},
  {split_val: "c", orig: "b, c", autosplit: "y", sort: "b, c 001",
    prepped_row_fingerprint: "YiwgY+KQn2PikJ9iLCBjIDAwMQ=="},
  {split_val: "d", orig: "d;e", autosplit: "y", sort: "d;e 000",
    prepped_row_fingerprint: "ZDtl4pCfZOKQn2Q7ZSAwMDA="},
  {split_val: "e", orig: "d;e", autosplit: "y", sort: "d;e 001",
    prepped_row_fingerprint: "ZDtl4pCfZeKQn2Q7ZSAwMDE="},
  {split_val: "f", orig: "f,g;h", autosplit: "y", sort: "f,g;h 000",
    prepped_row_fingerprint: "ZixnO2jikJ9m4pCfZixnO2ggMDAw"},
  {split_val: "g", orig: "f,g;h", autosplit: "y", sort: "f,g;h 001",
    prepped_row_fingerprint: "ZixnO2jikJ9n4pCfZixnO2ggMDAx"},
  {split_val: "h", orig: "f,g;h", autosplit: "y", sort: "f,g;h 002",
    prepped_row_fingerprint: "ZixnO2jikJ9o4pCfZixnO2ggMDAy"}
]
expect(result).to eq(expected)

With Hash of splitters

# Used in pipeline as:
# transform Fcar::SplitPrep,
#   splitters: {"," => "comma", ";" => "semicolon"},
#   orig: :val

xform = Fcar::SplitPrep.new(
   splitters: {"," => "comma", ";" => "semicolon"},
   orig: :val
)
input = [
  {val: "a"},
  {val: "b, c"},
  {val: "d;e"},
  {val: "f,g;h"}
]
result = Kiba::StreamingRunner.transform_stream(input, xform)
  .map{ |row| row }
expected = [
  {split_val: "a", orig: "a", autosplit: nil, sort: "a 000",
    prepped_row_fingerprint: "YeKQn2HikJ9hIDAwMA=="},
  {split_val: "b", orig: "b, c", autosplit: "comma", sort: "b, c 000",
    prepped_row_fingerprint: "YiwgY+KQn2LikJ9iLCBjIDAwMA=="},
  {split_val: "c", orig: "b, c", autosplit: "comma", sort: "b, c 001",
    prepped_row_fingerprint: "YiwgY+KQn2PikJ9iLCBjIDAwMQ=="},
  {split_val: "d", orig: "d;e", autosplit: "semicolon", sort: "d;e 000",
    prepped_row_fingerprint: "ZDtl4pCfZOKQn2Q7ZSAwMDA="},
  {split_val: "e", orig: "d;e", autosplit: "semicolon", sort: "d;e 001",
    prepped_row_fingerprint: "ZDtl4pCfZeKQn2Q7ZSAwMDE="},
  {split_val: "f", orig: "f,g;h", autosplit: "comma|semicolon", sort: "f,g;h 000",
    prepped_row_fingerprint: "ZixnO2jikJ9m4pCfZixnO2ggMDAw"},
  {split_val: "g", orig: "f,g;h", autosplit: "comma|semicolon", sort: "f,g;h 001",
    prepped_row_fingerprint: "ZixnO2jikJ9n4pCfZixnO2ggMDAx"},
  {split_val: "h", orig: "f,g;h", autosplit: "comma|semicolon", sort: "f,g;h 002",
    prepped_row_fingerprint: "ZixnO2jikJ9o4pCfZixnO2ggMDAy"}
]
expect(result).to eq(expected)

Constant Summary collapse

UNIT_SEP =

Used internally to indicate the places where the value needs to be split, before splitting is actually applied. The U+241F / E2 90 9F / Symbol for Unit Separator is used to avoid clashes with other common delimiter strings that may be present in values

Returns:

  • (String)

    U+241F / E2 90 9F / Symbol for Unit Separator

""

Instance Method Summary collapse

Constructor Details

#initialize(splitters:, orig:, target: :split_val, sort: :sort, indicator: :autosplit, fingerprint: :prepped_row_fingerprint) ⇒ SplitPrep

Returns a new instance of SplitPrep.

Parameters:

  • splitters (Array<String, Regexp>, Hash{String, Regexp => String, Symbol})

    Values on which to split the :orig column value. If given an Array, :autosplit column will be populated with “y” or “n”. If given a Hash, the Hash keys are used as the splitters, and the :autosplit column will be populated with the joined Hash values of the splitters that were applied

  • orig (Symbol)

    field containing values that will be programmatically split and reviewed by client in FCAR process

  • target (Symbol) (defaults to: :split_val)

    field in which the results of programmatic splitting will be written, and client can make corrections

  • sort (Symbol) (defaults to: :sort)

    field in which the sort values will be written

  • indicator (Symbol) (defaults to: :autosplit)

    field in which inidication of whether splitting was applied to target values will be written

  • fingerprint (Symbol) (defaults to: :prepped_row_fingerprint)

    field in which prepped row identifying fingerprint will be written



115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/kiba/extend/transforms/fcar/split_prep.rb', line 115

def initialize(splitters:, orig:, target: :split_val,
  sort: :sort, indicator: :autosplit,
  fingerprint: :prepped_row_fingerprint)
  @splitters = splitters.is_a?(Hash) ? splitters.keys : splitters
  @orig = orig
  @target = target
  @sort = sort
  @indicator = indicator
  @fingerprint = Fingerprint::Add.new(
    fields: [:orig, target, sort],
    target: fingerprint
  )
  @splitinds = splitters.is_a?(Hash) ? splitters : nil
  @rows = []
end

Instance Method Details

#closeObject



144
145
# File 'lib/kiba/extend/transforms/fcar/split_prep.rb', line 144

def close = rows.flatten
.each { |row| yield fingerprint.process(row) }

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


132
133
134
135
136
137
138
139
140
141
142
# File 'lib/kiba/extend/transforms/fcar/split_prep.rb', line 132

def process(row)
  val = row[orig]
  fail(BlankFcarOrigFieldError) if val.blank?

  row[:orig] = val
  row.delete(orig)
  matchers = splitters.select { |splitter| val.match?(splitter) }
  rows << prep_rows(row, val, matchers)

  nil
end