Class: Kiba::Extend::Transforms::Normalize::FieldValues

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/normalize/field_values.rb

Overview

Note:

The basic functionality of Kiba::Extend::Utils::StringNormalizer is described and tested in that class

Apply Kiba::Extend::Utils::StringNormalizer to the values in the indicated fields

Examples:

Single values

# Used in pipeline as:
# transform Normalize::FieldValues,
#   fields: %i[animal name],
#   replacements: {
#     "e" => "E",
#     /a$/ => "aaaa"
#   },
#   xforms: [:blank, ->(str) { str.reverse }]

xform = Normalize::FieldValues.new(
  fields: %i[animal name],
  replacements: {
    "e" => "E",
    /a$/ => "aaaa"
  },
  xforms: [:blank, ->(str) { str.reverse }]
)
input = [
  {animal: "guinea", name: "Napo"},
  {animal: "", name: nil}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {animal: "aaaaEniug", name: "opaN"},
  {animal: "", name: nil}
]
expect(result).to eq(expected)

Multi-values

# Used in pipeline as:
# transform Normalize::FieldValues,
#   fields: %i[animal name],
#   delim: "|",
#   replacements: {
#     "e" => "E",
#     /a$/ => "aaaa"
#   },
#   xforms: [:blank, ->(str) { str.reverse }]

xform = Normalize::FieldValues.new(
  fields: %i[animal name],
  delim: "|",
  replacements: {
    "e" => "E",
    /a$/ => "aaaa"
  },
  xforms: [:blank, ->(str) { str.reverse }]
)
input = [
  {animal: "guinea", name: "Napo|Earhart"},
  {animal: "", name: nil}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {animal: "aaaaEniug", name: "opaN|trahraE"},
  {animal: "", name: nil}
]
expect(result).to eq(expected)

Targets

# Used in pipeline as:
# transform Normalize::FieldValues,
#   fields: %i[animal name],
#   targets: %i[a b],
#   replacements: {
#     "e" => "E",
#     /a$/ => "aaaa"
#   },
#   xforms: [:blank, ->(str) { str.reverse }]

xform = Normalize::FieldValues.new(
  fields: %i[animal name],
  targets: %i[a b],
  replacements: {
    "e" => "E",
    /a$/ => "aaaa"
  },
  xforms: [:blank, ->(str) { str.reverse }]
)
input = [
  {animal: "guinea", name: "Napo"},
  {animal: "", name: nil}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {animal: "guinea", name: "Napo", a: "aaaaEniug", b: "opaN"},
  {animal: "", name: nil, a: "", b: nil}
]
expect(result).to eq(expected)

Instance Method Summary collapse

Constructor Details

#initialize(fields:, targets: nil, delim: nil, mode: nil, replacements: {}, xforms: []) ⇒ FieldValues

Defined xforms

  • :nfkc - ON BY DEFAULT: Applies Unicode compatibility decomposition, followed by canonical composition; See https://unicode.org/reports/tr15/ for more details than you want.
  • :replace - ON BY DEFAULT: performs find-and-replace operations specified in replacements parameter
  • :blank - deletes all spaces and tabs, using Ruby /\pBlank/ regexp
  • :lower - downcase the string
  • :nonword - removes ALL characters that are not letters, numbers, or underscores
  • :punct - removes all characters matching Ruby /\pPunct/ regexp
  • :to_ascii - replaces non-ASCII characters with an ASCII approximation, or if none exists, a replacement character which defaults to “?”.

Defined modes

  • :cspaceid - replaces weird characters that don’t convert to ASCII properly, :to_ascii, :nonword, :lower

Parameters:

  • mode (:cspaceid) (defaults to: nil)

    Use an established set of xforms and replacement settings

  • replacements (Hash{Regexp => String}) (defaults to: {})

    simple gsub find/replaces to be applied, in order, to the string being normalized; key is the find/match value; value is the replacement string

  • xforms (Array<Symbol, Proc>) (defaults to: [])

    Symbol must match one of the defined transforms; A Proc that takes one String arg and returns a String may also be passed to apply uncommon normalization logic

  • fields (Array<Symbol>, Symbol)

    field name or list of field names to add

  • targets (NilValue, Array<Symbol>, Symbol) (defaults to: nil)

    field name or list of field names in which to write normalized values; Must have same number of elements as fields

  • delim (nilValue, String) (defaults to: nil)

    when non-nil, each value will be split into multi-values using this string prior to normalization



111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/kiba/extend/transforms/normalize/field_values.rb', line 111

def initialize(fields:, targets: nil, delim: nil, mode: nil,
  replacements: {}, xforms: [])
  @fields = [fields].flatten
  @targets = if targets
    targetarr = [targets].flatten
    unless @fields.length == targetarr.length
      fail(Kiba::Extend::UnbalancedFieldsTargetsError)
    end
    targetarr
  end

  @delim = delim
  @normalizer = Kiba::Extend::Utils::StringNormalizer.new(
    mode: mode,
    replacements: replacements,
    xforms: xforms
  )
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


131
132
133
134
135
136
137
# File 'lib/kiba/extend/transforms/normalize/field_values.rb', line 131

def process(row)
  fields.each_with_index do |field, idx|
    normalize_field_value(row, field, idx)
  end

  row
end