Class: Kiba::Extend::Transforms::Replace::NormWithMostFrequentlyUsedForm

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/replace/norm_with_most_frequently_used_form.rb

Overview

Provides the most-frequently used literal form for normalized values.

The examples here will discuss names, but this transform can be applied to any kind of values.

REQUIRES:

  • one field having original name values (including all minor variants that are removed by normalization
  • one field having the normalized name values

The transform does not care what normalization algorithm was applied to derive the normalized values

Notes on examples below:

  • In the “With defaults” example, the normalized form is replaced by the most frequently used form
  • In the “With target” example, the most frequently used form is put in the specified target field, leaving original normalized value in place
  • When there’s a tie among most frequently-used forms, the first-encountered form is used

Examples:

With defaults

# Used in pipeline as:
# transform Replace::NormWithMostFrequentlyUsedForm,
#   normfield: :norm,
#   nonnormfield: :name
xform = Replace::NormWithMostFrequentlyUsedForm.new(
  normfield: :norm,
  nonnormfield: :name
)
input = [
  {name: "Smith, R. J.", norm: "smithrj"},
  {name: "Smith, R. J.", norm: "smithrj"},
  {name: "Smith, R.J.", norm: "smithrj"},
  {name: "Smith, RJ", norm: "smithrj"},
  {name: "Fields, J.T.", norm: "fieldsjt"},
  {name: "Fields, J. T.", norm: "fieldsjt"},
]
result = Kiba::StreamingRunner.transform_stream(input, xform)
  .map{ |row| row }
expected = [
  {name: "Smith, R. J.", norm: "Smith, R. J."},
  {name: "Smith, R. J.", norm: "Smith, R. J."},
  {name: "Smith, R.J.", norm: "Smith, R. J."},
  {name: "Smith, RJ", norm: "Smith, R. J."},
  {name: "Fields, J.T.", norm: "Fields, J.T."},
  {name: "Fields, J. T.", norm: "Fields, J.T."},
]
expect(result).to eq(expected)

With target

# Used in pipeline as:
# transform Replace::NormWithMostFrequentlyUsedForm,
#   normfield: :norm,
#   nonnormfield: :name,
#   target: :pref
xform = Replace::NormWithMostFrequentlyUsedForm.new(
  normfield: :norm,
  nonnormfield: :name,
  target: :pref
)
input = [
  {name: "Smith, R. J.", norm: "smithrj"},
  {name: "Smith, R. J.", norm: "smithrj"},
  {name: "Smith, R.J.", norm: "smithrj"},
  {name: "Smith, RJ", norm: "smithrj"},
  {name: "Fields, J.T.", norm: "fieldsjt"},
  {name: "Fields, J. T.", norm: "fieldsjt"},
]
result = Kiba::StreamingRunner.transform_stream(input, xform)
  .map{ |row| row }
expected = [
  {name: "Smith, R. J.", norm: "smithrj", pref: "Smith, R. J."},
  {name: "Smith, R. J.", norm: "smithrj", pref: "Smith, R. J."},
  {name: "Smith, R.J.", norm: "smithrj", pref: "Smith, R. J."},
  {name: "Smith, RJ", norm: "smithrj", pref: "Smith, R. J."},
  {name: "Fields, J.T.", norm: "fieldsjt", pref: "Fields, J.T."},
  {name: "Fields, J. T.", norm: "fieldsjt", pref: "Fields, J.T."},
]
expect(result).to eq(expected)

Since:

  • 4.0.0

Instance Method Summary collapse

Constructor Details

#initialize(normfield:, nonnormfield:, target: nil) ⇒ NormWithMostFrequentlyUsedForm

Returns a new instance of NormWithMostFrequentlyUsedForm.

Parameters:

  • normfield (Symbol)

    field in which normalized form is initially found. Will be replaced with most frequently used form, unless :target is given

  • nonnormfield (Symbol)

    field in which non-normalized form of name is found

  • target (nil, Symbol) (defaults to: nil)

    field in which most frequently used form of normalized value will be entered

Since:

  • 4.0.0



101
102
103
104
105
106
107
108
# File 'lib/kiba/extend/transforms/replace/norm_with_most_frequently_used_form.rb', line 101

def initialize(normfield:, nonnormfield:, target: nil)
  @normfield = normfield
  @nonnormfield = nonnormfield
  @target = target || normfield
  @data = {}
  @rows = []
  @lookup = {}
end

Instance Method Details

#closeObject

Since:

  • 4.0.0



116
117
118
119
120
121
122
# File 'lib/kiba/extend/transforms/replace/norm_with_most_frequently_used_form.rb', line 116

def close
  populate_lookup
  rows.each do |row|
    finalize(row)
    yield row
  end
end

#process(row) ⇒ Object

Since:

  • 4.0.0



110
111
112
113
114
# File 'lib/kiba/extend/transforms/replace/norm_with_most_frequently_used_form.rb', line 110

def process(row)
  populate_data(row)
  rows << row
  nil
end