Class: Kiba::Extend::Transforms::Cspace::NormalizeForID

Inherits:
Object
  • Object
show all
Includes:
MultivalPlusDelimDeprecatable
Defined in:
lib/kiba/extend/transforms/cspace/normalize_for_id.rb

Overview

Note:

This class makes use of Utils::StringNormalizer with mode: :cspaceid. See that class for more details and fuller tests of string normalization

Normalizes a string value—typically a value that will become a CSpace authority termdisplayname value—using the same (or as close as possible to the same) algorithm as the CSpace application uses to generate the shortid field in authority records.

This is useful for identifying values that are not exact string matches, but that CSpace may see/treat as duplicates under the hood where it uses the shortid (which is embedded in refName URNs). In preparing data for CSpace migrations, this can prevent creation of terms that cause problems during ingest, or that will later cause warnings/errors if you try to load Objects or Procedures containing those terms.

Examples:

With defaults

# Used in pipeline as:
# transform Cspace::NormalizeForID,
#   source: :place,
#   target: :norm
xform = Cspace::NormalizeForID.new(
  source: :place,
  target: :norm
)
input = [
  {place: 'Table, café'},
  {place: 'Oświęcim (Poland)|Iași, Romania'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {place: 'Table, café', norm: 'tablecafe'},
  {place: 'Oświęcim (Poland)|Iași, Romania',
   norm: 'oswiecimpolandiasiromania'}
]
expect(result).to eq(expected)

With delim

# Used in pipeline as:
# transform Cspace::NormalizeForID,
#   source: :place,
#   target: :norm,
#   delim: '|'
xform = Cspace::NormalizeForID.new(
  source: :place,
  target: :norm,
  delim: '|'
)
input = [
  {place: 'Table, café'},
  {place: 'Oświęcim (Poland)|Iași, Romania'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {place: 'Table, café', norm: 'tablecafe'},
  {place: 'Oświęcim (Poland)|Iași, Romania',
   norm: 'oswiecimpoland|iasiromania'}
]
expect(result).to eq(expected)

See Also:

Instance Method Summary collapse

Methods included from MultivalPlusDelimDeprecatable

#set_multival

Constructor Details

#initialize(source:, target:, multival: omitted = true, delim: nil) ⇒ NormalizeForID

Returns a new instance of NormalizeForID.

Parameters:

  • source (Symbol)

    field whose value will be normalized

  • target (Symbol)

    field to populate with normalized value

  • multival (Boolean) (defaults to: omitted = true)

    DEPRECATED - Do not use

  • delim (nil, String) (defaults to: nil)

    if given triggers treatment as multivalued, and is used to split/join string values



76
77
78
79
80
81
82
83
84
# File 'lib/kiba/extend/transforms/cspace/normalize_for_id.rb', line 76

def initialize(source:, target:, multival: omitted = true, delim: nil)
  @source = source
  @target = target
  @multival = set_multival(multival, omitted, self)
  @delim = delim
  @normalizer = Kiba::Extend::Utils::StringNormalizer.new(
    mode: :cspaceid
  )
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


87
88
89
90
91
92
93
94
# File 'lib/kiba/extend/transforms/cspace/normalize_for_id.rb', line 87

def process(row)
  row[target] = nil
  val = row.fetch(source, nil)
  return row if val.blank?

  row[target] = values(val).map { |val| normalize(val) }.join(delim)
  row
end