Class: Kiba::Extend::Utils::StringNormalizer

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/utils/string_normalizer.rb

Overview

Normalizes the given string according to the given parameters.

Can be used two ways. Preferred method when using in a transform or other context when the same normalization settings will be used to normalize many strings:

  # first initialize an instance of the class as an instance variable in
  #   your context
  @normalizer = StringNormalizer.new(downcased: false)

  # for the repetitive part:
  vals.each{ |val| @normalizer.call(val) }

For one-off usage, or where the normalization settings vary per normalized value, you can do:

StringNormalizer.call(downcased: false, str: 'Table, café')
  => 'Tablecafe'

The second way is much less performant, as it initializes a new instance of the class every time it is called.

Examples:

Default settings

util = Kiba::Extend::Utils::StringNormalizer.new
input = [
  'Oświęcim (Poland)',
  'Oswiecim, Poland',
  'Iași, Romania',
  'Iasi, Romania',
  'Table, café',
  '1,001 Arabian Nights',
  "foo\n\nbar"
]
expected = [
 'oswiecimpoland',
 'oswiecimpoland',
 'iairomania',
 'iasiromania',
 'tablecafe',
 '1001arabiannights',
 'foobar'
]
results = input.map{ |str| util.call(str) }
expect(results).to eq(expected)

downcased = false

util = Kiba::Extend::Utils::StringNormalizer.new(downcased: false)
input = [
  'Oświęcim (Poland)',
  'Oswiecim, Poland',
  'Iași, Romania',
  'Iasi, Romania',
  'Table, café',
  '1,001 Arabian Nights',
  "foo\n\nbar"
]
expected = [
 'OswiecimPoland',
 'OswiecimPoland',
 'IaiRomania',
 'IasiRomania',
 'Tablecafe',
 '1001ArabianNights',
 'foobar'
]
results = input.map{ |str| util.call(str) }
expect(results).to eq(expected)

:cspaceid mode

util = Kiba::Extend::Utils::StringNormalizer.new(mode: :cspaceid)
input = [
  'Oświęcim (Poland)',
  'Oswiecim, Poland',
  'Iași, Romania',
  'Iasi, Romania'
]
expected = [
 'oswiecimpoland',
 'oswiecimpoland',
 'iasiromania',
 'iasiromania'
]
results = input.map{ |str| util.call(str) }
expect(results).to eq(expected)

Since:

  • 3.3.0

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(mode: :plain, downcased: true) ⇒ StringNormalizer

Returns a new instance of StringNormalizer.

Parameters:

  • mode (:plain, :cspaceid) (defaults to: :plain)

    :plain does no find/replace before transliterating. :cspaceid does, due to characters it is known to handle weirdly internally

  • downcased (Boolean) (defaults to: true)

    whether to downcase result

Since:

  • 3.3.0



110
111
112
113
114
# File 'lib/kiba/extend/utils/string_normalizer.rb', line 110

def initialize(mode: :plain, downcased: true)
  @mode = mode
  @downcased = downcased
  @subs = set_subs
end

Class Method Details

.call(str:, mode: :plain, downcased: true) ⇒ Object

Parameters:

  • mode (:plain, :cspaceid) (defaults to: :plain)

    :plain does no find/replace before transliterating. :cspaceid does, due to characters it is known to handle weirdly internally

  • downcased (Boolean) (defaults to: true)

    whether to downcase result

  • str (String)

    to normalize

Since:

  • 3.3.0



101
102
103
# File 'lib/kiba/extend/utils/string_normalizer.rb', line 101

def call(str:, mode: :plain, downcased: true)
  new(mode: mode, downcased: downcased).call(str)
end

Instance Method Details

#call(val) ⇒ Object

Since:

  • 3.3.0



116
117
118
119
120
121
122
123
124
125
# File 'lib/kiba/extend/utils/string_normalizer.rb', line 116

def call(val)
  unless val.unicode_normalized?(:nfkc)
    val = val.unicode_normalize(:nfkc)
  end
  subs.each { |old, new| val = val.gsub(old, new) }

  val = ActiveSupport::Inflector.transliterate(val).gsub(/\W/, "")

  downcased ? val.downcase : val
end