Class: Kiba::Extend::Transforms::Split::PublicationStatement

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/split/publication_statement.rb

Overview

Splits string value of given field into new :pubplace, :publisher, :pubdate, :manplace, :manufacturer, and :mandate fields. Fieldnames can be overridden.

Splitting is based on expected ISBD punctuation used in the MARC 260 field.

This transform does the best it can, but depends on fiddly punctuation standards that are not always followed, and which are sometimes ambiguous when MARC subfield coding is not present. It is intended for use in preparing data for client review and cleanup.

Algorithm/assumptions:

  • Terminal period is removed from field value pre-processing
  • If field value starts with a digit, the whole field value is treated as a date
  • If field value contains a : or ;, followed by the pattern comma followed by non-comma characters and one or more digits, then everything including and following that pattern is treated as the date value
  • If a field value contains : or ;, we treat the first segment of the value as place
  • If a field value does not contain : or ;, and does not begin with a digit, we treat the whole field value as publisher
  • Any part of the field value in parentheses is extracted separately and checked for whether it follows the above patterns. If so, it is run through processing as manufacturing data, and the non-parenthetical data is run through the processing as publication data. If parenthetical data does not match one of the patterns, it gets included as part of publication place, name, or date field

Usage in jobs

transform Split::PublicationStatement, source: :pubstmt
transform Split::PublicationStatement,
  source: :pubstmt,
  fieldname_overrides: {manufacturer: :printer, mandate: :printdate},
  delim: '%'

Examples:

Default fieldnames, demonstrating parsing/splitting behavior

input = [
  {ps: 'Belfast [i.e. Dublin : s.n.], 1946 [reprinted 1965]'},
  {ps: 'Harmondsworth : Penguin, 1949 (1963 printing)'},
  {ps: 'Wash, D.C. (16 K St., N., Wash 20006) : Wider , 1979 printing, c1975.'},
  {ps: 'American Issue Publishing Company'},
  {ps: 'Chicago : New Voice Press, ©1898.'},
  {ps: 'New York ; Berlin : Springer Verlag, 1977.'},
  {ps: 'Columbus : The League'},
  {ps: '1908-1924.'},
  {ps: 'Paris : Rue ; London : Press, 1955'},
  {ps: 'Chicago, etc. : Time Inc.'},
  {ps: 'Paris : Impr. Vincent, 1798 [i.e. Bruxelles : Moens, 1883]'},
  {ps: 'London : Council, 1976 (Twickenham : CTD Printers, 1974)'}
]
expected = [
  {ps: 'Belfast [i.e. Dublin : s.n.], 1946 [reprinted 1965]',
   pubplace: 'Belfast [i.e. Dublin', publisher: 's.n.]',
   pubdate: '1946 [reprinted 1965]', manplace: nil,
   manufacturer: nil, mandate: nil},
  {ps: 'Harmondsworth : Penguin, 1949 (1963 printing)',
   pubplace: 'Harmondsworth', publisher: 'Penguin',
   pubdate: '1949', manplace: nil, manufacturer: nil,
   mandate: '1963 printing'},
  {ps: 'Wash, D.C. (16 K St., N., Wash 20006) : Wider , 1979 printing, c1975.',
   pubplace: 'Wash, D.C. (16 K St., N., Wash 20006)',
   publisher: 'Wider', pubdate: '1979 printing, c1975',
   manplace: nil, manufacturer: nil,
   mandate: nil},
  {ps: 'American Issue Publishing Company',
   pubplace: nil, publisher: 'American Issue Publishing Company',
   pubdate: nil, manplace: nil, manufacturer: nil, mandate: nil},
  {ps: 'Chicago : New Voice Press, ©1898.',
   pubplace: 'Chicago', publisher: 'New Voice Press',
   pubdate: '©1898', manplace: nil, manufacturer: nil,
   mandate: nil},
  {ps: 'New York ; Berlin : Springer Verlag, 1977.',
   pubplace: 'New York|Berlin', publisher: 'Springer Verlag',
   pubdate: '1977', manplace: nil, manufacturer: nil,
   mandate: nil},
  {ps: 'Columbus : The League',
   pubplace: 'Columbus', publisher: 'The League',
   pubdate: nil, manplace: nil, manufacturer: nil,
   mandate: nil},
  {ps: '1908-1924.',
   pubplace: nil, publisher: nil, pubdate: '1908-1924',
   manplace: nil, manufacturer: nil, mandate: nil},
  {ps: 'Paris : Rue ; London : Press, 1955',
   pubplace: 'Paris|London', publisher: 'Rue|Press', pubdate: '1955',
   manplace: nil, manufacturer: nil, mandate: nil},
 # NOTE: Terminal period from publisher name removed, but we don't
 #   typically expect abbreviations on the end of this field
  {ps: 'Chicago, etc. : Time Inc.',
   pubplace: 'Chicago, etc.', publisher: 'Time Inc', pubdate: nil,
   manplace: nil, manufacturer: nil, mandate: nil},
 # NOTE: This is handled as best we can do without MARC subfields
  {ps: 'Paris : Impr. Vincent, 1798 [i.e. Bruxelles : Moens, 1883]',
   pubplace: 'Paris', publisher: 'Impr. Vincent',
   pubdate: '1798 [i.e. Bruxelles : Moens, 1883]',
   manplace: nil, manufacturer: nil, mandate: nil},
  {ps: 'London : Council, 1976 (Twickenham : CTD Printers, 1974)',
   pubplace: 'London', publisher: 'Council', pubdate: '1976',
   manplace: 'Twickenham' , manufacturer: 'CTD Printers',
   mandate: '1974'}
]
xform = Split::PublicationStatement.new(source: :ps)
result = input.map{ |row| xform.process(row) }
expect(result).to eq(expected)

Overriding fieldnames

input = [
  {ps: 'London : Council, 1976 (Twickenham : CTD Printers, 1974)'}
]
expected = [
  {ps: 'London : Council, 1976 (Twickenham : CTD Printers, 1974)',
   pubplace: 'London', publisher: 'Council', pubdate: '1976',
   prtplace: 'Twickenham' , printer: 'CTD Printers',
   prtdate: '1974'}
]
xform = Split::PublicationStatement.new(
  source: :ps,
  fieldname_overrides: {manplace: :prtplace, manufacturer: :printer,
    mandate: :prtdate}
)
result = input.map{ |row| xform.process(row) }
expect(result).to eq(expected)

Since:

  • 4.0.0

Constant Summary collapse

DEFAULT_FIELDNAMES =

Since:

  • 4.0.0

%i[pubplace publisher pubdate
manplace manufacturer mandate]

Instance Method Summary collapse

Constructor Details

#initialize(source:, fieldname_overrides: nil, delim: Kiba::Extend.delim) ⇒ PublicationStatement

Returns a new instance of PublicationStatement.

Parameters:

  • source (Symbol)

    field containing publication statement to split

  • fieldname_overrides (nil, Hash<Symbol=>Symbol>) (defaults to: nil)

    with default field name to override as key, and new field name as value

  • delim (String) (defaults to: Kiba::Extend.delim)

    for joining multiple values in a given target field

Since:

  • 4.0.0



152
153
154
155
156
157
158
# File 'lib/kiba/extend/transforms/split/publication_statement.rb', line 152

def initialize(source:,
  fieldname_overrides: nil,
  delim: Kiba::Extend.delim)
  @source = source
  @fieldnames = setup_fieldnames(fieldname_overrides)
  @delim = delim
end

Instance Method Details

#process(row) ⇒ Hash{ Symbol => String, nil }

Parameters:

  • row (Hash{ Symbol => String, nil })

Returns:

  • (Hash{ Symbol => String, nil })

Since:

  • 4.0.0



162
163
164
165
166
167
168
# File 'lib/kiba/extend/transforms/split/publication_statement.rb', line 162

def process(row)
  add_all_fields(row)
  val = row[source]
  return row if val.blank?

  extract_handler(row, scanner: initial_clean(val))
end