Class: Kiba::Extend::Transforms::Split::IntoMultipleColumns

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/split/into_multiple_columns.rb

Overview

Splits field into multiple fields, based on sep New columns use the original field name, and add number to end (:field0, :field1, etc)

Example 1

Input table:

| summary    |
|------------|
| a:b        |
| c          |
| :d         |

Used in pipeline as:

transform Split::IntoMultipleColumns, field: :summary, sep: ':', max_segments: 2

Results in:

| summary0 | summary1   |
|-----------------------|
| a        | b          |
| c        | nil        |
|          | d          |

Example 2

Input table:

| summary    |
|------------|
| a:b:c:d:e  |
| f:g        |
|            |
| nil        |

Used in pipeline as:

transform Split::IntoMultipleColumns, field: :summary, sep: ':', max_segments: 3,
  collapse_on: :left, warnfield: :warnme

Results in:

| summary0 | summary1 | summary2 | warnme                                                |
|----------------------------------------------------------------------------------------|
| a:b:c    | d        | e        | max_segments less than total number of split segments |
| f        | g        | nil      | nil                                                   |
|          | nil      | nil      | nil                                                   |
| nil      | nil      | nil      | nil                                                   |

Used in pipeline as:

transform Split::IntoMultipleColumns, field: :summary, sep: ':', max_segments: 3,
  collapse_on: :right

Results in:

| summary0 | summary1 | summary2 |
|--------------------------------|
| a        | b        | c:d:e    |
| f        | g        | nil      |
|          | nil      | nil      |
| nil      | nil      | nil      |

Instance Method Summary collapse

Constructor Details

#initialize(field:, sep:, max_segments:, delete_source: true, collapse_on: :right, warnfield: nil) ⇒ IntoMultipleColumns

Note:

Since 2.0.0, the max_segments parameter is required. This is due to the row-by-row way in which Kiba processes data. When processing one row that would be split into 2 columns, the processor has no way of knowing that another row in the source should be split into 10 columns and thus it creates rows with different numbers of fields.

Returns a new instance of IntoMultipleColumns.

Parameters:

  • field (Symbol)

    Name of field to split

  • sep (String)

    Character(s) on which to split the field value

  • delete_source (Boolean) (defaults to: true)

    Whether to delete field after splitting it into new columns

  • max_segments (Integer)

    Specification of the maximum number of segments to split field value into (i.e. max number of columns to create from this one column).

  • collapse_on (:right, :left) (defaults to: :right)

    Which end of the split array to join remaining split values if there are more than max_segments

  • warnfield (Symbol) (defaults to: nil)

    Name of field in which to put any warning/error(s) for a row



99
100
101
102
103
104
105
106
107
108
109
# File 'lib/kiba/extend/transforms/split/into_multiple_columns.rb', line 99

def initialize(field:, sep:, max_segments:, delete_source: true, collapse_on: :right,
  warnfield: nil)
  @field = field
  @sep = sep
  @del = delete_source
  @max = max_segments
  @collapser = method("process_#{collapse_on}_collapse")
  @warn = !warnfield.blank?
  @warnfield = warnfield || :warning
  @warnvalue = "max_segments less than total number of split segments"
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


112
113
114
115
116
117
# File 'lib/kiba/extend/transforms/split/into_multiple_columns.rb', line 112

def process(row)
  add_new_fields(row)
  do_split(row)
  clean_up_fields(row)
  row
end