Class: Kiba::Extend::Transforms::Reshape::SimplePivot

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/reshape/simple_pivot.rb

Overview

Note:

This transformation runs in memory, so it may bog down or crash on extremely large data sources

Note:

This transformation has some pretty strong assumptions and limitations that can be quite destructive, so examine the example below carefully.

Examples

Input table:

| authority | norm    | term        | unrelated |
|-----------+---------+-------------+-----------|
| person    | fred    | Fred Q.     | foo       |
| org       | fred    | Fred, Inc.  | bar       |
| location  | unknown | Unknown     | baz       |
| person    | unknown | Unknown     | fuz       |
| org       | unknown | Unknown     | aaa       |
| work      | book    | Book        | eee       |
| location  | book    |             | zee       |
|           | book    | Book        | squeee    |
| nil       | ghost   | Ghost       | boo       |
| location  |         | Ghost       | zoo       |
| location  | ghost   | nil         | poo       |
| org       | fred    | Fred, Corp. | bar       |
| issues    | nil     | nil         | bah       |

Used in pipeline as:

transform Reshape::SimplePivot,
  field_to_columns: :authority,
  field_to_rows: :norm,
  field_to_col_vals: :term

Results in:

| norm    | person  | org         | location | work | issues |
|---------+---------+-------------+----------+------+--------|
| fred    | Fred Q. | Fred, Corp. | nil      | nil  | nil    |
| unknown | Unknown | Unknown     | Unknown  | nil  | nil    |
| book    | nil     | nil         | nil      | Book | nil    |

NOTE

  • A new column has been created for each unique value in the field_to_columns field
  • A single row has been generated for each unique value in the field_to_rows field
  • The value from the field_to_col_vals field is in the appropriate column
  • When more than one row has the same values for field_to_columns and field_to_rows, the value of the last row processed’s field_to_col_vals will be used (we get Fred, Corp. instead of Fred, Inc.
  • Only data from the three involved fields is kept! Note that the unrelated field from the input has been lost
  • Rows lacking a value for any of the three fields will be skipped, in terms of populating the dynamically created column (see the Ghost examples)
  • However, a dynamically created column will still be created even if it is given no data (See issues example)

Instance Method Summary collapse

Constructor Details

#initialize(field_to_columns:, field_to_rows:, field_to_col_vals:) ⇒ SimplePivot

Returns a new instance of SimplePivot.

Parameters:

  • field_to_columns (Symbol)

    field whose values will generate new columns

  • field_to_rows (Symbol)

    field whose values will generate the result rows (data is collapsed on values in this field)

  • field_to_col_vals (Symbol)

    field whose values get put in the new columns per row



76
77
78
79
80
81
82
# File 'lib/kiba/extend/transforms/reshape/simple_pivot.rb', line 76

def initialize(field_to_columns:, field_to_rows:, field_to_col_vals:)
  @col_field = field_to_columns
  @row_field = field_to_rows
  @col_val_field = field_to_col_vals
  @rows = {}
  @columns = {}
end

Instance Method Details

#closeObject



90
91
92
93
94
95
96
97
98
99
100
# File 'lib/kiba/extend/transforms/reshape/simple_pivot.rb', line 90

def close
  @rows.each do |fieldval, data|
    row = {@row_field => fieldval}
    row = row.merge(data)
    row_fields = row.keys.freeze
    @columns.keys.each do |field|
      row[field] = nil unless row_fields.any?(field)
    end
    yield row
  end
end

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


85
86
87
88
# File 'lib/kiba/extend/transforms/reshape/simple_pivot.rb', line 85

def process(row)
  gather_column_field(row)
  nil
end