Class: Kiba::Extend::Transforms::Deduplicate::FlagAll

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/deduplicate/flag_all.rb

Overview

Adds a field (specified as in_field) containing ‘y’ or ‘n’, indicating whether value of on_field is a duplicate

In contrast with Flag, where the first instance of a value in on_field is always marked n, with FlagAll, all rows containing a duplicate value in on_field are marked y.

Input table:

| foo | bar | combined  |
|-----------------------|
| a   | b   | a b       |
| c   | d   | c d       |
| c   | e   | c e       |
| c   | d   | c d       |

Used in pipeline as:

  @deduper = {}
  transform Deduplicate::FlagAll, on_field: :combined, in_field: :duplicate

Results in:

| foo | bar | combined | duplicate |
|----------------------------------|
| a   | b   | a b      | n         |
| c   | d   | c d      | y         |
| c   | e   | c e      | n         |
| c   | d   | c d      | y         |

Since:

  • 2.9.0

Instance Method Summary collapse

Constructor Details

#initialize(on_field:, in_field:, explicit_no: true) ⇒ FlagAll

Returns a new instance of FlagAll.

Parameters:

  • on_field (Symbol)

    Field on which to deduplicate

  • in_field (Symbol)

    New field in which to add ‘y’ or ‘n’

  • explicit_no (Boolean) (defaults to: true)

    if false, in_field value for non-duplicate is left blank use this transform

Since:

  • 2.9.0



52
53
54
55
56
57
58
# File 'lib/kiba/extend/transforms/deduplicate/flag_all.rb', line 52

def initialize(on_field:, in_field:, explicit_no: true)
  @on = on_field
  @in_field = in_field
  @deduper = {}
  @no_val = explicit_no ? "n" : ""
  @rows = []
end

Instance Method Details

#closeObject

Since:

  • 2.9.0



68
69
70
71
72
73
74
# File 'lib/kiba/extend/transforms/deduplicate/flag_all.rb', line 68

def close
  @rows.each do |row|
    val = row[on]
    row[in_field] = (deduper[val] > 1) ? "y" : no_val
    yield row
  end
end

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })

Since:

  • 2.9.0



61
62
63
64
65
66
# File 'lib/kiba/extend/transforms/deduplicate/flag_all.rb', line 61

def process(row)
  val = row[on]
  deduper.key?(val) ? deduper[val] += 1 : deduper[val] = 1
  rows << row
  nil
end