Class: Kiba::Extend::Transforms::Deduplicate::Flag

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/deduplicate/flag.rb

Overview

Adds a field (in_field) containing ‘y’ or ‘n’, indicating whether value of on_field is a duplicate

The first instance of a value in on_field is always marked n. Subsequent rows containing the same value will be marked ‘y’

Use this transform if you need to retain/report on what will be treated as a duplicate. Use FilterRows::FieldEqualTo to extract only the duplicate rows and/or to keep only the non-duplicate rows.

Use FlagAll if you need all rows containing duplicate values flagged y.

To delete duplicates all in one step, use Table

Input table:

| foo | bar | combined  |
|-----------------------|
| a   | b   | a b       |
| c   | d   | c d       |
| c   | e   | c e       |
| c   | d   | c d       |

Used in pipeline as:

  @deduper = {}
  transform Deduplicate::Flag, on_field: :combined, in_field: :duplicate, using: @deduper

Results in:

| foo | bar | combined | duplicate |
|----------------------------------|
| a   | b   | a b      | n         |
| c   | d   | c d      | n         |
| c   | e   | c e      | n         |
| c   | d   | c d      | y         |

Defined Under Namespace

Classes: NoUsingValueError

Instance Method Summary collapse

Constructor Details

#initialize(on_field:, in_field:, using:, explicit_no: true) ⇒ Flag

Returns a new instance of Flag.

Parameters:

  • on_field (Symbol)

    Field on which to deduplicate

  • in_field (Symbol)

    New field in which to add ‘y’ or ‘n’

  • using (Hash)

    An empty Hash, set as an instance variable in your job definition before you

  • explicit_no (Boolean) (defaults to: true)

    if false, in_field value for non-duplicate is left blank use this transform



59
60
61
62
63
64
65
66
67
68
# File 'lib/kiba/extend/transforms/deduplicate/flag.rb', line 59

def initialize(on_field:, in_field:, using:, explicit_no: true)
  @on = on_field
  @in_field = in_field
  @using = using
  unless @using
    raise NoUsingValueError,
      "#{self.class.name} `using` hash does not exist"
  end
  @no_val = explicit_no ? "n" : ""
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


71
72
73
74
75
76
77
78
79
80
# File 'lib/kiba/extend/transforms/deduplicate/flag.rb', line 71

def process(row)
  val = row.fetch(on)
  if using.key?(val)
    row[in_field] = "y"
  else
    using[val] = nil
    row[in_field] = no_val
  end
  row
end