Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching to using an instance of the SmarterCSV::Reader class #279

Merged
merged 17 commits into from
Jul 8, 2024
10 changes: 8 additions & 2 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Metrics/BlockNesting:
Metrics/ClassLength:
Enabled: false

Metrics/CyclomaticComplexity: # BS rule
Metrics/CyclomaticComplexity:
Enabled: false

Metrics/MethodLength:
Expand All @@ -34,7 +34,7 @@ Metrics/MethodLength:
Metrics/ModuleLength:
Enabled: false

Metrics/PerceivedComplexity: # BS rule
Metrics/PerceivedComplexity:
Enabled: false

Naming/PredicateName:
Expand All @@ -46,6 +46,9 @@ Naming/VariableName:
Naming/VariableNumber:
Enabled: false

Style/AccessorGrouping: # not needed
Enabled: false

Style/ClassEqualityComparison:
Enabled: false

Expand Down Expand Up @@ -88,6 +91,9 @@ Style/IfInsideElse:
Style/IfUnlessModifier:
Enabled: false

Style/InverseMethods:
Enabled: false

Style/NestedTernaryOperator:
Enabled: false

Expand Down
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,32 @@

# SmarterCSV 1.x Change Log

## 1.12.0 (2024-07-04)
* added SmarterCSV::Reader to process CSV files ([issue #277](https://github.com/tilo/smarter_csv/pull/277))

* POTENTIAL BREAKING CHANGE:

This version replaces `SmarterCSV.process(file_or_input, options, &block)` with:
```
reader = SmarterCSV::Reader.new(file_or_input, options)

# either simple one-liner:
data = reader.process

# or block format:
data = reader.process do
# do something here
end
```
There is some backwards-compatibility support for calling `SmarterCSV.process`,
but it no longer provides access to the internal state, e.g. raw_headers

Please update your code to create an instance of `SmarterCSV::Reader`.

`SmarterCSV.raw_headers` -> `reader.raw_headers`
`SmarterCSV.headers` -> `reader.headers`
...

## 1.11.0 (2024-07-02)
* added SmarterCSV::Writer to output CSV files ([issue #44](https://github.com/tilo/smarter_csv/issues/44))

Expand Down
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,31 @@ When writing CSV data to file, it similarly takes arrays of hashes, and converts

#### BREAKING CHANGES

* Version 1.12.0 has BREAKING CHANGES:

POTENTIAL BREAKING CHANGE:

This version replaces `SmarterCSV.process(file_or_input, options, &block)` with:
```
reader = SmarterCSV::Reader.new(file_or_input, options)

# either simple one-liner:
data = reader.process

# or block format:
data = reader.process do
# do something here
end
```
There is some backwards-compatibility support for calling `SmarterCSV.process`,
but it no longer provides access to the internal state, e.g. raw_headers

Please update your code to create an instance of `SmarterCSV::Reader`.

`SmarterCSV.raw_headers` -> `reader.raw_headers`
`SmarterCSV.headers` -> `reader.headers`
...

* Version 1.10.0 had BREAKING CHANGES:

Changed behavior:
Expand Down
6 changes: 4 additions & 2 deletions ext/smarter_csv/smarter_csv.c
Original file line number Diff line number Diff line change
Expand Up @@ -87,9 +87,11 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
}

VALUE SmarterCSV = Qnil;
VALUE Parser = Qnil;

void Init_smarter_csv(void) {
VALUE SmarterCSV = rb_define_module("SmarterCSV");
SmarterCSV = rb_define_module("SmarterCSV");
Parser = rb_define_module_under(SmarterCSV, "Parser");

rb_define_module_function(SmarterCSV, "parse_csv_line_c", rb_parse_csv_line, 4);
rb_define_module_function(Parser, "parse_csv_line_c", rb_parse_csv_line, 4);
}
45 changes: 41 additions & 4 deletions lib/smarter_csv.rb
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
# frozen_string_literal: true

require "smarter_csv/version"
require "smarter_csv/errors"

require "smarter_csv/file_io"
require "smarter_csv/options_processing"
require "smarter_csv/options"
require "smarter_csv/auto_detection"
require "smarter_csv/variables"
require 'smarter_csv/header_transformations'
require 'smarter_csv/header_validations'
require "smarter_csv/headers"
require "smarter_csv/hash_transformations"
require "smarter_csv/parse"

require "smarter_csv/parser"
require "smarter_csv/writer"
require "smarter_csv/reader"

# load the C-extension:
case RUBY_ENGINE
Expand Down Expand Up @@ -49,4 +52,38 @@
BLOCK_COMMENT
end
# :nocov:
require "smarter_csv/smarter_csv"

module SmarterCSV
# For backwards compatibility only
# while `SmarterCSV.process` works for simple cases, you can't get access to the internal state any longer.
# e.g. you need the instance of the Reader to access the original headers
#
# Please use this instead:
#
# reader = SmarterCSV::Reader.new(input, options)
# reader.process # with or without block
#
def self.process(input, given_options = {}, &block)
reader = Reader.new(input, given_options)
reader.process(&block)
end

# SmarterCSV.generate(filename, options) do |csv_writer|
# MyModel.find_in_batches(batch_size: 100) do |batch|
# batch.pluck(:name, :description, :instructor).each do |record|
# csv_writer << record
# end
# end
# end
#
# rubocop:disable Lint/UnusedMethodArgument
def self.generate(filename, options = {}, &block)
raise unless block_given?

writer = Writer.new(filename, options)
yield writer
ensure
writer.finalize
end
# rubocop:enable Lint/UnusedMethodArgument
end
2 changes: 1 addition & 1 deletion lib/smarter_csv/auto_detection.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module SmarterCSV
class << self
module AutoDetection
protected

# If file has headers, then guesses column separator from headers.
Expand Down
16 changes: 16 additions & 0 deletions lib/smarter_csv/errors.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# frozen_string_literal: true

module SmarterCSV
class Error < StandardError; end # new code should rescue this instead
# Reader:
class SmarterCSVException < Error; end # for backwards compatibility
class HeaderSizeMismatch < SmarterCSVException; end
class IncorrectOption < SmarterCSVException; end
class ValidationError < SmarterCSVException; end
class DuplicateHeaders < SmarterCSVException; end
class MissingKeys < SmarterCSVException; end # previously known as MissingHeaders
class NoColSepDetected < SmarterCSVException; end
class KeyMappingError < SmarterCSVException; end
# Writer:
class InvalidInputData < SmarterCSVException; end
end
2 changes: 1 addition & 1 deletion lib/smarter_csv/file_io.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module SmarterCSV
class << self
module FileIO
protected

def readline_with_counts(filehandle, options)
Expand Down
2 changes: 1 addition & 1 deletion lib/smarter_csv/hash_transformations.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module SmarterCSV
class << self
module HashTransformations
def hash_transformations(hash, options)
# there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
# make sure we delete any key/value pairs from the hash, which the user wanted to delete:
Expand Down
2 changes: 1 addition & 1 deletion lib/smarter_csv/header_transformations.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module SmarterCSV
class << self
module HeaderTransformations
# transform the headers that were in the file:
def header_transformations(header_array, options)
header_array.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
Expand Down
4 changes: 2 additions & 2 deletions lib/smarter_csv/header_validations.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module SmarterCSV
class << self
module HeaderValidations
def header_validations(headers, options)
check_duplicate_headers(headers, options)
check_required_headers(headers, options)
Expand All @@ -26,7 +26,7 @@ def check_required_headers(headers, options)
missing_keys = options[:required_keys].select { |k| !headers_set.include?(k) }

unless missing_keys.empty?
raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}. Check `SmarterCSV.headers` for original headers."
raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}. Check `reader.headers` for original headers."
end
end
end
Expand Down
2 changes: 1 addition & 1 deletion lib/smarter_csv/headers.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module SmarterCSV
class << self
module Headers
def process_headers(filehandle, options)
@raw_header = nil # header as it appears in the file
@headers = nil # the processed headers
Expand Down
Original file line number Diff line number Diff line change
@@ -1,43 +1,51 @@
# frozen_string_literal: true

module SmarterCSV
DEFAULT_OPTIONS = {
acceleration: true,
auto_row_sep_chars: 500,
chunk_size: nil,
col_sep: :auto, # was: ',',
comment_regexp: nil, # was: /\A#/,
convert_values_to_numeric: true,
downcase_header: true,
duplicate_header_suffix: '', # was: nil,
file_encoding: 'utf-8',
force_simple_split: false,
force_utf8: false,
headers_in_file: true,
invalid_byte_sequence: '',
keep_original_headers: false,
key_mapping: nil,
quote_char: '"',
remove_empty_hashes: true,
remove_empty_values: true,
remove_unmapped_keys: false,
remove_values_matching: nil,
remove_zero_values: false,
required_headers: nil,
required_keys: nil,
row_sep: :auto, # was: $/,
silence_missing_keys: false,
skip_lines: nil,
strings_as_keys: false,
strip_chars_from_headers: nil,
strip_whitespace: true,
user_provided_headers: nil,
value_converters: nil,
verbose: false,
with_line_numbers: false,
}.freeze
#
# NOTE: this is not called when "parse" methods are tested by themselves
#
# ONLY FOR BACKWARDS-COMPATIBILITY
def self.default_options
Options::DEFAULT_OPTIONS
end

module Options
DEFAULT_OPTIONS = {
acceleration: true, # if user wants to use accelleration or not
auto_row_sep_chars: 500,
chunk_size: nil,
col_sep: :auto, # was: ',',
comment_regexp: nil, # was: /\A#/,
convert_values_to_numeric: true,
downcase_header: true,
duplicate_header_suffix: '', # was: nil,
file_encoding: 'utf-8',
force_simple_split: false,
force_utf8: false,
headers_in_file: true,
invalid_byte_sequence: '',
keep_original_headers: false,
key_mapping: nil,
quote_char: '"',
remove_empty_hashes: true,
remove_empty_values: true,
remove_unmapped_keys: false,
remove_values_matching: nil,
remove_zero_values: false,
required_headers: nil,
required_keys: nil,
row_sep: :auto, # was: $/,
silence_missing_keys: false,
skip_lines: nil,
strings_as_keys: false,
strip_chars_from_headers: nil,
strip_whitespace: true,
user_provided_headers: nil,
value_converters: nil,
verbose: false,
with_line_numbers: false,
}.freeze

class << self
# NOTE: this is not called when "parse" methods are tested by themselves
def process_options(given_options = {})
puts "User provided options:\n#{pp(given_options)}\n" if given_options[:verbose]
Expand All @@ -53,13 +61,6 @@ def process_options(given_options = {})
@options
end

# NOTE: this is not called when "parse" methods are tested by themselves
#
# ONLY FOR BACKWARDS-COMPATIBILITY
def default_options
DEFAULT_OPTIONS
end

private

def validate_options!(options)
Expand Down
4 changes: 2 additions & 2 deletions lib/smarter_csv/parse.rb → lib/smarter_csv/parser.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module SmarterCSV
class << self
module Parser
protected

###
Expand All @@ -10,7 +10,7 @@ class << self
def parse(line, options, header_size = nil)
# puts "SmarterCSV.parse OPTIONS: #{options[:acceleration]}" if options[:verbose]

if options[:acceleration] && has_acceleration?
if options[:acceleration] && has_acceleration
# :nocov:
has_quotes = line =~ /#{options[:quote_char]}/
elements = parse_csv_line_c(line, options[:col_sep], options[:quote_char], header_size)
Expand Down
Loading
Loading