Skip to content

Commit

Permalink
Merge pull request #408 from gjtorikian/why-double
Browse files Browse the repository at this point in the history
Remove superfluous sanitization
  • Loading branch information
gjtorikian authored Jul 16, 2024
2 parents 93b8a78 + a6d7c0a commit 9dd46be
Show file tree
Hide file tree
Showing 9 changed files with 26 additions and 21 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,9 +171,9 @@ The `ConvertFilter` takes text and turns it into HTML. `@text`, `@config`, and `

### Sanitization

Because the web can be a scary place, HTML is automatically sanitized after the `ConvertFilter` runs and before the `NodeFilter`s are processed. This is to prevent malicious or unexpected input from entering the pipeline.
Because the web can be a scary place, **HTML is automatically sanitized** after the `ConvertFilter` runs and before the `NodeFilter`s are processed. This is to prevent malicious or unexpected input from entering the pipeline.

The sanitization process takes a hash configuration of settings. See the [Selma](https://www.github.com/gjtorikian/selma) documentation for more information on how to configure these settings.
The sanitization process takes a hash configuration of settings. See the [Selma](https://www.github.com/gjtorikian/selma) documentation for more information on how to configure these settings. Note that users must correctly configure the sanitization configuration if they expect to use it correctly in conjunction with handlers which manipulate HTML.

A default sanitization config is provided by this library (`HTMLPipeline::SanitizationFilter::DEFAULT_CONFIG`). A sample custom sanitization allowlist might look like this:

Expand Down Expand Up @@ -224,7 +224,7 @@ For more examples of customizing the sanitization process to include the tags yo

`NodeFilters`s can operate either on HTML elements or text nodes using CSS selectors. Each `NodeFilter` must define a method named `selector` which provides an instance of `Selma::Selector`. If elements are being manipulated, `handle_element` must be defined, taking one argument, `element`; if text nodes are being manipulated, `handle_text_chunk` must be defined, taking one argument, `text_chunk`. `@config`, and `@result` are available to use, and any changes made to these ivars are passed on to the next filter.

`NodeFilter` also has an optional method, `after_initialize`, which is run after the filter initializes. This can be useful in setting up a custom state for `result` to take advantage of.
`NodeFilter` also has an optional method, `after_initialize`, which is run after the filter initializes. This can be useful in setting up a fresh custom state for `result` to start from each time the pipeline is called.

Here's an example `NodeFilter` that adds a base url to images that are root relative:

Expand Down
2 changes: 1 addition & 1 deletion html-pipeline.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Gem::Specification.new do |gem|
"rubygems_mfa_required" => "true",
}

gem.add_dependency("selma", "~> 0.1")
gem.add_dependency("selma", "~> 0.4")
gem.add_dependency("zeitwerk", "~> 2.5")

gem.post_install_message = <<~MSG
Expand Down
19 changes: 12 additions & 7 deletions lib/html_pipeline.rb
Original file line number Diff line number Diff line change
Expand Up @@ -175,11 +175,20 @@ def call(text, context: {}, result: {})
end
end

unless @node_filters.empty?
rewriter_options = {
memory: {
max_allowed_memory_usage: 5242880, # arbitrary limit of 5MB
},
}

if @node_filters.empty?
instrument("sanitization.html_pipeline", payload) do
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters, options: rewriter_options).rewrite(html)
end unless @convert_filter.nil? # no html, so no sanitization
else
instrument("call_node_filters.html_pipeline", payload) do
@node_filters.each { |filter| filter.context = (filter.context || {}).merge(context) }
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
html = result[:output]
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters, options: rewriter_options).rewrite(html)
payload = default_payload({
node_filters: @node_filters.map { |f| f.class.name },
context: context,
Expand All @@ -188,10 +197,6 @@ def call(text, context: {}, result: {})
end
end

instrument("sanitization.html_pipeline", payload) do
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
end

result = result.merge(@node_filters.collect(&:result).reduce({}, :merge))
@node_filters.each(&:reset!)

Expand Down
2 changes: 1 addition & 1 deletion lib/html_pipeline/convert_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ class ConvertFilter < Filter
attr_reader :text, :html

def initialize(context: {}, result: {})
super(context: context, result: result)
super
end

class << self
Expand Down
2 changes: 1 addition & 1 deletion lib/html_pipeline/convert_filter/markdown_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ class ConvertFilter < Filter
# :markdown[:extensions] => Commonmarker extensions options
class MarkdownFilter < ConvertFilter
def initialize(context: {}, result: {})
super(context: context, result: result)
super
end

# Convert Commonmark to HTML using the best available implementation.
Expand Down
2 changes: 1 addition & 1 deletion lib/html_pipeline/node_filter/syntax_highlight_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ class NodeFilter
# This filter does not write any additional information to the context hash.
class SyntaxHighlightFilter < NodeFilter
def initialize(context: {}, result: {})
super(context: context, result: result)
super
# TODO: test the optionality of this
@formatter = context[:formatter] || Rouge::Formatters::HTML.new
end
Expand Down
2 changes: 1 addition & 1 deletion lib/html_pipeline/text_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ class TextFilter < Filter
attr_reader :text

def initialize(context: {}, result: {})
super(context: context, result: result)
super
end

class << self
Expand Down
2 changes: 1 addition & 1 deletion lib/html_pipeline/version.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# frozen_string_literal: true

class HTMLPipeline
VERSION = "3.2.0"
VERSION = "3.2.1"
end
10 changes: 5 additions & 5 deletions test/html_pipeline_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ def test_context_is_carried_over_in_call
# - yeH is bolded
# - strikethroughs are rendered
# - mentions are not linked
assert_equal("<p><strong>yeH</strong>! I <em>think</em> <a href=\"/gjtorikian\">@gjtorikian</a> is <del>great</del>!</p>", result)
assert_equal("<p><strong>yeH</strong>! I <em>think</em> <a href=\"/gjtorikian\" class=\"user-mention\">@gjtorikian</a> is <del>great</del>!</p>", result)

context = {
bolded: false,
Expand All @@ -144,7 +144,7 @@ def test_context_is_carried_over_in_call
# - yeH is not bolded
# - strikethroughs are not rendered
# - mentions are linked
assert_equal("<p>yeH! I <em>think</em> <a href=\"http://your-domain.com/gjtorikian\">@gjtorikian</a> is ~great~!</p>", result_with_context)
assert_equal("<p>yeH! I <em>think</em> <a href=\"http://your-domain.com/gjtorikian\" class=\"user-mention\">@gjtorikian</a> is ~great~!</p>", result_with_context)
end

def test_text_filter_instance_context_is_carried_over_in_call
Expand All @@ -160,7 +160,7 @@ def test_text_filter_instance_context_is_carried_over_in_call

# note:
# - yeH is not bolded due to previous context
assert_equal("<p>yeH! I <em>think</em> <a href=\"/gjtorikian\">@gjtorikian</a> is <del>great</del>!</p>", result)
assert_equal("<p>yeH! I <em>think</em> <a href=\"/gjtorikian\" class=\"user-mention\">@gjtorikian</a> is <del>great</del>!</p>", result)
end

def test_convert_filter_instance_context_is_carried_over_in_call
Expand All @@ -176,7 +176,7 @@ def test_convert_filter_instance_context_is_carried_over_in_call

# note:
# - strikethroughs are not rendered due to previous context
assert_equal("<p><strong>yeH</strong>! I <em>think</em> <a href=\"/gjtorikian\">@gjtorikian</a> is <del>great</del>!</p>", result)
assert_equal("<p><strong>yeH</strong>! I <em>think</em> <a href=\"/gjtorikian\" class=\"user-mention\">@gjtorikian</a> is <del>great</del>!</p>", result)
end

def test_node_filter_instance_context_is_carried_over_in_call
Expand All @@ -192,6 +192,6 @@ def test_node_filter_instance_context_is_carried_over_in_call

# note:
# - mentions are linked
assert_equal("<p><strong>yeH</strong>! I <em>think</em> <a href=\"http://your-domain.com/gjtorikian\">@gjtorikian</a> is <del>great</del>!</p>", result)
assert_equal("<p><strong>yeH</strong>! I <em>think</em> <a href=\"http://your-domain.com/gjtorikian\" class=\"user-mention\">@gjtorikian</a> is <del>great</del>!</p>", result)
end
end

0 comments on commit 9dd46be

Please sign in to comment.