LazyString's `html` is an incomplete implementation of MarkupSafe #224

jace · 2023-03-28T08:04:46Z

MarkupSafe's Markup class, as used in Flask, is very careful about mixing escaped and unescaped strings. For instance, Markup(...).format(...) will ensure all format variables are escaped before being interpolated into the string, and the resulting string is fully escaped.

LazyString does not do this, and the presence of a __html__ method (as previously raised in #121) creates a situation where there is no way to tell whether a string is properly escaped or not:

>>> from flask_babel import lazy_gettext
>>> from markupsafe import Markup
>>> l = lazy_gettext("This is a <em>string</em> with a {var}")
>>> l
l'This is a <em>string</em> with a {var}'

>>> Markup(l)
Markup('This is a <em>string</em> with a {var}')

>>> Markup(l.format(var="variable & more"))
Markup('This is a <em>string</em> with a variable & more')

>>> Markup(l).format(var="variable & more")
Markup('This is a <em>string</em> with a variable &amp; more')

When a lazy string has format variables, it must be wrapped in Markup() before calling .format() to make it continue to behave as a HTML string. However, this is dangerous to do in a function that receives the string as parameter. Markup wrapping must happen at source, but that is also not possible in a lazy context as it causes a string evaluation.

Here is a test case showing how gettext, lazy_gettext and Markup all behave differently. As a result, neither translator nor programmer has any indication on whether any given string is plain text or HTML, and every string will need a full integration test to confirm markup and escaping are handled appropriately across translations.

Possible mitigations:

LazyString needs two implementations with and without __html__, for use ~~by the Jinja extension and lazy_gettext functions~~ in HTML and non-HTML contexts respectively. This solves for the programmer but not the translator, who still gets no context on whether it is safe to use common punctuation characters like &, < and >. This may be solved by separating HTML and non-HTML strings into different domains.
LazyString must derive from Markup, or reproduce its functionality to ensure appropriate escaping in all scenarios, and gettext must wrap strings in Markup before returning. This solves for both parties in a consistent way, but commits to all i18n strings being HTML, requiring unescaping before use in non-HTML contexts (JSON APIs, Markdown, etc).

from flask import Flask, Markup
from flask_babel import Babel, gettext, lazy_gettext

import pytest


@pytest.fixture(scope='session')
def app():
    return Flask(__name__)


@pytest.fixture(scope='session')
def babel(app):
    return Babel(app)


@pytest.fixture()
def ctx(app, babel):
    with app.test_request_context() as context:
        yield context


raw_string = "This is a <em>string</em> with a {var}"

get_texts = [
    pytest.param(lambda: gettext(raw_string), id='str'),
    pytest.param(lambda: lazy_gettext(raw_string), id='lazy'),
    pytest.param(lambda: Markup(raw_string), id='markup'),
]


@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_type(get_text):
    text = get_text().format(var="variable & more")
    assert isinstance(text, str)


@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_value(get_text):
    text = get_text().format(var="variable & more")
    assert text == "This is a <em>string</em> with a variable &amp; more"


@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_html(get_text):
    text = get_text().format(var="variable & more")
    assert '__html__' in text

Output (with errors interpolated):

FAILED lazystr_test.py::test_gettext_value[str] - AssertionError: assert equals failed
E         'This is a <em>string</em> with a variable & more'      'This is a <em>string</em> with a variable &amp; more'
FAILED lazystr_test.py::test_gettext_value[lazy] - AssertionError: assert equals failed
E         'This is a <em>string</em> with a variable & more'      'This is a <em>string</em> with a variable &amp; more'
FAILED lazystr_test.py::test_gettext_html[str] - AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
E       AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[lazy] - AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
E       AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[markup] - AssertionError: assert '__html__' in Markup('This is a <em>string</em> with a variable &amp; more')
E       AssertionError: assert '__html__' in Markup('This is a <em>string</em> with a variable &amp; more')
=== 5 failed, 4 passed ===

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LazyString's `html` is an incomplete implementation of MarkupSafe #224

LazyString's `html` is an incomplete implementation of MarkupSafe #224

jace commented Mar 28, 2023 •

edited

Loading

LazyString's __html__ is an incomplete implementation of MarkupSafe #224

LazyString's __html__ is an incomplete implementation of MarkupSafe #224

Comments

jace commented Mar 28, 2023 • edited Loading

LazyString's `html` is an incomplete implementation of MarkupSafe #224

LazyString's `html` is an incomplete implementation of MarkupSafe #224

jace commented Mar 28, 2023 •

edited

Loading