You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MarkupSafe's Markup class, as used in Flask, is very careful about mixing escaped and unescaped strings. For instance, Markup(...).format(...) will ensure all format variables are escaped before being interpolated into the string, and the resulting string is fully escaped.
LazyString does not do this, and the presence of a __html__ method (as previously raised in #121) creates a situation where there is no way to tell whether a string is properly escaped or not:
>>>fromflask_babelimportlazy_gettext>>>frommarkupsafeimportMarkup>>>l=lazy_gettext("This is a <em>string</em> with a {var}")
>>>ll'This is a <em>string</em> with a {var}'>>>Markup(l)
Markup('This is a <em>string</em> with a {var}')
>>>Markup(l.format(var="variable & more"))
Markup('This is a <em>string</em> with a variable & more')
>>>Markup(l).format(var="variable & more")
Markup('This is a <em>string</em> with a variable & more')
When a lazy string has format variables, it must be wrapped in Markup() before calling .format() to make it continue to behave as a HTML string. However, this is dangerous to do in a function that receives the string as parameter. Markup wrapping must happen at source, but that is also not possible in a lazy context as it causes a string evaluation.
Here is a test case showing how gettext, lazy_gettext and Markup all behave differently. As a result, neither translator nor programmer has any indication on whether any given string is plain text or HTML, and every string will need a full integration test to confirm markup and escaping are handled appropriately across translations.
Possible mitigations:
LazyString needs two implementations with and without __html__, for use by the Jinja extension and lazy_gettext functions in HTML and non-HTML contexts respectively. This solves for the programmer but not the translator, who still gets no context on whether it is safe to use common punctuation characters like &, < and >. This may be solved by separating HTML and non-HTML strings into different domains.
LazyString must derive from Markup, or reproduce its functionality to ensure appropriate escaping in all scenarios, and gettext must wrap strings in Markup before returning. This solves for both parties in a consistent way, but commits to all i18n strings being HTML, requiring unescaping before use in non-HTML contexts (JSON APIs, Markdown, etc).
fromflaskimportFlask, Markupfromflask_babelimportBabel, gettext, lazy_gettextimportpytest@pytest.fixture(scope='session')defapp():
returnFlask(__name__)
@pytest.fixture(scope='session')defbabel(app):
returnBabel(app)
@pytest.fixture()defctx(app, babel):
withapp.test_request_context() ascontext:
yieldcontextraw_string="This is a <em>string</em> with a {var}"get_texts= [
pytest.param(lambda: gettext(raw_string), id='str'),
pytest.param(lambda: lazy_gettext(raw_string), id='lazy'),
pytest.param(lambda: Markup(raw_string), id='markup'),
]
@pytest.mark.usefixtures('ctx')@pytest.mark.parametrize('get_text', get_texts)deftest_gettext_type(get_text):
text=get_text().format(var="variable & more")
assertisinstance(text, str)
@pytest.mark.usefixtures('ctx')@pytest.mark.parametrize('get_text', get_texts)deftest_gettext_value(get_text):
text=get_text().format(var="variable & more")
asserttext=="This is a <em>string</em> with a variable & more"@pytest.mark.usefixtures('ctx')@pytest.mark.parametrize('get_text', get_texts)deftest_gettext_html(get_text):
text=get_text().format(var="variable & more")
assert'__html__'intext
Output (with errors interpolated):
FAILED lazystr_test.py::test_gettext_value[str] - AssertionError: assert equals failed
E 'This is a <em>string</em> with a variable & more' 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_value[lazy] - AssertionError: assert equals failed
E 'This is a <em>string</em> with a variable & more' 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[str] - AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
E AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[lazy] - AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
E AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[markup] - AssertionError: assert '__html__' in Markup('This is a <em>string</em> with a variable & more')
E AssertionError: assert '__html__' in Markup('This is a <em>string</em> with a variable & more')
=== 5 failed, 4 passed ===
The text was updated successfully, but these errors were encountered:
MarkupSafe's Markup class, as used in Flask, is very careful about mixing escaped and unescaped strings. For instance,
Markup(...).format(...)
will ensure all format variables are escaped before being interpolated into the string, and the resulting string is fully escaped.LazyString does not do this, and the presence of a
__html__
method (as previously raised in #121) creates a situation where there is no way to tell whether a string is properly escaped or not:When a lazy string has format variables, it must be wrapped in
Markup()
before calling.format()
to make it continue to behave as a HTML string. However, this is dangerous to do in a function that receives the string as parameter. Markup wrapping must happen at source, but that is also not possible in a lazy context as it causes a string evaluation.Here is a test case showing how
gettext
,lazy_gettext
andMarkup
all behave differently. As a result, neither translator nor programmer has any indication on whether any given string is plain text or HTML, and every string will need a full integration test to confirm markup and escaping are handled appropriately across translations.Possible mitigations:
LazyString needs two implementations with and without
__html__
, for useby the Jinja extension andin HTML and non-HTML contexts respectively. This solves for the programmer but not the translator, who still gets no context on whether it is safe to use common punctuation characters likelazy_gettext
functions&
,<
and>
. This may be solved by separating HTML and non-HTML strings into different domains.LazyString must derive from Markup, or reproduce its functionality to ensure appropriate escaping in all scenarios, and
gettext
must wrap strings in Markup before returning. This solves for both parties in a consistent way, but commits to all i18n strings being HTML, requiring unescaping before use in non-HTML contexts (JSON APIs, Markdown, etc).Output (with errors interpolated):
The text was updated successfully, but these errors were encountered: