Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically choose preposition-article contraction in Portuguese #294

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ All notable changes to this project will be documented in this file. For change

- Update Japanese localization, add named intersections. [#290](https://github.com/Project-OSRM/osrm-text-instructions/pull/290)
- Corrected various Portuguese translations. [#283](https://github.com/Project-OSRM/osrm-text-instructions/pull/283)
- Added Portuguese abbreviations and grammar to choose the right preposition-article contraction before certain names. [#294](https://github.com/Project-OSRM/osrm-text-instructions/pull/294)

## 0.13.4 2019-09-16

Expand Down
6 changes: 6 additions & 0 deletions languages.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ var instructionsZhHans = require('./languages/translations/zh-Hans.json');
var grammarDa = require('./languages/grammar/da.json');
var grammarFr = require('./languages/grammar/fr.json');
var grammarHu = require('./languages/grammar/hu.json');
var grammarPt = require('./languages/grammar/pt.json');
var grammarRu = require('./languages/grammar/ru.json');

// Load all abbreviations files
Expand All @@ -50,6 +51,7 @@ var abbreviationsHu = require('./languages/abbreviations/hu.json');
var abbreviationsLt = require('./languages/abbreviations/lt.json');
var abbreviationsNl = require('./languages/abbreviations/nl.json');
var abbreviationsRu = require('./languages/abbreviations/ru.json');
var abbreviationsPt = require('./languages/abbreviations/pt.json');
var abbreviationsSl = require('./languages/abbreviations/sl.json');
var abbreviationsSv = require('./languages/abbreviations/sv.json');
var abbreviationsUk = require('./languages/abbreviations/uk.json');
Expand Down Expand Up @@ -94,6 +96,8 @@ var grammars = {
'da': grammarDa,
'fr': grammarFr,
'hu': grammarHu,
'pt-BR': grammarPt,
'pt-PT': grammarPt,
'ru': grammarRu
};

Expand All @@ -110,6 +114,8 @@ var abbreviations = {
'hu': abbreviationsHu,
'lt': abbreviationsLt,
'nl': abbreviationsNl,
'pt-BR': abbreviationsPt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe also item for "generic" Portuguese 'pt': abbreviationsPt,?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR copies the abbreviations object to two keys for consistency with the translation files, but the client is expected to perform some sort of locale matching. Even if we were to set pt, the environment’s locale may be something else like pt-AO or even pt-US. There are plenty of libraries that can perform locale matching, such as locale-utils.

I’m not necessarily opposed to setting the language-only locale, but I think we’d want to do so consistently for all languages and resource types, and I’m not sure that would be feasible for a situation like zh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't see the problem with this grammar use for all Portuguese dialects - grammar expressions will match only Portuguese street names even if pt-US will be used in US with English names. Or there is a difference in Portuguese articles usage inpt-BR and pt-US?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think abbreviations and grammar rules should be available regardless of the specific Portuguese locale in use, but it should be up to the client to choose a country to default to. For now the two locales share the same abbreviations and grammar rules, but that wouldn’t necessarily be the case in the future for all languages, so I’d be hesitant to create an expectation that clients can look up grammars without performing locale matching first, which they have to do when getting a translated instruction.

If it’s a major inconvenience for clients to perform locale matching themselves, then we could have this library depend on locale-utils, but there are larger libraries with more robust locale matching and I wouldn’t want to force clients to use the more rudimentary logic in locale-utils.

'pt-PT': abbreviationsPt,
'ru': abbreviationsRu,
'sl': abbreviationsSl,
'sv': abbreviationsSv,
Expand Down
31 changes: 31 additions & 0 deletions languages/abbreviations/pt.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"abbreviations": {
"quadra": "Qd",
"vila": "Vil"
},
"classifications": {
"avenida": "Av",
"caminho": "Cam",
"estrada": "Estr",
"rua": "R",
"travessa": "Tv"
},
"directions": {
"lés-nordeste": "ENE",
"nordeste": "NE",
"oeste": "O",
"sudeste": "SE",
"lés-sudeste": "ESE",
"nor-nordeste": "NNE",
"sul": "S",
"nor-noroeste": "NNO",
"noroeste": "NO",
"norte": "N",
"oés-sudoeste": "OSO",
"oés-noroeste": "ONO",
"sudoeste": "SO",
"sul-sudeste": "SSE",
"su-sudoeste": "SSO",
"leste": "L"
}
}
48 changes: 48 additions & 0 deletions languages/grammar/pt.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"meta": {
"regExpFlags": "gi"
},
"v5": {
"em": [
["^ A(s?) ", " na$1 "],
["^ O(s?) ", " no$1 "],

["^ (Marginal|Parte|Passagem|Ponte|Radial|Rede|Servid[aã]o) ", " na $1 "],
["^ (Anel|Bosque|Boulevard|Cais|Canal|Caracol|Casal|Catete|Coronel|Duque|Fim|General|Impasse|Jardim|Juiz|Lote|Marechal|Mirante|Oriente|Parque|Professor|Setor|Tenente|Terminal|T[uú]nel) ", " no $1 "],

["^ ([^\\- ]+dade)(s?) ", " na$2 $1$2 "],
["^ ([^\\- ]+z) ", " na $1 "],
["^ ([^\\- ]+z)es ", " nas $1es "],
["^ ([^\\- ]+[cçz][aã]o) ", " na $1 "],
["^ ([^\\- ]+[cçz][oõ]es) ", " nas $1 "],

["^ ([^\\- ]+dor) ", " no $1 "],
["^ ([^\\- ]+dor)es ", " nos $1es "],

["^ (?!n[ao]s? )([^\\- ]+)([ao]s?) ", " n$2 $1$2 "],

["^ (?!n[ao]s? )(\\S)", " em $1"]
],
"a": [
["^ A(s?) ", " à$1 "],
["^ O(s?) ", " ao$1 "],

["^ (Marginal|Parte|Passagem|Ponte|Radial|Rede|Servid[aã]o) ", " à $1 "],
["^ (Anel|Bosque|Boulevard|Cais|Canal|Caracol|Casal|Catete|Coronel|Duque|Fim|General|Impasse|Jardim|Juiz|Lote|Marechal|Mirante|Oriente|Parque|Professor|Setor|Tenente|Terminal|T[uú]nel) ", " ao $1 "],

["^ ([^\\- ]+dade)(s?) ", " à$2 $1$2 "],
["^ ([^\\- ]+z) ", " à $1 "],
["^ ([^\\- ]+z)es ", " às $1es "],
["^ ([^\\- ]+[cçz][aã]o) ", " à $1 "],
["^ ([^\\- ]+[cçz][oõ]es) ", " às $1 "],

["^ ([^\\- ]+dor) ", " ao $1 "],
["^ ([^\\- ]+dor)es ", " aos $1es "],

["^ ([^\\- ]+)a(s?) ", " à$2 $1a$2 "],
["^ (?!aos? )([^\\- ]+)o(s?) ", " ao$2 $1o$2 "],

["^ (?!aos? |às? )(\\S)", " a $1"]
]
}
}
34 changes: 34 additions & 0 deletions languages/overrides/pt.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// Add grammar option to {way_name} depending on phrase context

var replaces = [
[' (n[ao] +)?\{(destination|junction_name|way_name|waypoint_name)\}', ' {$1:em}'], // eslint-disable-line no-useless-escape
[' (ao +|à +)?\{(destination|junction_name|way_name|waypoint_name)\}', ' {$1:a}'] // eslint-disable-line no-useless-escape
];

function optionize(phrase) {
var result = phrase;
replaces.forEach(function(pattern) {
var re = new RegExp(pattern[0], 'gi');
result = result.replace(re, pattern[1]);
});

return result;
}

function iterate(values) {
Object.keys(values).forEach(function (key) {
var value = values[key];
if (typeof value === 'string') {
values[key] = optionize(value);
} else if (typeof value === 'object') {
iterate(value);
}
});
}

module.exports = function(content) {
// Iterate all content string values recursively
iterate(content.v5);

return content;
};
6 changes: 6 additions & 0 deletions scripts/transifex.js
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ languages.supportedCodes.forEach((code) => {
var override = `${__dirname}/../languages/overrides/${code}.js`;
if (fs.existsSync(override)) {
content = require(override)(content);
} else {
var language = code.split('-')[0];
override = `${__dirname}/../languages/overrides/${language}.js`;
if (fs.existsSync(override)) {
content = require(override)(content);
}
}

// Write language file
Expand Down
88 changes: 88 additions & 0 deletions test/grammar_test.js
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,94 @@ const grammarTests = {
['Приморское шоссе', 'dative', 'Приморскому шоссе'],
['Приморское шоссе', 'genitive', 'Приморского шоссе'],
['Приморское шоссе', 'prepositional', 'Приморском шоссе']
],
'pt-PT': [
['Anel Viário Prefeito Pedro Ernesto', 'a', 'ao Anel Viário Prefeito Pedro Ernesto'],
['Anel Viário Prefeito Pedro Ernesto', 'em', 'no Anel Viário Prefeito Pedro Ernesto'],
['Boqueirao da Ponte da Lama', 'a', 'ao Boqueirao da Ponte da Lama'],
['Boqueirao da Ponte da Lama', 'em', 'no Boqueirao da Ponte da Lama'],
['Boqueirão da Ponte da Lama', 'a', 'ao Boqueirão da Ponte da Lama'],
['Boqueirão da Ponte da Lama', 'em', 'no Boqueirão da Ponte da Lama'],
['Bosque Marapendi Expresso', 'a', 'ao Bosque Marapendi Expresso'],
['Bosque Marapendi Expresso', 'em', 'no Bosque Marapendi Expresso'],
['Boulevard 28 de Setembro', 'a', 'ao Boulevard 28 de Setembro'],
['Boulevard 28 de Setembro', 'em', 'no Boulevard 28 de Setembro'],
['Cais de Costas', 'a', 'ao Cais de Costas'],
['Cais de Costas', 'em', 'no Cais de Costas'],
['Canal do Cunha', 'a', 'ao Canal do Cunha'],
['Canal do Cunha', 'em', 'no Canal do Cunha'],
['Circular Norte do Bairro da Encarnação', 'a', 'a Circular Norte do Bairro da Encarnação'],
['Circular Norte do Bairro da Encarnação', 'em', 'em Circular Norte do Bairro da Encarnação'],
['Comunidade do Quiabo', 'a', 'à Comunidade do Quiabo'],
['Comunidade do Quiabo', 'em', 'na Comunidade do Quiabo'],
['Coronel Correa Lima', 'a', 'ao Coronel Correa Lima'],
['Coronel Correa Lima', 'em', 'no Coronel Correa Lima'],
['Corredor Verde Monsanto-Parque Eduardo VII', 'a', 'ao Corredor Verde Monsanto-Parque Eduardo VII'],
['Corredor Verde Monsanto-Parque Eduardo VII', 'em', 'no Corredor Verde Monsanto-Parque Eduardo VII'],
['Cruz das Oliveiras', 'a', 'à Cruz das Oliveiras'],
['Cruz das Oliveiras', 'em', 'na Cruz das Oliveiras'],
['Cruzes da Sé', 'a', 'às Cruzes da Sé'],
['Cruzes da Sé', 'em', 'nas Cruzes da Sé'],
['Duque de Loulé', 'a', 'ao Duque de Loulé'],
['Duque de Loulé', 'em', 'no Duque de Loulé'],
['Elevador de Santa Justa', 'a', 'ao Elevador de Santa Justa'],
['Elevador de Santa Justa', 'em', 'no Elevador de Santa Justa'],
['Fim da Trilha do Pico da Tijuca', 'a', 'ao Fim da Trilha do Pico da Tijuca'],
['Fim da Trilha do Pico da Tijuca', 'em', 'no Fim da Trilha do Pico da Tijuca'],
['Impasse Rua Padre Américo', 'a', 'ao Impasse Rua Padre Américo'],
['Impasse Rua Padre Américo', 'em', 'no Impasse Rua Padre Américo'],
['Jardim de Alah', 'a', 'ao Jardim de Alah'],
['Jardim de Alah', 'em', 'no Jardim de Alah'],
['Juiz de Fora', 'a', 'ao Juiz de Fora'],
['Juiz de Fora', 'em', 'no Juiz de Fora'],
['Ligacao Vista Chinesa Mesa do Imperador', 'a', 'à Ligacao Vista Chinesa Mesa do Imperador'],
['Ligacao Vista Chinesa Mesa do Imperador', 'em', 'na Ligacao Vista Chinesa Mesa do Imperador'],
['Ligacão Vista Chinesa Mesa do Imperador', 'a', 'à Ligacão Vista Chinesa Mesa do Imperador'],
['Ligacão Vista Chinesa Mesa do Imperador', 'em', 'na Ligacão Vista Chinesa Mesa do Imperador'],
['Ligaçao Vista Chinesa Mesa do Imperador', 'a', 'à Ligaçao Vista Chinesa Mesa do Imperador'],
['Ligaçao Vista Chinesa Mesa do Imperador', 'em', 'na Ligaçao Vista Chinesa Mesa do Imperador'],
['Ligação Vista Chinesa Mesa do Imperador', 'a', 'à Ligação Vista Chinesa Mesa do Imperador'],
['Ligação Vista Chinesa Mesa do Imperador', 'em', 'na Ligação Vista Chinesa Mesa do Imperador'],
['Marechal Hermes', 'a', 'ao Marechal Hermes'],
['Marechal Hermes', 'em', 'no Marechal Hermes'],
['Marginal Avenida Brasil', 'a', 'à Marginal Avenida Brasil'],
['Marginal Avenida Brasil', 'em', 'na Marginal Avenida Brasil'],
['Mirante do Urubu', 'a', 'ao Mirante do Urubu'],
['Mirante do Urubu', 'em', 'no Mirante do Urubu'],
['Oriente Estacionamento', 'a', 'ao Oriente Estacionamento'],
['Oriente Estacionamento', 'em', 'no Oriente Estacionamento'],
['Parque Civil', 'a', 'ao Parque Civil'],
['Parque Civil', 'em', 'no Parque Civil'],
['Parte do Circuito da Pedreira', 'a', 'à Parte do Circuito da Pedreira'],
['Parte do Circuito da Pedreira', 'em', 'na Parte do Circuito da Pedreira'],
['Passagem Paulo Roberto Lopes Lima', 'a', 'à Passagem Paulo Roberto Lopes Lima'],
['Passagem Paulo Roberto Lopes Lima', 'em', 'na Passagem Paulo Roberto Lopes Lima'],
['Ponte Antiga', 'a', 'à Ponte Antiga'],
['Ponte Antiga', 'em', 'na Ponte Antiga'],
['Professor Honório Silvestre', 'a', 'ao Professor Honório Silvestre'],
['Professor Honório Silvestre', 'em', 'no Professor Honório Silvestre'],
['Radial de Benfica', 'a', 'à Radial de Benfica'],
['Radial de Benfica', 'em', 'na Radial de Benfica'],
['Rede Sarah', 'a', 'à Rede Sarah'],
['Rede Sarah', 'em', 'na Rede Sarah'],
['Sao Paulo', 'a', 'ao Sao Paulo'],
['Sao Paulo', 'em', 'no Sao Paulo'],
['Servidao Sete de Setembro', 'a', 'à Servidao Sete de Setembro'],
['Servidao Sete de Setembro', 'em', 'na Servidao Sete de Setembro'],
['Servidão Sete de Setembro', 'a', 'à Servidão Sete de Setembro'],
['Servidão Sete de Setembro', 'em', 'na Servidão Sete de Setembro'],
['Setor Alemanha', 'a', 'ao Setor Alemanha'],
['Setor Alemanha', 'em', 'no Setor Alemanha'],
['São Paulo', 'a', 'ao São Paulo'],
['São Paulo', 'em', 'no São Paulo'],
['Tenente Cerqueira Leite', 'a', 'ao Tenente Cerqueira Leite'],
['Tenente Cerqueira Leite', 'em', 'no Tenente Cerqueira Leite'],
['Terminal Aroldo Melodia', 'a', 'ao Terminal Aroldo Melodia'],
['Terminal Aroldo Melodia', 'em', 'no Terminal Aroldo Melodia'],
['Tunel Acústico Rafael Mascarenhas', 'a', 'ao Tunel Acústico Rafael Mascarenhas'],
['Tunel Acústico Rafael Mascarenhas', 'em', 'no Tunel Acústico Rafael Mascarenhas'],
['Túnel Acústico Rafael Mascarenhas', 'a', 'ao Túnel Acústico Rafael Mascarenhas'],
['Túnel Acústico Rafael Mascarenhas', 'em', 'no Túnel Acústico Rafael Mascarenhas']
]
// TODO add your language grammar tests to call grammarize() and check result
};
Expand Down