Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with parsing CoRE Link Format #2274

Open
v-po opened this issue Aug 25, 2024 · 4 comments
Open

Issues with parsing CoRE Link Format #2274

v-po opened this issue Aug 25, 2024 · 4 comments

Comments

@v-po
Copy link

v-po commented Aug 25, 2024

I have encountered some issues with the LinkFormat.parse() method when parsing certain payloads. I realize the current implementation is very simple. And maybe the following points are not even considered bugs, in which case feel free to close this issue. I'm also hoping that I have not completely misunderstood the main RFC 6690, or any of the linked ones 😅.

The current parser seems to have the following problems:

1. Incorrect handling of certain characters in Attribute Values

Parsing stops when encountering certain characters that are valid in relation-types. Specifically, it fails when handling characters such as dots (.), hyphens (-), and colons (:).

var parsed = LinkFormat.parse("</some/uri>;rt=some.resource-type;if=some.interface-description:v0.13");
System.out.println(parsed);

Expected output:

[</some/uri>
    rt: some.resource-type
    if: some.interface-description:v0.13]

Actual output:

[</some/uri>
    rt: some]

It appears that the issue is caused by the regex pattern used for WORD.

public static final Pattern WORD = Pattern.compile("\\w+");

However, I'm not sure if simply extending this regex to include those characters will fix it. It might even make it worse :)

2. Incorrect handling of quoted strings for non-relation-type attributes

The current parser splits all quoted strings into separate values when encountering spaces, regardless of the attribute type. If I understood correctly, only specific attributes (such as rel, rev, rt, and if) should have their quoted strings split into space-separated values. All other attributes should treat quoted strings as single entities, even if they contain spaces.

var parsed = LinkFormat.parse("</some/uri>;b=\"abc def\"");
System.out.println(parsed);

Expected output:

[</some/uri>
    b: abc def]

Actual output:

[</some/uri>
    b: [abc, def]]

3. (Maybe issue) Handling of attributes without values (e.g., observability flags)

I decided to represent them as the empty list, instead of the list containing the empty string as californium treats them, since an attribute like a="" seemed distinct than just a

I'm sure there are other issues with the parser (for example I didn't see any handling of ext-value) as the whole specification is (imo) a giant mess of interlinking RFCs. These are just the ones that I came across.

My own solution

I don't know if this is of value to anyone else, but this is the code that I'm currently using to using to get around these issues. It's nowhere near "RFC-compliant" and also not very efficient with the String allocations.

Core Links parser in Kotlin
typealias AttributeName = String
typealias AttributeValues = List<String>

data class CoreLinks(val links: Map<URI, CoreLink>)
data class CoreLink(val uri: URI, val attributes: Map<AttributeName, AttributeValues>)

fun parseCoreLinkFormat(input: String): CoreLinks {
    val linksMap = mutableMapOf<URI, CoreLink>()
    val entries = splitLinkValueList(input)

    for (entry in entries) {
        val uriReference = extractUriReference(entry) ?: continue
        val attributes = extractLinkParams(entry.substringAfter(">").trimStart())
        linksMap[uriReference] = CoreLink(uriReference, attributes)
    }

    return CoreLinks(linksMap)
}

private fun splitLinkValueList(input: String): List<String> {
    val entries = mutableListOf<String>()
    val currentEntry = StringBuilder()
    var insideUri = false
    var insideQuote = false

    for (char in input) {
        when (char) {
            ',' -> {
                if (!insideUri && !insideQuote) {
                    // If we encounter a comma and are not inside a URI or quoted string,
                    // treat it as a delimiter for splitting link-value entries
                    entries.add(currentEntry.toString().trim()) // Add the current entry to the list
                    currentEntry.clear() // Clear the StringBuilder for the next entry
                } else {
                    // If inside URI or quoted string, treat the comma as part of the entry
                    currentEntry.append(char)
                }
            }
            '<' -> {
                // Mark the start of a URI-Reference (inside <...>)
                insideUri = true
                currentEntry.append(char)
            }
            '>' -> {
                // Mark the end of a URI-Reference
                insideUri = false
                currentEntry.append(char)
            }
            '"' -> {
                // Toggle the insideQuote flag when encountering a quote character
                insideQuote = !insideQuote
                currentEntry.append(char)
            }
            else -> {
                // For all other characters, just append them to the current entry
                currentEntry.append(char)
            }
        }
    }

    if (currentEntry.isNotEmpty()) {
        entries.add(currentEntry.toString().trim())
    }

    return entries
}

private fun extractUriReference(entry: String): URI? {
    val start = entry.indexOf('<')
    val end = entry.indexOf('>')
    if (start == -1 || end == -1 || end < start) {
        println("URI-Reference incorrectly formatted: $entry")
        return null
    }
    return try {
        URI(entry.substring(start + 1, end))
    } catch (e: Exception) {
        println("Error constructing URI: ${e.message}")
        null // Handle invalid URIs
    }
}

private fun extractLinkParams(paramsPart: String): Map<String, List<String>> {
    val linkParams = mutableMapOf<AttributeName, AttributeValues>()
    val parts = paramsPart.split(";").map { it.trim() }
    if (parts.isEmpty()) return linkParams

    // We iterate parts from index 1 because paramsPart starts with a ";", so index 0 will be an empty string after split
    for (i in 1 until parts.size) {
        val attrParts = parts[i].split("=", limit = 2).map { it.trim() }
        if (attrParts.size == 2) {
            val key = attrParts[0]
            val value = attrParts[1]

            val values = when {
                value.startsWith("\"") && value.endsWith("\"") -> {
                    // Handle "quoted strings" for relation-types
                    if (key == RELATION || key == REVERSE_RELATION || key == RESOURCE_TYPE || key == INTERFACE_DESCRIPTION) {
                        value.removeSurrounding("\"").split(" ")
                    } else {
                        listOf(value.removeSurrounding("\""))
                    }
                }
                else -> listOf(value)
            }

            linkParams[key] = values
        } else if (attrParts.size == 1) {
            // Handle attributes without values (e.g., observability flags)
            linkParams[attrParts[0]] = listOf()
        }
    }
    return linkParams
}
@boaks
Copy link
Contributor

boaks commented Aug 26, 2024

To be frank, I'm no expert for RFC 6690.
Over the years not too many seems to use that extensively.
What's is your use-case, which is affected by the misbehavior?

@v-po
Copy link
Author

v-po commented Aug 26, 2024

@boaks Not entirely sure what you mean by use-case, but we are using parts of californium in an Android app and in a Resource Directory backend. And we have IoT devices that send data in the formats outlined above (when querying "/.well-known/core") which caused problems when we tried using the LinkFormat parser. For us it's not a very big issue, as we don't use the californium CoapClient, so we can get by with our own Core Links parser. I guess it will be a problem for someone who uses CoapClient.discover() and receives data like the one specified above.

@boaks
Copy link
Contributor

boaks commented Aug 26, 2024

Not entirely sure what you mean by use-case

"Resource Directory" is the kind of answer I was looking for.

For us it's not a very big issue

OK.

If you're interested to get answers for your points and interpretations from the IETF core-group,
the easiest way is to copy the issue into corrclar issue.

We can alternatively convert your solution into a PR and see, if someone has issues with it.

(I'm personally on the way to ramp down my engagement to a hobby/free time level. Therefore the handling of the clarification on my side will take some time.)

@sbernard31
Copy link
Contributor

On my side, I never used Californium classes to parse Link Format.
But I faced this RFC 6690 with LWM2M and I also feel this is a not so easy to implement RFC. (especially because of extensibility ...)

In Leshan we try to implement a CoRE Link Parser, all classes are in leshan-lwm2m-core@org/eclipse/leshan/core/link/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants