Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multipart/form-data and file uploads #609

Closed
wants to merge 41 commits into from

Conversation

ahirner
Copy link
Contributor

@ahirner ahirner commented Oct 22, 2023

Fixes #518 and #402.

This approach factors out binary form data as reqwest::multipart::Part parameter. This has the following advantages:

  • Clients gains the ability to stream large files
  • Bodies remain serde::Serialize which they couldn't if a Part was made from an async stream

The disadvantage is leaking a reqwest type. However, we could probably wrap and imple Into for an internal type if required.

An internal media heavy API compiled well. I'm looking for guidance and tips. Todos:

  • decide and finalize how to typify binary formatted strings
  • some work on cli
  • some work on httpmock
  • tests

* own OperationParamaterType
* delare replaced PartMeta
Todo:
* some work on cli
* some work on httpmock
* generate form in method body
except: create form from body
adds:
use reqwest::multipart::Part;
adds super::Part import
approach has two flaws:
* form params optional cannot be recovered from typify or ergonomically from schema
* fails for bodies with nested compound types, e.g. Map, Vec
the main benefit is correct treatment of
optionals and a baseline support for
nested compound parameters
@ahirner ahirner marked this pull request as ready for review October 22, 2023 20:21
@ahl
Copy link
Collaborator

ahl commented Oct 23, 2023

I didn't take a close look at this as it appears to still be a work in progress, but I did want to say:

  1. Thank you! great to see folks participating in this project
  2. I've found that one of the best ways to demonstrate new features such as this one is to add an end-to-end test by adding an operation in an existing OpenAPI json file or by adding a new file; this will make the output clear in the updated fixture data.

@ahirner ahirner marked this pull request as draft October 27, 2023 00:37
@ahirner ahirner marked this pull request as ready for review October 28, 2023 23:17
@ahirner
Copy link
Contributor Author

ahirner commented Oct 28, 2023

I've found that one of the best ways to demonstrate new features such as this one is to add an end-to-end test

Check out progenitor-impl/tests/output/openai-openapi-*.out. This API features multiple files in one operation and optionals as well.

For an integrated example see: https://github.com/ahirner/openai-progenitor

decide and finalize how to typify

The outputs are fully functional AFAIC. My last gripe is that multipart/form-data body structs are emitted twice. Once with file fields factored out and once from the original ref. My idea would be to inline those definitions before adding them:

self.type_space.add_ref_types(schemas)?;

LMK what you think!
Btw, I think there is also the opportunity to DRY the token generating methods.

progenitor-client/src/progenitor_client.rs Show resolved Hide resolved
progenitor-impl/src/method.rs Outdated Show resolved Hide resolved
progenitor-impl/src/method.rs Outdated Show resolved Hide resolved
progenitor-impl/src/method.rs Show resolved Hide resolved
progenitor-impl/src/method.rs Outdated Show resolved Hide resolved
progenitor-impl/src/method.rs Outdated Show resolved Hide resolved
progenitor-impl/src/method.rs Outdated Show resolved Hide resolved
progenitor-impl/src/method.rs Outdated Show resolved Hide resolved
progenitor-impl/src/method.rs Show resolved Hide resolved
progenitor-impl/src/lib.rs Outdated Show resolved Hide resolved
ReferenceOrExt already does everything we need, amazing
.. because a cloned and pruned body type is created separately.
We probalby want to get rid of redundant body structs iff. they
are referenced in multipart/form-data contents only
@ahirner
Copy link
Contributor Author

ahirner commented Oct 30, 2023

PR is complete and all comments so far adressed.

@ahl
Copy link
Collaborator

ahl commented Nov 13, 2023

PR is complete and all comments so far adressed.

Sounds good. I'll try to take a first pass this week.

@ahirner
Copy link
Contributor Author

ahirner commented Dec 11, 2023

Hi @ahl, is there maybe something I can do in the meantime?

Copy link
Collaborator

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not exhaustive, but what I could get through. I'd encourage you to generalize from these comments and go through the PR yourself again. Thank you for the submission and my apologies for the delay reviewing it.

Comment on lines +14 to +15
The others should not be changed other than updating them from their source repository.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert?

Comment on lines +146 to +151
/// Binary or text parts of a multipart/form-data body.
/// The bool signifies if it is required.
// TODO any other body content ought to become optional,
// after which flag would be redundant
// (see comment in OperationParameterKind)
FormData(bool),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Binary or text parts of a multipart/form-data body.
/// The bool signifies if it is required.
// TODO any other body content ought to become optional,
// after which flag would be redundant
// (see comment in OperationParameterKind)
FormData(bool),
/// Binary or text parts of a multipart/form-data body.
FormData,

It doesn't appear that this is needed for the test cases provided and if we're going to support optional bodies, let's do it properly in OpationalParameterKind

Comment on lines +1617 to +1619
if let OperationParameterKind::Body(
BodyContentType::FormData(required),
) = param.kind
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe if let .. else { unreachable!() };

}
}

pub fn generate_tokens(&mut self, spec: &OpenAPI) -> Result<TokenStream> {
fn add_ref_types(&mut self, spec: &OpenAPI) -> Result<()> {
validate_openapi(spec)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this has to do with what I infer to be the purpose of this function.

// Take unquoted String Value
Some(v) => Ok(v.to_string()),
// otherwise convert to quoted json
None => serde_json::to_string(value).map_err(|_| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this make serde_json into a mandatory dependency for generated crates?

@@ -308,15 +337,24 @@ impl Generator {
};

let version_str = &spec.info.version;
let (to_form_string, pub_part) = if self.uses_form_parts {
(
Some(quote! { ,to_form_string }),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use it unconditionally (note #[allow(unused_imports)])

Comment on lines +219 to +231
// collect all component names in request body media references
// that are only used for multipart/form-data operations.
let media_conents = spec
.paths
.iter()
.flat_map(|(_path, ref_or_item)| {
// Exclude externally defined path items.
let item = ref_or_item.as_item().unwrap();
item.iter().map(move |(_method, operation)| operation)
})
.flat_map(|operation| operation.request_body.iter())
.filter_map(|b| b.as_item().map(|b| &b.content))
.flat_map(|body_content| body_content.iter());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're trying to exclude these so we don't generate types that aren't used?

validate_openapi(spec)?;

// collect all component names in request body media references
// that are only used for multipart/form-data operations.
let media_conents = spec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

.iter()
.flat_map(|(_path, ref_or_item)| {
// Exclude externally defined path items.
let item = ref_or_item.as_item().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this valid?

/// Return all component names where schema references are keyed
/// by multipart/form-data but nowhere else. The last part of a media
/// schema's json path reference is considered the component name.
fn get_exclusive_formdata_refs<'a>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these checks seem insufficient to validate if referenced types are used elsewhere e.g. what if the type is used as a body parameter or response parameter. I may be misunderstanding what's going on here, but I don't think we should do any of this.

Copy link
Contributor Author

@ahirner ahirner Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the type is used as a body parameter or response parameter

That's what this function should find out. Alas, it's only partially validated in tests, not provably.

what's going on here

It's to avoid that multipart/form-data body structs are emitted twice. Once with file fields factored out and once from the original ref.

should do any of this

I think so too, but haven't found a cooler way. My initial idea was way to inline types earlier, but I think the current typify API didn't let me. Maybe I'll find a better approach still.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we just not do anything clever here: generating orphaned types is fine--we already do it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... good to know! Then this PR might come to a conclusion.
It's only that without f8baafa, the number of orphaned types was very, very large for our internal uses. This simply went against my sense of aesthetics. Technically, I don't think it added compile times really.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's probably a bigger, more holistic look one might take: start from all operations, see the types they touch. We could have special handling for these multi-part types (i.e. use in those cases wouldn't count as real use)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. OTOH, the more I thought about that, the larger the PR grew in my mind as well ;)
Intuitively, every operation could probably inferred from a generalized query into a graph of schema units.

@ahirner ahirner marked this pull request as draft December 13, 2023 18:59
@ahirner
Copy link
Contributor Author

ahirner commented Mar 10, 2024

Obviously this PR stalled due to my lack of effort.
Overall I suggest to revisit it after #384. #177 would make the generated code leaner also.

@ahirner ahirner closed this Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

content multipart/form-data is not supported
3 participants