Align information in relation to new architecture #270

nuzil · 2019-09-11T12:32:32Z

Description

I aligned information about architectural vision of Asynchronous Import.

Reviewers

tariqjawed83 · 2019-09-18T19:06:06Z

design-documents/asynchronous-import/base-extension.md

+    "source_type": "local",
+    "source_data": "var/catalog_product.csv" 
+  } 
+  "format": {


I think we should have format_type like csv, xml etc.; I do understand we will currently only support csv.
Instead of calling "csv_separator", we can say use "delimiter", which is commonly understood term. And we could also remove "csv_" from beginning.

https://www.php.net/manual/en/function.fgetcsv.php

we can also look at some optional parameters in the link above, for instance "escape, enclosure, length"; some of them are very commonly used in file-upload scenarios.

About "csv_" - agree here, as we have type defined in URL, we can use just params as "delimiter", "enclosure" and I will add "escape", make sense.
Length I guess its not really required param in this case. Cause we are basically defining how we will read file, which parts of file, etc. I never had a real usage of "length" param in csv import workflows.

tariqjawed83 · 2019-09-18T19:10:13Z

design-documents/asynchronous-import/base-extension.md

@@ -23,16 +26,15 @@ Path is relative from Magento Root folder

 ```
 {


do we think that this can be used in multi-select or multi-files upload?

Not really. Again it was build based on practical experience. One file transfer already can be quite resource consuming. Multiple its already risky.
If customer want to have multiple files upload he has BULK API available for this case

ok, I was just thinking from UI perspective, sometimes we have multi-select option enabled for small files. But in this case it can be risky and may overcomplicate.

Yep, I would not stick to multiple files in MVP in any case. Especially when we have BULK API on-board. But this can be discussed in a future steps if we will get request from users

Align format fields

tariqjawed83 · 2019-09-18T20:14:31Z

design-documents/asynchronous-import/base-extension.md

@@ -207,70 +160,88 @@ Will return list of Source that was uploaded before.
 }
 ```

-## Start File Import Endpoint
+### Base CSV Parser


Usually in the streaming based file import process, we take the records from the data stream and process them as we are fetching, parsing and pushing those records. Is there a specific customer requirement, to separate the upload and import from each other. Because this design will inherently make the process slower, since you have to upload the file first, and then when it's all completed, only then you will start the actual import process. From customer perspective, they want to import the data into Magento as quickly as possible.

On the other hand, since uploading might involve network latency and disk read (I/O latency), it is usually time consuming process, it is usually advisable to use streaming based approach end to end; meaning since lot of time spend waiting on I/O, CPU can be better off continue processing what it has already in-hand; so as you get records you can parse and push them simultaneously.

Since ultimate import will be based on Queue Processing, it should not cause dead-lock issues.

We had a long and multiple discussion related on this topic.
Main idea of split process is to separate 1 actions that can be independent from each other.
We are talking about decomposition of processes and this approac have multiple profits.

We have own set of imported data, so we can always see which data were uploaded, when and by whom

Each file can be run independently, so you can restart import for a file as many times as you want, define different mapping for same file, fix some errors in your mapping or processing rules without requirements to re-upload whole file again and again

Its is possible, and there are a cases where same file can be used for processing different import types: for example - Stock and Price can be 2 different imports but they could be located in same file

Source - its a separate module, this mean you can just delete it. For Import its no matter where you received your file from, you can use 3rd party extension, you can create your own module that will place file in correct location and will create required entity in database. and it still will work. The same process works in another direction. You can upload file and then create any custom script that will process it. And in this case you will use all standard Magento security features for file valiation and upload.

I hope it makes sense :-)

I believe we can still achieve all these points using a streaming based approach to this problem. Allow import again for a file that has been imported, by updating only import settings or format, there is no stopping in that. I have been involved in numerous data migration / ETL projects in the past, and have seen these issues many times.

Maybe we can improve this process later on after MVP, but I just want to make it clear, that this will not be performant for customers in many scenarios, and it will not be using the streaming based approach potential.

But I am absolutely okay, if you want to move forward with this approach and test the waters; then we can improve later on.

Agree!! One more point I forgot to mention is a complexity of such request. We already have an issue that for each file format we need own endpoint and own interfaces cause they have a custom format. Then also add to the body mapping information and body already become more complex and heavy.
As we are providing API for that, I believe that if somebody will integrate it with own system, he can also build in a way of one request.

I any case, agree with you that let’s try it how it will work and how the user experience will be. For now we had a good experience with this on our project. Let’s see

Sure, I am okay. For large data-sets or files, this sequential implementation never performed at the desired rate for me in the past; but we can always improve on, this is not something fundamentally right or wrong. This is based on needs, streaming based approach end to end, is much more efficient in terms of memory footprint & cpu processing, that all modern systems are using it.

tariqjawed83 · 2019-09-20T15:04:15Z

design-documents/asynchronous-import/base-extension.md

-          "csv_delimiter": "string",
-          "multiple_value_separator": "string"
-      }
+  "uuid": "UUID",


will this UUID be returned by the system/api itself rather than the user? if that is true, then we can remove it from the Request, and also we can name it just an ID, rather than UUID. Since UUID is one of the strategy to generate the IDs, which can be changed in the future. We should not name the underlying strategy of the ID generation, for the name of the Entity ID. During import, I believe we are using the name, source_id; I believe we should be consistent in the naming also.

UUID can be defined by the client, but at the same time IF its not set, we are generating it on Magento side, thats why this param is here

I understand your point related to IDs naming.
Its correct idea if we are talking about system that have this pattern everywhere.
Unfortunatelly its not in Magento, so _id is everywhere in the system are Database Ids.
Same time you are right that user should not be aware of way how ID is generated.

In our implementation its explicitly UUID, cause its currently on our code level.
May I ask you that we leave it as it is for now, and will have another discussion.

Btw. you can raise this topic on next Architectural meeting so there will be a lot of people to discuss and this may define pattern for future Magento standard of IDs naming

ok I will raise this within architecture meeting

tariqjawed83 · 2019-09-20T15:25:11Z

design-documents/asynchronous-import/base-extension.md

-    }
+        "import_images_file_dir": "string"
+    },
+    "convertingRules": [


1- Not sure, if I can understand how to call this API, it is not clear the intend of the parameters to be honest.
2- What is the use-case of sorting while importing? usually sorting of any kind doesn't happen during data importing. Is this sorting of rules, like a sequence of rules execution.

Thats fine, its not so easy from first glance :-)
convertingRules are defining how values for import can be changed to archive our goals.
Same time its not require param.
Out of box, without this param we are supporting standard Magento CSV format.
But imaging real world where each system export their own formats. This mean we have to provide possibility to configure import source so, that customer can import any csv data structure he wants.
Here are coming convertingRules.

So we have in the code some sub-module: RulesConverter.
Module takes all data we giving him (in this case read from the source file).
Then converter apply rules for this data.
Example: In import Status == Yes/No, we have a rule that convert this to 0/1 .... and so on for all required NOT standard Magento fields.
And it returns already changed input data, but those data are ready for import to Magento.

Sort order is required, cause you may have multiple manipulation with same field, so order of execution have to be correct.

thanks for clarification,

tariqjawed83 · 2019-09-20T17:04:16Z

design-documents/asynchronous-import/base-extension.md

+    "delimiter": "string",
+    "enclosure": "string",
+    "escape": "string",
+    "multiple_value_separator": "string"


what is the purpose of this "multiple_value_separator"?

For example Multiselect attribute value:
"color": "Red|Green|White"
In this case - | is a multiple value separator

okay, so it should be multiple_value_delimiter? to match above convention

Make sense, aligned

tariqjawed83 · 2019-09-20T17:08:57Z

design-documents/asynchronous-import/base-extension.md

+        "import_type": "catalog_product",
+        "validation_strategy": "string",
+        "allowed_error_count": 0,
+        "import_image_archive": "string",


are we planning to import images as well? if yes, not sure where they will be landing, it could be some CDN etc.

Yes we are planning and currently they will be imported in Magento.
Magento our of box is saving all images under pub/media/
Magento API same time accepting images as base64_encoded content.
So form our side are not really care about where are the images saved, we just taking what customer giving to us, and forwarding them to Magento in base64 view.

Customer providing images by uploading them on Magento filesystem. and they will be deleted after import.

Yes, in perfect way we have to support CDN storage, and for Magento we already have some support. But for our module - its currenly I guess out of scope of MVP.

Interesting point for future, we can groom locating where User can user place images for Import

Update ConverSion rules

tariqjawed83 · 2019-11-05T15:16:16Z

@nuzil I understand there were changes to the design, should we need to update this pull accordingly? or we should use this pull #330 instead?

nuzil · 2019-11-06T16:40:14Z

This one can be closed. New one will come

Align information in relation to new architecture

cd56262

magento-engcom-team added partners-contribution Partner: Comwrap labels Sep 11, 2019

paliarush assigned paliarush and unassigned paliarush Sep 17, 2019

tariqjawed83 self-assigned this Sep 18, 2019

tariqjawed83 reviewed Sep 18, 2019

View reviewed changes

Update base-extension.md

a7b4f2e

Align format fields

tariqjawed83 reviewed Sep 18, 2019

View reviewed changes

tariqjawed83 reviewed Sep 20, 2019

View reviewed changes

nuzil added 2 commits September 22, 2019 13:42

Update base-extension.md

271d856

Update ConverSion rules

Update base-extension.md

68a5e54

tariqjawed83 mentioned this pull request Sep 23, 2019

September 25, 2019 #290

Closed

nuzil added 3 commits September 23, 2019 21:33

Change multiple_value_delimiter

85ef7b2

Align multiple_value_delimiter

149dcb8

Update Converting rules format. Added apply_to

de6b1e6

nuzil closed this Nov 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align information in relation to new architecture #270

Align information in relation to new architecture #270

nuzil commented Sep 11, 2019

tariqjawed83 Sep 18, 2019

tariqjawed83 Sep 18, 2019

nuzil Sep 18, 2019

tariqjawed83 Sep 18, 2019

nuzil Sep 18, 2019

tariqjawed83 Sep 18, 2019

nuzil Sep 18, 2019

tariqjawed83 Sep 18, 2019

nuzil Sep 18, 2019

tariqjawed83 Sep 18, 2019 •

edited

Loading

nuzil Sep 18, 2019

tariqjawed83 Sep 18, 2019

tariqjawed83 Sep 20, 2019 •

edited

Loading

nuzil Sep 21, 2019

nuzil Sep 23, 2019

tariqjawed83 Sep 23, 2019

tariqjawed83 Sep 20, 2019 •

edited

Loading

nuzil Sep 21, 2019

tariqjawed83 Sep 23, 2019

tariqjawed83 Sep 20, 2019

nuzil Sep 21, 2019

tariqjawed83 Sep 23, 2019

nuzil Sep 23, 2019

tariqjawed83 Sep 20, 2019

nuzil Sep 21, 2019

tariqjawed83 commented Nov 5, 2019

nuzil commented Nov 6, 2019

		@@ -23,16 +26,15 @@ Path is relative from Magento Root folder

		```
		{

Align information in relation to new architecture #270

Align information in relation to new architecture #270

Conversation

nuzil commented Sep 11, 2019

Description

Reviewers

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tariqjawed83 Sep 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tariqjawed83 Sep 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tariqjawed83 Sep 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tariqjawed83 commented Nov 5, 2019

nuzil commented Nov 6, 2019

tariqjawed83 Sep 18, 2019 •

edited

Loading

tariqjawed83 Sep 20, 2019 •

edited

Loading

tariqjawed83 Sep 20, 2019 •

edited

Loading