Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Herd Uploader #370

Open
tinshuksingh opened this issue Mar 2, 2018 · 6 comments
Open

Herd Uploader #370

tinshuksingh opened this issue Mar 2, 2018 · 6 comments

Comments

@tinshuksingh
Copy link

tinshuksingh commented Mar 2, 2018

Hi Team,

We created business object definition and now trying to upload file to S3 bucket using herd-uploader-0.63.0.jar from ec2 instance.

  • We followed the wiki steps which you mentioned to upload file to S3.

  • We are able to pre-register the business object with registration server successfully but after that getting null pointer exception while reading the directory path.
    manifest.json
    {
    "namespace": "S3UploadNamespace",
    "businessObjectDefinitionName": "S3BusinessDefination",
    "businessObjectFormatUsage": "PRC",
    "businessObjectFormatFileType": "TXT",
    "businessObjectFormatVersion": "0",
    "partitionKey": "PROCESS_DATE",
    "partitionValue": "2014-04-01",
    "storageName": "S3StorageUnit",
    "subPartitionValues": [
    "2014-04-01"
    ],
    "manifestFiles" : [ {
    "fileName" : "testFile1.gz",
    "rowCount" : 0
    }, {
    "fileName" : "testFile2.gz",
    "rowCount" : 0
    } ]
    }

    image

  • We tried to pass directory path in manifest.json file but it was failing with exception UnrecognizedPropertyException.
    manifest.json
    {
    "namespace": "S3UploadNamespace",
    "businessObjectDefinitionName": "S3BusinessDefination",
    "businessObjectFormatUsage": "PRC",
    "businessObjectFormatFileType": "TXT",
    "businessObjectFormatVersion": "0",
    "partitionKey": "PROCESS_DATE",
    "partitionValue": "2014-04-01",
    "storageUnits": [ {
    "storageName": "S3StorageUnit",
    "storageDirectory": {
    "directoryPath": "Herd_poc_bucket"
    },
    "storageFiles": [
    {
    "filePath": "testFile1.txt",
    "fileSizeBytes": 0,
    "rowCount": 0
    }
    ],
    "discoverStorageFiles": true
    }],
    "subPartitionValues": [
    "2014-04-01"
    ],
    "manifestFiles" : [ {
    "fileName" : "testFile1.gz",
    "rowCount" : 0
    }, {
    "fileName" : "testFile2.gz",
    "rowCount" : 0
    } ]
    }

    image

  • Please let us know if we are missing anything.

Thanks,
Tinshuk

@nateiam
Copy link
Contributor

nateiam commented Mar 2, 2018

Hi Tinshuk -

The Uploader tool is included in our automated test suite in our environment so I believe it should not be too difficult to get working in your environment. And it's a good indication that the pre-registration worked.

I would like to collect some information. But first -- I bet you already discovered the swagger docs that ship with each release. Actually I think this is in the CloudFormation output but so maybe you did not see it. But the docs are at /herd-app/docs/rest/index.html and they will help you with the Storages GET below and many other REST calls you will be making in the future!

Please send:

  • Output from Storages GET for your S3StorageUnit (probably like /herd-app/rest/storages/S3StorageUnit)
  • Full command line including all arguments you are using to call Uploader
  • Attach full logs from the Uploader execution - it's easier for us to have the full logs not just a snippet so we can see some earlier steps.

I am also tagging @kenisteward here who can help troubleshoot. Thanks Tinshuk, Keni!

@tinshuksingh
Copy link
Author

tinshuksingh commented Mar 5, 2018

Hi @nateiam,

Please find details you asked,

  • Output from Storages GET for S3StorageUnit:
    {
    "name": "S3StorageUnit",
    "storagePlatformName": "S3",
    "attributes": []
    }

  • java -jar herd-uploader-0.63.0.jar -a xxx -p xxx/xxx -l /home/xxx-user/herd-uploader -m manifest.json -H ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com -P 8080

  • herd-uploader-error.txt

@kenisteward
Copy link
Contributor

@tinshuksingh

When the uploader tries the actual upload, it uses the BDATA"s storage.directorypath to go to the actual s3 place.

It looks like your storage doesn't have the attributes that tells where your s3 path is. If you could, try doing a stoarge put on the following attributes:

{
  "attributes": [
    {
      "name": "bucket.name",
      "value": "yourBucketName"
    }
  ]
}

If this doesn't work let us know. We think this should fix it with minimal changes but there are other knobs we can tweak.

@tinshuksingh
Copy link
Author

@kenisteward

I updated the storage with attributes as:

	{
	  "name": "S3StorageUnit",
	  "storagePlatformName": "S3",
	  "attributes": [
	    {
	      "name": "bucket.name",
	      "value": "bucketName"
	    }
	  ]
	}

but getting same error as earlier I mentioned.

@kenisteward
Copy link
Contributor

@tinshuksingh

Gotcha. Looks like you need to set the keyPrefix for the storage since you can't set the storage directory in the manifest.json. Maybe we can make that a feature of uploader? @nateiam

	{
	  "name": "S3StorageUnit",
	  "storagePlatformName": "S3",
	  "attributes": [
	    {
	      "name": "bucket.name",
	      "value": "bucketName"
	    },
	    {
	      "name": "key.prefix.velocity.template",
	      "value": "your/velocity/key/prefix"
	    }
	  ]
	}

It looks like with the herd-uploader's manifest.json, you aren't actually allowed to specify the storage directory. Because of this, you'll have to setup the directory path via the storage's key.prefix.velocity.template.

This can be any string. It also has replaceable values that are:

S3 Key Prefix Velocity Template
$environment | The environment name.
$namespace | The namespace code.
$dataProviderName | The data provider name.
$businessObjectDefinitionName | The name of the business object definition.
$businessObjectFormatUsage | The business object format usage.
$businessObjectFormatFileType | The business object format file type.
$businessObjectFormatVersion | The version of the business object format.
$businessObjectDataVersion | The version of the business object data.
$businessObjectFormatPartitionKey | The partition key which must be pre-registered as part of the business object format.
$businessObjectDataPartitionValue | The business object data primary partition value.
$businessObjectDataPartitions | The ordered map of sub-partition column names to sub-partition values.
$CollectionUtils | org.apache.commons.collections4.CollectionUtils.class

Examples:

$environment/$namespace/$businessObjectDataPartitionValue

$namespace/some/random/choices/$businessObjectFormatFileType/$businessObjectDataPartitionValue

@kenisteward
Copy link
Contributor

@tinshuksingh Are you still having any issues?

FINRAOSS pushed a commit that referenced this issue Oct 5, 2018
* commit '1d023e3d0c9a357c5ce0474c43a0b7172cfd74d6':
  DM-10691: FactoryCQ - UI - Migrate from Http to HttpClient
  DM-10691: FactoryCQ - UI - Migrate from Http to HttpClient
  DM-10691: FactoryCQ - UI - Migrate from Http to HttpClient
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants