Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use log date/time as timestamp, not ingested time #57

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@

`s3-log-ingestion-lambda` is an AWS Serverless application that sends log data from an S3 bucket of your choice to New Relic.

## Changes from original

- Avail meaningful `timestamp`
- In original implementation, `timestamp` is set to the time when Lambda is invoked. With this implementation, `timestamp` is set to the time when log record is generated.
- `date` and `time` attributes are removed.
- In Lambda, replace date and time each of log record, to timestamp in ISO8601 format
- In New Relic, use timestamp in ISO8601 format as `timestamp`
- You have to define Log parsing rule in New Relic. Read following [Setup in New Relic](#setup-in-new-relicrequired-for-this-repository) section.

## Requirements

To forward data to New Relic you need access to a [New Relic License Key](https://docs.newrelic.com/docs/accounts/install-new-relic/account-setup/license-key).
Expand All @@ -12,6 +21,55 @@ To forward data to New Relic you need access to a [New Relic License Key](https:

To install and configure the New Relic S3 log shipper Lambda, [see our documentation](https://docs.newrelic.com/docs/logs/enable-new-relic-logs/1-enable-logs/aws-lambda-sending-logs-s3).

**Note: Use [Manual install using Serverless Framework](https://docs.newrelic.com/docs/logs/forward-logs/aws-lambda-sending-logs-s3/#serverless-install)**

Example(replace YOUR_XXX with your values):

```shell
git clone https://github.com/netmarkjp/aws_s3_log_ingestion_lambda.git
cd aws_s3_log_ingestion_lambda

# sls(serverless framework) parameters
export SERVICE_NAME=YOUR_SERVICE_NAME

# function parameters
export LICENSE_KEY=YOUR_LICENSE_KEY
export LOG_TYPE=cloudfront-web-timestamp
export DEBUG_ENABLED=false
export S3_CLOUDTRAIL_LOG_PATTERN=""
export S3_IGNORE_PATTERN=""
export BATCH_SIZE_FACTOR=""
export ADDITIONAL_ATTRIBUTES='{"aws.accountId": "YOUR_ACCOUNT_ID", "aws.region": "YOUR_REGION"}'

# event parameters
export S3_BUCKET_NAME=YOUR_LOG_BUCKET_NAME
export S3_PREFIX=""

# install serverless plugins if needed
npx serverless plugin install -n serverless-python-requirements
npx serverless plugin install -n serverless-better-credentials

# deploy
npx serverless deploy --region YOUR_REGION --config serverless.yml
```

## Setup in New Relic(required for this repository)

With web browser, go to [Logs > Parsing](https://one.newrelic.com/logger/log-parsing) and `Create parsing rule`.

- Name: `cloudfront-web-timestamp`
- Field to parse: `message`
- Filter logs based on NRQL: `logtype='cloudfront-web-timestamp'`
- Parsing rule: ```^%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{NOTSPACE:x_edge_location}%{SPACE}%{NOTSPACE:sc_bytes}%{SPACE}%{NOTSPACE:c_ip}%{SPACE}%{NOTSPACE:cs_method}%{SPACE}%{NOTSPACE:cs_host}%{SPACE}%{NOTSPACE:cs_uri_stem}%{SPACE}%{NOTSPACE:sc_status}%{SPACE}%{NOTSPACE:cs_referer}%{SPACE}%{NOTSPACE:cs_user_agent}%{SPACE}%{NOTSPACE:cs_uri_query}%{SPACE}%{NOTSPACE:cs_Cookie}%{SPACE}%{NOTSPACE:x_edge_result_type}%{SPACE}%{NOTSPACE:x_edge_request_id}%{SPACE}%{NOTSPACE:x_host_header}%{SPACE}%{NOTSPACE:cs_protocol}%{SPACE}%{NOTSPACE:cs_bytes}%{SPACE}%{NOTSPACE:time_taken}%{SPACE}%{NOTSPACE:x_forwarded_for}%{SPACE}%{NOTSPACE:ssl_protocol}%{SPACE}%{NOTSPACE:ssl_cipher}%{SPACE}%{NOTSPACE:x_edge_response_result_type}%{SPACE}%{NOTSPACE:cs_protocol_version}%{SPACE}%{NOTSPACE:fle_status}%{SPACE}%{NOTSPACE:fle_encrypted_fields}%{SPACE}%{NOTSPACE:c_port}%{SPACE}%{NOTSPACE:time_to_first_byte}%{SPACE}%{NOTSPACE:x_edge_detailed_result_type}%{SPACE}%{NOTSPACE:sc_content_type}%{SPACE}%{NOTSPACE:sc_content_len}%{SPACE}%{NOTSPACE:sc_range_start}%{SPACE}%{NOTSPACE:sc_range_end}```

`Parsing rule` in one line.
Based on builtin logtype `cloudfront-web`.
https://docs.newrelic.com/docs/logs/ui-data/built-log-parsing-rules/#cloudfront

From `cloudfront-web`, replace `^%{NOTSPACE:date}%{SPACE}%{NOTSPACE:time}` to `^%{TIMESTAMP_ISO8601:timestamp}`.

Note: Because of the restiction of New Relic parsing rule length, we cannot **append** timestamp. So we have to **replace** these attributes

## Support

New Relic hosts and moderates an online forum where customers can interact with New Relic employees as well as other customers to get help and share best practices. Like all official New Relic open source projects, there's a [related Community topic in the New Relic Explorers Hub](https://discuss.newrelic.com/t/aws-s3-log-ingestion-lambda/104986).
Expand Down
2 changes: 2 additions & 0 deletions serverless.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,12 @@ provider:

plugins:
- serverless-python-requirements
- serverless-better-credentials

custom:
pythonRequirements:
dockerizePip: non-linux
fileName: src/requirements.txt

functions:
NewRelic-s3-log-ingestion:
Expand Down
11 changes: 11 additions & 0 deletions src/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,17 @@ async def _fetch_data_from_s3(bucket, key, context):
if index % 500 == 0:
logger.debug(f"index: {index}")
logger.debug(f"log_batch_size: {log_batch_size}")
try:
# https://docs.newrelic.com/docs/logs/ui-data/built-log-parsing-rules/#cloudfront
# replace: ^%{NOTSPACE:date}%{SPACE}%{NOTSPACE:time} to ^%{TIMESTAMP_ISO8601:timestamp}
logger.debug(f"log: {log}")
if _get_log_type() and _get_log_type() == "cloudfront-web-timestamp" and not log.startswith("#"):
chunks = str(log).split("\t")
logger.debug(f"chunks: {chunks}")
log = "\t".join(str(s) for s in [f"{chunks[0]}T{chunks[1]}Z", *chunks[2:]])
except Exception as e:
logger.debug(e)
pass
log_batches.append(log)
if log_batch_size > (MAX_BATCH_SIZE * BATCH_SIZE_FACTOR):
logger.debug(f"sending batch: {batch_counter} log_batch_size: {log_batch_size}")
Expand Down