Skip to content
This repository was archived by the owner on Apr 4, 2019. It is now read-only.
This repository was archived by the owner on Apr 4, 2019. It is now read-only.

saving json data , partition by specific field (timestamp)  #58

Description

@doriwaldman

I have a question, data in kafka is in json format, in each event I have a field called"eventTimestamp" which is a long number which represents the event time , I want to save the data in s3 in hourly bucket based on that timestamp, not the time the event was added to Kafka

my settings when I used Kafka s3 connect are :

connector.class=io.confluent.connect.s3.S3SinkConnector
storage.class=io.confluent.connect.s3.storage.S3Storage
format.class=io.confluent.connect.s3.format.json.JsonFormat
schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner
timestamp.extractor=RecordField
path.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH
timestamp.field=eventTimestamp
partition.duration.ms=10
locale=en_IN
timezone=UTC

I see that streamx support TimeBasedPartitioner but if I understand it can only support to extract RecordField from parquet or avro not from json

Is it possible to do it with json ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions