Skip to the content.

Kafka Connect HTTP Connector

Disclaimer

It is a fork of the castorm/kafka-connect-http project, designed to improve its latest implementation and continue support as support for the original project is virtually nonexistent.

This connector is for you if

Source Connector

io.github.comrada.kafka.connect.http.HttpSourceConnector


Timer: Throttling HttpRequest

Controls the rate at which HTTP requests are performed by informing the task, how long until the next execution is due.

http.timer

public interface Timer extends Configurable {

    Long getRemainingMillis();

    default void reset(Instant lastZero) {
        // Do nothing
    }
}

Throttling HttpRequest with FixedIntervalThrottler

Throttles rate of requests based on a fixed interval.

http.timer.interval.millis

Interval in between requests

Throttling HttpRequests with AdaptableIntervalThrottler

Throttles rate of requests based on a fixed interval. It has, however, two modes of operation, with two different intervals:

http.timer.interval.millis

Interval in between requests when up-to-date

http.timer.catchup.interval.millis

Interval in between requests when catching up


HttpRequestFactory: Creating a HttpRequest

The first thing our connector will need to do is creating a HttpRequest.

http.request.factory

public interface HttpRequestFactory extends Configurable {

    HttpRequest createRequest(Offset offset);
}

http.offset.initial

Initial offset, comma separated list of pairs.

Creating a HttpRequest with TemplateHttpRequestFactory

This HttpRequestFactory is based on template resolution.

http.request.method

Http method to use in the request.

http.request.url

Http url to use in the request.

http.request.headers

Http headers to use in the request, , separated list of : separated pairs.

http.request.params

Http query parameters to use in the request, & separated list of = separated pairs.

http.request.body

Http body to use in the request.

http.request.template.factory
public interface TemplateFactory {

    Template create(String template);
}

public interface Template {

    String apply(Offset offset);
}

Class responsible for creating the templates that will be used on every request.

Creating a HttpRequest with FreeMarkerTemplateFactory

FreeMarker templates will have the following data model available:

Accessing any of the above withing a template can be achieved like this:

http.request.params=after=${offset.timestamp}

For an Epoch representation of the same string, FreeMarker built-ins should be used:

http.request.params=after=${offset.timestamp?datetime.iso?long}

For a complete understanding of the features provided by FreeMarker, please, refer to the User Manual


HttpClient: Executing a HttpRequest

Once our HttpRequest is ready, we have to execute it to get some results out of it. That’s the purpose of the HttpClient

http.client

public interface HttpClient extends Configurable {

    HttpResponse execute(HttpRequest request) throws IOException;
}

Executing a HttpRequest with OkHttpClient

Uses a OkHttp client.

http.client.connection.timeout.millis

Timeout for opening a connection

http.client.read.timeout.millis

Timeout for reading a response

http.client.connection.ttl.millis

Time to live for the connection

http.client.proxy.host

Hostname of the HTTP Proxy

http.client.proxy.port

Port of the HTTP Proxy

http.client.proxy.username

Username of the HTTP Proxy

http.client.proxy.password

Password of the HTTP Proxy

HttpAuthenticator: Authenticating a HttpRequest

When executing the request, authentication might be required. The HttpAuthenticator is responsible for resolving the Authorization header to be included in the HttpRequest.

http.auth

public interface HttpAuthenticator extends Configurable {

    Optional<String> getAuthorizationHeader();
}

Authenticating with ConfigurableHttpAuthenticator

Allows selecting the authentication type via configuration property

http.auth.type

Type of authentication

Authenticating with BasicHttpAuthenticator

Allows selecting the authentication type via configuration property

http.auth.user
http.auth.password

HttpResponseParser: Parsing a HttpResponse

Once our HttpRequest has been executed, as a result we’ll have to deal with a HttpResponse and translate it into the list of SourceRecords expected by Kafka Connect.

http.response.parser

public interface HttpResponseParser extends Configurable {

    List<SourceRecord> parse(HttpResponse response);
}

Parsing with PolicyHttpResponseParser

Vets the HTTP response deciding whether the response should be processed, skipped or failed. This decision is delegated to a HttpResponsePolicy. When the decision is to process the response, this processing is delegated to a secondary HttpResponseParser.

HttpResponsePolicy: Vetting a HttpResponse
http.response.policy
public interface HttpResponsePolicy extends Configurable {

    HttpResponseOutcome resolve(HttpResponse response);

    enum HttpResponseOutcome {
        PROCESS, SKIP, FAIL
    }
}
http.response.policy.parser
Vetting with StatusCodeHttpResponsePolicy

Does response vetting based on HTTP status codes in the response and the configuration below.

http.response.policy.codes.process

Comma separated list of code ranges that will result in the parser processing the response

http.response.policy.codes.skip

Comma separated list of code ranges that will result in the parser skipping the response

Parsing with KvHttpResponseParser

Parses the HTTP response into a key-value SourceRecord. This process is decomposed in two steps:

http.response.record.parser
public interface KvRecordHttpResponseParser extends Configurable {

    List<KvRecord> parse(HttpResponse response);
}
http.response.record.mapper
public interface KvSourceRecordMapper extends Configurable {

    SourceRecord map(KvRecord record);
}
Parsing with JacksonKvRecordHttpResponseParser

Uses Jackson to look for the records in the response.

http.response.list.pointer

JsonPointer to the property in the response body containing an array of records

http.response.record.pointer

JsonPointer to the individual record to be used as kafka record body. Useful when the object we are interested in is under a nested structure

http.response.record.offset.pointer

Comma separated list of key=/value pairs where the key is the name of the property in the offset, and the value is the JsonPointer to the value being used as offset for future requests. This is the mechanism that enables sharing state in between HttpRequests. HttpRequestFactory implementations receive this Offset.

Special properties:

One of the roles of the offset, even if not required for preparing the next request, is helping in deduplication of already seen records, by providing a sense of progress, assuming consistent ordering. (e.g. even if the response returns some repeated results in between requests because they have the same timestamp, anything prior to the last seen offset will be ignored). see OffsetFilterFactory

http.response.record.timestamp.parser

Class responsible for converting the timestamp property captured above into a java.time.Instant.

http.response.record.timestamp.parser.pattern

When using DateTimeFormatterTimestampParser, a custom pattern can be specified

http.response.record.timestamp.parser.zone

Timezone of the timestamp. Accepts ZoneId valid identifiers

http.response.record.timestamp.parser.regex

When using RegexTimestampParser, a custom regex pattern can be specified

http.response.record.timestamp.parser.regex.delegate

When using RegexTimestampParser, a delegate class to parse timestamp


Mapping a KvRecord into SourceRecord with SimpleKvSourceRecordMapper

Once we have our KvRecord we have to translate it into what Kafka Connect is expecting: SourceRecords

Embeds the record properties into a common simple envelope to enable schema evolution. This envelope simply contains a key and a value properties with customizable field names.

Here is also where we’ll tell Kafka Connect to what topic and on what partition do we want to send our record.

** It’s worth noticing there are projects out there that allow you to infer the schema from your json document. (e.g. expandjsonsmt)

kafka.topic

Name of the topic where the record will be sent to

http.record.schema.key.property.name

Name of the key property in the key-value envelope

http.record.schema.value.property.name

Name of the value property in the key-value envelope


SourceRecordSorter: Sorting SourceRecords

Some Http resources not designed for CDC, return snapshots with most recent records first. In this cases de-duplication is especially important, as subsequent request are likely to produce similar results. The de-duplication mechanisms offered by this connector are order-dependent, as they are usually based on timestamps.

To enable de-duplication in cases like this, we can instruct the connector to assume a specific order direction, either ASC, DESC, or IMPLICIT, where implicit figures it out based on records’ timestamps.

http.record.sorter

public interface SourceRecordSorter extends Configurable {

    List<SourceRecord> sort(List<SourceRecord> records);
}

http.response.list.order.direction

Order direction of the results in the response list.


SourceRecordFilterFactory: Filtering out SourceRecord

There are cases when we’ll be interested in filtering out certain records. One of these would be de-duplication.

http.record.filter.factory

public interface SourceRecordFilterFactory extends Configurable {

    Predicate<SourceRecord> create(Offset offset);
}

Filtering out SourceRecord with OffsetTimestampRecordFilterFactory

De-duplicates based on Offset’s timestamp, filtering out records with earlier or the same timestamp. Useful when timestamp is used to filter the HTTP resource, but the filter does not have full timestamp precision. Assumptions:

If the latter assumption cannot be satisfied, check OffsetRecordFilterFactory to try and prevents data loss.

Filtering out SourceRecord with OffsetRecordFilterFactory

De-duplicates based on Offset’s timestamp, key and any other custom property present in the Offset, filtering out records with earlier timestamps, or when in the same timestamp, only those up to the last seen Offset properties. Useful when timestamp alone is not unique but together with some other Offset property is. Assumptions:


HttpResponseTransformer: Transforming a HttpResponse

Once our HttpRequest has been executed, as a result we’ll have to deal with a HttpResponse, we can immediately do some kind of transformation with it. This will happen before the HttpResponseParser starts its work. This may be useful in various cases, e.g. to correct JSON into more usable form, to cast types of fields, or to select certain number of fields from complex data structure. As the transformer receives the full HttpResponse object, it can also modify server headers and HTTP status code.

http.response.transformer

public interface HttpResponseTransformer extends Configurable {

    HttpResponse transform(HttpResponse response);
}

Transforming with JsltBodyTransformer

Transformer first checks the server response for application/json content type and if it is different, it fails. Transformer allows you to flexibly convert JSON from one form to another. You can learn examples of the syntax in the official documentation.

http.response.transform.jslt
http.auth.password

Development

Building

mvn package

Running the tests

mvn test

Releasing

Contributing

Contributions are welcome via pull requests, pending definition of code of conduct, please just follow existing conventions.

License

This project is licensed under the Apache 2.0 License - see the LICENSE.txt file for details

Built With

Acknowledgments