backendspace - Tumblr blog

backendspace · 2 years ago

Text

Static methods and IO

Although we can mock & test static methods, still it's bad for testability.

Another practice is to not have static methods for IO calls.

0 notes

backendspace · 3 years ago

Text

Authentication: authz

Authorization: authn

0 notes

backendspace · 3 years ago

Text

Erasure coding clusters is an encoding done with the data that enables hdfs to have the same replication with the advantage of reducing the storage overhead by 50%.

0 notes

backendspace · 3 years ago

Text

Fanout

Fanin

0 notes

backendspace · 3 years ago

Text

Circuit breaker

Hystrix is a circuit breaker

0 notes

backendspace · 3 years ago

Text

Cascading errors among other services.

0 notes

backendspace · 3 years ago

Text

Immediate failures are good, timeouts are bad.

0 notes

backendspace · 3 years ago

Text

Spring boot has a concept of service thread pool where we can adjust the threads.

0 notes

backendspace · 3 years ago

Text

Service mesh

Manager the micro services without writing code to manage them. An example of service mesh is istio plus envoy proxy.

It runs as a sidecar pattern of running an instance with the application server and doing the operations.

It can also help with load balancing, routing based on user agent say, etc .

I have to check if kubernetes can do something similar.

0 notes

backendspace · 3 years ago

Text

Want to speed things up? Run close to the data.

0 notes

backendspace · 3 years ago

Text

If using a Kafka lossy cluster here are the reasons for message loss

Executor crash or restart

Kafka rest proxy restart

Kafka leader failover

0 notes

backendspace · 3 years ago

Text

If you cannot solve for all, solve for the most important ones.

0 notes

backendspace · 3 years ago

Text

A Kafka topic can dump data into a Hive table, not sure if this is available out of the box

0 notes

backendspace · 3 years ago

Text

Hive is also used like this

A Spark application outputs into a Hive table t1, then another spark application pulls from table t1 and dumps to table t2, third application pulls from t2 and dumps to t3.

This is not happening sequentially as Spark AFAIK is used for batch processing, so each of the applications would run at fixed intervals.

0 notes

backendspace · 3 years ago

Text

You need to have different SLAs for different tiers.

Just like the deadline in Jira for different priorities

0 notes

backendspace · 3 years ago

Text

Hive sync issues

Autofix inconsistent tables if possible

Report inconsistent tables that are not fixable

Do the above 2 within the freshness SLA

0 notes

backendspace · 3 years ago

Text

Hive

Yarn

HDFS

Airflow

Spark

Flink

0 notes