backendspace
backendspace
Backend/Data space
89 posts
Snippets and micro blogging of interesting/nuanced backend & data related stuff
Don't wanna be here? Send us removal request.
backendspace · 2 years ago
Text
Static methods and IO
Although we can mock & test static methods, still it's bad for testability.
Another practice is to not have static methods for IO calls.
0 notes
backendspace · 3 years ago
Text
Authentication: authz
Authorization: authn
0 notes
backendspace · 3 years ago
Text
Erasure coding clusters is an encoding done with the data that enables hdfs to have the same replication with the advantage of reducing the storage overhead by 50%.
0 notes
backendspace · 3 years ago
Text
Fanout
Fanin
0 notes
backendspace · 3 years ago
Text
Circuit breaker
Hystrix is a circuit breaker
0 notes
backendspace · 3 years ago
Text
Cascading errors among other services.
0 notes
backendspace · 3 years ago
Text
Immediate failures are good, timeouts are bad.
0 notes
backendspace · 3 years ago
Text
Spring boot has a concept of service thread pool where we can adjust the threads.
0 notes
backendspace · 3 years ago
Text
Service mesh
Manager the micro services without writing code to manage them. An example of service mesh is istio plus envoy proxy.
It runs as a sidecar pattern of running an instance with the application server and doing the operations.
It can also help with load balancing, routing based on user agent say, etc .
I have to check if kubernetes can do something similar.
0 notes
backendspace · 3 years ago
Text
Want to speed things up? Run close to the data.
0 notes
backendspace · 3 years ago
Text
If using a Kafka lossy cluster here are the reasons for message loss
Executor crash or restart
Kafka rest proxy restart
Kafka leader failover
0 notes
backendspace · 3 years ago
Text
If you cannot solve for all, solve for the most important ones.
0 notes
backendspace · 3 years ago
Text
A Kafka topic can dump data into a Hive table, not sure if this is available out of the box
0 notes
backendspace · 3 years ago
Text
Hive is also used like this
A Spark application outputs into a Hive table t1, then another spark application pulls from table t1 and dumps to table t2, third application pulls from t2 and dumps to t3.
This is not happening sequentially as Spark AFAIK is used for batch processing, so each of the applications would run at fixed intervals.
0 notes
backendspace · 3 years ago
Text
You need to have different SLAs for different tiers.
Just like the deadline in Jira for different priorities
0 notes
backendspace · 3 years ago
Text
Hive sync issues
Autofix inconsistent tables if possible
Report inconsistent tables that are not fixable
Do the above 2 within the freshness SLA
0 notes
backendspace · 3 years ago
Text
Hive
Yarn
HDFS
Airflow
Spark
Flink
0 notes