Snippets and micro blogging of interesting/nuanced backend & data related stuff
Don't wanna be here? Send us removal request.
Text
Static methods and IO
Although we can mock & test static methods, still it's bad for testability.
Another practice is to not have static methods for IO calls.
0 notes
Text
Erasure coding clusters is an encoding done with the data that enables hdfs to have the same replication with the advantage of reducing the storage overhead by 50%.
0 notes
Text
Spring boot has a concept of service thread pool where we can adjust the threads.
0 notes
Text
Service mesh
Manager the micro services without writing code to manage them. An example of service mesh is istio plus envoy proxy.
It runs as a sidecar pattern of running an instance with the application server and doing the operations.
It can also help with load balancing, routing based on user agent say, etc .
I have to check if kubernetes can do something similar.
0 notes
Text
If using a Kafka lossy cluster here are the reasons for message loss
Executor crash or restart
Kafka rest proxy restart
Kafka leader failover
0 notes
Text
If you cannot solve for all, solve for the most important ones.
0 notes
Text
A Kafka topic can dump data into a Hive table, not sure if this is available out of the box
0 notes
Text
Hive is also used like this
A Spark application outputs into a Hive table t1, then another spark application pulls from table t1 and dumps to table t2, third application pulls from t2 and dumps to t3.
This is not happening sequentially as Spark AFAIK is used for batch processing, so each of the applications would run at fixed intervals.
0 notes
Text
You need to have different SLAs for different tiers.
Just like the deadline in Jira for different priorities
0 notes
Text
Hive sync issues
Autofix inconsistent tables if possible
Report inconsistent tables that are not fixable
Do the above 2 within the freshness SLA
0 notes