naybnet-tech-blog - Tumblr blog

naybnet-tech-blog · 2 years ago

Text

Application Security : CSRF

Cross Site Request Forgery allows an attacker to capter or modify information from an app you are logged to by exploiting your authentication cookies.

First thing to know : use HTTP method carefully. For instance GET shoud be a safe method with no side effect. Otherwise a simple email opening or page loading can trigger the exploit of an app vulnerability

PortSwigger has a nice set of Labs to understand csrf vulnerabilities : https://portswigger.net/web-security/csrf

Use of CSRF protections in web frameworks

Nuxt

Based on express-csurf. I am not certain of potential vulnerabilities. The token is set in a header and the secret to validate the token in a cookie

Django

#security #appsec #csrf

0 notes

naybnet-tech-blog · 2 years ago

Text

Data engineering on GCP

This is a practical path to data engineering. As a result, we will not start by fondamental of distributed systems and data storage but will directly use cloud based tools to understand how to build data pipelines.

We will mainly use Google Cloud Skill Boost as source of material.

Learning path :

Apache Beam : motivations for this batch and streaming programming model https://youtu.be/owTuuVt6Oro

0 notes

naybnet-tech-blog · 2 years ago

Text

The pony programming language

It looks like this deals with some concepts I am interested in learning about : actor systems, distributed scheduling. I will try to answer 2 questions :

- why was this language created - what are actor systems

The early history of pony

This blog post gives a fair overview of the reason for the language existence : https://www.ponylang.io/blog/2017/05/an-early-history-of-pony/

When working of low-latency - high throughput environments, exchanging data is a challenge. You need a system able to deal with high concurrency. It needs to be very performant : compiled but with shared use of data.

How to avoid counting references to a same object (slow for performance)? How to avid making to many copies of the same object (same issue) ?

Is C/C++ the language adapted to do this ? Apparently not. Why ? data races & deadlocks. Difficult to debug => loss of productivity for developers.

Is the actor system a solution to this ? Maybe.

Pony is...

statically typed

compiled

highly concurrent

based on the actor model

object oriented

based on composition rather than inheritance

Pony has...

a garbage collector (which implications on performance?)

traits

interfaces

What is this thing ?

Primitive : like a class but immutable and a singleton => enum, collection of functions. Have functions _init and _final

Behaviour : a behaviour returns None the way an async function would return a Promise since the computation has not been done yet.

Fun stuff

Pony does not have a null.

It also does not have global variables.

Pony assignment are expressions (and not statements). So they return a value. Funny enough they return the old value before assignment rather than the new.

There are no locks so no deadlocks.

Actor system

An example of actor-model language ? Erlang, Elixir, or Akka

Actors communicate with each other through message passing.

Hypothesis : learn pony to learn rust later ? Yay or Nay.

Question ? Asnychronicity allows concurrency ? What are the other ways to have concurrency ?

#actors #concurrency #race condition #asynchronous #programming languages

0 notes

naybnet-tech-blog · 2 years ago

Text

Search for the web

https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html

I want to understand the inner workings of elasticsearch. Starting from a book looks more interesting than reading documentation :”We wrote this book because Elasticsearch needs a narrative“

What is es

A document-oriented database which also indexes the document content. Serialization is done in JSON.

storing = indexing ⇒ (an index is a database). Relational databases add an index, elasticsearch uses a structure called an inverted index.

field ⊂ document ⊂ index ⊂ cluster

Every field in a document is indexed. Mapping types have been deprecated so we create a field type to signal that the document is or type employee.

POST /megacorp/_create/1 { "first_name" : "John", "type": "employee", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] }

The concept of relevance is central to elasticsearch

#elasticsearch

0 notes

naybnet-tech-blog · 2 years ago

Text

Refresher on C++

Now that I am writing C++ again (the last time was at least 8 years ago), I am rediscovering a syntax long forgotten like :

the use of const keyword

what does a const argument mean

what is a const method in a class

when to pass an argument by reference

what is a virtual method of a class

how to override a virtual method in a child class

Constructors

They can be initialized with member initializer list, whose syntax is the colon character :, followed by the comma-separated list of one or more member-initializers

Nuggets of knowledge

C++ templates used at scale can slow down compile time (https://youtu.be/rX0ItVEVjHc?t=545)

0 notes

naybnet-tech-blog · 2 years ago

Text

Javascript interpreters

I was looking around for a simple tutorial on building a Javascript interpreter. Not only did I find a great example (https://github.com/MarcoLizza/tiny-js) but it led me to the Espruino project (code here https://github.com/espruino).

Expruino is a very light JS interpreter designed to work on microcontrollers.

0 notes

naybnet-tech-blog · 2 years ago

Text

Back to c++ with a new challenge

Thenks to a wonderful blog post from kipply’s blog, I’ve started to implement the Raytracing in a weekend book (see here: https://raytracing.github.io/books/RayTracingInOneWeekend.html) and I am having a blast!

0 notes

naybnet-tech-blog · 2 years ago

Text

It’s been a long time

Eventually I did not go to work in Data Science, technologies like Hadoop have gone out of fashion but I feel like I have a million new things to learn.

So I’m reviving this blog.

#restart

0 notes

naybnet-tech-blog · 12 years ago

Text

Hadoop and the Amazon Cloud - Lesson 2 - Introcuding cloud computing

Source : BigDataUniversity

Not a new concept

Giving access to the masses of the performance of a large number of servers

Illusion of an inifinite number of servers

You rent the servers for the amount of time you are using them

You pay for the amount of CPU that you use

Advantages of cloud computing :

You don't need to set up a complicated and expensive IT system (capital->operating expense)

You do not need to plan for the maximum of CPU you need, you can scale on demand

You can set it up in a few hours instead of a month for an IT farm

Cloud computing service models

Infrasctucture as a service IaaS : you don't worry about the infrastructure, it is dealt with by the cloud service

Platform as a service PaaS : you don't worry about middle infrastructure

Software as a service SaaS : you don't worry about the software, you suscribe to a monthly payment to use it

(Hardware as a service, Cloud as a service...)

Concerns of cloud computing

Security, privacy -> private cloud

Highly transactional workloads

Apps with complex regulations, resiliency

Why to use cloud

Proof of concepts

Development, tests

Apps with highly variable workloads

#cloud computing #lecture notes

0 notes

naybnet-tech-blog · 12 years ago

Text

Hadoop Fundamentals 1 - Lesson 1 - Introduction to Hadoop

Source : BigDataUniversity, Hadoop Fundamentals 1

Lesson Summary

Use of Hadoop :

for large amount of data in relational database (~10TB), for unstructured data (Facebook...)

Framework written in Java, MapReduce as foundation

Data structured or unstructured

Massive parallel processing, no immediate -> bash

not possible to randomly access data or sequentially access data

not replacement for relational DB

Other products :

Lucene : text search engine library in Java

Hbase : Hadoop DB

Hive : data warehousing , query load data

Pig : high level language for analysing large datasets

Jaql : query language in Javacsript

Zookeeper : configuration

Avro : data serialization system

UIMA : architecture for dealing with unstructured data

Limits of Hadoop, it is not good :

when lot of small datasets

for low atency data access

for random access

when work cannot be parallelized

for intensive calculations with little data

Lab Summary

Comments and perspectives

#hadoop #lecture notes

2 notes · View notes

naybnet-tech-blog · 14 years ago

Link

0 notes

naybnet-tech-blog · 14 years ago

Text

Conférence Strata

Un panorama très instructif de la situation actuelle de la "science des données" : la conférence Strata organisée en janvier dernier par la maison d'éditions O'Reilly.

http://bit.ly/rc1qOW

http://siliconangle.tv/channels/StrataConf

0 notes