padewitte - Tumblr blog

padewitte · 9 years ago

Link

My blog is moving to a new place.

2 notes · View notes

padewitte · 10 years ago

Text

Processing large MongoDB collection with Apache Camel 2.16

The new 2.16 Camel version is coming with a small improvement in MongoDB connector. The new outputType attribute allows you to choose the output type when performing a findall (listing all documenents matching a query). In the previous versions of camel the object retrun by this operation was a List. This implementation has the drawback of loading all the documents found in database in ram. Event if you use the CamelMongoDbBatchSize header when querying for a large volume of documents you were not able to process the result. By choosing to set the outputType to DBCursor you can now process the result of any findall query. Documents are fetch from the database bucket by bucket and it does not fill your ram. With this addition you can now process large collection and for example write a reshape treatment or enrich docuents with informations contained in another storage.

I wrote a groovy script showing this new option in action.

Priorir to launching script lower the maximum ram available for groovy by setting maximum size to 664M export JAVA_OPTS="$JAVA_OPTS -Xmx64M".

MongoDB should run localy with no security activate for test database otherwise change the mongoClient initialization.

First feed the base by running the script with FEED parameter ( groovy CamelMongoDBCursorExample.groovy FEED )

Try to list all document by running the script with LIST parameter ( groovy CamelMongoDBCursorExample.groovy LIST ). You should face a OutOfMemory error.

Run again with CURSOR parameter ( groovy CamelMongoDBCursorExample.groovy CURSOR ) all documents are print in the console.

https://gist.github.com/padewitte/1e159475b0ae7a9b6b8e

Hope it will be usefull.

PS : As Camel and Groovy are very flexible there are many ways to solve a problem. Feel free to comment the gist.

#camel #mongo #groovy

2 notes · View notes

padewitte · 10 years ago

Text

Building a multi-player quizz with socket.io and apispark

With my dear friend Bruno we spend few hours in the past weeks to build an online multi-player quiz. The goal was to learn how to use new tools and libraries then share our work during the Summer of API hackaton.

Choosing our data-set

We both leave in Nantes for a long time and we do not know the story of our famous people. All of them have a park, a street or an avenue at their name. They sound familiar to use but we do not know their story. We decided to use the open data set containing all the street of our city and mix it with the Wikipedia list of famous people related to Nantes to build a multi-player quiz. Very basically, you log in, you then have a question prompt and three proposition. Is up to each player to respond as fast as possible.

Technical stack

Server stack

When you want to have fun and quickly share your idea Node.js is the only solution and the best cloud provider to host is Clever Cloud. No discussion there.

User interface

We both have skill in angular and did not see any reason to experiment a new library. We just give a try to getmdl.io in order to have a “Material” view. If you try a quiz you will quickly see that design is not one of my skill.

Coordinating clients and server

In a multi-player quiz the coordination between every players is the key. To achieve it we need a fast and reliable communication protocol. In a previous experience we tried to build a video chat with webrtc. Despite a lot of effort we did not manage to make it worked as expected but we really see a big potential in web-socket. We really wanted to show the power of them in our new application. That’s why we decided to use socket.io. It is a very elegant and simple to set-up library. The most tricky part was to specify all the client state we need in order to build the server the right way. Thanks to socket.io we only focus on the quiz logic and the mapping of external api.

Here is the first version of the flow. This picture help use a lot building our app.

After defining the flow we build a state diagram and we were ready to code.

Serving the questions

Our first option was to use directly the wikipedia api to build questions from the first sentence of selected article. It appears quickly that we will need to also link with this question the right answer, the proposal and the words to hide in the wikipedia article - it is not always the same rule, for example for Lamoricière we need to hide Lamoricière but also Moricière. As we really want to go fast and do not want to use all Wikipedia articles we decided to store our questions in a Google sheets and use it in our application throught APISpark

Setting-up a api with data stored in a Google sheet

With APISpark serving a google sheet through a API was very quick. First we had to create a entity store and link it with our google account.

Once you choose the spreadsheet, by default each column name is used as a property for the entity. The first column is used as a primary key.

To expose those data as an api you will need then to deploy and export it. The position in user interface of this option is not obvious.

After that we just open up the api to anybody by changing the settings of the generated webapi.

You see it was very simple to expose the data. You understand know why we did not focus on grabbing automatically data from Wikipedia.

Adding more information with answers

One we got the api and the flow with a nice technical stack building screens and server was a matter of time. We also wanted to give more information when serving an answer like showing to user views of streets named with famous people of our city. For that we need to find the street named with the famous people attach to it GPS coordinate and use street views api. A data-set of street names exits for our city. We tried to load it in a Google spreadsheet and search for street in it with APISParck but currently you can not search with a non exact match in columns values. I pretty sure it will come one day. In order to full-fill our need we add to load the data in a MongoDB instance hosted in MongoLab (we could also have use the Clever Cloud service) and query them to find all the streets names. We used mapquest to get the GPS coordinate from street name and then build the street view uri.

Playing with API was fun

It was very fun to play with all those api. You could enjoy the result at mille-sabords.cleverapps.io and browse the code at https://github.com/bbonnin/quizz-nantes .

#APISpark SummerOfAPI Socket.io CleverCloud

2 notes · View notes

padewitte · 10 years ago

Link

0 notes

padewitte · 11 years ago

Text

Querying MongoDB with SQL

SQL ? MongoDB ? No keading !

Do not look at me like this lovely animal. Yes maybe one day one of your colleague will tell you "I know MongoDB is great, i know that flexible schema is great, i know you can scale easily ... but i want to explore your data". Maybe he is a smart guy and he will learn aggregation framework in the next days but most probably he will ask you if he can query your database with SQL. Structured Query Language appeared in 1974 and is teach in every computer science school. Not only developers can write SQL queries. Even if MongoDB documentation is great, learning aggregation will sometime not be possible for some users. That why i was looking for a way of quering my MongoDB database with SQL.

Solutions

During last weeks, I explore various solutions :

Hive based solution with MongoDB Hadoop Connector as describe by my ex-colleague Bruno

JDBC Driver wrapping MongoDB driver

MOSql a solution that tail the oplog and import your data in a Postgresql database written by Stripe

Each one has a drawback. Hive is not fully sql compliant. The existing JDBC driver is just a prototype and a lot of work remains. MOSql is copying data. For a huge database it is not a solution.

Postgresql "MongoDB Foreign Data Wrapper"

My last read was MongoDB Foreign Data Wrapper (FDW) for Postgresql. It implements a api than allows the open source sql engine to read data not stored directly in the database. CitusData wrote such a FDW for MongoDB. It wraps the MongoDB C driver. With this add-on you can connect to Postgresql and write a query in plain sql without moving the data. Access to data is read only. It comes with some limits. You will have to wrap attributes with upercase letter in name with double quote and attribute name is limit to 63 char which is a Postgresql limit.

Installation

Installation is quite easy assuming Mongodb and Postgresql are already installed and running. I test on a debian docker container and the data wrapper is not working (i did not go deaper) but it works fine on a Ubuntu (even in a docker container).

Install postgresql dev package (needed to compile a FDW)

sudo apt-get install postgresql-server-dev-9.3

Clone and compile MongoDB FDW

git clone https://github.com/citusdata/mongo_fdw.git cd mongo_fdw/ sudo make install

Load the US zip collection in mongodb

mongoimport -d test -c zips --drop zips.json

Connect to postgres

psql

Load extension

CREATE EXTENSION mongo_fdw;

Describe the server connexion

CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw OPTIONS (address '127.0.0.1 ', port '27017');

Create a foreign table

CREATE FOREIGN TABLE mg_zips("_id" TEXT, city TEXT, state TEXT, pop NUMBER ) SERVER mongo_server OPTIONS (database 'test', collection 'zips');

Searching city name present in more than 20 states

select city from mg_zips group by city having count(distinct(state)) > 20;

Docker

To those who wants to tries this without installing Postgresql and the data wrapper i wrote a DockerFile based on official Postgresql container but with ubuntu. To use it :

Build and launch the container

sudo docker build -t mongodbfw . docker run --name a-mongodbfw -d

And then connect

docker run -it --link a-mongodbfw:postgres --rm postgres sh -c 'exec psql -h "$POSTGRES_PORT_5432_TCP_ADDR" -p "$POSTGRES_PORT_5432_TCP_PORT" -U postgres' At this point you will just have to create the extension and the foreign table before starting querying data. Don't forget that your MongoDB server is not launch in your container. You may use the docker interface of your host (172.17.42.1) in the server description step.

Conclusion

Now you can tell your colleague "Please use my MongoDB database. Query it with your SQL tool." A further blog will talk about performance and about possible use cases for this solution.

#MONGODB #sql #postgresql #query #docker

0 notes

padewitte · 11 years ago

Link

Realy like this reading about the story of storm. Nathan as a brillant guy use all marketing tricks he could to touch a maximum of people.

Apache Storm recently became a top-level project , marking a huge milestone for the project and fo...

0 notes

padewitte · 11 years ago

Link

You've hit some bottleneck in MongoDB. Now what do you do? How do you figure out what part of your system is causing the problem? MongoDB offers a number of tools for diagnosing performance issues and monitoring areas of your application and infrastructure that may need additional resources or attention. These include mongostat, mongotop, various parts of MMS and mtools. In this talk we will walk through a number of performance scenarios and use these tools to diagnose problems and common (and not so common) pitfalls in your MongoDB cluster.

I loved this talk at MongoDBWorld. Nice intro if you want to spot the worst queries of your database. No more reason not to use [mtools](https://github.com/rueckstiess/mtools).

#Mongodb mongodbworld tools

0 notes

padewitte · 11 years ago

Link

Viedo of my talk at MongoDBWorld

#mongodb #mongodbworld

0 notes

padewitte · 11 years ago

Link

mongotools

Fastest way to evaluate the memory usage of a collection for your #MongoDB DB

See also MongoDBWorld presentation by @alonhorev for explanation about memory managment.

#mongodb #tools

0 notes

padewitte · 11 years ago

Link

Mongo-snippets - snippets of code that might be useful

Realy usefull python script to quicly create a testiing environnement on your laptop. Just trie it today and already quain a lot of time. I launch a replicaset in one command rather than 15. Enjoy.

#usefull #mongodb #tool

0 notes

padewitte · 11 years ago

Link

Fondateur de plusieurs start-ups dont Doctissimo, Alexandre dirige DNAVision, spécialisé dans le séquençage de l'ADN. Dans cette vidéo, il nous dévoile l'incroyable impact des biotechnologies sur l'humanité !

Discours très optimisme malgré la gravité des sujets abordés. Cela change vraiment ma vision du lien entre l'informatique et la médecine.

0 notes

padewitte · 11 years ago

Link

My usefull MongoDB links

0 notes

padewitte · 11 years ago

Link

Everything you need to know about web development. Neatly packaged. Learn HTML, CSS, Javascript, Python, Rails, Node, and more in each box with a set of links.

0 notes

padewitte · 11 years ago

Link

gor - HTTP traffic replay in real-time. Replay traffic from production to staging and dev environments.

#unixTooling #ProductionStuff

0 notes

padewitte · 11 years ago

Link

Court et excellent.

Learn Java SE 8 by example: Lambda Expressions, Default Interface Methods, Method References, Streams, Date API, Annotations and more

0 notes

padewitte · 11 years ago

Text

Minmap pour bien débuter avec GIT

Voici le résultat de ma veille techno et de mes premiers mois d'utilisation de GIT.

http://mind42.com/public/f541dd08-2101-4807-968d-12511db511c4

#faitesTourner

0 notes

padewitte · 11 years ago

Link

Impressionnant la sélection dans un histogramme !

0 notes