#graphdb | Explore Tumblr Posts and Blogs

iamvolonbolon · 4 years

Text

Exploring Databases

A requirement of almost any program is to persist data and retrieve data. That’s why we have Databases. A Database is usually designed to store and manages a large amount of data. A database has to be accurate, with all sorts of internal checks, giving integrity to the data it manages. Since they are a solution to a pervasive problem, it is easy to see why they have been developed since the early days of CS, and why we have different flavors, for different needs. Let's review them.

Key-Value

The most simple approach is a hash pairing keys and values. Fast and easy to use, they are popular to build caches. Because they hold data in memory, there is a limitation to the amount of data at their disposal, but at the same time, by avoiding round trips to slow second memory, they are super fast. They are also limited in the interface. No fancy queries, JOINs, or anything like that. Just read and write. Let's see an example in Redis

# redis-cli > 127.0.0.1:6379> SET maurice_moss reynholm OK > 127.0.0.1:6379> GET maurice_moss "reynholm"

Best for: Reduce data latency. Usually deployed on top of some other database used to persist data. Popular alternatives: Redis, MemCache

Wide Column

We can stretch the value part of a key-value DB, to store a set of ordered rows, and then we have a wide column DB. That way we can group data together and associate it with the same key. These databases don't have a schema, and they can easily handle unstructured data I know, I know, Cassandra does have a schema. That's true. It is also true that it was developed schemaless. Schemas were added later. We can interact with them with some languages (like CQL), that usually are similar to the most popular SQL, but limited (still no fancy operations like JOINs) Because of its nature, they are easy to replicate and scale-up. And no, the reason they are easy to replicate is not that they are NoSQL, it is because they relax on the ACID requirements. You see, read scaling is not that hard. Bottlenecks appear only when introducing JOINs and that kind of operations, which can be opt-out even in RDBMS. The problem is to scale up writes. If you want to speed up writes, then you will need to relax on atomicity by shorten the time tables are locked (like MongoDB), consistency which let's scale-up in a cluster of nodes (like Cassandra) or durability holding everything on memory and avoiding round trips to disk (as we saw already, Redis). In fact, these types of databases are popular in applications where writing is much more frequent than reading. Let's imaging a system that persist readings from a vast array of wheather stations:

cqlsh> CREATE KEYSPACE IF NOT EXISTS mycassandra WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE mycassandra; cqlsh:mycassandra> CREATE TABLE IF NOT EXISTS wheather (temp float, pressure int, humidity float, location varchar, time timestamp, PRIMARY KEY(location)); cqlsh:mycassandra> INSERT INTO wheather (temp, pressure, humidity, location, time) VALUES (23, 1016, 90, 'Buenos Aires', toTimestamp(now())); cqlsh:mycassandra> INSERT INTO wheather (temp, pressure, humidity, location, time) VALUES (18, 1030, 72, 'Lisbon', toTimestamp(now())); cqlsh:mycassandra> SELECT * FROM wheather; location | humidity | pressure | temp | time --------------+----------+----------+------+--------------------------------- Lisbon | 72 | 1030 | 18 | 2020-09-24 22:25:44.563000+0000 Buenos Aires | 90 | 1016 | 23 | 2020-09-24 22:24:36.110000+0000 (2 rows)

Best for: Backing IoT Popular alternatives: Apache Cassandra, Apache HBase, Cloud Bigtable

Document DB

They are based on documents, where each document is a container of key-value pairs. They are unstructured and don't require a schema. Documents are group together in collections, and fields within collections can be indexed. Collections can be organized in hierarchies, allowing some kind of relational modeling. Still no JOINs. Denormalization is encouraged, because of this, write operations could be a little slower, but, as we saw earlier, they relax on ACID requirements to achieve better performance.

root@7747c048549d:/# mongo MongoDB shell version v4.4.1 > use reynholm_employees; switched to db reynholm_employees > db.it.save({first: "Maurice", last: "Moss"}); WriteResult({ "nInserted" : 1 }) > db.it.save({first: "Roy", last: "Trenneman"}); WriteResult({ "nInserted" : 1 }) > db.it.save({first: "Jen", last: "Barber"}); WriteResult({ "nInserted" : 1 }) > db.it.find({first: "Maurice"}); { "_id" : ObjectId("5f6d2a4ced7dc6a9061ed522"), "first" : "Maurice", "last" : "Moss" } >

Best for: They are very popular in IoT and content management. They are also great to start if not sure about how data is structured. Popular alternatives: MongoDB, Apache CouchDB

RDBMs

Very popular, and one of the older paradigms. They are a collection of multiple data sets organized in tables with a well-defined relationship between them. Each table is a relation, each table record (row), contains a unique data instance defined for a corresponding column category. One or more data or record characteristics relate to one or many records to form a functional dependency (normalization). *One to One: One table record relates to another record in another table. *One to Many: One table record relates to many records in another table(s). *Many to One: More than one table record relates to a record in a different table. *Many to Many: More than one record relates to other records in different tables. We can interact with them with SQL (Structured Query Language) languages. Normalization requires a schema, which can be tricky if the data structure is not known in advance. The flip side is that we finally get to play with JOINs

mysql> CREATE TABLE orders ( -> order_id INT AUTO_INCREMENT PRIMARY KEY, -> timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP -> ); Query OK, 0 rows affected (0.02 sec) mysql> CREATE TABLE details ( -> product_id INT AUTO_INCREMENT PRIMARY KEY, -> name VARCHAR(100), -> qty INT, -> order_id INT -> ); Query OK, 0 rows affected (0.02 sec) mysql> INSERT INTO orders (order_id) VALUES (NULL); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO orders (order_id) VALUES (NULL); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO details VALUES (NULL, 'Apricots', 4, 1); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO details VALUES (NULL, 'Bananas', 2, 1); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO details VALUES (NULL, 'Eggfruit', 1, 2); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO details VALUES (NULL, 'Blueberries', 3, 2); Query OK, 1 row affected (0.00 sec) mysql> SELECT o.order_id, o.timestamp, d.name, d.qty FROM orders o INNER JOIN details d ON o.order_id = d.order_id; +----------+---------------------+-------------+------+ | order_id | timestamp | name | qty | +----------+---------------------+-------------+------+ | 1 | 2020-09-25 12:32:02 | Apricots | 4 | | 1 | 2020-09-25 12:32:02 | Bananas | 2 | | 2 | 2020-09-25 12:32:06 | Eggfruit | 1 | | 2 | 2020-09-25 12:32:06 | Blueberries | 3 | +----------+---------------------+-------------+------+ 4 rows in set (0.00 sec)

Best for: Perhaps the most popular family of DBs, and essentials when data integrity is a must (financial). Popular alternatives: MySQL, PostgreSQL

Graph

In graph DB, the relationships between elements are first-class citizens, they are treated exactly the same as the elements. From a mathematical point of view, the relations are edges of a graph where the elements are nodes. Edges are always directed. It is far more efficient to traverse the data. We can specify edges, or move across the entire graph. Because the graph is already built, there is no need to compute JOINs and the performance is thus greatly improved. neo4j is perhaps the most popular graph database out there. In the sandbox it provides, there is a database with movies, actors, and directors. To compute the Tom Hanks number two we can do something like

MATCH (a:Person{name: "Tom Hanks"})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(b:Person) MATCH (b:Person)-[:ACTED_IN]->(n:Movie)<-[:ACTED_IN]-(c:Person) WHERE c <> a AND NOT (a)-[ACTED_IN]->()<-[:ACTED_IN]-(c) -> RETURN c.name

Best for: Anything that can be expressed as a graph. Very popular with engine recommendations, and fraud detection. Popular alternatives: MongoDB, Apache CouchDB

#database #keyvalue #redis #cassandra #hbase #mongodb #mysql #graphdb #neo4j #acid #base

0 notes

ms4446 · 4 years

Link

#learning #learningneverstops #learningeveryday #continuous learning #selflearning #keeplearning #happylearning #learnsomethingnew #learningisfun #quarantine diaries #newskills #cosmosdb #azure microsoftazure neo4j graphdb ADLS ADX datalake

1 note · View note

jacquelinejosh321-blog · 5 years

Link

#neo4j graphdb foundations with cypher download

0 notes

fesbrainlara-blog · 5 years

Link

#neo4j graphdb foundations with cypher download

0 notes

hackernewsrobot · 4 years

Text

Build your next app with a graph database

https://dgraph.io/blog/post/graphdb-for-your-next-app/ Comments

1 note · View note

bhrammi · 6 years

Link

An AWS option for the AWS native fans , However there are quite a few options for users to explore beyond Neptune.

#graphdb

0 notes

ericvanderburg · 1 year

Text

Ontotext Introduces ChatGPT Capabilities in GraphDB 10.3

http://i.securitythinkingcap.com/St8w4J

0 notes

dataandco · 7 years

Link

#tutorial #arangodb #graphdb #aql #free #training #course

0 notes

eonj · 5 years

Text

블록체인 대잔치

4차 산업 혁명(이하 4IR), 창조경제(이하 CI/CE; 창조산업과 더불어), 이런 버즈워드로 인해 IT업계가 들썩거리는 것은 사실 어제오늘 뉴스가 아니다. 많은 사람들은 정보기술과 통신기술의 고도화에 대한 이해를 포기하고 이를 교양지식에 편입하는 것을 거부해 왔다. 이 때문에 컴퓨터와 소프트웨어 분야에서 새로이 만들어지는 무언가의 중요성을 사회에 널리 알리기 위해서는 다른 분야에서 요긴하게 쓰이는 키워드를 조합하여 새 어휘를 탄생시키는 방법이 널리 사용되었다. 한때 최악의 버즈워드는 Web 2.0이었고 Ubiquitous였으며 Big data였다; 희한하게도 이것들 모두 결국 실체가 있게 되었지만 말이다. 놀랍게도 Artifial intelligence와 Deep learning도, Smart technology나 Internet of Things도 결국 그렇게밖에 부를 수 없는 애매한 무언가로 발전한 것이다. (이들 각각에 대한 이야기는 나중에 따로 풀 기회가 있을 것이다.)

4IR이나 CI/CE가 비슷한 길을 걷고 있긴 한데 거기에 블록체인이 들어갈지는 잘 모르겠다. 4IR이나 CI/CE는 역사적인 버즈워드들에 비하면 목표지점과 범위가 명확한 편인데 애초에 이들 둘 다 블록체인 산업과는 거리가 먼 개념이기 때문이다. 블록체인 산업을 육성하는 정책으로는 4IR과 CI/CE에 가까워질 수 없다는 것이다. 그러나 희한하게도 아무튼 블록체인은 개인의 참여적 생산을 전제하고 창조성을 보장한다는 면에서 4IR과 CI/CE와 깊은 연관이 있는 것으로 비추어지며 블록체인 기반으로 4IR이나 CI/CE를 바라보는 프로젝트가 많이 생겨나고 있는 것도 사실이다. 도대체 무슨 일이 일어나고 있는 것일까?

4IR

이런 복잡한 상황에 대해 제대로 이해하려면 일단 4IR과 CI/CE에 대한 개념을 먼저 이해하는 것이 좋겠다. 4IR은 비교적 최근에 알려져서 꽤 생소한 단어이지만 한국어 화자 문화권에는 이전부터 널리 알려진 유사개념이 있다: 정보화의 물결(3W). 3W는 농업-산업-정보화로 이어지는 세 번째의 인류문화사적 혁명이 실현되고 있다는 이론이다. 대체로 농업 시대는 노동집약적이었고, 산업 시대는 그 중간 특성을 띠며, 완전한 정보화의 시대에 와서야 모든 것들이 지식집약이게 되는 것으로 간주된다. 또한 이 새 시대에는 모든 시장활동이 개인화되고 정부 서비스가 똑똑해지며 개인의 생산이 보장될 것으로 기대된다. 여기까지는 대한민국 중등교육 수준 교양으로 꽤 널리 알려진 이야기이며 3W는 거의 실존하는 것으로 취급된다.

그러나 사실 3W 이론은 상당한 비판을 마주하였다. 산업 혁명의 경우 대량생산의 시대를 불러온 굵직한 사건들의 실체가 분명하며 국가체제와 전쟁 수준에 이르는 엄청난 변화를 가져온 것이 확실하지만, 정보 혁명의 경우 소수주체가 없고 변화양상이 너무 작고 다양하기 때문이다. 또한 3W 이론은 정보화의 토대가 된 기존의 산업구조를 도구화/타자화하여, 대강 결국 모든 것이 정보화를 통해 다시 뒤집어질 것이라고 말하고 있다고 할 수도 있다.

3W 이론이 대륙간 갈등과 대결 구도에 알맞은 이념적 특성을 지닌다는 지적도 있다. 농업 혁명은 호모 사피엔스가 아프리카 대륙을 나와서 사람속 유인원 중 최강자가 되고 인류의 문화가 존속하는 데 결정적인 역할을 하였으며, 산업 혁명은 그 주도권을 서유럽 국가들에게 넘겨주는 계기가 되었다. 정보 혁명이 정말로 산업 혁명만큼의 파급효과가 있다면 세계 질서의 주도권은 미국으로 완전히 넘어갈 것이다

4IR은 3W와 비슷한 미래를 예견하지만 이념적으로 완전히 대척점에 있는 이론체계이다. 4IR 이론은 오늘날 인류가 산업 혁명의 네 번째 단계에 있다고 판단한다. 첫 번째 단계는 흔히 산업 혁명 그 자체로 표상되는 증기기관과 같은 동력원을 통한 자동화였다. 두 번째 단계는 자동화를 통한 여러 공학적 분야의 발전이었는데, 하나는 재료공학적 양질 대량생산과 그에 따른 소형/정밀 기계공정의 발전이다. 다른 하나로 더 중요한 것이 모터 즉 전동기와 전력산업이다. 그리고 산업 혁명의 세 번째 단계가 바로 초미세공정의 발전과 컴퓨터의 발명/발전이다. 즉 4IR은 3W의 초기전제를 산업 혁명의 연장선상에 놓는다.

4IR의 핵심은 그 이름에서 보이듯 산업 혁명의 네 번째 단계인데, 컴퓨터의 일반사무적 이용을 넘어 컴퓨터 자체를 생산체계에 결합시켜 생산의 효율과 개별성을 높이고 이를 통해 공정신뢰도와 상품개체의 개인화를 노리는 것이다. 여기서 등장하는 개념으로 대표적인 것이 가상물리시스템(CPS)이며, 그 밖에 전산기계제어와 로보틱스, 인공지능, 산업 IoT -- 그리고 탈중앙화된 정책체계가 있다.

결과적으로 3W나 4IR 중 어느 쪽만이 옳고 어느 쪽은 틀렸다고 말하기는 힘들다. 미국은 WW2 이후 정보전(첩보전; 지능전) 능력을 토대로 세계 패권을 쥐었고, 정보기술과 관련된 국가 간 갈등 양상이 실존하며, 공공서비스나 시장에서의 정보불균형은 상당한 사회갈등을 유발하는 한편 유튜브는 완전히 세계 국경을 넘었다. 그러나 정보를 많이 모으고 대세를 만들기보다는 안전과 정확성에 집중해야 하는 생산체계가 있으며, 시장이 고도로 개인화되더라도 그런 생산체계의 바탕은 결국 대량생산이라는 산업혁명의 물리적 토대에 있다.

CI/CE

CI/CE는 정보 사회의 산업적이지 않은 측면에 대한 분석으로 4IR보다는 좀 더 오래되었으며 다소 무른 이론이다. 사실 이 역시 창조경제나 창조산업보다는 더 널리 알려져 정책적으로도 오래 쓰인 유의어가 있다: 문화산업. 문화산업이라는 표현은 오늘날 대자본의 개입으로 전통적으로 가내수공업적, 장인정신적 기예의 영역이었던 문화요소들이 산업화되는 현상을 표방한다. 건축공학과 건축디자인, 레이아웃과 시각디자인, 제품생산과 산업디자인, 도서/사진/음악/영상(TV/영화) 등이 CI/CE에 포함된다. 그 밖에 CI/CE의 근간을 이루는 영역으로 광고가 있으며 플랫폼으로써 IT와 소프트웨어가 존재한다고 알려져 있다.

CI/CE는 양면성을 갖고 있는데, 산업혁명으로 인한 사회 자본규모의 급성장을 따라 문화산업이 문화콘텐츠를 생산해 내기 때문에 대중의 문화적 획일화와 몰개성을 만들 수 있다는 것이 어두운 단면이다. 그러나 반대로 누구든 문화요소를 문화콘텐츠로 만들어 배포할 수 있는 수단과 플랫폼에 접근할 수 있으며, 이전에는 이런 일이 국가체제 수준에서나 할 수 있는 고비용의 일이었으나 이젠 누구나 대중적 지지를 바탕으로 영향력을 획득할 수 있는 것이다.

CI/CE의 기본적 가정은 바로 인간의 창조성이야말로 인간 사회에서 가장 중요한 궁극적 자원이라는 것이다. 이는 물론 앞서 설명한 농업 혁명과 산업 혁명 등으로 인해 물질적 자원이 충분히 풍요로워졌을 때의 이야기이지만, 반대로 정신적 자원이 물질적 자원의 풍요를 부추기는 현상도 나타난다. 여기서 가장 중요한 것이 바로 지식재산권인데 실용적인 생각이나 고유한 생각을 재산으로 취급하여 소유자에게 그것을 행사할 수 있는 독점적 권리를 부여하는 것이다.

상표권과 정보인권 계통을 제외하고 나면, 지식재산권은 기본적으로 어떤 아이디어나 표현의 고유성을 전제하여 그에 대한 독점적 재산권을 인정한다. 오늘날 근대 국가들은 지식재산권과 관련된 다수의 국제협약을 통하여, 아이디어 수준에서 고유하며 실용성이 있는 것은 실현가능한 것에 특허를 부여하며, 산업적 실용성은 없으나 고유한 표현으로써 실현된 것에는 그 역시 산업적 가치가 내재되어 있는 것으로 보아 저작권을 부여한다. 지식재산권은 즉 재산을 생성하는 원천 역시 재산으로 보는 것이며, 이는 전통적으로 기술적 노하우나 예술적 활동에 가치를 인정하고 보호하던 불문법과도 맞닿아 있다.

이렇게 보면 정책적으로 CI/CE의 그림자를 최대한 줄이고 모든 개인이 아이디어를 내는 주체로서 보호받는 사회가 되어 가는 것 같으나, 인지능력이 좀 떨어지는 게 아니라면 사회 현실이 그렇지 못하다는 것을 알고 있을 것이다. 대부분의 근대 국가 민법/상법 체계에서는 고용인이 생산한 지식재산에 대한 권리는 고용주(사업체)에 귀속되며 이는 회수할 수 없다. 창의성이야말로 진정한 자원인데, 자본 기반의 사업체가 안정된 자본과 시스템을 제공한다는 이유로 창의성 자체를 소유할 권한을 갖는 법적 근거가 있는 것이다. 또한 행정-사법체계가 소자본과 대자본 간의 지식재산 관련 분쟁에서 공정하고 엄정한 대응을 보이지 못하는 모습 역시 허다하다. 가장 큰 문제는 국가 간 관계인데, 지식재산 역시 돈이 돈을 버는 논리를 그대로 따르기 때문에 국제 관계에서 사다리 걷어차기가 일어난다는 것이다. 유형재산이나 화폐라면 분배라도 가능하지, 지식재산은 분배하기도 애매하다.

CI/CE의 그림자가 꼭 지식재산권을 인정하기 때문에 벌어지는 문제는 아니라고 볼 수도 있다. 특허나 저작권보다도 사실 더 큰 것은 안정된 시스템을 운영하는 노하우이며 이것은 국제특허 따위로 공개출원등록되지도 않는다. 이런 부분 역시 사업비밀이나 군사비밀 등 각종 법적 근거로 보호되는데, CI/CE가 이런 모든 역작용을 옹호하고 있는 것은 아니다. 결과적으로 CI/CE 관점에서의 이상적 사회는 누구든 자신의 생각을 널리 알리고 가치를 인정받아 경제적으로도 보상을 받는 것이며 이 때문에 CI/CE의 이상 역시 탈중앙화된 정책체계로 대변된다.

블록체인

이쯤 되면 4IR와 CI/CE가 공통적으로 탈중앙화된 정책체계라는 것을 이상으로 두고 있으며 별 관련도 없어 보이는 둘이 묶여서 희한하게도 블록체인이라는 것이 잘나가는 이유를 알 수 있을 것이다. 그런데 우리는 블록체인 자체에 대해서는 얼마나 알고 있을까? 블록체인이 정말 탈중앙화된 정책체계를 만들어 줄까?

놀랍게도 블록체인은 탈중앙화된 정책체계를 위한 도구가 맞다. 보다 정확히 말하자면, 블록체인은 처음부터 임의의 정책체계에 속한 개체들로부터 정책결정을 검토하고 승인하는 특수관계의 개체의 존재를--즉 ‘중앙’을--제거하기 위해서 고안되었다. 블록체인에 무언가를 넣으면, 중앙 없이도 잘 돌아가게 할 수 있다. 정말 놀라운 일이다. 지금까지 중앙 집중된 관리체계를 유지하기 위해 들어간 모든 비용은 다 허상이었다.

물론 거짓말이다. 블록체인은 그 정책체계의 모든 개체에 적용되는 기본법의 명령이 중앙의 존재를 부정하도록 작성된 경우에만 탈중앙화가 이루어진다. 다시 말하자면 기술적으로 블록체인은 그냥 탈중앙화를 도와주는 수단일 뿐이라는 것이다. 게다가 그 정책체계 역시 대체로 폭력적이기 그지없어서, 절대 다수의 블록체인은 어떤 결정이 승인되는 조건을 과반의 찬성으로 두고 있는 것이 정책 기조이다. 대체 뭐가 문제인지 짚어 보자.

경제학에는 불가능의 삼각형이라는 개념이 있다. 개방경제 모형인 IS-LM-BoP 모형에서 어떤 국가가 고정환율(환율안정), 자본이동자유화(자본통제금기), 통화정책독립성의 셋 모두를 달성하는 것은 불가능하다는 것이며 이는 수학적으로 그럴듯한 증명도 되어 있다. 고정환율로 안정된 미래를 도모하고 시장도 세계에 개방하면 통화정책을 정하는 데 다른 국가들의 눈치를 봐야 한다. 아무튼 많아야 둘을 달성가능하며 최소 하나는 포기해야 한다. 물론 이 모형에 예외는 있으며 경제학답게 거의 모든 국가가 조금씩 예외이지만 말이다.

컴퓨터에도 이와 꼭 닮은 이론이 있는데 CAP 삼각형이라고 한다. 시계열에서 변하는 어떤 자료를 관리하고자 할때 그 자료의 일관성(consistency), 가용성(availability), 분할내성(partition tolerance)의 셋을 동시에 달성할 수 없다는 것이다. 만약 자료가 쉽게 여러 위치에 단순히 나뉘어 보관되어 있으면, 일관성을 맞추기도 어려울 것이고 필요할 때 특정 자료를 찾아내기도 어려울 것이다. 자료를 여러 곳에 똑같이 복제하여 보관하면 이를 피할 수 있나 싶었는데, 또 특정 자료를 찾으려 할 때마다 여러 곳에 있는 자료가 다 똑같은지 점검해야 자료의 정확성을 보장할 수 있거나, 새로운 자료를 넣을 때마다 동일성을 맞춰야 할 것이다.

이론적으로 CAP는 셋 다 맞출 수 없지만 제정신인 IT 기업이라면 셋 다 맞추기 위해 애쓴다. C가 없으면 사용자에게 이상한 것을 보여주게 되고 A가 없으면 사용자와 소통할 수 없는 장애상태가 되며 P가 없으면 시스템에 큰일이 났을 때 돌이킬 수 없게 된다. 이 중 최고는 당연히 구글로 전 지구 각지의 데이터센터에 원자시계를 두고 위성통신을 비롯한 각종 보정기법을 이용해 시계 시간을 최대한 정확하게 맞춘다. 그럼 A와 P를 보장하면서 적어도 C가 너무 오랫동안 크게 잘못되지는 않게 된다.

블록체인의 기술적 의의에 대한 수많은 서술이 있지만 근본적으로 블록체인은 CAP 문제를 다소 다른 방식으로 푼다. 자료에 대한 조작은 결국 변경되거나 열람하거나 둘 중 하나인데, 변경하는 조작의 가용성을 매우 낮추면 열람하는 조작의 가용성을 매우 높인 채로 일관성과 분할내성도 높게 유지할 수 있게 된다. 이는 상태를 변경하는 조작이 근본적으로 시계열에서 일관성과 분할내성에 악영향을 줄 수 있기 때문이다. 따라서 변경 이전의 값과 변경에 투입되는 값을 비교하여 변경 이후에도 일관성이 유지될지를 검증하는 것이 필수로 해결해야 하는 문제가 되며, 널리 알려진 첫 블록체인인 비트코인이 이를 작업증명(PoW)을 도입하여 탈중앙화하여 풀었기에 블록체인은 흔히 비잔틴 문제를 푸는 방법이라고 알려지게 되었고 분산원장기술이라고도 하게 되었다.

한 가지 주의할 점은 블록체인의 핵심 프로세스는 탈중앙화 가능한 정책결정 그 자체라는 점이다. 즉 블록체인이 CAP 문제를 좀 다르게 풀긴 하는데, 분할내성의 기본 가정은 그냥 블록체인에 실존하는 문제로 남았다. 따라서 변경하는 조작이 승인된 이후에도 모든 개체가 그 조작의 결과를 즉시 열람할 수 있는 것은 아니다. 물론 자료들은 분산 보관되기 때문에 일단은 그럭저럭 즉시 열람 가능하다. 블록체인은 처음부터, 그렇게 미처 덜 조작된 상태가 즉시 열람되는 것이, 일관성을 잃지 않는다고 정의한다. 즉 블록체인의 단점은 즉각적으로 최신 정보를 요구하는 시스템에 치명적이다. 또한 블록체인은 과반찬성 승인정책을 따르며 높은 가용성을 보장하려 하기에 대체로 모든 개체들에게 분산 보관되며 따라서 기밀성을 요구하는 시스템에도 치명적이다.

대체 왜 그러세요

정보기술과 통신기술의 각 분야는, 여러 표준이 공존하면서 장점을 살리고 단점을 보완하며 서로 닮아 가는 방식으로 흘러 왔다. 데이터베이스 분야도 마찬가지여서 전통적 강자인 RDBMS/SQL이 있고 그 라이벌인 ODBMS/NoSQL이 있으며 최근에는 GraphDB/GraphQL이 대두되는 등 꾸준히 특정목적의 신식 표준이 분화되고 또 구식 표준은 신식 표준을 참고하며 발전해 왔다. 대체로 장점이 고만고만한 상황에서 기술적 결정은 주로 치명적 단점을 피하는 방식으로 이루어져 왔다.

블록체인은 장단점이 명확한 신기술이다. 위에서 정리했듯 실시간성도 모자라고 기밀성도 보장되지 않는다. 물론 블록체인을 잘 개량하면 실시간성을 조금 개선할 수 있다. 기밀성 역시 개선할 수 있다. 그러나 그렇게 개량된 블록체인의 실시간성과 기밀성조차도 기존의 중앙집중화된 데이터베이스에 비할 것이 못 될 정도로 처참하다.

그러나 블록체인의 장점을 살리는 프로젝트 역시 곳곳에서 진행되고 있다. 이더리움, 리플이나 모네로와 같이 암호화폐인 블록체인의 발전형으로 거래의 일부로 형식화된 계약을 거래내역과 함께 박제하여 승인하는 동시에 성능을 높이려는 시도들이 이루어지고 있다. 네오, 이오스, 스텔라루멘 등은 여기서 더 나아가 암호화폐 간의 환전 플랫폼이 되고자 하고 있으며, 베이직어텐션토큰이나 온톨로지토큰 등은 새로운 실용적 구현이나 독창적 표현물의 유통을 돕는 플랫폼이 되려고 하고 있다. 기술과 컨텐츠를 실어나르는 블록체인이 세계공용화폐가 된다면 그야말로 국경이 없는 사회가 되고 4IR과 CI/CE의 꿈이 이루어질 것이다. 물론 연산성능은 엄청나게 많이 필요할 것이며 다수의 플랫폼이 중앙 통제 기반으로 설계되고 있지만 말이다.

위에 열거한 것은 희망편이다. 절망편을 보자. 한번 블록체인을 통해 개인정보가 유통되면 그것은 영원히 블록체인에 박제된다. 블록체인은 조작기록의 나열이며 통제자가 없으니 완전히 지울 수도 없다. 버그투성이의 코드로 만들어진 스마트 계약이 수두룩하며 허공에서 돈이 사라지고 있다. 완전한 분산결정을 하기에는 너무 많은 컴퓨터의 연산성능이 필요하고 시간적 비용을 극복하기 위해 결국 자료를 충분히 검증하지 않거나 중앙통제자가 검증한다. 기밀성이 훼손되고 일관성이 떨어지며 성능이 나쁘고 결국 중앙통제자가 등장하였다.

개인적으로 블록체인 기반의 신규 서비스/시스템 아이디어 중 가장 경악스러운 것이 두 가지 있었는데 하나는 요즘 소식이 안 들리는 것으로 블록체인에 국민의 의료기록정보를 공공자료로 풀겠다는 것이었다. 블록체인 네트워크는 기본적으로 공개인데 의료나 제약 관련 산업의 부흥을 위해 의료기록과 같은 민감한 정보를 거기에 싣겠다는 기사였다. 암호화야 당연히 하겠지만 기본적으로 모든 암호는 풀린다. 열람할 사람에게 복호화가 가능해야 하므로 양방향 암호화일 것이고, 단방향 암호화도 사실 언젠가 풀리는 건 마찬가지다. 그런 고급 정보를 평문으로 뽑기 위해서라면 암호해독에 총력전이 가해질 것이다.

두 번째는 현재진행형인데 은행에서 금융기록정보를 블록체인으로 관리하겠다고 하는 것이다. 다행히 금융 관련해서는 정부의 감시가 살벌하니 당연히 공개 블록체인으로 관리하지는 않고 사설화하겠다고 한다. 그러나 자료를 공개하지도 않을 것이고, 결국 어떤 권위있는 개체가 데이터베이스의 일관성을 통제할 것이라면, 뭐하러 전체적인 비용을 상승시키는 결정을 하는가? 대체 왜 중앙집중하여 관리해야 CAP와 기밀성을 손쉽게 확보할 수 있고 그래야 하는 종류의 자료를 블록체인으로 관리하는가?

ㅎㅎ는 흑흑입니다만

CAP와 기밀성이 생명과도 같은 자료를 유지하기 위해, CAP와 기밀성에 손해를 보면서까지, 자료를 중앙집중해 관리하지 않는 이유는 무엇일까? 아는 사람도 있고 눈치챈 사람도 있겠는데, 자료를 중앙에서 직접 관리하지 않는 것이 바로 그 목적인 것이다.

자료를 중앙에서 직접 관리하면 권위 있는 개체가 해당 데이터의 CAP라든지 기밀성이라든지 하는 모든 요구사항에 대한 책임권한을 지게 된다. 이는 어떤 사고가 발생했을 시에 금융과 같은 고신뢰 요구 시스템에서 특히 큰 대응비용의 발생을 초래하며 심한 경우 금전적 피해에 대한 배상책임까지도 안게 될 소지가 있다. 그러나 블록체인을 사용하면 기밀성의 책임을 모두에게 지우고, C를 담당하는 외주회사가 있고 또 A를 담당하는 외주회사가 있으며 블록체인이니 자동으로 P가 보장되게 된다. 그렇게 책임의 외주화를 위해 블록체인이 쓰인다.

보안 분야의 격언이 있다. 사슬의 강도가 가장 약한 고리의 강도로 결정되듯, 어떤 시스템의 보안성능은 그 중 가장 약한 서브시스템의 보안성능만큼으로 귀결된다. 권위 있는 단일책임주체의 관리가 필요한 자료에 대한 관리를 분야별로 외주화한다면 조만간 어떤 사고가 발생해도 누가 그 책임을 져야 하는지조차 불분명해지게 될 것이다. 블록체인을 도입하자고 한 갑? 아니면 그것을 내버려둔 슈퍼갑? 아니면 최초에 블록체인을 발명한 익명의 외계인? 미래는 알 수 없다.

적어도 지금 블록체인이라는 마법단어에 홀려 4IR 버스와 CI/CE 기차에 올라탈 수 있다고 생각하며 투자를 감행하고 있는 사람은 정신차리기 바란다. 블록체인이 4IR이나 CI/CE에 기여할 수 있는 방법은 그런 것이 아니다. 그렇게 도입한 블록체인이 조만간 크게 뒤통수를 칠 것이며, 이렇게 블록체인을 마법단어로 부양시키며 교묘하게 큰 판을 짠 누군가는 절대 당신 대신 책임져 주지 않는다. 단적인 예로 의료와 금융을 들었지만 당연히 그 둘이 전부가 아니다. 이런 오늘날의 현실은 사실상 블록체인 대재앙, 아니 블록체인 대잔치이다. 축배를 들어라, 잔치에서 길을 잃지 않는 건 당신 책임이고 4IR이든 CI/CE든 하고 싶다면 블록체인은 좀 아니다.

1 note · View note

djavco · 7 years

Photo

Neo4j graphs starting to get more elaborate. To me, one looks like a warrior holding a shield. What do you think? . #djav#djlife #geekbyday #coderbyday #djbynight #graphdb #neo4j #cypher #entities #relationships #somethingiscooking #ideas #unusualapplications #stillthinking (at Pune, Maharashtra)

#neo4j #entities #unusualapplications #geekbyday #somethingiscooking #djlife #ideas #djbynight #stillthinking #cypher #relationships #coderbyday #djav #graphdb

0 notes

womaneng · 6 years

Photo

Hello to my lovely friends ❣ How's your life going on at the moment ? As you can see I'm lost between screens. 💻🖥📸🤗 ❕No Signal . . . #AI #Analytics #BI #BigData #Database #DataEngineering #DataLake #DataScience #DataWarehouse #DeepLearning #GraphDB #IIoT #IoT #LinkedData #MachineLearning #NoSQL #OpenData #ORMS #PredictiveAnalytics #PrescriptiveAnalytics #SmallData #SmartData #Statistics https://www.instagram.com/p/BuGUkmmAKnw/?utm_source=ig_tumblr_share&igshid=12164bja0v9v5

0 notes

aztecaread · 2 years

Photo

KirkDBorne https://twitter.com/KirkDBorne/status/1539811142922653698 https://t.co/7sJx1GkhfM June 23, 2022 at 12:22PM

Complex Network Analysis in #Python — Recognize • Construct • Visualize • Analyze • Interpret: https://t.co/7sJx1GkhfM —— #BigData #DataScientists #DataScience #AI #Semantic #MachineLearning #LinkedData #GraphDB #GraphAnalytics #NetworkScience #DataStorytelling #100DaysOfCode https://t.co/yTRFnhrjEf

— Kirk Borne (@KirkDBorne) Jun 23, 2022

#KirkDBorne

0 notes

mtbuzzerseo-blog · 6 years

Text

Graph database vs relational Database

Graph database vs. relational database, this article is based on the details of Graph database vs. relational database also concluded the basic information of graph database and relational database

Graph database vs. relational database: For what reason do we utilize this database

Relational databases:

Relational databases like MySQL, PostgreSQL, and SQLite3 speak to and store information in tables and columns. The structure of a relational database enables you to interface data from various tables using foreign keys (or records).

Graph database:

Informal communities, Recommendation, and personalization, Customer 360, including element determination (associating client information from numerous sources), Fraud identification, Asset administration.

Graph database vs. relational database: Different Types

Types of the relational database:

The most popular of these have been Microsoft SQL Server, Oracle Database, MySQL, and IBM DB2

Types of Graph database:

Neo4j, FlockDB, Allegro Graph, GraphDB, InfiniteGraph, OrientDB, InfoGrid, and HypergraphDB.

Graph database vs. relational database: Design Requirements

Relational database:

A very much outlined database is essential for quick information recovery and updates. The fundamental strides in planning a database :

To decide the reason for your system, the tables you require in the system and the fields you require in the tables.

Graph database:

Graph Database Management systems (GDBs) are picking up prominence. They are utilized to break down enormous chart datasets that are normally showing up in numerous application zones to display interrelated information. The goal of this paper is to raise another theme of exchange in the benchmarking network and permit professionals having an arrangement of essential rules for GDB benchmarking.

Graph database vs. relational database: Disadvantages

Relational database:

Cost: Relational database is the expense of setting up and maintaining the database system.

The abundance of Information: Complex images, numbers, patterns and multimedia items.

Graph database:

Improper for transactional data, such as accounting records where connections between records are more straightforward. Harder to do summing queries and max queries proficiently - checking queries not harder. Generally, need to take in another question dialect like CIPHER. Fewer merchants to look over, and littler client base, so harder to get bolster when you keep running into issues.

Graph database vs. relational database: Advantages

Relational database:

Data Structure: The table format is simple and easy for database users to understand and use.

Multi-User Access: RDBMSs allow multiple database users to access a database simultaneously

Privileges: Authorization and privilege control features in an RDBMS allow the database administrator to restrict access to authorized users.

Network Access: RDBMSs provide access to the database through a server daemon, a specialized software program that listens for requests on a network, and allows database clients to connect to and use the database.

Speed: RDBMS advantages, such as simplicity, make the slower speed a fair trade-off. Optimizations built into an RDBMS.

Relational database Maintenance: RDBMSs feature maintenance utilities that provide database administrators with tools to easily maintain, test, repair and back up the databases housed in the system.

Support of Languages: RDBMSs support a generic language called "Structured Query Language" (SQL). The SQL syntax is simple.

Graph database:

Thinking about Object-Oriented: This means very clear, explicit semantics for each query you write.

Performance: A graph is essentially an index data structure.

Update Data in Real-Time and Support Queries Simultaneously: Graph databases can perform time to time updates on big data while supporting queries at that time.

Flexible Online Schema Environment: You can constantly add and drop new vertex.

Aggregate Queries: Graph databases, in addition to traditional group-by queries.

Combine and Hierarchize Multiple Dimensions: Graph databases can combine multiple dimensions to manage big data, including time series, demographic, geo-dimensions, etc.

AI Infrastructure: Graph databases serve as great AI infrastructure due to well-structured relational information between entities, which allows one to further infer indirect facts and knowledge.

Graph database vs. relational database: Limitation

Relational database:

The first limitation of an RDBMS (relational database) is the rigidity. It comes from organizing data into tables and relations.

An outcome of this is the pattern (or structure) of all records in a table must be the similar

A second outcome (result) is that pattern changes are heavyweight. In the event that you have even one record which needs another field, you should add it to each record in the table.

Relational databases commonly work around this impediment by displaying such information in stan1dardized frame with parent-youngster records.

Graph database:

The absence of elite simultaneousness: Much of the time, GDBs give different peruser and single author sort of exchanges, which ruins their simultaneousness and execution as a result.

The absence of standard dialects: The absence of an all-around established and standard revelatory dialect is being an issue these days. Neo4j is proposing Cipher and Oracle is taking a shot at a dialect. This is certainly an issue since improvement is an essential issue, and having standard dialects encourages the advancement of this vital advance.

The absence of parallelism: One critical issue is the way that dividing a graph is an issue. In this manner, most GDBs don't give shared anything parallel queries on extensive charts. Graph database vs. relational database, this article is based on the details of Graph database vs. relational database also concluded the basic information of graph database and relational database

Graph database vs. relational database: For what reason do we utilize this database Relational databases: Relational databases like MySQL, PostgreSQL, and SQLite3 speak to and store information in tables and columns. The structure of a relational database enables you to interface data from various tables using foreign keys (or records). Graph database:Informal communities, Recommendation, and personalization, Customer 360, including element determination (associating client information from numerous sources), Fraud identification, Asset administration.

Graph database vs. relational database: Different Types Types of the relational database:The most popular of these have been Microsoft SQL Server, Oracle Database, MySQL, and IBM DB2Types of Graph database: Neo4j, FlockDB, Allegro Graph, GraphDB, InfiniteGraph, OrientDB, InfoGrid, and HypergraphDB.

Graph database vs. relational database: Design Requirements Relational database:A very much outlined database is essential for quick information recovery and updates. The fundamental strides in planning a database :To decide the reason for your system, the tables you require in the system and the fields you require in the tables. Graph database:Graph Database Management systems (GDBs) are picking up prominence. They are utilized to break down enormous chart datasets that are normally showing up in numerous application zones to display interrelated information. The goal of this paper is to raise another theme of exchange in the benchmarking network and permit professionals having an arrangement of essential rules for GDB benchmarking.

Graph database vs. relational database: Disadvantages Relational database:Cost: Relational database is the expense of setting up and maintaining the database system. The abundance of Information: Complex images, numbers, patterns and multimedia items.Graph database:Improper for transactional data, such as accounting records where connections between records are more straightforward. Harder to do summing queries and max queries proficiently - checking queries not harder. Generally, need to take in another question dialect like CIPHER. Fewer merchants to look over, and littler client base, so harder to get bolster when you keep running into issues.

Graph database vs. relational database: Advantages Relational database:Data Structure: The table format is simple and easy for database users to understand and use. Multi-User Access: RDBMSs allow multiple database users to access a database simultaneouslyPrivileges: Authorization and privilege control features in an RDBMS allow the database administrator to restrict access to authorized users. Network Access: RDBMSs provide access to the database through a server daemon, a specialized software program that listens for requests on a network, and allows database clients to connect to and use the database. Speed: RDBMS advantages, such as simplicity, make the slower speed a fair trade-off. Optimizations built into an RDBMS. Relational database Maintenance: RDBMSs feature maintenance utilities that provide database administrators with tools to easily maintain, test, repair and back up the databases housed in the system.Support of Languages: RDBMSs support a generic language called "Structured Query Language" (SQL). The SQL syntax is simple.Graph database:Thinking about Object-Oriented: This means very clear, explicit semantics for each query you write. Performance: A graph is essentially an index data structure. Update Data in Real-Time and Support Queries Simultaneously: Graph databases can perform time to time updates on big data while supporting queries at that time. Flexible Online Schema Environment: You can constantly add and drop new vertex.Aggregate Queries: Graph databases, in addition to traditional group-by queries.Combine and Hierarchize Multiple Dimensions: Graph databases can combine multiple dimensions to manage big data, including time series, demographic, geo-dimensions, etc. AI Infrastructure: Graph databases serve as great AI infrastructure due to well-structured relational information between entities, which allows one to further infer indirect facts and knowledge.

Graph database vs. relational database: LimitationRelational database:The first limitation of an RDBMS (relational database) is the rigidity. It comes from organizing data into tables and relations. An outcome of this is the pattern (or structure) of all records in a table must be the similarA second outcome (result) is that pattern changes are heavyweight. In the event that you have even one record which needs another field, you should add it to each record in the table. Relational databases commonly work around this impediment by displaying such information in stan1dardized frame with parent-youngster records.Graph database:The absence of elite simultaneousness: Much of the time, GDBs give different peruser and single author sort of exchanges, which ruins their simultaneousness and execution as a result.The absence of standard dialects: The absence of an all-around established and standard revelatory dialect is being an issue these days. Neo4j is proposing Cipher and Oracle is taking a shot at a dialect. This is certainly an issue since improvement is an essential issue, and having standard dialects encourages the advancement of this vital advance. The absence of parallelism: One critical issue is the way that dividing a graph is an issue. In this manner, most GDBs don't give shared anything parallel queries on extensive charts.

#Graph database vs relational Database

2 notes · View notes