This blog is going to document my final year university computer science project (what a mouth full). This project aims to use Twitter to judge the publics opinions on certain topic areas.
Don't wanna be here? Send us removal request.
Text
UI intergration
I have now implemented a basic Java UI so there no longer has to be console inputs to run the program. It can now be run as an application. The Ui can be seen below:
In the case above it can be seen that messages appear with information regarding what you have entered. As well as this there is also information displayed when the incorrect information is entered.
#ui#java#data#big data analytics#database#data store#twitter#twitter analytics#twitter4j#tweets#tweet#nosql#mongodb#mongo db
0 notes
Text
More data
As well as getting the data of the Tweets and the date of the Tweets, I am also now creating and storing the sentiment words that are matched in a Tweet.
An example of this can be seen below:
This allows the comparison of the use of each sentiment word for all Tweets and see the most used, most used combination, least used etc.
#twitter#twitter analytics#twitter4j#tweets#tweet#data#big data analytics#database#data store#nosql#nosql database#mongodb#mongo db
0 notes
Text
Improved process
As mentioned before I was experiencing slow analysis. To reiterate, this was due to the fact that I was analysing the entire collection each time at the end of the inserts. As a result, as the collection grew in size the time take to analysis it all increased also.
To resolve this issue I first had to remove this functionality of analysing the entire collection with each run. To do this I decided to instead analysis the Tweets one by one as they are being retrieved. This meant a few things:
The analysis would be a lot quicker when compared to mass sentiment analysis.
This stops the entire collection being analysed with each iteration thus speeding up analysis.
A live analysis would be carried out so results can be seen as it is running. This is helpful if, for example, you retrieve 2,000 Tweets, this will take a while but as they are being analysed one by one it the analysis for Tweets can be seen throughout.
This method has now been implemented and is very efficient with no Tweets being lost as the process is very quick.
#data#big data analytics#database#data store#sentiment analysis#sentiment analytics#twitter#twitter analytics#twitter4j#tweets#tweet
0 notes
Text
Slow analysis
Due to the fact that I am inserting new tweets then at the end analysing the entire collect it is a very slow process as the collection grows.
This issue needs to be fixed, not only for speed and efficiency but also for live feeds of Twitter data that is analysed. In doing this graphs can be plotted and figures extracted as Tweets are posted, therefore this program could be run on a cloud instance, for example, and run for days with steady information being produced without having to wait.
I shall begin work on this now.
#java#twitter#twitter analytics#twitter4j#tweets#tweet#data#big data analytics#database#data store#nosql#mongodb#mongo db#final project#final year project
0 notes
Text
Initial Analysis
With some initial analysis using my system I have now completed the sentiment comparison (after some tweaks to the matching process), the results for #Trump and #StPatricksDay (carried out on Saint Patrick’s day) can be found below respectively:
#twitter#twitter analytics#sentiment analysis#sentiment analytics#sentiment#nosql#nosql database#mongodb#mongo db#trump#stpatricksday#final project#final year project
0 notes
Text
MongoDB Cloud Atlas
After moving all my code into one class (yet to determine if this was a good move) I have moved my data storage from my local machine to MongoDb’s cloud storage, Atlas.
This was an easy process in order to spin up a cluster and to get it started. Converting the Java code was also easy after an update to Java 8 from Java 7 was carried out, which was not specified had to be done.
With MongoDb’s Atlas running I don’t need a local instance of MongoDb running on my machine in order to run my code. The next step is to get my code on a cloud instance so I don’t need to have my laptop running in order be collecting data from Twitter via the stream.

#mongodb#mongo db#data#big data analytics#database#data store#nosql#nosql database#cloud#cloudcomputing#twitter#twitter analytics#tweets#tweet#twitter4j
0 notes
Text
Tweet Sentiment Analysis
As of posting this the project is at the stage where the Tweets are analysed and sentiment words are found. This value is then recorded (as specified in a previous post) and updates the tweet_data collections.
It has been thought to add an additional key and value to the tweet_data documents for the tweet polarity based on the sentiment value. This was a definitive answer of positive, negative or neutral can be seen at a glance.
The next stage is to look into data analytics tools that can be integrated into Mongo Db in order to carry out further analysis and to plot results.
Alternative the code could be refined to cut down on computation and save time. The classes are currently split based on their functionality and need to be linked so multiple classes don’t need to be run manually.
#mongodb#mongo db#twitter#twitter analytics#tweets#tweet#big data analytics#sentiment analysis#sentiment analytics#analytics#data#database#data store#data storeage#nosql#nosql database#final project#university project#final year project#project#programming#java
0 notes
Text
Updated architecture.
As you can see below I have updated my system architecture with more detail, explanations and to include the use of MongoDB.
#mongodb#mongo db#java#twitter#twitter analytics#twitter4j#computer science#computer#tweets#tweet#sentiment analysis#sentiment analytics#database#dissertation#final project#university project#final year project#project#programming
0 notes
Text
Data manipulation
In order to carry out an accurate comparison of the sentiment words and the retrieved tweets some move around of data had to be done.
In order to check each tweet with every sentiment word I extracted the data as follows:
The sentiment table was split into two array lists, one for the word and one for the corresponding polarity of that word. Due to this (and each array being indexed at zero), it can be said that, for example, the word at index position 5 of the word array has the polarity at index 5 of the polarity array.
As for the Twitter data I split each tweets main keys into separate elements in an array, i.e. ID at index 0, Tweet Date at index 1, Tweet Text at index 2 etc. With each iteration through the code (once the Tweet has been compared with each sentiment word) the array is cleared for the Twitter data and replace with the next Tweet data.
This process is repeated until all Tweets have been checked against the word and therefore the polarity arrays. If the Tweet has a match for a sentiment word, that word’s polarity is checked. If it is positive the sentiment value increases by 1 and if negative it decreases by 1.
Once the process is complete the sentiment value is populated back into the database using the ID of the Tweet data which can be obtained from the array of Twitter data at index 0.
If the sentiment value is greater than 0 it is considered a positive tweet, if less than 0 it is considered a negative tweet and if it is 0 it is neutral.
Some hand written notes explaining the process of the data being split can be seen below:

#data#big data analytics#database#data store#data storeage#data manipulation#mongodb#mongo db#twitter#twitter analytics#tweets#tweet#data analysis#data analytics#final project#university project#final year project#programming#java
0 notes
Text
Populating sentiment
The Mongo Db collection to store the sentiment words has now been populated with both positive and negative sentiment words. Approx 2000 positive and approx 4000 negative.
I obtained these lists of sentiment words from an only source which can be found here for both positive and negative words respectively:
http://ptrckprry.com/course/ssd/data/positive-words.txt
http://ptrckprry.com/course/ssd/data/negative-words.txt
#mongodb#mongo db#sentiment analysis#sentiment analytics#sentiment#sentiment words#data#big data analytics#database#data storeage#data storage#twitter#twitter analytics#tweets#tweet#social media
0 notes
Text
Making Strides
A lot of time has been getting put in to self-teaching NoSQL databases and more specifically Mongo Db. It has been a learning curve for this alone but then the addition of learning this all and how to write it in Java has been that bit more challenging.
In saying this however, I have been making progress. As of writing this I have managed the following (in addition to what I mentioned in my previous post):
Read in text file of tweet data
Specify all variables i.e. database, connection, collection, tweet date, tweet text etc. to be used later.
Stripped date of unneeded characters “]” and “[”.
Build up a document based on data from text file which is to be inserted to the collection.
Add all fields including fields not yet used (sentimentFound and overallSentiment) which are set to “NULL” here.
Insert all this data into the Mongo Db database collection specified.
The Java code for all of this can bee seen below in the following screenshot:
#mongodb#mongo db#java#twitter#twitter analytics#twitter4j#tweets#nosql#nosql database#final project#final year project#computer science#computing#code#programming#university project#project#sentiment analysis#sentiment analytics#sentiment
0 notes
Text
Mongo DB and Java
After deciding to use a NoSQL database (Mongo DB) I have had to teach myself how it all works. NoSQL is quite different to RDS, which I have always used, so it has been a challenge.
Now however, I have the terminology down and I’m moving forward. After checking out some online resources I have managed to implement Mongo DB into my Java IDE (Intellij). Firstly I needed the Java driver (easily found with a quick google search), I implemented it via Maven.
Even though I had to convert my project to a Maven project in order to do this it was all pretty painless. Added support came in the form of a free JetBrains plugin for Intellij which allows me access my database and collections from within the IDE. This has proven very helpful in order to visualise my database structure.
As an added bonus I can connect to the Mongo DB service and run queries via the shell all from within the IDE which is great.
#nosql#nosql database#mongodb#mongo db#data#database#java#intellij idea#intellij#programm#code#final project#final year project
0 notes
Text
Understanding on NoSQL structure
Below is a diagram of my understanding of a NoSQL database structure.
0 notes
Text
Pursuing MongoDB
After deciding to pursue MongoDB for my project I have had to brush up on my NoSQL knowledge.
I am aware of NoSQL and the very basics of it but I mainly have worked with RDS. As you can imagine it gets a bit confusing as it requires a completely different way of thinking.
To help me understand it helps to have it explained in relational database systems terminology (terrible I know).
I found this site which helped me understand the terminology in seconds. Here you go, check it out:
http://www.w3resource.com/mongodb/databases-documents-collections.php
This in conjunction with the MongoDB getting started guide for Java, I have quickly got things rolling.
#mongodb#phew#nosql#nosql database#final project#getting started#java#twitter#data#database#data store
0 notes
Text
Mongo DB
For Mongo DB, this article was found which talks about retrieving and storing Tweets in Mongo DB. In the example, they provide they are using Python and R. I am planning to use Java with Mongo DB which is possible, but may be more challenging as I have never used Mongo or any NoSQL database system before so it is a new technology.
Mongo DB is a cloud storage solution which is becoming widely used and accepted as a great NoSQL database contender. There is ample support and easy to use, within reason (but specifics for my required use has not yet been researched.)
See the article below.
http://stats.seandolinar.com/collecting-twitter-data-storing-tweets-in-mongodb/
#nosql#nosql database#mongodb#mongo db#data#data storage#big data analytics#twitter#twitter analytics#tweets#computer science
0 notes
Text
Microsoft Azure and Power Bi
This recent article was found, which was posted a few weeks ago, about carrying out sentiment analysis on Twitter using Microsoft’s solutions. I cannot use this software as it is only for Windows and I do my work on Mac. I can partition my hard drive and do everything on Windows or I can spin up a Microsoft Azure instance with a Windows VM running on it and do all my work there. However, I would rather not limit my project to Windows only and there could be a cost to running an instance on the cloud. Seeing as it is all laid out in this article I wonder if it a viable project using this? See link below with regards to the article I mentioned above.
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-twitter-sentiment-analysis-trends
Microsoft also have another tool called Power Bi which is report generation tool. This would be useful to use to carry out my analytics and is easily integrated into Microsoft Azure.
Although this application would be very useful it has the same limitations (and solutions as stated above.) As a result is any other database solution is used an addition external analytics tool would be required to analysis the data.
0 notes
Text
Different data storage
After a meeting with a member of staff at my University, whom specialises in data storage/ databases, it was suggested that I should take a different approach from the conventional relational data model and Oracle.
As my suggested database structure doesn't require any relations to be applied and RDS isn't needed. Much more recent and prosperous areas of technology can be implemented.
NoSQL database solutions are an every growing industry to rival the conventional RDS model. A few different NoSQL solutions and new technologies were discussed in this meeting and I shall talk about each in post to come.
#data#big data analytics#database#data store#data storeage#nosql#nosql database#mongodb#microsoft azure#rds#twitter#twitter analytics
0 notes