Tumgik
jul-projects ¡ 4 years
Text
TouchDesigner: Data Visualisation (process)
This was the meat of my project. TouchDesigner is an absolutely massive program that is able to do practically anything with minimal coding knowledge. However, this still doesn’t take away from the fact that it is, again, a massive program and the 12 weeks of TouchDesigner tutorials and classes wasn’t enough to scratch the surface, probably.
Also, I forgot to document things while I was working on this, but... it was essentially days of YouTube tutorials, wiki help pages, an overheated laptop, and cursing at 3am.
When I first started messing around, I made this:
Tumblr media
What I wanted was a linear bar graph of the sentiment data which would then bend into a circle, as shown below:
Tumblr media
(^ defining position and shape of line/circle)
Tumblr media
(^ feeding the data into the render network)
The above was created with the help of this tutorial.
But I realised that this wouldn’t work with datasets larger than the 23 rows I used to test things out: if I inputted, say, 10k rows of data, then the circle of rectangles would become hopelessly squished. I also had no idea how to rotate each of the boxes around so that they would all point towards the centre of the circle, which would have made it somewhat more bearable to look at.
So then I explored other methods:
Tumblr media Tumblr media
...spheres looked better than boxes, at least.
Tumblr media
Hm.
Tumblr media
I liked this a lot, and I almost made this into my final... but then I realised that the points in that sphere were selected arbitrarily (relatively, at least) to fill its mesh. Whereas before I was able to select the number of divisions in the circle (and map the spheres to these divisions) based on the number of rows of the input table, now I found that some data were lost or cloned again and again to fill the empty spaces of the sphere’s mesh. In short, I couldn’t reliably control the number and position of points in the sphere, so I abandoned this method.
...but it still looked the best :(
I went back to the circle method, but I dropped the line-to-circle method completely and tried something else to distribute the points more comfortably:
Tumblr media
If one circle isn’t enough, then try two - or five.
Tumblr media
The first four operators on the top left of that screenshot are the circles that I used to determine the position of each of the data points. Then I divided the circle into certain percentages based on the total number of rows of the input data:
Tumblr media
(36%, 24%, 20%, 16%, 4% of the data will be mapped onto each circle respectively)
...the percentages were arbitrarily chosen via trial and error, but y’know, I’m no mathematician :’)
The problem was now the fact that the size of the circle was too large for some of the months. For example, if I input April’s data (which only had 23 rows compared to May’s... 10k something rows), then there was this big empty space with a few floating balls. Technically, it was effective in conveying just how little we spoke in April, but I figured that if the data wasn’t as obvious as the April example, then there’d be very little difference between, say, May and June’s texting data. Which wasn’t very interesting.
So I normalised all four months’ number of rows to fit between 0.2 and 1.4 (the numbers I determined would look the best)
Tumblr media Tumblr media
0 notes
jul-projects ¡ 4 years
Text
VADER Sentiment Analysis: The Process
Before I can do anything, I need data. To acquire data, unfortunately, I needed python knowledge... which I don’t have.
It was therefore an interesting mix of python crash course tutorials and VADER tutorials for a few days. However, despite the fact that I could understand much more than I did two years ago, it was still very slow going... and I still had no clue how to export the results into a .csv file in a format that I wanted.
Tumblr media Tumblr media
It was much later into this newbie rabbit hole that I found, on the bottom of the github repository, an R port for VADER that I can use to run the code, which was absolutely amazing because I knew R better than python.
Below is a screenshot of my RStudio:
Tumblr media
This way, I could freely format the dataframe into whatever I wanted and save them into .csv files:
Tumblr media
(^ and separate them into months because 225k rows of data)
I also found out that the R port was made on September 2020. I’m pretty sure this saved me and this project several times over.
0 notes
jul-projects ¡ 4 years
Text
Text Data: preprocessing
In 2018, I extracted my Kakaotalk history with my friend. It was... very long. Looking at it, I didn’t want to add two more years to that pile so I decided to start with what I had.
Tumblr media Tumblr media
...though what I had was apparently not good enough.
The above are encoding errors I got when Excel converted UTF-8 to ANSI while importing the text data. The weird black box things were supposed to be apostrophes, Korean letters, and emoticons. While the Korean letters weren’t exactly useful to VADER (and I could discard them), the apostrophes and emoticons were.
Trying to import the text data into Excel so that I could convert it into csv also produced extra, unwanted columns:
Tumblr media
When I looked through the data, I found the culprits:
Tumblr media
Since I imported the text data with a comma delimiter, Excel added a column every time there was a comma... even when they were grammar-related.
To fix this issue, I made up a completely ridiculous delimiter to use:
Tumblr media
I don’t think I’ll ever use “@@@” consecutively in a conversation anywhere, so it was perfect.
Tumblr media
And it worked...!
With the encoding problem:
Tumblr media Tumblr media
Those were supposed to be apostrophes, so I just replaced them all in Excel... which was easy. What was not easy were the emoticons, and just as I was about to give up entirely, I discovered something that I had overlooked completely:
Tumblr media
XD
Which solved most of my issues.
The last problem was this:
Tumblr media
There were so many errors.
Tumblr media
Which... obviously should not be a date. After some digging, I found the source of the problem:
Tumblr media
Basically, any Kakaotalk message that was not in a single line was exported as separate lines... and Excel couldn’t read them in the format I was trying to coerce it into. So I went in, ctrl + f searched for all of the above instances and fixed them manually.
Tumblr media
I still haven’t found out what the other errors were indicating, but I figured that 262 against 224k rows of data was negligible enough to discard them.
And this was the end of my confusing and frustrating preprocessing attempts :’D
I also realised much later on that, if I knew enough python and even SQL, I could have done this much more quickly. Since I am still very much a complete newbie, however, manual labour was the way to go.
0 notes
jul-projects ¡ 4 years
Text
Edit:
The tools that I will be using are:
VADER Sentiment Analysis
For sentiment analysis on text data
TouchDesigner
For visualisation
Initially, I wanted to make use of this amazing tutorial in a YouTube channel called Noto The Talking Ball to do my own project; however, I realised that while the end product would look ‘cool’ and would technically be driven by data, it wouldn’t be an entirely ‘true’ interpretation of the data and would also be less intuitive than other forms of visualisations. So I decided to just... dive right in and try to discover the visualisation in the process. In a way, this does feel more faithful to the original data, and it allows for a more personal involvement with the project itself.
0 notes
jul-projects ¡ 4 years
Text
2020-2 OMU Independent Project: Proposal
I started a project in 2018 where I wanted to create a visualisation of sentiment data extracted from my text conversations with a friend - but in the form of digital landscapes. Unfortunately, my lack of knowledge in programming and a similar lack of familiarity in the necessary software meant that I was forced to abandon it very early on.
MY RATIONALE FOR THE PROJECT HERE
In short, I wanted to revive this project now that I have a little more room to work with in terms of the programming/software prerequisites.
0 notes