TouchDesigner: Data Visualisation (process)
This was the meat of my project. TouchDesigner is an absolutely massive program that is able to do practically anything with minimal coding knowledge. However, this still doesnât take away from the fact that it is, again, a massive program and the 12 weeks of TouchDesigner tutorials and classes wasnât enough to scratch the surface, probably.
Also, I forgot to document things while I was working on this, but... it was essentially days of YouTube tutorials, wiki help pages, an overheated laptop, and cursing at 3am.
When I first started messing around, I made this:
What I wanted was a linear bar graph of the sentiment data which would then bend into a circle, as shown below:
(^ defining position and shape of line/circle)
(^ feeding the data into the render network)
The above was created with the help of this tutorial.
But I realised that this wouldnât work with datasets larger than the 23 rows I used to test things out: if I inputted, say, 10k rows of data, then the circle of rectangles would become hopelessly squished. I also had no idea how to rotate each of the boxes around so that they would all point towards the centre of the circle, which would have made it somewhat more bearable to look at.
So then I explored other methods:
...spheres looked better than boxes, at least.
Hm.
I liked this a lot, and I almost made this into my final... but then I realised that the points in that sphere were selected arbitrarily (relatively, at least) to fill its mesh. Whereas before I was able to select the number of divisions in the circle (and map the spheres to these divisions) based on the number of rows of the input table, now I found that some data were lost or cloned again and again to fill the empty spaces of the sphereâs mesh. In short, I couldnât reliably control the number and position of points in the sphere, so I abandoned this method.
...but it still looked the best :(
I went back to the circle method, but I dropped the line-to-circle method completely and tried something else to distribute the points more comfortably:
If one circle isnât enough, then try two - or five.
The first four operators on the top left of that screenshot are the circles that I used to determine the position of each of the data points. Then I divided the circle into certain percentages based on the total number of rows of the input data:
(36%, 24%, 20%, 16%, 4% of the data will be mapped onto each circle respectively)
...the percentages were arbitrarily chosen via trial and error, but yâknow, Iâm no mathematician :â)
The problem was now the fact that the size of the circle was too large for some of the months. For example, if I input Aprilâs data (which only had 23 rows compared to Mayâs... 10k something rows), then there was this big empty space with a few floating balls. Technically, it was effective in conveying just how little we spoke in April, but I figured that if the data wasnât as obvious as the April example, then thereâd be very little difference between, say, May and Juneâs texting data. Which wasnât very interesting.
So I normalised all four monthsâ number of rows to fit between 0.2 and 1.4 (the numbers I determined would look the best)
0 notes
VADER Sentiment Analysis: The Process
Before I can do anything, I need data. To acquire data, unfortunately, I needed python knowledge... which I donât have.
It was therefore an interesting mix of python crash course tutorials and VADER tutorials for a few days. However, despite the fact that I could understand much more than I did two years ago, it was still very slow going... and I still had no clue how to export the results into a .csv file in a format that I wanted.
It was much later into this newbie rabbit hole that I found, on the bottom of the github repository, an R port for VADER that I can use to run the code, which was absolutely amazing because I knew R better than python.
Below is a screenshot of my RStudio:
This way, I could freely format the dataframe into whatever I wanted and save them into .csv files:
(^ and separate them into months because 225k rows of data)
I also found out that the R port was made on September 2020. Iâm pretty sure this saved me and this project several times over.
0 notes
Text Data: preprocessing
In 2018, I extracted my Kakaotalk history with my friend. It was... very long. Looking at it, I didnât want to add two more years to that pile so I decided to start with what I had.
...though what I had was apparently not good enough.
The above are encoding errors I got when Excel converted UTF-8 to ANSI while importing the text data. The weird black box things were supposed to be apostrophes, Korean letters, and emoticons. While the Korean letters werenât exactly useful to VADER (and I could discard them), the apostrophes and emoticons were.
Trying to import the text data into Excel so that I could convert it into csv also produced extra, unwanted columns:
When I looked through the data, I found the culprits:
Since I imported the text data with a comma delimiter, Excel added a column every time there was a comma... even when they were grammar-related.
To fix this issue, I made up a completely ridiculous delimiter to use:
I donât think Iâll ever use â@@@â consecutively in a conversation anywhere, so it was perfect.
And it worked...!
With the encoding problem:
Those were supposed to be apostrophes, so I just replaced them all in Excel... which was easy. What was not easy were the emoticons, and just as I was about to give up entirely, I discovered something that I had overlooked completely:
XD
Which solved most of my issues.
The last problem was this:
There were so many errors.
Which... obviously should not be a date. After some digging, I found the source of the problem:
Basically, any Kakaotalk message that was not in a single line was exported as separate lines... and Excel couldnât read them in the format I was trying to coerce it into. So I went in, ctrl + f searched for all of the above instances and fixed them manually.
I still havenât found out what the other errors were indicating, but I figured that 262 against 224k rows of data was negligible enough to discard them.
And this was the end of my confusing and frustrating preprocessing attempts :âD
I also realised much later on that, if I knew enough python and even SQL, I could have done this much more quickly. Since I am still very much a complete newbie, however, manual labour was the way to go.
0 notes
Edit:
The tools that I will be using are:
VADER Sentiment Analysis
For sentiment analysis on text data
TouchDesigner
For visualisation
Initially, I wanted to make use of this amazing tutorial in a YouTube channel called Noto The Talking Ball to do my own project; however, I realised that while the end product would look âcoolâ and would technically be driven by data, it wouldnât be an entirely âtrueâ interpretation of the data and would also be less intuitive than other forms of visualisations.
So I decided to just... dive right in and try to discover the visualisation in the process. In a way, this does feel more faithful to the original data, and it allows for a more personal involvement with the project itself.
0 notes
2020-2 OMU Independent Project: Proposal
I started a project in 2018 where I wanted to create a visualisation of sentiment data extracted from my text conversations with a friend - but in the form of digital landscapes. Unfortunately, my lack of knowledge in programming and a similar lack of familiarity in the necessary software meant that I was forced to abandon it very early on.
MY RATIONALE FOR THE PROJECT HERE
In short, I wanted to revive this project now that I have a little more room to work with in terms of the programming/software prerequisites.
0 notes