jul-projects - Tumblr blog

jul-projects · 4 years

Text

TouchDesigner: Data Visualisation (process)

This was the meat of my project. TouchDesigner is an absolutely massive program that is able to do practically anything with minimal coding knowledge. However, this still doesn’t take away from the fact that it is, again, a massive program and the 12 weeks of TouchDesigner tutorials and classes wasn’t enough to scratch the surface, probably.

Also, I forgot to document things while I was working on this, but... it was essentially days of YouTube tutorials, wiki help pages, an overheated laptop, and cursing at 3am.

When I first started messing around, I made this:

What I wanted was a linear bar graph of the sentiment data which would then bend into a circle, as shown below:

(^ defining position and shape of line/circle)

(^ feeding the data into the render network)

The above was created with the help of this tutorial.

But I realised that this wouldn’t work with datasets larger than the 23 rows I used to test things out: if I inputted, say, 10k rows of data, then the circle of rectangles would become hopelessly squished. I also had no idea how to rotate each of the boxes around so that they would all point towards the centre of the circle, which would have made it somewhat more bearable to look at.

So then I explored other methods:

...spheres looked better than boxes, at least.

Hm.

I liked this a lot, and I almost made this into my final... but then I realised that the points in that sphere were selected arbitrarily (relatively, at least) to fill its mesh. Whereas before I was able to select the number of divisions in the circle (and map the spheres to these divisions) based on the number of rows of the input table, now I found that some data were lost or cloned again and again to fill the empty spaces of the sphere’s mesh. In short, I couldn’t reliably control the number and position of points in the sphere, so I abandoned this method.

...but it still looked the best :(

I went back to the circle method, but I dropped the line-to-circle method completely and tried something else to distribute the points more comfortably:

If one circle isn’t enough, then try two - or five.

The first four operators on the top left of that screenshot are the circles that I used to determine the position of each of the data points. Then I divided the circle into certain percentages based on the total number of rows of the input data:

(36%, 24%, 20%, 16%, 4% of the data will be mapped onto each circle respectively)

...the percentages were arbitrarily chosen via trial and error, but y’know, I’m no mathematician :’)

The problem was now the fact that the size of the circle was too large for some of the months. For example, if I input April’s data (which only had 23 rows compared to May’s... 10k something rows), then there was this big empty space with a few floating balls. Technically, it was effective in conveying just how little we spoke in April, but I figured that if the data wasn’t as obvious as the April example, then there’d be very little difference between, say, May and June’s texting data. Which wasn’t very interesting.

So I normalised all four months’ number of rows to fit between 0.2 and 1.4 (the numbers I determined would look the best)

#omu #omu 2020-2 #touchdesigner #data vis

0 notes

jul-projects · 4 years

Text

VADER Sentiment Analysis: The Process

Before I can do anything, I need data. To acquire data, unfortunately, I needed python knowledge... which I don’t have.

It was therefore an interesting mix of python crash course tutorials and VADER tutorials for a few days. However, despite the fact that I could understand much more than I did two years ago, it was still very slow going... and I still had no clue how to export the results into a .csv file in a format that I wanted.

It was much later into this newbie rabbit hole that I found, on the bottom of the github repository, an R port for VADER that I can use to run the code, which was absolutely amazing because I knew R better than python.

Below is a screenshot of my RStudio:

This way, I could freely format the dataframe into whatever I wanted and save them into .csv files:

(^ and separate them into months because 225k rows of data)

I also found out that the R port was made on September 2020. I’m pretty sure this saved me and this project several times over.

#omu #omu 2020-2 #alksjdkgkdjfnh #sentiment analysis

0 notes

jul-projects · 4 years

Text

Text Data: preprocessing

In 2018, I extracted my Kakaotalk history with my friend. It was... very long. Looking at it, I didn’t want to add two more years to that pile so I decided to start with what I had.

...though what I had was apparently not good enough.

The above are encoding errors I got when Excel converted UTF-8 to ANSI while importing the text data. The weird black box things were supposed to be apostrophes, Korean letters, and emoticons. While the Korean letters weren’t exactly useful to VADER (and I could discard them), the apostrophes and emoticons were.

Trying to import the text data into Excel so that I could convert it into csv also produced extra, unwanted columns:

When I looked through the data, I found the culprits:

Since I imported the text data with a comma delimiter, Excel added a column every time there was a comma... even when they were grammar-related.

To fix this issue, I made up a completely ridiculous delimiter to use:

I don’t think I’ll ever use “@@@” consecutively in a conversation anywhere, so it was perfect.

And it worked...!

With the encoding problem:

Those were supposed to be apostrophes, so I just replaced them all in Excel... which was easy. What was not easy were the emoticons, and just as I was about to give up entirely, I discovered something that I had overlooked completely:

Which solved most of my issues.

The last problem was this:

There were so many errors.

Which... obviously should not be a date. After some digging, I found the source of the problem:

Basically, any Kakaotalk message that was not in a single line was exported as separate lines... and Excel couldn’t read them in the format I was trying to coerce it into. So I went in, ctrl + f searched for all of the above instances and fixed them manually.

I still haven’t found out what the other errors were indicating, but I figured that 262 against 224k rows of data was negligible enough to discard them.

And this was the end of my confusing and frustrating preprocessing attempts :’D

I also realised much later on that, if I knew enough python and even SQL, I could have done this much more quickly. Since I am still very much a complete newbie, however, manual labour was the way to go.

#omu #omu 2020-2 #preprocessing #:'(

0 notes

jul-projects · 4 years

Text

Edit:

The tools that I will be using are:

VADER Sentiment Analysis

For sentiment analysis on text data

TouchDesigner

For visualisation

Initially, I wanted to make use of this amazing tutorial in a YouTube channel called Noto The Talking Ball to do my own project; however, I realised that while the end product would look ‘cool’ and would technically be driven by data, it wouldn’t be an entirely ‘true’ interpretation of the data and would also be less intuitive than other forms of visualisations. So I decided to just... dive right in and try to discover the visualisation in the process. In a way, this does feel more faithful to the original data, and it allows for a more personal involvement with the project itself.

#omu #omu 2020-2 #proposal #tools

0 notes