syen998-twenty-percent-project - Tumblr blog

syen998-twenty-percent-project · 3 months ago

Text

3/24 Twenty Percent Project

It has now been two weeks since my last blog post on March 10. I am pleased to announce that I have accomplished quite a few things over that time! The most notable of these accomplishments is that I finally solved the problem of how to actually publish the code when it is finished.

Previously, the choice of GitHub was suggested to share the code. However, as someone who has actually used GitHub before, I have deemed it as far too confusing. GitHub is not only confusing for the developer, but also confusing for any potential users who will actually run my code. If I were to share a GitHub repository, for example, people would be prompted to “Download ZIP.” This would download a ZIP file, which, when extracted, would create a confusing mess of files, in a manner that is not particularly intuitive for any layperson.

My other alternative was to use an online Java compiler, like Programiz. This does indeed work, as Programiz allows developers to easily share code via a share button. This does indeed allow users to run my code, but Programiz focuses more on sharing the code itself, and not the final project.

Instead, I have chosen to use CodePad, a website designed specifically for publishing projects, not sharing them. This is now available at https://codepad.app/pad/115ad533! However, please note that this code is still in beta. This means that many features may still be buggy, so proceed at your own risk. However, it still gets the job done!

0 notes

syen998-twenty-percent-project · 4 months ago

Text

3/10 Blog Post

I have come here today to inform you that I recently have been working on a website for my translator. The website is still under construction, but upon completion it will just be a simple, basic website. It will comprise of four main sections:

Home Page The homepage will just introduce the translator, giving some background information and explanations of various use cases in which it provides an advantage.

Blog Posts Users will soon also be able to view blog posts on the website, rather than being directed to the Tumblr page. In other words, more information will be available, in more places.

History Section The website is designed to have an extremely short page explaining how this project began and where we are now. It will provide a little context for the translator.

Frequently Asked Questions I will also be able to answer a few questions that conceivably be commonly asked. This includes things like my inspiration for making this translator, why I chose to program in Java, and more!

I anticipate that this website could become an extremely helpful resource, giving everyone a simple, easy way to see the information provided in an accessible manner. Anyone with a computer will be able to view this website.

Unfortunately, this website is still under development, as websites can be extremely difficult to create. It is unsure when the website will ultimately become available, but I anticipate sometime within the next couple months, in March or April. I will keep you posted on its development!

2 notes · View notes

syen998-twenty-percent-project · 4 months ago

Text

3/3 Blog Post

Today is the day that I must implement translations to Spanish. However, Spanish syntax actually differs very greatly from English syntax. The following are a few key points I have gleaned from various sources on Spanish syntax.

One major difference in Spanish syntax is that adjectives usually go after the noun in Spanish, whereas in English, the adjectives go before the noun. For example, “lazy dog” would be translated to “perro perezoso” in Spanish. Note that “perro” means “dog” and “perezoso” means “lazy.”

However, there are some adjectives that appear before the noun. The first kind of these adjectives are numbers, like “one” or “two.” For example, “two dogs” in Spanish is “dos perros,” where “dos” means “two” and “perros” means “dogs.” The second kind of these adjectives are ordinal numbers used to rank things, like “first” or “second.” For example, “first dog” in Spanish is “primer perro,” where “primer” means “first” and “perro” means “dog.” The third kind of these adjectives are possessive adjectives, like “my” or “you.” For example, “my dog” in Spanish is “mi perro,” where “mi” means “my” and “perro” means “dog.” The fourth kind of these adjectives are quantities, like “enough” or “too many.” For example, “too many dogs” in Spanish is “demasiados perros,” where “demasiados” means “too many” and “perros” means “dog.”

To implement this in the translator, we will first translate the text String word-for-word, with no thought for syntax. Then, we will need a list of all Spanish adjectives. If we see a long string of adjectives followed by nouns, simply reverse it. For example, “rápido marrón zorro” would become “zorro marrón rápido.”

0 notes

syen998-twenty-percent-project · 4 months ago

Text

2/24 Blog Post

Now that most of the bugs in the translator have been identified and rooted out, we can now focus on considering which languages to add. These additions should align with one central mission:

Any additions must improve communications for all.

Thus, we should choose languages such that as many people as possible will be able to translate any foreign text they may encounter into a language that they can understand. However, some people may immediately jump to the conclusion that this is as simple as implementing the most widely spoken languages. This is when we need to keep in mind that our translator only needs to support one language that people can understand.

To explain what I mean, consider a room with one hundred people. Twenty of these people speak English. Seventeen of these people speak Chinese. Eight of these people speak Hindi. Six of these people speak Spanish. Now, suppose you could only offer support for three languages. Naturally, you may want to offer support for the top three—English, Chinese, and Hindi, for a grand total of forty-five people. But what if I told you that of the Chinese speakers, all seventeen of them also spoke English? Then just the English alone would be able to accommodate them, so we could just focus on English, Hindi, and Spanish, for a grand total of fifty-one people.

Thus, when deciding which languages to support, it is necessary to consider not just the most popular languages, but also the most isolated languages.

0 notes

syen998-twenty-percent-project · 5 months ago

Text

2/10 Twenty Percent Project

In some languages, like English, there is a very clear delimiter for where one word ends and begins. For example, in the sentence “I have a guitar and a violin,” each word is very distinct from each other—they are very clearly separated by spaces. Thus, translating it is as simple as finding a word surrounded by both sides with a space, then translating it word by word. But in other languages, this is not quite so simple. Consider the Chinese translation of that sentence, “我有一把吉他和一把小提琴.” Translating everything character by character works fine for the first few characters, producing translations like “I” for “我” and “have” for “有.” But when you get to “吉他” (guitar), translating “吉” and “他” individually produces “lucky” and “he”—not exactly the result we were hoping for.

But how can we avoid this? How do we know which characters are actually part of a greater phrase? I have actually developed a quite remarkable method for this. First, we begin with the longest possible phrase: “吉他和一把小提琴.” Then we check it against our dictionary of Chinese phrases. Since no phrase was found, we must next find the second longest: “吉他和一把小.” Again, no phrase was found. Thus, we continue making the phrase shorter and shorter until we get “吉他和一把,” “吉他和一,” “吉他和,” and finally “吉他.” At this point a phrase would be found, and the correct translation of “guitar” would replace it.

Thus, this method ensures that we translate phrases wholly, not character by character. This ensures a much more accurate translation.

0 notes

syen998-twenty-percent-project · 5 months ago

Text

2/3 Twenty Percent Project

As you already know, my Twenty Percent Project is a translator. Some may wonder, what exactly is the point of this? Are there not already several translators and other tools that already exist? This is indeed a very good question, and will be addressed in this week’s blog post.

My translator is not designed to replace existing translation tools like Google Translate or Microsoft Translate. Rather, my translator is designed as a supplement to these tools. It simply offers another option for people. They can use it if they want, but they don’t need to if they don’t want to. No one is forcing them.

Some may wonder why anyone would choose to use my translator, since it is currently much weaker than existing tools like Google Translate. However, my translator does pose several advantages, and may actually be superior in a wide variety of use cases.

My translator, for example, works offline. Translation tools like the aforementioned Google Translate require an internet connection to function. Yes, select languages can be downloaded while the internet is still available, but this is only available on mobile. In contrast, my translator is very versatile and can work for a variety of devices.

Furthermore, there is a tradeoff between speed and translation quality. My translator trades translation quality for speed, meaning that long texts can be quickly translated on any device, although the translations may not always be high-quality. This is useful for those who just want a fast translation, and do not care about the quality.

0 notes

syen998-twenty-percent-project · 5 months ago

Text

1/27 Blog Post

Today, we will discuss the process of sharing the code I have written for my translator. A perfectly well-written program is of no use to everyone if no one can access it. In this blog post, we discuss different common ways to share code, and their advantages and disadvantages for my use case.

GitHub

GitHub is perhaps the most common way to share code. Sharing code through GitHub would be as simple as sending someone a link, then having someone clone and import it into an integrated development environment, like Java IDE.

However, this actually presents a big problem: everyone who wishes to run my code, must have a GitHub account and an integrated development environment pre installed. Neither of these are particularly intuitive, which can make this particular code sharing method extremely inaccessible.

Online Java Compiler

Another way is to use an online Java compiler, like the Programiz Online Java Compiler or JDoodle, and ask that people run my code from there.

This is beneficial in the sense that nothing needs to be installed—the online Java compiler runs directly on the browser. However, these are not designed for sharing code, so they will need to copy and paste the code in from elsewhere, like a Google Doc. This can be quite a pain. Depending on the amount of code that needs to be copied and pasted, it could actually be easier to use the GitHub method.

Of course, different methods work best in different situations. Only time will tell which method is the best.

0 notes

syen998-twenty-percent-project · 5 months ago

Text

1/13 Twenty Percent Project

Today we will discuss the advantages and disadvantages of two common list storage methods: 𝙰𝚛𝚛𝚊𝚢s and 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝s. They are both useful in a variety of situations, and pose different advantages and disadvantages in different situations.

Elements in an 𝙰𝚛𝚛𝚊𝚢 are stored in adjacent memory locations. Here is what a 𝚌𝚑𝚊𝚛[𝟽] might look like:

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝟿𝙴𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟶𝟶𝟶𝟶𝟷

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝟿𝙵𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟷𝟷𝟶𝟶𝟷

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟶𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟶𝟷𝟶𝟶𝟶

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟷𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟷𝟶𝟷𝟶𝟶

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟸𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟶𝟶𝟷𝟶𝟶

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟹𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟷𝟶𝟷𝟷𝟷

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟺𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟷𝟶𝟶𝟷𝟷

This has one very distinct benefit: accessing list indices is extremely fast. Accessing index n is as simple as multiplying n by sixteen and adding that to the first index, 𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝟿𝙴𝟻. However, this also poses one distinct disadvantage: the list is static and cannot be resized. Once the original bits are allocated for this list, the computer proceeds to continue filling up the memory locations around it.

This brings us to the second list storage method: 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝s. 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝s are actually stored using a partially filled 𝙰𝚛𝚛𝚊𝚢:

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝟿𝙴𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟶𝟶𝟶𝟶𝟷

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝟿𝙵𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟷𝟷𝟶𝟶𝟷

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟶𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟶𝟷𝟶𝟶𝟶

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟷𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟷𝟶𝟷𝟶𝟶

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟸𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟶𝟶𝟷𝟶𝟶

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟹𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟷𝟶𝟷𝟷𝟷

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟺𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟶𝟷𝟶𝟶𝟷𝟷

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟻𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟼𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶

𝟶𝚡𝟺𝟽𝟿𝙰𝟼𝙰𝟽𝟻: 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶

As you can see, this is an 𝙰𝚛𝚛𝚊𝚢 of length ten (𝚌𝚑𝚊𝚛[𝟷𝟶]). Yet only seven of those elements represent the actual 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝 (this “true length” is stored elsewhere, like 𝟶𝚡𝙳𝟶𝟿𝙵𝙰𝙲𝙴𝟻 = 𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟶𝟷𝟷𝟷). This means that resizing the list is as simple as simply changing the memory address that has already been preallocated. When we run out of memory addresses, we just replace that with a longer 𝙰𝚛𝚛𝚊𝚢. This has the advantage of being very versatile, but it wastes a lot of memory.

This all leads to one question:

Which list storage method is better for our purposes?

The answer is an 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝. We don’t know what the length of the translation would be beforehand (“backpack” in Chinese, for example, is “背包”), so 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝s give us the freedom to change this later.

0 notes

syen998-twenty-percent-project · 6 months ago

Text

1/6 Twenty Percent Project Blog

I am pleased to report that I have worked on my Twenty Percent Project quite a bit since my last blog post over a month ago, back on December 2. I have listed some of these updates below:

First, the translator now supports two languages. Text can be translated both from English to Chinese, and from Chinese to English.

Secondly, the translator has actually been optimized quite a bit, both in terms of saving memory and saving time. Previously, the words were parsed from the input 𝚂𝚝𝚛𝚒𝚗𝚐 using a confusing mess of 𝚏𝚘𝚛 loops and conditional 𝚒𝚏 statements. Now the words are instead parsed using a simple, much less tedious, chunk of RegEx. This is superior both in terms of parse speed, and in minimizing the memory usage (in terms of the actual text part of the code).

Furthermore, the translating process has also been changed completely to allow for future improvements and features. The translated words are now stored in an 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝, rather than a 𝚂𝚝𝚛𝚒𝚗𝚐, This has a variety of advantages. For one, 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝s are much easier to edit and interpret than a 𝚂𝚝𝚛𝚒𝚗𝚐—getting the 𝚒th word from an 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝s is as simple as 𝚝𝚎𝚡𝚝[𝚒]. Additionally, 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝s also pose an advantage in terms of speed and memory. 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝s are actually partially filled 𝙰𝚛𝚛𝚊𝚢s—an operation performed on an 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝 uses only the existing 𝙰𝚛𝚛𝚊𝚢, while an operation on a 𝚂𝚝𝚛𝚒𝚗𝚐 must return an entirely new 𝚂𝚝𝚛𝚒𝚗𝚐. In other words, with an 𝙰𝚛𝚛𝚊𝚢𝙻𝚒𝚜𝚝, neither memory nor time is wasted.

0 notes

syen998-twenty-percent-project · 7 months ago

Text

12/2 Twenty Percent Project

Now that the questions about the way the actual translation part will be coded have been sorted out, we must now focus on the user interface. Currently, the user interface is console-based—users must type everything into the Java Integrated Development Environment’s text-only console, producing a rather inelegant and unsatisfying result.

Instead, it is probably best to use a more graphical interface. One way to do this is to use the Processing library, which has a lot of graphical functionality, such as drawing rectangles or circles. For the actual input, on the other hand, we could use the Swing library.

For the layout, we could put the language selection menu on the top left, then put the text field for the input text on the top right. Below it, we could do the same for the output language. At the very bottom, we would put the [Translate] button.* However, note that this is just early brainstorming, so the final product may not match this description exactly, or have a graphical display at all.

*Note that I decided to force users to specifically press a button to translate the text, as the actual translation part can be quite slow. If it were to be translated in real time, this would have the potential to waste a lot of not only time, but also memory and energy, especially since it would end up translating half-finished sentences most of the time. A specific button would instead allow the text to be translated only when the user is ready.

0 notes

syen998-twenty-percent-project · 7 months ago

Text

11/18 Twenty Percent Project

Now that we have a .𝚝𝚡𝚝 file with a list of English words, we also need a .𝚝𝚡𝚝 file with a list of words from other languages, like Chinese or Spanish, for the translation to be implemented. However, it can’t just be any list of words—the words need to match up. For example, if the word at index 934 of the English .𝚝𝚡𝚝 file is “dog” and the word at index 934 of the Chinese .𝚝𝚡𝚝 file is “書” (book), then the word “dog” will be translated to the Chinese word for book. We thus come to the following realization:

For every single .𝚝𝚡𝚝 file, the word at each index must match up with the word at said index for every other .𝚝𝚡𝚝 file.

Of course, I have already found an English .𝚝𝚡𝚝 file suitable for my purposes, as English is not only a common language, but also my primary language. However, finding a .𝚝𝚡𝚝 file that matches up with the words in the English .𝚝𝚡𝚝 file will be the difficult part. I have ten-thousand words in my English .𝚝𝚡𝚝 file, so the chance of another randomly ordered .𝚝𝚡𝚝 file will match up with it is one in 10000 ⋅ 9999 ⋅ 9998 ⋅ ⋯ ⋅ 3 ⋅ 2 ⋅ 1, or less than one in 2.84 ⋅ 10³⁵⁶⁵⁹.

Instead, it is probably best to take the original English .𝚝𝚡𝚝 file and manually translate that, perhaps with the aid of a machine, to form the .𝚝𝚡𝚝 files for the other languages.

0 notes

syen998-twenty-percent-project · 8 months ago

Text

11/4 Twenty Percent Project

Now that we have discussed the memory impacts of the storage, we can now discuss the actual code itself. One way is to use Eclipse to store a list of words as a .𝚝𝚡𝚝 file, then convert it into a list of 𝚂𝚝𝚛𝚒𝚗𝚐s using 𝚓𝚊𝚟𝚊.𝚒𝚘.𝚁𝚎𝚊𝚍𝚎𝚛. We can also do this for each language because of the versatility of the 𝚂𝚝𝚛𝚒𝚗𝚐 object. Then, we can simply use a 𝚏𝚘𝚛 loop to run through each word in the input and replace it with the word from the desired language. Here is what the code snippet could look like, to translate an English 𝚒𝚗𝚙𝚞𝚝 into a Chinese 𝚘𝚞𝚝𝚙𝚞𝚝:

𝚂𝚝𝚛𝚒𝚗𝚐 𝚘𝚞𝚝𝚙𝚞𝚝 = 𝚒𝚗𝚙𝚞𝚝;

𝚏𝚘𝚛(𝚒𝚗𝚝 𝚒 = 0; 𝚒 <= 𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜.𝚜𝚒𝚣𝚎() - 𝟷; 𝚒++)

{

𝚘𝚞𝚝𝚙𝚞𝚝.𝚛𝚎𝚙𝚕𝚊𝚌𝚎(𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜.𝚐𝚎𝚝(𝚒), 𝚌𝚑𝚒𝚗𝚎𝚜𝚎𝚆𝚘𝚛𝚍𝚜.𝚐𝚎𝚝(𝚒));

}

However, note that words that aren’t part of the words in 𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜 will not be translated. There are two possible cases in which this could occur:

A typo on the user’s part: Other than asking my users to type more carefully, I can also check for typos by replacing certain letters with their often-typoed variants. For “tgis,” for example, I could replace the “g” with an “h” to get the actual, intended word, “this.” Alternatively, I can also slightly rearrange the letter order. For “tihs,” for example, I could swap the positions of the “i” and “h” to get “this.”

Input contains archaic symbols: If the input were to contain a word like “🤣🤣🤣,” its translation would be the same in any language. In this case, not translating the word would be the right move.

0 notes

syen998-twenty-percent-project · 8 months ago

Text

10/28 Twenty Percent Project

In our blog post, we already figured out how to store the translations, but not much else. In this blog post, we will discuss the memory impact of storing such a large quantity of data.

In ASCII, characters are stored using one byte, or eight bits. This allows ASCII to store up to 256 different characters, but can also be somewhat memory intensive. Suppose, for example, we wish to store 100 thousand words, with an average length of 4.7 characters. Suppose we also need to reserve an extra character for a delimiter to separate each word, making the average length 5.7 characters. We can multiply 100 thousand and 5.7 together to find that the total number of bytes will be 570 thousand bytes, or 557 kilobytes. This is actually not that much! For comparison, this is less than the amount of memory that a medium-sized image takes up.

But what about other languages? In Chinese, characters are not stored as ASCII, but rather as a Unicode character with three bytes, or twenty-four bits. This allows it to store up to 16 million different characters but is three times as memory intensive, compared to ASCII. Luckily, however, each Chinese word is only one character, so storing, say, 100 thousand, Chinese words will only amount to 600 thousand bytes, or 586 kilobytes. Again, this is still relatively very few bytes.

Of course, if we wish to store multiple languages, the memory usage will increase, but is still unlikely to take up more than a few images’ worth of memory.

0 notes

syen998-twenty-percent-project · 8 months ago

Text

10/21

I have finally started my Twenty Percent project today. To start off, I have already chosen to program my Twenty Percent Project in Java, which is an object-oriented programming language created in 1995. It is also high-level, so it would also be fairly easy for me to program in it, compared to other, lower level programming languages such as machine language.

Now that we have chosen the programming language, we must consider the exact manner in which the program will run. The first step in translating a large chunk of text is to replace each word or phrase with its respective translation. To do this, we could simply run through each word in the passage using a simple 𝚏𝚘𝚛 loop, identifying the word, and using the 𝚛𝚎𝚙𝚕𝚊𝚌𝚎 command to replace the word or phrase with its translation. Of course, languages often have hundreds of thousands of words, so a 𝚂𝚝𝚛𝚒𝚗𝚐𝙻𝚒𝚜𝚝 will likely be the best bet for my specific use case. For example, a code snippet may look like this:

𝚂𝚝𝚛𝚒𝚗𝚐𝙻𝚒𝚜𝚝 𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜;

𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜 = 𝚗𝚎𝚠 𝚂𝚝𝚛𝚒𝚗𝚐𝙻𝚒𝚜𝚝();

𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜.𝚊𝚙𝚙𝚎𝚗𝚍("𝚑𝚞𝚖𝚊𝚗");

𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜.𝚊𝚙𝚙𝚎𝚗𝚍("𝚍𝚘𝚐");

𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜.𝚊𝚙𝚙𝚎𝚗𝚍("𝚌𝚊𝚝");

Which would create a 𝚂𝚝𝚛𝚒𝚗𝚐𝙻𝚒𝚜𝚝 named “englishWords” with the words “human,” “dog,” and “cat” in it.

For the translation, a code snippet may look like this:

𝚂𝚝𝚛𝚒𝚗𝚐𝙻𝚒𝚜𝚝 𝚌𝚑𝚒𝚗𝚎𝚜𝚎𝚆𝚘𝚛𝚍𝚜;

𝚌𝚑𝚒𝚗𝚎𝚜𝚎𝚆𝚘𝚛𝚍𝚜 = 𝚗𝚎𝚠 𝚂𝚝𝚛𝚒𝚗𝚐𝙻𝚒𝚜𝚝();

𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜.𝚊𝚙𝚙𝚎𝚗𝚍("人");

𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜.𝚊𝚙𝚙𝚎𝚗𝚍("狗");

𝚎𝚗𝚐𝚕𝚒𝚜𝚑𝚆𝚘𝚛𝚍𝚜.𝚊𝚙𝚙𝚎𝚗𝚍("貓");

Which would create a 𝚂𝚝𝚛𝚒𝚗𝚐𝙻𝚒𝚜𝚝 named “chineseWords” with the words “人,” “狗,” and “貓”—the respective translations of “human,” “dog,” and “cat”—in it.

We will discuss further ramifications of this storage method in our next blog post.

0 notes

syen998-twenty-percent-project · 9 months ago

Text

10/7 Twenty Percent Project

It is my deepest regret to inform you that I still have not yet started my Twenty Percent Project, as 100% of my allocated time for working on the Twenty Percent Project is spent writing these weekly blog reports. If there is ever a day where writing these weekly blog reports takes up less than 100% of my allocated work time, that will be the day when I finally begin working on my Twenty Percent Project. Of course, I do not dare go over my allocated work time, as that would just be considered academic dishonesty, pure and simple.

Luckily, I did manage to squeeze in some brainstorming here and there. During my brainstorming, I noticed that a “good” translation has never been formally defined. This means that there is often no consistency between translators, as different translators would have different translations for the same passage of text. I guess what I’m really trying to say is that some translators may focus primarily on making the translation simply “sound good,” often cutting out huge swaths of information to make the translation “flow better,” while other translators focus on ensuring that ever last chunk of information is preserved in the translation. For convention, my translator will do the latter. Some translators, such as Google Translate, may translate passages in a rather lossless manner, creating translations that are at best humorous and at worst confusing. I don’t want that for my translator, or to make a copy of a translator that already exists.

0 notes

syen998-twenty-percent-project · 9 months ago

Text

9/30 Twenty Percent Project

As you may know, I will be making a translator for my Twenty Percent Project. This has not yet been started, but here is how it is intended to work:

First, use a language-to-language dictionary to replace each word and phrase with their respective translations.

Next, identify each word type, such as nouns, verbs, adjectives, adverbs, or something else.

Using the above word types, arrange the words and phrases in a manner that is both grammatically and syntactically correct for said language.

Now, let’s see it in action. Let’s consider the translation “The quick brown fox jumps over the lazy dog” into Chinese. The first step is to replace each word with its Chinese translation, so “The quick brown fox jumps over the lazy dog” becomes “那,” “快,” “棕色,” “狐狸,” “跳過去,” “這,” “懶惰的,” and “狗.” Now, notice that one might then translate this into “那快棕色狐狸跳過去這懶惰的狗.” However, “狐狸” (“fox”) and “狗” (“dog”) require the measure word “隻” before both the animal and the adjective, so the sentence would actually become “那隻快棕色狐狸跳過去這隻懶惰的狗.”

The same reasoning can also be applied to other languages, such as Spanish. However, we need to remember to also conform to Spanish grammar and syntax. For example, “brown fox” in Spanish is “zorro marrón.” However, “zorro” does not mean “brown” and “marrón” does not mean fox—it’s the other way around. “Zorro” means “fox” and “marrón” means “brown.” This is because, in Spanish, the adjective is usually placed after the noun it is describing. In English, on the other hand, the adjective goes before the noun.

0 notes