#technically these ''excel'' files are csv files so all they need is a text editor that can handle really big files
Explore tagged Tumblr posts
Text
update: they had unzipped it!
but also some of the files they want to open are like 2.5Gb (aka 10x bigger than excel can handle) so I can see why they were having trouble
I don't understand why they couldn't have figured out a solution in the 2 years that they've had this data, but...
client: can you pull out all the mutations on one chromosome from our analysis results? the excel files won't open on our computer ;-;
me: I mean, sure... [feels like something you should've already figured out since you've had this data for 2 years but whatever]
client: great! We uploaded all the files from the compressed folder you gave us 2 years ago to our google drive, we really need this by Friday!
me, growing suspicious about whether they've ever correctly unzipped that folder: cool, I'll get started, but out of curiosity, have you actually fully extracted the contents of that folder? With a third-party tool since Windows can't handle it?
#work haps#technically these ''excel'' files are csv files so all they need is a text editor that can handle really big files#which do exist!
1 note
·
View note
Note
I'm thinking of trying my hand at some GBA Fire Emblem ROM hacking, and I was wondering, what software/program/whatever was used to make Sacred Echoes?
My workflow for Sacred Echoes ended up being pretty similar to how the original devs built FE8 - meaning I was mostly working with source code and the compiler with various command-line utilities to convert my asset files into a data format the GBA could handle. When I started the project in mid-2018, I already had some formal education and work experience in programming, so I was past the steepest part of the learning curve for these specific tools.
Please note that my methods are NOT the methods I would recommend for a first project unless you're already familiar with the software development process and using command-line tools. I went into Sacred Echoes knowing I would need to write a bunch of custom code to modify the game mechanics beyond what the beginner tools at the time allowed me to do, so I chose the more complex path. If you're looking for an all-in-one graphical editor that's more friendly to beginners, FEBuilder is amazing and constantly updated with new functions. Whichever method you decide on using, the FE Universe forum and discord are full of resources, tutorials, and helpful people, and I wouldn't have been able to succeed without them. Best of luck on your project!
That said, here's all the technical details and links to all the tools I used:
Sacred Echoes was built using a combination of GNU make (a build system used to automatically detect and compile changes to source code in large projects) and Event Assembler, a utility primarily built for editing the GBA Fire Emblem games. Event Assembler is used with a method called the buildfile, which is essentially a fancy text file with instructions for Event Assembler to insert source files into a ROM and linking different parts together. This meant I used different tools for creating each type of data. Unlike with a ROM editor (such as FEBuilder), I wasn't constantly saving my changes to the same ROM file, but instead freshly building it each time I made a change and wanted to test. This meant that if I messed up (very common when writing custom code), I could just comment out the relevant code or instructions in the buildfile and rebuild from source, rather than try to pick through the ROM by hand to fix issues.
There were cases where I would need to view and edit raw binary data with a hex editor (usually to find a pointer to compressed graphics or a data table); I prefer HxD for that.
For graphics, use any program that can edit and save .PNG files (I used MS Paint and GIMP), and then a tool for game graphics called Usenti to put them into a format the GBA can read. If you need to find and rip graphics from a ROM to edit them, GBAGE is the gold standard (and comes built-in to FEBuilder).
Maps are built from the tileset graphics using a program called Tiled.
For music, the GBA uses MIDI sequences, so any audio program with MIDI support works fine for that. (I used Anvil Studio). The MIDI file is then converted to GBA with a utility called midi2agb.
For unit data and other large data structures, I used a spreadsheet in CSV format, which can be edited with a program like Excel or LibreOffice Calc.
For map events and loading units, the GBA FE games use a scripting language called Event Assembler Language, which just gets written in a raw .txt file. A good plaintext editor like Notepad++ or SublimeText can help keep track of language syntax and keywords.
For assembly code, it is also written in a text editor, and then compiled to bytecode with devkitARM. Most of it I wrote in raw ARM assembly language (which is specific to the GBA's CPU), but in more complex cases towards the end of the project I wrote the code in the C programming language and compiled it with devkitARM.
To keep track of my source files and changes, and to make backups and version control easier, I just used GitHub because I already had an account, but you could also use GitLab or Bitbucket instead.
Finally, I used some tools made by the FE hacking community specifically for automating some tasks and formatting data - most of these are Python scripts, but some can be downloaded as compiled executables. I used "lyn", "TMX2EA", "C2EA", TextProcess and ParseFile, and AnimationAssembler. Ask on the FEU discord or check the forum's toolbox tag.
8 notes
·
View notes
Text
A checklist for computer science undergrads
influenced by john regehr's 'basic toolbox' post about this topic, i thought i would throw my hat into the ring given that my experiences have been different than john's and seem to be at odds with what i have observed from working with many competent developers.
As i was leaving grad school, a friend of mine suggested to me that a winning strategy in Industrial Design had been to pick some medium that you worked well in and focus on doing all your work with that. The rationale here was that starting anew each project with a new medium invariably impacted the execution of the final deliverable distracting your prof/critic/peers from the high-level feedback you actually wanted on your work, creative vision, etc.
The advice there is to focus less on the tool and more on using a tool efficiently to communicate your ideas. In most cases it does not matter what the tool is as long as you can deploy it to solve problems in your domain.
Much of the tooling that exists in CS is directed at very specific users: working programmers. using these tools correctly as an undergrad is aspirational, but often their execution is distorted in academic contexts.
Every lab or workplace should expect to bootstrap new hires on internal tooling/workflows and almost none of them should assume prior knowledge. Depending on the aims, the only hard requirement should be ability to program in a language or framework similar to the one being used.
Core skills
A single programming language
You do not need to be ultimately proficient in every language, you just need to be able to sketch out and implement a solution to most problems you encounter in one language you enjoy working in. Which language you pick does not matter. If you are in john's classes, however, you should probably ensure that you know two languages: a compiled/systems-ey one (rust, go, c, java, swift, clojure, etc) and a scripting language (python, ruby, javascript, clojurescript, elm, mathematica, anything goes here as long as it has a repl or runtime that you can use to hammer out solutions to problems).
If you're not one of john's students, typically the scripting language will suffice (although it is generally rare to finish a cs program being exposed to only one language).
s/Text Editor/Touch Typing/i
The advice to be familiar with a text editor is largely a request from others who expect you to competently pair-program with them at their pace. The point of knowing an editor is much the same as knowing at least one language passably: it should not be something that gets in your way.
More essential than being comfortable with a specific editor (it honestly does not matter which one as long as you like using it and you are productive with it) is being comfortable touch typing. In the event that slack or other IM platforms have not made you a better touch typist, it is well worth investing time if only so that the act of writing anything is no longer a major time hinderance.
At some point, you may find yourself bored or in need of procrastination and decide you want to customize your editor: that is a perfect time to try something like sublime or atom or vi or emacs.
rough shell experience
you should be able to navigate around a filesystem, make directories, read directory listings and read the cli help documentation for most commands.
you absolutely do not need to know the details of your shell's preferences around glob expansion or how to write legible shell scripts. you can learn that, but after a certain point, all the obscure functionality ends up beng more "dev-ops" style knowledge that rarely pays any dividends except when developing commercial developer-facing internal tooling.
incidentally, getting students past the hurdle of commandline BS is almost certainly a job of an advisor (or postdoc). Ignoring it helps nobody and if a research project's documentation (q.v. below) is poor or nonexistent, the PI only has themselves to blame for this ongoing time commitment.
reading documentation
this is probably the weakest skill i have seen from folks coming out of undergrad. nobody expects you to know all of a language, all of its quirks, etc etc. what you are expected to know is how to find the answer to any reasonable question around your language or toolchain of choice.
A useful skill: you should be able to, given a stylized block of shell commands, paste those into your terminal one-by-one in order to bootstrap some project i.e. ./configure && make && make test. nobody should expect that you understand autoconf unless your research project is specifically devoted to it in some obscure way (i'm sorry if this is the case).
Specifically, you do not need to know how to parse an excel-formatted csv, but you should know where to look (or be able to find a solution) in order to do that in a reasonable amount of time. You do not need to know what an ideal runtime serialization format is for your language, you only need to call back on the terms you learned in your cs classes: marshalling, serialization, persistence, writing data, etc. although it can be useful at the extremes, be skeptical of the amount and quality of programming language trivia you know offhand.
writing documentation
no, this is not technical writing. this simply means you should be able to write a plain text file for each project that outlines
how to build some program
what its implicit dependencies are
what its arguments are
what the exposed/public api is
aside from being useful to others, in roughly six weeks or half a semester, this will invariably be of use to future-you as well.
a good acid test here is pointing a friend to the project and asking if they can build it and understand how they might use it. at some point you will embed this knowledge into a Makefile, shell script, or some other dsl, but until then it is infinitely more useful to write down the steps.
html
unless this is your job (or you intend it to be) you only need to know how to make an academic-level webpage which requires only the most basic knowledge of semantic html: h1, h2, ul/ol li, p, a, img, pre, strong, em (optionally hr, dl dd & dt). avoid css. if anyone gives you shit, you can invoke "Default Systems" giving you a perfectly valid excuse to stop devoting any more attention to design after you have mastered those tags.
reproducing errors
it is unclear when you are an undergrad or novice if you have encountered a truly exceptional case or if you simply have no idea what you're doing. Make a habit of reproducing and then writing down steps to reproduce edge cases you encounter and share them with people you ask for help from.
above and beyond, if you can identify the specific step (or code or whatever) that you invoke that (seemingly) causes the error, you will have an easier time teasing apart the nature of bug as you are telling someone else about it.
the most basic of data visualization skills
all this means is that nobody is actually good at doing this and everyone thinks that two hours peeking at ggplot2 has made them wizards at communicating the complexity of some dataset or results. it hasn't.
in many cases it suffices to be able to graph something from mathematica, R, d3, mathplotlib, or google sheets / excel. again, nobody cares how you do it as long as you do it and it doesn't take you all day. if your lab or workplace has some in-house style for doing this, they will need to train you how to do that anyway.
nonlinear spider-sense
the single reason "big o" notation is taught in school is so that at some point you can look at a performance regression and say "ha, that almost looks like a parab—o.m.g." the ability to recognize code or performance that appears nonlinear (or pathologically exponential) is probably one of the core things that i think undergrads should try to hone because during almost no other time will you be asked repeatedly, and at length, to explain the space/time complexity of arbitrary blocks of code.
computers are fast enough that you can usually be blasé about performance but eventually you'll start looking. being able to recognize something that is accidentally quadratic is often the most practical day-to-day application of cs theory—hone this spider sense.
Nice to haves
Version Control
there is a large chasm between "git for one" and "using git as a team" and that harsh valley is almost certainly due to the large amount of human communication and coordination required to work on a project as a team. Most people stress learning git, but this is largely useless advice because most of git or hg's corner cases and weirdness only come up when you're trying to integrate your work successfully among your teammates. It is good advice to perhaps become vaguely competent using git or mercurial or rcs, that experience will almost certainly pale in comparison to the massive flail when you are trying to set up multiple worktrees to create integration branches that contain the contents of multiple prs (each likely with their own rebase/merge/squash quirks).
to that end, you should learn to, say, create a commit and push your work, but everything else beyond that is almost certainly guaranteed to be complicated by whatever your team's workflow is (github prs, phabricator, gerrit, etc). i have rarely met people outside of professional or open source contexts that are capable of producing sensible chained commits or sane pull requests, it is simply not a skill that is required outside of contributing to open source or working on a commercial application. When people ask for git experience they secretly crave this flavor of professionalism that it took months to acquire at each of their prior jobs or internships.
A Presentation Tool
the baseline here is very low, you only need to be able to make a presentation and in all likelihood if you are still an undergrad, you easily have ten-plus years of doing this already. worry about fonts/design/transitions/etc once your content is solid.
most people produce terrible presentations making the needed baseline here quite low—it is more important that you know how to practice giving a presentation than it is to actually create the slides for it.
debugger knowledge
i have met many successful professional working programmers that have little to no idea how their language's debugging tools work. if you are a gdb wizard this sounds shocking on its face but lots of developers make do just fine without them. This is not to say that you should be willfully ignorant of debuggers or eschew them (especially if this is part of your curriculum), but nobody should look down on you if you learn (or are taught this) On The Job.
many of these tools are technically robust but have a ui only moderately less hostile than an opaque box of loose razorblades and chocolates. much like git, most developers internalize some form of stockholm affection for these tools despite their poor design, nonexistent editor integration, and often incomplete terminal support.
you should understand roughly what a debugger is and what it can (and can't) do, but it's almost certain that you won't need to have mastered debugger internals straight out of college.
build systems
this is honestly a "top of maslow" need. This is great knowledge if you are planning to distribute code or need it to build dependably/reliably on others' computers, it is absolutely inessential for an undergrad to understand to do this level of orchestration except as documentation for others to evaluate that your project actually builds etc etc. if your advisor or boss asks you to learn something like make or whatever, then by all means.
You should know what a make tool is for and when it is necessary, but you should not expect that to apply to the lion's share of work you do in school.
working for a period of time before asking for help
although this should be a core skill many adults are incapable of doing this effectively. there is a tradeoff between "i'm learning" and "i'm being unproductive." In an academic lab, arguably much of your experience will appear to be some quantum state that simultaneously inhabits both extremes but your goal should be attempting to independently arrive at a solution and after some time cut-off (which you should negotiate with your advisor/postdoc/pi/whatever) you should say "i tried $A, $B, and $C to accomplish $GOAL and was unable to make any progress because $ERR_A, $ERR_B, and $ERR_C."
even the act of noting down "what i am trying to accomplish, how i tried, what went wrong" may in itself lead you to a correct solution, but without having done that due dilligence and outlined those aspects, it will be difficult to receive good feedback from somebody that is trying to help you.
unit/integrated/etc testing
if you find that something like TDD is useful for you as a productivity or refactoring tool, keep doing that! most working software people cannot even agree on what the point of testing is, so it feels unfair to burden undergrads with this. in a professional context, you will be in a codebase with some established testing norms, you need only mimic those until you have determined what works for you.
there are lots of sane and sensible resources for writing tests or thinking about tests. understand that everyone does testing slightly differently so your best bet will be to figure out how testing plays a part wherever you go. in most cases, that codebase will have a specific incantation to invoke tests, your best bet is to ask how they do things there are just go from there if the setup is not obvious.
understanding scope
most academic projects are poorly managed because they have inconsistent pressure to be profitable beyond whatever funding inspired them. simultaneously, many academic advisors are not trained well to manage or lead a team (remember, most were hired to write grants and produce research papers (or possibly to teach)). management is something an advisor is literally picking up "on the job".
If you are unsure what exactly you are supposed to do, you should clarify as soon as possible what deliverable is expected and when it is due. This seems obvious, but because communication is complicated you may end up assuming you need to, for instance, resolve outstanding cli argument parsing bugs rather than only needing to add support for a new one. Understanding the scope of a project you've been assigned prevents you from doing redundant work or opening prs that will never get merged.
language idioms
If you are cozy with a programming language, the natural evolution here is to begin learning what idiomatic programming is like for it: what are common libraries, do people tend to program it functionally or imperatively, for or map?, what patterns are awkward or hard to read, what are common tools in its toolchain, how do people use it to write web services, how do people use it to avoid shell scripting, what are its peformance pathologies, etc. this is the extension to knowing how to read the documentation: it is developing intuition about the language to avoid doing counterproductive work in the future.
Many developers learn one language and become fluent in its quirks then proceed to apply those to every language they see later on. if you encounter this as a novice, it may appear that they are simply Better Programmers and not, instead, people who are speaking a pidgin-python with a heavy haskell accent.
To recap
It is something of a mistake to hope that a cs student will have the gradually developed and refined skills of a professional tradesperson. Graduating cs students often do not have strong professional software development experience (this is what internships are meant to accomplish) but are good at thinking about design/architecture. if, at the very minimum, as an undergrad you can churn out some ruby and have the runtime execute it, you're usually in great shape.
most cs programs do not train students to develop tightly crafted applications with industry-tested documentation/syntax/structure/workflows etc. bootcamps, however, do stress this sort of thing, which causes a confusing periodic wave of "college is dead, long live bootcamps."
when looking at job descriptions or other checklists, it's useful to try to gaze back at the abyss and ask "why was this listed here?"
John's research is compiler-focused, deals with undefined behavior, and often invokes llvm, c, and other "low level" toolchains. a strong undergrad cs student will be able to intern with john productively because the core of his research focus is mostly general to computer science: correctness, compiler behavior, etc. someone with deep knowledge of C, llvm, compiler design/internals, etc is almost certainly in a position to become one of his graduate students or postdocs. I think john's list is interesting, but i think it emphasizes details that are often foreign to developers at all skill levels.
finally this list is biased itself, so take it with a grain of salt: all my work experience is in design and frontend/backend web development and the skills listed here represent the qualities i've observed from successful interns and developers i have interviewed and worked with in the past ~ eight years. my experience is clearly n=1, but among the things i've noticed is that it's easy to get people to learn git, but it's hard to get somebody to internalize recursion, nonlinear growth, or canonical architecture patterns within the same time period. i'm not saying it's impossible, but if you're a cs student, this is 100% what the point of most cs programs is.
2 notes
·
View notes