#wouldn't be in a reference-able format for me to find the files I need...
Explore tagged Tumblr posts
Text
this chem software is infuriatingly proprietary
I knew I was going to have to deal with this down the line because it was obvious this data wasn't being managed correctly; I hoped it was. but it wasn't, and now I'm opening each file and writing down metadata.
Pretty sure there's python code to do this metadata collection, and I'm a little annoyed I'm not better at coding to whip something up to circumvent the way this was organized. It's one of those things where it'd take longer to put the script together than just cranking through. Plus learning a new library is a pain. Ideally I won't have to repeat this process either; when I do it myself I'll know how to manage it.
#I think 90% of research is data management. I got lucky that I failed so much at my job as practice tbh >.>#Aside though: any software that doesn't export a raw data file to txt/csv should be taken out back.#I should investigate opening these with different kinds of software tbh. But I'd still need to do this process because the metadata#wouldn't be in a reference-able format for me to find the files I need...#/makes frustrated hand gestures#c'est la vie~#ptxt#alright break over lol
5 notes
·
View notes
Text
While I was able to use the holiday Monday at work to catch up on stuff I haven't been able to do in the last six weeks I've been doing my job alone, I spent way more of that time than I wanted to in trying to find a way around Excel breaking my labor saving spreadsheet by refusing to follow its own rules.
I help out my supervisor by turning the reports our phone system generates into a chart of call times she can analyze for coverage purposes. There are two hurdles the way it generates the reports creates. One of them is that the date, beginning time, and ending time, are all in the same cell of its line, and the other is that if there's no active call time in that span, it will more likely than not skip that span.
The easiest way I've come up with to find those times it skipped is to break up the time stamps into separate date, start, and stop cells, and then use conditional formatting to highlight the start times that are different from the end time of the line above. Originally I was using text to columns to do this, but then I decided I wanted to automate that. I set up a spreadsheet that would take the file name of the report and fetch the date and time cell from it, then an array of cells using the MID function to pull out the individual pieces of it.
It worked great! I could just tell it the file to look at and it got the data I needed. And then I'd go to add a line for a skipped time span and all of the formulas would break, because they were referenced based on the line number, and Excel ever so helpfully updates those references when your data moves. But it's okay because if you don't want the reference to be updated, there's a character for that. To keep the same line number, use B$2 instead of B2.
I worked out a fancy formula with INDIRECT, LEFT, and the new to me FORMULATEXT function to automatically assemble a new version of the formula with the crucial absolute reference for each row, since the absolute meant it wouldn't update by line if I just filled down and I was not going into over 300 cells to add one character by hand.
EXCEPT! Marking the reference as absolute only freezes the reference for pasting and directional filling! It turns out it totally ignores the $ if you're shifting and inserting! Excel broke my plans because it doesn't follow its own rules!
After like two hours of beating my head against it and reading a bunch of forum help threads where the answer was "just use INDIRECT" when I was already using INDIRECT, using COUNTIF to count only the cells above that had data in them seemed promising, but it kept giving reference errors as part of the INDIRECT, probably because the COUNTIF syntax needs you to tell it what to look for, and I think the quotation marks around the asterisk weren't playing nicely with the quotation marks of the INDIRECT even though I was using " for the latter and ' for the former. Finally I started looking into other COUNT_____ functions and it turns out that plan old COUNT does exactly what I was looking for. Where "count cells that have data" with COUNTIF needs you to specify cells containing "*", COUNT just does it. By some miracle, I found the right syntax to have the INDIRECT assemble the COUNT with a range from B1 to (current cell) in only one or two tries, and now I finally have a formula that doesn't care if I add lines which are empty in the column it's looking at.
Now I just need to automate adding the missing rows and filling in the zero values in the column I'm doing this all for, but that seems beyond what I can do with just Excel on its own. Seems like something that would be simple to execute in Python if I export a CSV, and if I could get anything to work in VBasic I could probably do a macro, but I'd prefer not to step it out of Excel and back in, and I don't think our workstations have Python, and if they don't have it, I can't add it...
2 notes
·
View notes