Tumgik
esstoosci-blog · 13 years
Text
pomodoro technique
Umesh recently discussed the Pomodoro Technique (slides), a time management method that uses a timer to break down tasks into 25-minute intervals called 'pomodori' (PDF book on PT, book on PT). The work intervals are separated by brief breaks, usually five minutes each. The name comes from the tomato-shaped kitchen timer that was used by Francesco Cirillo, the inventor of the method, when he was a university student. Why care? Because you have a lot to get done and using your time effectively is crucial to getting it done.
Related concepts that Umesh mentioned include Stephen Covey with The 7 Habits of Highly Effective People and David Allen with Getting Things Done (GTD). Covey makes four quadrants from the combinations of urgent and not urgent with important and not important. This framework, in the least, seems like a useful thing to have in the back of your mind, like which of the four quadrants is this activity in, if it's in the wrong quadrant consider switching to other activities. GTD is based on 'making it easy to store, track and retrieve all information related to the things that need to get done'. It uses things like a weekly review to, among other things, determine the priority of the individual tasks and commitments gathered during the workflow process. Another tool is mind maps, which you can use to 'represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea'. Other stuff that Umesh mentioned are the Unschedule and GQueues.
Next point was some reminders of why these tools are important. First, Parkinson's Law: Work expands so as to fill the time available for its completion. Second, mistaking activity for accomplishment. Umesh gave some examples of non-accomplishment activity: Gmail, Rotten Tomatoes, Cat Melon Head, YouTube, Roger Federer, Engadget, Facebook, OMG! Ubuntu!, etc.
You need a timer (and there are many software timers, see below); an Activity Inventory, where you'll make estimates of the number of pomodori needed for each task; and To Do Today. For example Activity Inventory, 'Answer questions on thermodynamics in Ch. 4, 3'. For To Do Today, you'll allocate, essentially making blanks for each of the tasks that you'll do. Normal thing is to use a fixed number of pomodori. Pick a task from the To Do Today. Set timer to 25 minutes and work on it until the timer rings. Then mark it done. Take a five minute break and continue with the next pomodoro. After four pomodori, take a 30 minute break. Continue until the task is done.
Pomodoro rules include 1 pomodoro = 25 minutes work + 5 minutes break. Long break (30 minutes) after four pomodori. If you think a task will take more than 7 pomodori, break it down. Once a pomodoro begins, it has to ring. Further, protect the pomodoro. For example, void it if interrupted often. Consider internal interruptions (self) and external interruptions (others).
Mark internal interruptions, such as with an apostrophe in your To Do Today. If not urgent add it to your Activity Inventory (perhaps marking it as not urgent). If urgent, add to To Do Today, such as under Unplanned & Urgent. External interruptions, inform, negotiate and call back. Last thing is to plan, track and record. Look at what you've wanted to accomplish and compare to what you did accomplish. Pros of Pomodoro Technique include that it's easy to concentrate for 25 minutes, you have frequent breaks and it helps track productivity (and postpone interruptions). Cons include that it focuses on efforts and does not tell you how to prioritize.
Pomodoro tools include: Focus Booster; Pomodoro Crate; Pomodoro Pro; Pomodairo, Tomatoi.st, TimeBoxed, Menubar Countdown, Keep Focused, MaToMaTo, Pomodoro Windows 7 gadget.
Tumblr media
14 notes · View notes
esstoosci-blog · 13 years
Text
scientific publishing
Niko Kriegeskorte from the MRC Cognition and Brain Sciences Unit at Cambridge recently discussed the future of scientific publishing (his blog on this). He considered these questions: What's good and bad about the current system? What features define the future system and how can we transition to it?
Good things about the current system include that it provides a signal of what to read. It does this through peer review and journal prestige. One issue is whether journal prestige is a good indicator of whether something should be read, an issue that's considered below.
Another good thing with the current system is that it provides an appealing layout for papers. Niko mentioned that this is desirable for, but not critical to, scientific progress. I think of this more broadly than just layout, in that publishers implement frameworks for dissemination of the content. They put it in the right place, provide it in different formats, keep track of downloads, etc. They also enforce a threshold on these qualities. I would agree that this is desirable, but not essential. In some sense, things like bad layout are self-correcting. In the absence of enforced layout, people would likely gravitate towards those papers/labs/people with good layout.
Tumblr media
Something bad about the current system is that most journals are not open access. He makes the point that scientific papers benefit society only the extent that they are accessible. If the public pays for scientific research, it should demand that the results are openly accessible. To me, this is a weird straw man and almost certainly not true. It seems to assume that all research that would be published is publicly supported or at least that publications from all research should open since that's in the interests of the public even though they only pay for a subset of the research.
But what are the public paying for and why are they paying for it? Is the answer to that even singular? They may pay for things that benefit themselves or their society in the future, these being assumed in the they pay for it so they should have access to it line. But wanting to benefit doesn't mean they need access to the papers. In fact, most of those in the public who are paying for it may not care at all about accessing the papers. Some may care even less in the face of evidence that giving them access meant a worse system overall. In addition, other forms of communication might be better suited with respect to benefiting them. One could say that the public may benefit by having the scribbling and marker board content of labs that they are supporting. Should there likewise be a system of providing them with such content?
The technologies or knowledge that the research generates, however remotely connected, may be what actually matters to them. They wouldn't necessarily benefit by having access to all of the literature that contributes to the design of an A5 chip. They benefit from a faster and useful iPhone.
Continuing with bad things about the current publishing system, the main evaluative signal provided to readers is a journal's prestige, their reputation. The point here is that although the journal is correlated to the quality of the papers, the signal it provides is one dimensional, and therefore greatly impoverished. The content of the reviews and the ratings that reviewers provide to the journal are kept secret.
My view here is that it is unclear that this can only be fixed by doing away with the middle man or by going fully open. What's to stop someone from creating a place for rating and reviewing papers and charging for it? In fact, don't social networking reference management tools, among other things, already do that? They are already tracking things like what people are are reading what papers. They could also collect reviews, at least there is nothing stopping them from doing that right now.
Further point on journal prestige as an evaluative signal, it's compromised by circularity. Prestige is related to impact factors, which in turn depend on citation frequencies. Because something is published in a high profile journal it will be cited more, something that'll continue to make the journal more prestigious. This is true regardless of whether the papers are actually any good. Impact factors give us a quality index that's distorted to an unknown degree by this self-fulfilling prophesy of citation frequency. Also, the prestige reflects the journal, not the individual papers. The two are too weakly correlated with paper quality. On a related point, having only two to four reviews provides only a noisy evaluation signal to justify the influence high-impact publications have on the attention of the scientific community, publication policy, science funding and an individual scientist's career.
Another bad aspect of the current system has to do with the review process, which Niko says lacks transparency. It relies on secret reviews that are visible only to editors and authors. As above, the selection is based on two to four peer reviewers. Niko's point is that the quality and originality of a paper cannot be reliably assessed by such a small number of reviewers, even in the ideal case where they are neutral experts. Niko points out that reviewers are rarely objective, considering, for example, that they may be invested in the theory supported in the paper or invested in some other theory that's challenged by the paper. The current publication system also comes with long publication delays. Pre-publication review can carry on for over a year. Niko's point is that this delay slows the progress of science.
Moving towards the future of scientific publishing Niko points to some things that are in the right direction. These include arXiv.org, an open access paper repository. The PLoS Journals, which are open-access and invite post-publication commentary. Faculty of 1000, a commercial source for alternative paper evaluations. ResearchBlogging.org, which collects blog responses to peer-reviewed papers. And, Frontiers, which combines open access and democratic post-publication selection for greater visibility.
Tumblr media
So what about the system of the future? In Niko's view, it includes open access and open post-publication peer review. The idea is to immediately publish the paper and then allow open peer review and reception.
Some details about open post-publication review. Anyone can instantly publish a review and anyone can instantly access it. Every review is permanently linked to the paper. Reviews are digitally authenticated at different levels. There are signed reviews in which the author is authenticated and publicly identified, unsigned reviews by authenticated group members (e.g., a member of a professional group such as the Society for Neuroscience), and unauthenticated reviews.
Tumblr media
Further points about this kind of review: in order for reviewing to be open, it has to be post-publication; review and reception are an integrated ongoing process, which takes places after publication; reviews do not decide about or delay publication; peer review is not perfect, but it is the best evaluation mechanism we have; the most serious drawbacks of peer review derive from the fact that it is currently a secret process.
A comparison between current and future systems. In current, a review is secret communication to authors and editors. In future, it's an open letter to the community. In current, a review decides the fate of the paper. In future, it evaluates published work. In current, reviewer's motivation includes selfless qualities, such as scientific objectivity, and selfish ones, such as scientific politics. In future, reviewer's motivations include the same selfless qualities, but the selfish ones are replaced by looking smart and objective in public. In current, a weak argument can kill a paper. In future, an argument is as powerful as it is compelling.
Tumblr media
In the future system, authors may ask a senior scientist to edit a paper, at which point the senior scientist would choose three reviewers. The editor asks them to openly review the paper. The editor is named on the paper.
Also, a paper evaluation function, quantifies papers based on available meta-information. The simplest metric might be a weighted average of review ratings. These could be weighted by dimension or by reviewer information, such as by expertise factor, time investment, or independence of authors. They can also be optionally normalized by error margin, so that papers with more reviews have higher scores. Individuals or groups can define their own paper evaluation functions to prioritize the literature according to their needs. Papers with very high scores could be showcased in Science or Nature.
How do we make this happen? A public website for open posting of digitally authenticated post-publication reviews. PubMed-scale investment to develop collaborative software and install the system. This could involve public funding and involvement from Google. Papers published in the current system can be reviewed using the new system. Original reviewers can publish the reviews they wrote for a traditional journal. This provides a platform for continual online evaluation of the scientific literature. Tipping point reached when the evaluative signal becomes more reliable than journal prestige. At that point, papers can be published instantly without journals, as authenticated digital documents, like the reviews.
What can we do now? Publish the reviews we write and receive online. This is useful activism, not the solution. View the problem as a grand challenge to cognitive and brain science. How to organize the collective cognition of the scientific community? Imagine how we want it to work, then talk and write about it.
27 notes · View notes
esstoosci-blog · 13 years
Text
version control
Rich recently discussed version control (slides, notes for Git with Ubuntu and NYU). Version control, which is also called revision control or source control, is 'the management of changes to documents, programs and other information stored as computer files'. Rich started with some ideas on what you might use it for and why you should use it. One use is for synchronization. You have your laptop, which you work on from home and while traveling, and you need to snyc it with your server, which you use for simulations and backup.
Another use is for collaborating on writing papers with many authors. In this case, you need to share many different documents and version control is the right way to do it. Also good for record taking and reproducibility. In that direction, you'll be in the habit of commenting on the changes you make and you'll be able to return to previous points in the code, such as ones where things seemed to be working. Final motivator is that it's useful for releasing code.
Rich gave his desiderata as: synchronize between multiple machines; share between multiple local and nonlocal coauthors; make parts of the code public (e.g., development versus stable release); and record simulation settings, changes an author has made, etc. His point, version control does this and more. Some more background on version control it's an analogue to 'track changes' in Microsoft Word, but for all files in a directory; it records who changed what, where and when; it can roll back changes; and can share/synchronize all the files.
Rich focussed on one distributed version control system, Git. And he provided some explanation for why Git and not something else. First reason is that it's distributed. Some version control systems are centralized, so everything passes through a server. This means it's slow and you cannot work offline. In Git, everyone has a copy of the database, meaning they have the whole history. Everything is local. It's very fast and allows for offline work. It's distributed so there are many backups. Branching, where a set of files are separated so they can be independently worked on in different ways, is easy with Git. It's open source and free. Having said that, one limitation is that it only works on certain kinds of files.
Tumblr media
Last thing is where to go for more information. American Scientist article which makes the point that 'scientists would do well to pick up some tools widely used in the software industry'. More on the bottleneck idea from HPC Wire. Software Carpentry, an awesome place for learning more about these kinds of things, includes lectures on version control. InfoWorld on why Git is on the up and up. Git project home. Git Reference. Git Wiki. Git tutorial. Git for scientific computing tutorial (part of Advanced Scientific Programming in Python). Version control in MATLAB. Unison (for file synchronization). Some Mac tools: Versions (for SVN); Gitbox; Tower; and SourceTree. Last but not least, GitHub, 'a web-based hosting service for software development projects that use the Git'.
11 notes · View notes
esstoosci-blog · 13 years
Text
speaking
Greg Steinbruner of Applied Speaking and Presenting recently discussed public speaking, in his words speaking to a purpose. He started by reminding us of some things. Liking speaking, or not liking speaking, does not correlate with how good you are at speaking. Speaking is a performing art. When you speak, you are simultaneously in a vulnerable and a powerful position.
Greg described a tripart framework for public speaking, the vertices of which are: you, your point and your audience. The idea is you should prepare for each of these vertices equally. You should pay attention and consider each segment that connects the vertices.
On your audience, you serve them. Research your audience before hand. Ensure that your audience is listening during your talk. Ensure that audience can see you completely (e.g., don't hide hands in pockets, don't hide your body behind a podium, don't speak in the dark). The physical environment can make it hard to engage your audience -- something you should fight.
On you, you should speak regularly. Regular practice will make you better. It's easy to get better -- most people do not practice or even think about it. The difference between a professional and an amateur is that a professional practices out loud. Practicing out loud is the most efficient way to optimize what your presentation. You need some anxiety to give a good talk. The fight-or-flight response has a downside (flight, making you run away, making you nervous) and a upside (fight, making you show confidence). Use the upside.
The opening is important - particularly the first seven seconds. People typically make a very quick, instinctive, decision about the speaker and the talk. It's hard to change their perception if this initial read is negative. Smile/provoke, etc. Hook your audience early on. Ask the audience to come to the table. Be kinetic in a controlled way. Keep your arms at your sides (not in your pockets and not folded). This may feel vulnerable, but it looks confident.
On your point, the audience will primarily retain the start and the end of the talk. Say what you're going to say, then say it, then say what you said. Repeat your point over and over. You are evaluated by your audience in a multitude of ways (the logic of your argument, the emotions you show and your appearance). So use lots of different ways to convince them (use the three forms of rhetoric: logos, pathos and ethos). Don't try to learn your talk verbatim. If you do, it's hard to sound engaging and hard to adapt to different circumstances.
Tumblr media
Cicero Denounces Catiline, fresco by Cesare Maccari, 1882-1888.
50 notes · View notes
esstoosci-blog · 13 years
Text
preparing manuscripts
I recently discussed the preparation of manuscripts. My focus was on how you can increase the efficiency and accuracy of the process of collective writing. My thoughts are derived from, and geared towards, the kinds of writing that we do in LCV. I include some thoughts on using LaTeX and BibTeX, for example, which are widely used here, but may be relatively uncommon, or unheard-of, elsewhere. Having said that, most of my ideas, such as to use tools to allow all authors to simultaneously edit whenever possible, should apply to other kinds of writing and other collaborative processes.
I wanted to visualize what the steps in writing a manuscript are, what the tools are used for doing each step and how the overall process could be improved. I also wanted to capture who does what. I made a popplet (see below). I include guesses at who normally does different things (left side with magenta popples and second from left column with red popples). Consider that a set of tools (blue popples) for these alone would probably be something like Google Docs, Skype, Preview, JabRef, TeXstudio (what used to be TexMakerX), MATLAB, Adobe Illustrator, Apple Numbers (iWork), Gmail, Time Machine. Thoughts about what you should do as orange popples and some other notes to self as green popples. I’ve also made a table (**+here is a PDF) that lists different tools that you might find useful, some of which you may not know about.
For authors, I considered two groups, those who focus on adding content and commenting on existing content, and those who in addition to writing, complete smaller tasks, which have less to do with content, such as editing the references, or recompiling the paper from its source. I think of the latter as being one person, although it can be more than one. Sometimes two out of several authors would do those kinds of things, but normally everyone wouldn’t. I call these groups ‘everyone’ and ‘one person’.
So what are all the steps? They include: writing text; discussing everything; commenting on stuff; adding references; compiling the source (assuming there is one); exporting plots; making and editing figures; keeping track of stuff (e.g., different runs of an optimization); backing up everything; and sharing among ourselves and with others.
Use Google Docs (or some other cloud-based tool) for writing text. You could also consider Verbosus (good for TeX), Adobe Buzzword (which looks cool), and Pages (with iWork.com). This is the right idea because it allows everyone to work on it at the same time, and it eliminates all steps involving keeping stuff in sync (emailing back and forth), which in itself eliminates a lot of headache about things like naming conventions. I believe that these cloud-based tools are essential regardless of how many people are working on it. Even with only two, not having to email back and forth saves a lot of time. The point about editing simultaneously is not really refuted by saying you only really need to do that when you’re facing a deadline. Actually having a single document that everyone can edit in real time means not even having to worry about it. You can just add text.
Agree before working on the paper (or whatever) to only leave comments in one place. You could say, for example, that you’ll only comment on the text with ‘Insert>Comment’ in Google Docs. That’d be a cool choice because everyone can see it, it’ll stay there, it can be updated or resolved, you can get email reminders when stuff changes. The point is that if someone comments outside of that, such as by emailing a list of changes or thoughts, you’d remind each other that you want to keep everything on Google. The thing about not having a standard for that is stuff ends up all over, increasing the load on a subset of authors to put stuff in sync. It also means that everyone has to check many different sources to see the comments. Imagine you come back to the text a couple years later and you recall someone saying something about some chunk. But where did they say it? You may have to look in your email, you may have to look on things you’ve scribbled in your office, you  may have to look in previous downloads of the source (where it might be as a /comment). More than likely you’ll never be able to track them down. That might imply a good commenting system would be one where you can resolve stuff, but where the comments are not deleted. Something to consider.
Use a full-blown reference management tool like Mendeley or Zotero with a shared (cloud-based) folder for the paper rather than something like JabRef. The advantages to doing so are similar to those for Google Docs, including that everyone can edit in real time. Other options include colwiz, which should be similar. Another option is EndNote, which seems like it doesn't cover the same functionality (e.g., reading and annotating). It’s not clear how to make it all work with the Google Docs to LaTeX thing. Integration with Google Docs, might be helpful, but even that alone might not work with TeX documents? You could try exporting to BibTeX and having that export be in a shared (cloud-based) project folder, such as one in Dropbox.
Use TeXworks (or something like it) for TeX front end, for doing compiles). It’s probably more tool than you’ll need, but at least it’s stable and likely to work. Other options include TeXstudio and TeXShop. Remember to never edit in the .tex itself. Sometimes that’s necessary, like when it won’t compile and you need to fix something. Normal thing then would be to make the change and copy and paste back into Google. Seems like bad form. Keep the PDF in the shared folder for the paper.
Give everyone access to the MATLAB code, so everyone has same scripts and outputs.
Consider saving all figures as EPS so people can use any vector graphics editor on them. It’s not clear whether that’s perfect, though, so you’d want to check it. It’s possible that there would be differences with editing in various editors and things that don’t get saved out. If everyone is cool with using Adobe Illustrator I’d go with that. Again agree that you’re not going to edit stuff with different tools or with tools of different versions.
Don’t share things via email. Use Dropbox on the main folder and put everything that’s not on Google Docs (or some other cloud) into the Dropbox. If you want to send something to someone outside use YouSendIt. You could also give them a link to the Google doc or put them on sharing there. It might be necessary for everyone involved to have a Dropbox account (and something like 50 gig), which is unlikely. If that applies, would need to figure out some other way to do this.
Have a plan for backup. One can assume that the stuff on Google Docs is backed up their side, but you will also have copies of stuff from the downloads. Can probably make a Time Machine backup on the main project folder, although not clear how that works with Dropbox folders.
24 notes · View notes
esstoosci-blog · 13 years
Photo
Tumblr media
Steps and tools for preparing a manuscript. At left, is for two kinds of authors (in magenta), everyone (those focussed on content) and one person (those who do content and other stuff). Some of the steps that authors take in writing a manuscript (e.g., editing figures) (in red). Tools used for doing the associated steps (in blue). Thoughts about how these things could be improved upon or some stuff that you might consider (in orange). Some notes to self (in green).
17 notes · View notes
esstoosci-blog · 13 years
Text
blogging
Jeanne Garbarino recently discussed her observations about the science blogosphere (her Prezi about it, her post about it).
Her first point was that there is science communication directed to scientists, which takes the form of journal articles (e.g., PLoS ONE), conferences (e.g., Cosyne), grants (e.g., NRSA Individual Postdoctoral Fellowships) or other stuff within a research institution (e.g., HHMI Bulletin). There is also science communication directed to everyone else, which traditionally takes the form of newspaper articles (e.g., The New York Times Science), magazines (e.g., National Geographic), TV/radio programs (e.g., Science Friday) and websites (e.g., Science Daily). Somewhere in the midst of these two clouds are blogs, which serve scientists and non-scientists alike.
One kind is The Niche Blog, which in Jeanne’s words is scientists writing for an interested crowd. They often unpack recent scientific papers. They typically keep to their topic. And their audience is likely to be familiar with relevant scientific terms (i.e., jargon-robust). Examples: Obesity Panacea; Neurophilosophy; and Research Blogging.
Another kind is The Lay Science Blog, which is scientists and/or science writers writing for the general public. They avoid technical jargon (is there non-technical jargon?), try to tackle seeming complex issues, and help get people interested in science. Examples: The Loom, by Carl Zimmer; Science Sushi, by Christie Wilcox; Cocktail Party Physics, by Jennifer Ouellete; and Not Exactly Rocket Science, by Ed Yong.
Another kind is The Science-Life Blog, which focusses on life in science, either in academia or industry. These are cool for discussing issues relevant to scientists lives as scientists. Examples: The Mother Geek, by Jeanne Garbarino (yes); The Tightrope, by Dr. O; and Athene Donald’s Blog, by Athene Donald.
At another level are Blog Networks. Examples: Scientific American Blog Network; Nature Blog Network; PLoS Blogs Network; Wired Science; Discover Blogs; ScienceBlogs; Scientopia; LabSpaces; Occam’s Typewriter; and Science 3.0.
So why are science blogs important? She answered this by example. In November of 2010, NASA put out this statement, which said they were going to hold a news conference on an astrobiology discovery (e.g., life on Europa). Needless to say, ‘wild speculation ensued’. kottke.org: ‘Has NASA discovered extraterrestrial life?’. Gawker: ‘Did NASA Discover Life on One of Saturn’s Moons?’.
I guess the impression one gets to that point is that blogs are quick to pick things up, meaning a good channel to catch what’s going on. And with the speed comes the ability to amplify and speculate, also useful things to sample from.
Anyway, not to disappoint, what followed was the arsenic life debacle. Sciencexpress: ‘A Bacterium That Can Grow by Using Arsenic Instead of Phosphorus’. Wired Science: ‘NASA Unveils Arsenic Life Form’. The Huffington Post: ‘NASA Discovers New Life: Arsenic Bacteria With DNA Completely Alien To What We Know’ (notice the use of that A word). Guardian.co.uk: ‘Nasa reveals bacteria that can live on arsenic instead of phosphorus’.
The background and claims of the actual study: researchers headed to arsenic-laced Mono Lake in California; they isolated a strain of bacteria (dubbed GFAJ-1); when they cultured it in the presence of arsenic (low phosphorus), it incorporated arsenic into its DNA.
And then the purple-haired hero of this example emerged, Dr. Rosie Redfield of the University of British Columbia. By emerged, I mean started to blog about it: ‘Arsenic-associated bacteria (NASA’s claims)’. And in her post, the kind paragraph that every scientist that I know could only dream of getting in response to their work: ‘Bottom line: Lots of flim-flam, but very little reliable information. The mass spec measurements may be very well done (I lack expertise here), but their value is severely compromised by the poor quality of the inputs. If this data was presented by a PhD student at their committee meeting, I’d send them back to the bench to do more cleanup and controls’. I <3 that, and yet I do ask myself if she could have essentially summed up her reaction with another word that may be in the same family as flimflam, piffle.
Dr. Redfield wasn’t alone. Nature: ‘Will you take the ‘arsenic-life’ test?’. Science, Editor’s Note: ‘...the subject of extensive discussion and criticism following its online publication. Science received a wide range of correspondence that raised specific concerns about the Research Article’s methods and interpretations.'. The Loom: ‘Of arsenic and aliens: What the critics said’.
My take is that you’d have had to be online to catch most of that stuff. Certainly you’d need to be plugged in to the blogosphere to evaluate it for yourself. And while you can fault them for the early hype, you should also credit them for the rapid, crowdsourced, free, self-correction.
So what about Jeanne’s answers to why science blogs are important? They help keep journalists in check (is that to say the aid in self-correction?). They help keep scientists honest (is that to say they have a beneficial chilling effect, partly via the threat of self-correction?). They help to broadly disseminate scientific information.
I could also say they’re cool, fun, interesting, but I’m afraid of someone blogging about this blog post and using words like flimflam and that'll go viral among people who go viral.
If you’re still not convinced, consider the personal development: it improves your writing skills, puts you in the habit of presenting your thoughts, and increases the breadth of your knowledge. It’s also good for professional development: it allows you to present your work to the greater scientific community, gets you to engage with fellow scientists, facilitates networking and collaborations, and can be like presenting at a conference - only it’s free. In terms of community engagement: it’s a good way to discuss science with the outside world, and getting more followers forces you to be creative.
8 notes · View notes
esstoosci-blog · 13 years
Text
adding publications to your cv
Umesh recently discussed how to automate the 'Publications' section of LaTeX-based CVs (associated files, example CV). His desiderata for Publications: reverse chronological order; organization by type (e.g., for journals, conferences, abstracts and talks); and his name in bold. He doesn't want to enter this stuff manually and he'd like it to match what shows up on his website.
For those who know Umesh, I'm not sure it really needs to be explained, but why LaTeX for this and not some other WYSIWYG option like Microsoft Word. Well, for one thing it's WYSIWYG, which means it can be challenging to keep the formatting consistent. I want to say that using styles in most WYSIWYG makes consistent formatting relatively easy, although maybe not as straightforward as in LaTeX. Anyway, separation of content and style is a good idea and it's something LaTeX is good at. And, though he didn't mention it, there are lots of free tools for writing with it, absolving you of a lifetime of Office purchases.
One thing you might consider, though, is that some places require that you submit these kinds of documents as a .doc. Once upon a time, we worked really hard to convert a .tex, either directly or through the .pdf, into a .doc. Our conclusion then: if it can be done it's highly non-trivial.
So BibTeX (what we use for references in LaTeX). It's good for citing in papers, but getting it to do what we want (see list above) is tricky. Not to worry, it can be done with the datatool package. What follows is the version-by-version. Check the associated files for this post to see how this is done. You should be able to modify those files to make this stuff happen.
Version 1 looks ok, meaning it works, but the first thing is it's not in reverse chronological (2008, 2009, 2003..).
Version 2 has the sorting worked out, starts with 2010, but isn't grouped by type. It's just a big list.
Version 3 gets grouping, and is only lacking (with respect to our list) in that names aren't highlighted.
Version 4 has the name highlighting.
In summary, use BibTex and Datatool. Also, Rob's bib2php (which I wrote about here) can use the same BibTex to add your references to your webpage. Other uses of Datatool include creating a list of papers since a particular year, creating a list of 10 most recent publications and managing multiple bibliographies. Problems include that output styles are limited and it may be slow to compile for many references.
49 notes · View notes
esstoosci-blog · 13 years
Text
adding publications to your website
Rob recently discussed how to add your publications to your website using his bib2php, which, as the name implies, converts BibTeX to PHP. bib2php was developed at LCV for this purpose, to add all of the lab’s publications to the lab’s website. Here is what the output looks like. I’ll describe some of this more below, but note that the main output shows lists of publications, with the standard stuff like title, authors, etc., and links to the ‘Abstract’ and a ‘PDF’ when it’s available. It gives you different options for sorting publications (e.g., ‘Year’ ‘Type’ or ‘First Author’) and allows you to exclude certain types, such as for ‘Conference Abstracts’.
Why should you care about this? It can save you time and effort. You won’t have to update them separately, meaning you can just update your BibTeX and everything will work with the webpage. It’s ‘feature-rich’ and ‘customizable’.
As I mentioned, the basic features work with just your BibTeX. Sort stuff by BibTex types, date and author. Exclude BibTeX types. Get an abstract page (which shows the abstract) with author links and publication links, which connect something to related documents.
To get the most basic features working, you need to edit bib2php.conf. It’s a configuration file with three things that you need to change: the name of your .bib file, where it lives on your server and a setting to turn off display of the LCV header.
To get the advanced stuff working, you need to edit bibaux. It’s where you would handle supplemental stuff, downloads, related documents, awards, superseded papers, cover pictures and document types not defined by BibTeX (e.g., conference abstract/paper distinction, talks, posters).
Customization includes through CSS and the printSelf function, which allows you to change the printing style for existing document types. You can define the style for new document types.
Future versions should include topics, which will be tag-based, and allows you to sort and exclude by topics. You will also be able to create automated topic pages. Also, a GUI bibaux editor and a more generic BibTeX parser.
To get a copy, email Rob, and if you don’t have his email email me. And if you don’t have my email use ‘Ask me anything’.
Tumblr media
8 notes · View notes
esstoosci-blog · 13 years
Text
sleeping
Pascal recently discussed sleep, the first part in a series on how to build a successful and sustainable life in science (or maybe even a life in life). He first covered cultural attitudes towards sleep. The attitudes he's observed are: sleep is for losers; at best, it's a waste of time; great people don't sleep very much; no sleep equals success; the real elites get by on little sleep; and people who sleep more are just lazy.
Pascal's evidence for these views included 'The Sleepless Elite' by Melinda Beck in the Health Journal of The Wall Street Journal. She notes that natural short sleepers are 'energetic, outgoing, optimistic and ambitious, according to the few researchers who have studied them'. The pattern (for natural short sleepers) starts in childhood and often runs in families. She also says it's unclear if they are high achievers, despite having more time to do stuff. In the context of finding natural short sleepers, she points out that out of every 100 people who believe they only need five or six hours of sleep a night, only about five people really do. The rest end up chronically sleep deprived, part of the one-third of US adults that get less than the recommended seven hours of sleep per night. She alludes to work work by Dr. Ying-Hui Fu from UCSF who discovered a gene variation, hDEC2, in a pair of short sleepers in 2009. Mice with the gene variation need less sleep.
Anyway, to Pascal's point about cultural attitudes, I'd say this one only mildly supports it, despite the title. You might come away wishing you had that gene, or to be one of the naturals, but the focus is more on how short sleeping is done and among whom, than that those who don't achieve so much more, or are among some sleepless elite.
Additional evidence comes from examples of great people who don't, or didn't, sleep very much. Pascal showed a bar plot of self reported average sleep durations for different athletes, with the incredible Tiger Woods half-heighted bar reflecting 4-5 hours per night. I should say this bar plot might be described as a Tuftian nightmare, but nevermind. Well, it is tough not to question this. What about all the other bars, which seem to be quite high? Are those all average people in their sport? And what's the light blue dark blue thing (maybe that's indicating performance)? And are self reports not a deeply flawed measure? N=1? Anyway, Woods doesn't sleep very much and is in the category of greats where other greats run out of superlatives for him.
Tumblr media
  Other great people that didn't need much sleep include Leonardo Da Vinci, roughly 1.5 hours per day. According to at least one website, legend has it he slept for 15 minutes every four hours. Napoleon Bonaparte, 4 hours per night. Thomas Edison, 5 hours per night. The less is more idea extends beyond great individuals to the group level, with Citi adopting the slogan, 'Citi Never Sleeps'. Consider that Citi is headquartered in, has a baseball park named after it, in The City that Never Sleeps.
Tumblr media
Reviewing the ideas and some other facts: lack of sleep as a badge of honor; less sleep equals more out of life; people want to be part of this; people respond to incentives; average sleep duration has dropped from over 8 hours to under 6.75 hours in the past 100 years; and technology is meeting this demand for less sleep. There are over 250 Starbucks in Manhattan (people drink lots of stimulating drinks). There are also drugs like Provigil, which is widely used off-label to suppress the need for sleep.
There are also different types of sleepers, from the more conventional monophasic, which sleeps in one big chunk, to the Uberman (Pascal's term?) for sleeping in small increments at six regular intervals.
Tumblr media
Pascal bought into this and wasn't sleeping very much. Thankfully this was during a period in which he was heavily engaged in tracking himself. Below is a plot of what I believe is his sleep in blue, his work in black, and his mood in red. x-axis is time in days. To me, what stands out more than little sleep is lots of variability. It'd be interesting to know if he's computed correlations between these. Almost surely he has, but what are they?
Tumblr media
  So why sleep? Because there is evidence that it's to your benefit; all animals do it, and if they can afford to, they do it in a more phasic way; and chronic and complete sleep deprivation is fatal. Getting it helps with insight, which is supported by this paper (Wagner et al., 2004), which I believe shows that subjects who've slept well are better able to do cognitive tasks, such as connecting nine dots in a three by three grid with four continuous straight lines without lifting your pen from the page. Getting it helps with self-control, which is supported by this paper (Barnes et al. 2011), which I believe shows that well rested participants are much 'more persistent in assigned tasks and much less likely to cut corners. Sleep deprived subjects were much more likely to outright cheat on the assigned tasks'. And lastly, neurons will 'sleep' if you don't (Vyazovskiy et al. 2011).
If your goal is long term sustainability, respect sleep, it's important. Don't mess with it (e.g., Uberman, etc.). There is no substitute for it and there won't be one for a while (although people are working on it). Some things to consider with regard to making it happen include that we're often overstimulated with things like coffee, caffeine has a long half-life. We're exposed to bright light at night (e.g., LCD screens). Many of us are awoken by loud and unpleasant sounds, such as sirens and partiers. One approach that Pascal uses is light management. His setup involves an array of Philips goLITE BLU Light Therapy Devices and last, but not least, the Zeo Personal Sleep Coach. Quite cool.
7 notes · View notes
esstoosci-blog · 13 years
Text
making figures
Umesh recently discussed making figures with Inkscape (video tutorials), whose motto is 'Draw Freely', which we'll get to (associated files). Inkscape is another vector graphics editor. It's free and open source (information about open source alternatives). You can use it to edit figures generated using MATLAB and can also use it to make posters for conferences.
Umesh started his presentation by reviewing the differences between vector graphics (VGs) and raster graphics (RGs), which include that VGs are composed of fonts, line art and illustrations, whereas RGs are images. VG formats include EPS, PDF, AI and SVG, whereas those for RG are JPG, BMP, PNG and TIFF.
He also reviewed what we'd covered before: MATLAB, use code to generate your figures, try to do as much as possible in it (so before export), and what you see on the screen does not map to what prints in a straightforward way (meaning you need to do work to get it right); Adobe Illustrator, powerful vector graphics editor, but expensive (in Umesh's view); Intaglio, simple interface, but Mac only. This gets us to Inkscape. Well, how did he get to Inkscape?
He had been using Microsoft PowerPoint which he thought of as convenient, but constrained in having a max page size (bad for posters). That size constraint was fixed, meaning there was nothing you could do in the case of conferences for which you needed to go bigger. From there, to Adobe Illustrator, which he used and liked, but found to be too expensive to install on his personal machines. He started to use Inkscape. Everything looks pretty simple, with a single bar of tools, similar to those in Adobe Illustrator and Intaglio.
Tumblr media
Here is his first real Inkscape project, which was to make this greeting card for Prof. Al Bovik. The background is text from his work (papers, etc.).
Tumblr media
Here is Umesh's first poster with Inkscape. He notes that it was very easy to add images, it takes EPS and PDF files. You can also use LaTeX equations (no problems with that). Probably made this poster on a Mac. On running on Mac, we believe it is possible to run Inkscape natively and found a video that seemed to support that, but we're not really sure.
Tumblr media
Stepping back and reviewing, things Umesh likes about it are: that it's free, so you can use it anywhere (on any machine and for any purpose); it provides the features that he used in Adobe Illustrator; it has useful extensions, such as for equations (built-in LaTeX engine with editing), function plotters and replace font. Adobe Illustrator is $200 and works on Microsoft Windows and Mac OS X. Inkscape is free and works on the three major operating systems. More feature comparison is available from Inkscape. In some sense, these application wars are a variant of 'Get a Mac'.
Here are links to a couple things made in Inkscape, in case you're of the belief that it's impossible to use it to make professional looking things: a yellow Ferrari by Gilles Pinard and a lady named Megan by Luciano Lourenço.
Final thoughts from Umesh were to give Inkscape a shot. It's free! You might find it 'professional enough'. Current users in LCV are Umesh, Yan, Rich and Rob. Also, remember that there is proprietary software and there are proprietary formats. So don't send things as DOC or AI, but as something that everyone can open and edit. Umesh also wanted me to mention plot2svg.m, which converts 3D and 2D MATLAB plots to SVG.
4 notes · View notes
esstoosci-blog · 13 years
Text
making figures
Deep recently discussed making figures with Intaglio. Here is a summary of his thoughts on it in terms of pros and cons. It is fast and light weight (i.e., not a resource hog). It's relatively cheap. We looked into pricing and believe that it costs $89 compared to $199 for the student version of Adobe Illustrator or $599 for the commercial version. It could be said that your lab or institution may have a license for Illustrator, which would mean that at least for the time while you're there price is a nonissue. Having said that, the point is often raised, 'well what about when I leave?'. But even there, if one assumes that the next institution will not have Illustrator, why not switch to something cheaper at that point. In any case, an additional pro (in Deep's view) is that it's very easy to learn and has a simple and intuitive interface.
In cons, Deep and two other LCVers have experienced a strange error with MATLAB line plots. MATLAB exports lines as a set of anchor points (connected by a path) between the two endpoints. For some reason, in Intaglio, some of the extraneous anchor points will move out of the space spanned by the line, which creates jagged looking lines. You can fix this by projecting the stray anchor points back to the line, but these changes often revert back without explanation. An additional con is that there is no 'transform each' feature, allowing to you modify individual elements. If you resize dots in a scatter, the x y position of the dots will change as you scale their size. Deep noted that he has not found a work-around but believes there may be one. In Deep's experience, Intaglio does not deal with LaTeXiT equations better than Illustrator.
Other cons have less to do with Intaglio, but with using it as part of our lab. There is a limited knowledge base with it in our lab. Only Eero and Josh are using it. If you run into problems before a deadline, they are your only help. In our department, there are many more people using Illustrator, so the surrounding knowledge base for it is likewise much bigger. In a similar vein, many of Deep's collaborators use Illustrator. He's encountered serious issues opening/editing a PDF made in Illustrator with Intaglio and vice versa. It should be noted that similar issues exist when collaborating with others who have newer or older versions of Illustrator. It's best if everyone uses the same vector graphics editor.
Considering these factors, Deep concluded the cons of Intaglio far outweighed the pros. After making one poster in Intaglio, he switched to Illustrator for all subsequent data figures. For schematic figures, though, he sometimes still uses Intaglio for its ease of use. He then copies the finished schematic back into Illustrator if that's possible.
15 notes · View notes
esstoosci-blog · 13 years
Text
making figures
Jeremy recently discussed making figures with Adobe Illustrator (associated files). I'm going to summarize the points he made starting with some of his thoughts on writing, design and choice of software. Writing can be thought of in terms of tone, organization, conceptual clarity, concision and precision, all objective qualities. While style can be debated, consistent style gives you a voice. All of this applies to figure design. Design is not just subjective (you might want to reread that) and is arguably more important than writing since it shows up in talks and posters as well. Illustrator is a powerful and stable tool for making figures. It makes sense to do as much as possible in MATLAB to save time, an approach that was suggested in Rich's earlier discussion. But there are two kinds of figures: explanatory and data. Explanatory should be done almost entirely in Illustrator since there is usually no benefit to doing it in MATLAB. Illustrator brings you closer to the design. When you're good with Illustrator, it's much faster to try out different stuff (move things around, change colors, change sizes, etc.) and you often don't know what'll look good until you try it.
He demoed two things. The first was to take the first version of the scatter shown below (which was an EPS) and make it into the second version (which was then a PDF). The second was to create the second activation and task figure from components of it. Highlights of the steps for both follow with his more thoughtish stuff in brackets.
Tumblr media Tumblr media
select everything; see all the junk?; highlight with direct select tool and delete select everything; ungroup select bottom axis; remove compound path [Work on the axis. I like to give them a little 'room to breathe'. I like to make the ticks a little longer] direct select them; drag down 2 steps group the dots by selecting one; select all with same color make three circles; select all three; pick a triad; pick the color tool and edit the triad; create a folder in the swatch box color the dots and lines; thicken strokes; add white outline around all dots; select lines and send to back; grab each group of dots; ungroup them, transform each by 150%; group back make text 12 pt instead of 10; align labels using guides; drag something from LaTeXiT; align with text
Tumblr media
open all of the components; grab the color bar; grab stuff from map files; delete extra stuff use direct select tool to just get map and boundaries; get both, then scale together; make small and put into position lock the boundaries; make new ones; unlock all; deselect the maps; delete; select the maps and lines, group, then align add hemisphere labels create box for stimulus; copy grating in and put on top of box; align to center; group; create a bunch of copies, line them up and distribute; move and then redistribute grab color bar, shift and rotate; create black line beneath and make label make another, bigger, stimulus; make an arrow connecting them; add a magnifying glass logo
1 note · View note
esstoosci-blog · 13 years
Text
exporting figures
Rich recently discussed creating and exporting figures from MATLAB (associated files). He started by listing the things he wants: to produce a figure for a paper (e.g., for LaTeX); for fonts, line widths, etc. within the figure to be a specified size (e.g., 12 pts); for fonts and symbols in the figure to be identical in the text; to automate the process as much as possible since regenerating the figure is common {collect additional data; tweak simulations; reuse the figure in a new publication, talk or grant; make minor cosmetic alterations}; and to reuse the visualization code.
His thoughts on how to make these happen include: not to export figures using the 'export' menu function; not to modify figure properties using the mouse; to avoid using third party graphics manipulation programs where possible; to use functions and scripts to generate plots (allows for reusability); to specify fonts, line styles, axis positions and figure sizes as variables (allows for modifiability); and to export using the print command (allows for controllability).
He also listed the differences between vector graphics and raster graphics. In vector graphics or line art, the format is the object properties, whereas in raster graphics or bitmap it is the pixel values. The file size for line art in vector graphics is small, whereas it's large for raster graphics. Conversely, the file size for images is massive in vector graphics, whereas it's merely large for raster graphics. Tools for editing vector graphics include Adobe Illustrator, Inkscape and Intaglio. Those for raster graphics include Adobe Photoshop and GIMP. In most cases, you should export your MATLAB figures as EPS or PDF (vector graphics).
Here is some other useful stuff that Rich mentioned:
patch.m - 2D polygon plotting, useful for plotting error bars
plotyy.m - different y axes on each side of the plot
annotation.m - add arrows, textboxes etc. to your figure
legend.m, nudgeLegend.m - add legend to a plot, and tweak its appearance (my code)
set(gca,'box','off') - turns the figure bounding box off
set(gca,'layer','top') - bring the axis to the top to stop stuff being plotted over the black edges and tick marks
set(gca,'TickDir','out') - set the direction of the tick marks to point outwards
pu = get(gcf,'PaperUnits'); pp = get(gcf,'PaperPosition'); set(gcf,'Units',pu,'Position',pp) To set the screen size to be the same as the papersize - more WYSIWYG
for mixtures of vector and bitmaped graphics - export each part separately and overlay e.g. within latex itself
transparent background in eps file - comment out lines in the eps file that read: "X X X PR" or "X X X X MP", where X is some number.
LaTeX trick: use the layout package to find out how wide the page is for setting figure dimensions.
To get a better sense for how to actually do this stuff, here is some snippet's of his demoFigs.m as well as the associated images from demoFig.pdf. The idea of demoFigs is to progressively improve a figure. I'm including only the good parts. I suggest you mess around with the script yourself.
Version 1--nothing special, just exported with MATLAB's Figure menu.
Tumblr media
Version 2--automated export, using print.
filename = 'demo2.eps';
print(gcf,'-depsc',filename);
Note his use of the bang command ('!'), which steps out of MATLAB and calls some OS commands.
! epstopdf demo2.eps 
! acroread demo2.pdf&
Tumblr media
Version 3--set size of EPS/PDF, by specifying the paper size.
PS = [14,10]; % paper size (in centimeters)
PP = [0,0,PS]; % paper position on the printed page (in centimeters)
set(gcf,'paperpositionmode','manual','paperposition', ...
        PP,'papersize',PS, 'paperunits','centimeters');
filename = 'demo3.eps';
print(gcf,'-depsc',filename,'-painters','-loose'); % loose option stops matlab cropping white space automatically
Tumblr media
Version 4--make better axes with less white space, by setting their position and using axes to create them.
left = 0.12;  % space on LHS of figure, normalised units
right = 0.02; % space on RHS of figure
top = 0.05;   % space above figure
bottom = 0.1; % space below figure
hspace = 0.07;% horizontal space between axes
height = (1-top-bottom); % height of axis
width = (1-left-right-hspace)/2; % width of axis
across = [hspace+width,0,0,0];
pos1 = [left,1-top-height,width,height]; % position of axis
pos2 = pos1+across;
figure
ax1 = axes('position',pos1); % produce axis replaces subplot(1,2,1)
plot(t,s1,'-')
ylabel('y')
xlabel('\omega = 2 \pi c/\lambda')
ax2 = axes('position',pos2); % produce second axis replace subplot(1,2,2)
plot(t,s2,'-')
xlabel('\omega = 2 \pi c/\lambda')
Tumblr media
Version 5--make better font sizes and linewidths, by setting them.
% Fonts
FontName = 'Times';
FSsm = 7; % small font size
FSmed = 10; % medium font size
% Line widths
LWthin = 1; % thin lines
% Colors
col1 = [0,0,1];
---->
ax1 = axes('position',pos1); % produce axis replaces subplot(1,2,1)
plot(t,s1,'-','linewidth',LWthin)
ylabel('y','fontname',FontName,'FontSize',FSmed)
xlabel('\omega = 2 \pi c/\lambda','fontname',FontName,'FontSize',FSmed)
set(ax1,'FontName',FontName,'FontSize',FSsm)
ax2 = axes('position',pos2); % produce second axis replace subplot(1,2,2)
plot(t,s2,'-','linewidth',LWthin)
xlabel('\omega = 2 \pi c/\lambda','fontname',FontName,'FontSize',FSmed)
set(ax2,'FontName',FontName,'FontSize',FSsm)
Tumblr media
Version 6--make better xlabel on second plot using PSfrag.
Nothing really to say about the MATLAB code here, since it's done in LaTeX. Only thing would be to keep the text simple so that you can locate it in the LaTeX.
Tumblr media
Version 7--make everything with a 'shell' function.
filename = 'demo7';
Plot1By2Demo(t,s1,s2,filename)
Tumblr media
22 notes · View notes
esstoosci-blog · 13 years
Text
managing references
Jess and Alick from Mendeley recently visited us at NYU, with Jess leading a discussion on using Mendeley for reference management (slides). She covered these features: (1) adding content; (2) managing content; (3) sharing content; and (4) discovering content. I should say that, considering an analysis of reference management tools that we thought of yesterday, we're likely to call these functions, with the two other major 'objects' being uses and steps. But hold that for another post. I should also say that I'm a Mendeley Advisor and that I think it's the best tool for reference management. And, as you could guess, everything that follows is 'all me'.
I think the above features are probably described somewhere on the Mendeley page, and maybe in videos, and if not, then I think they will be soon. Nonetheless, I'd like to textualize some of what Jess discussed, and add some commentary or highlight some stuff.
Adding content
Adding stuff is easy. One route is to drag-and-drop content. You can do that with files or folders. If it can, Mendeley will extract metadata and it'll sync with online for backup. I think that's conditioned on your preferences. Another option is to add the stuff to a watched folder and it'll sync. Third option is to use 'Import to Mendeley' from within a browser, which will add the stuff directly.
To me, this seems like a relatively broad set of ways to get stuff in, and my impression is most of it works pretty well. It'd be interesting to know more about how people are differentially using these. For example, is there some clean way to decide that you're going to use web-based importing for this kind of stuff and watched folders for a different kind of stuff, and drag and drop almost never. It can actually be pretty easy to forget all of your options, and having some bridge to them, like some way of saying what you'd like to do and then it choosing the best option, might be helpful. Another option there might just be for you to be able to visualize how you typically do things. That visualization could be the point at which suggestions are made. Then again, you don't want to have a modernized-and equally annoying-version of the 'Office Assistant' that showed up in some Microsoft product that I've pretty much forgotten about.
We have also noted, as you may have, that as acquiring content has gotten easier and easier, our stack of want to read but will never read, has gotten bigger and bigger. You could say that as the efficiency of getting stuff has increased the ratio of read to have has diminished. This suggests that just after addition might be a good time, not only for sorting, but for prioritizing. Like, 'these are super important and I need to look at them'. This could also be connected to something at intervals, or something preferenced-out, that comes up later and says 'you marked these as must reads and you've done nothing with them, why not read now or archive'.
As is the case with Gmail - Priority Inbox, the obvious way to go about this is for most of it to happen without your input, but in such a way where you'd be able to modify later. An example might be where Mendeley marked a paper as a priority and put it in my must read folder, but it's actually not important and is more of an optional read, so you change it to that and it goes to a different location. Here too, though, it might be nice to be able to create some rules, or as many conditional rules as you like. These might be things like, 'if I import something by Steve Jobs, put it in must read'. But even that might be weird since you might import things after reading them.
Managing content
You can search as you type and can do it within documents, meaning as you enter something. Example might be something like 'protein', where it'll pull up stuff that has 'protein', and highlight it in yellow. I actually don't find this terribly useful. I mean it's obviously important to be able to search for stuff, but I don't like looking at what comes up, maybe because I'm not familiar with how the stuff will show up and it's visually complicated. It's also not clear why this stuff is any more limited than say searches with OS X's Finder. Like, 'Kind>Is>Image>JPEG + Last opened date>Is>this month + File label>green'. The equivalent of that could be something like wanting supplemental information that I've read in the last six months that has tags for 'retna' and 'graffiti'. How could this be done with what's there now? If it can be done, is the output easily parsed?
This reminds me that the green dot that shows whether you've read something is flawed. You double click something to look at it, and the green dot goes away meaning you read it, but you didn't, you just looked at it. It seems like opening something doesn't need an indicator, but reading does. If an estimate can be made automatically of whether a user has finished reading something, why not prompt them and say 'we think you've finished reading this, have you'. Then show the outcome of that. This is weird though because we're not tied to readable things, so one would hope that wouldn't show up for things that are just a reference, or an image, for example. But with a video would you like something such as 'we think you've finished watching this, have you'? As it is now, you open something to read the abstract and it's then marked as read, then you end up putting it back to unread.
You can highlight (but only in yellow) and you can add Post-it Notes, which are called annotations. Not to be inconsistent, these can only be yellow, unlike the 'real' ones from 3M. These have an indication of who wrote it, 'You', and the date/time, as well as the page. And, something you might struggle to remember, the 'Notes' (the text box on the right side) is for your own comments. I find both of these rather limiting. You can't, for example, outline very well in 'Notes' since there are no styles, such as 'heading 1'. You can't resize this or reposition it. You can't have more than one. You can't make different saves of stuff. I also can't tell from the main 'My Library' which documents have this stuff, or who has added it. I might like to know, for example, that Umesh and I have a bunch of annotations on this document and to be able to see that from 'My Library'.
We have also thought that in some cases the right model would be to say that this content, such as this PDF, is a master, and all of this stuff is a layer. Then for the layers to be editable and easily visualized. Idea is we modify it so that we have 'these annotations AND notes are a single layer', layer 1. This summary and criticisms from our lab is in layer 2. These layers, as in Adobe Illustrator, can be rearranged, meaning we can change what's in front of what. Also, we can change their permissions and visibility. For example, we might want to say lock layer 1 (James and Umesh notes) and make them visible only to these people: James, Umesh, Rob and Rich. Then keep layer 2 (group summary and criticisms) unlocked, so everyone can edit, and make it visible to all, but invisible on printing. You can imagine this would break down in cases where there is no obvious master and in that way could be connected to the 'find duplicates problem'.
You can interface with Mendeley in some word processing tools. The known ones are Microsoft Word, OpenOffice and NeoOffice. For our purposes, a much better set would be Pages, Google Docs and LaTeX. Everyone uses these for preparing papers. One preference, which Mike raised, is to separate the compile bibliography step, as in LaTeX with BibTeX. We are all also under the impression that these things are, to the extent that they are there, only halfway there. Having said that, we think this is going to get much better soon. I mean, dump papers in a private group with colleagues working on a paper, read stuff as you go along, have a common set of tags for adding to the paper and to talks (meaning Keynote), add them from within Google Docs, click something there and it compiles a version of the bibliography..
Sharing and discovering content
You can obviously create groups to share references and documents. This is really important, is cool, seems to be defined in a reasonable way and seems like it'll continue to get better. You can also use Mendeley to connect to others and to share your publications, which happens through your profile. I don't know if it's already there, but, assuming it isn't, it might be nice to have more stuff to mark what is yours, including stuff that isn't official publications, such as 'these are my notes on Linear Algebra 314'. This could also be linked to LinkedIn or similar tools. Another thing is to find stuff, such as with 'Most read articles' in an area or to do so via groups owned by others, such as the one for 'Bayesian MCMC'.
1 note · View note
esstoosci-blog · 13 years
Video
youtube
Michael Nielsen is one of the pioneers of quantum computation. Together with Ike Chuang of MIT, he wrote the standard text in the field, a text which is now one of the twenty most highly cited physics books of all time. He is the author of more than fifty scientific papers, including invited contributions to Nature and Scientific American. His research contributions include involvement in one of the first quantum teleportation experiments, named as one of Science Magazine's Top Ten Breakthroughs of the Year for 1998. Michael was a Fulbright Scholar at the University of New Mexico, and has worked at Los Alamos National Laboratory, as the Richard Chace Tolman Prize Fellow at Caltech, as Foundation Professor of Quantum Information Science at the University of Queensland, and as a Senior Faculty Member at the Perimeter Institute for Theoretical Physics. Michael left academia to write a book about open science, and the radical change that online tools are causing in the way scientific discoveries are made.
4 notes · View notes
esstoosci-blog · 13 years
Text
sharing code and data
I recently discussed an aspect of open research that is of particular relevance to our lab, which is releasing the code and data associated with a project (slides). The general topics of open research and open science are already quite broad, and are evolving. Sharing everything, the end of publishing as we know it (as reason for not worrying about being scooped), alternative means of review, were all 'showing up' at the last SoNYC, which was supposed to be about whether scientists are antisocial.
But my focus for ETS was more applied and narrow. I wanted only to consider these issues: why should we share our code and data; why shouldn't we; if we're going to, what are some things we should think about; if we write a plan for sharing, what should it include; where do we share the stuff; and who else out there is thinking or discussing these things.
Before going to the first point, I want to say that I think sharing is the right thing to do. In saying that, I don't mean there aren't reasons not to do it, nor do I mean we are all in a position to share everything. I just mean that it seems right, and that we're better off in doing it, that the field is better off with people doing it.
Reasons to share
Reasons to share include that generating verifiable knowledge is a central goal of science, something that is generally impossible based only on what's presented at conferences and in papers. Sharing allows future generations to build on the work of previous generations. These are kind of the same point, and they are both part of a larger point that sharing allows others to replicate, validate and correct your results. It's easy to imagine scenarios where you wouldn't want that. You just published something, and someone comes along and shows how you messed up. It's so bad that you need to retract something you just put out. But if you think about it, do you really think it's the right move to try to protect yourself from your own errors? Is that a 'strategy for the future'?
Some funding agencies require sharing (e.g., NSF, maybe NIH), as do some journals (e.g., Nature, PLoS). To the previous point, even if you had an interest in preventing others from discovering your mistakes, those who pay for it and who work towards communicating it probably don't. One issue, though, is what the arsenal of rewards and punishments of these organizations is or whether they even have one. And what happens when they run into conflicts? What happens, for example, if your university doesn't want you to share something, but your funding does, and of the two publications based on it, one journal requires it to happen one way, and the other another?
Sharing is also cool because it makes people more aware of your stuff. That seems like it's especially true if you're known for sharing. I don't know if evidence that supports it, but it seems obvious that sharing increases the impact of your publications. It might also lead to an ever-growing set of microcitations, which couldn't hurt.
There are also many other more selfish reasons, such as that it preserves your stuff for your own future use. You'll be able to identify, retrieve and understand the data long after you've lost familiarity with it. As you prepare to share, your habits will likely facilitate you being able to go back to a particular point in the history of the project. You might do so with a particular figure. You'll see what you ran, what parameters you used, what version of the code you used, what data were there, whose data it was, whether you'd whitened it or not, whether you'd discussed it with your lab prior to that point, etc.
Sharing might also be a great way for your students to learn about these things. Misha made a point here that it could be bad for your students to have code and data from previous projects. He was essentially saying that it's part of their learning process to get to the point of being able to replicate things themselves. I see the point, but just don't agree. I'm not convinced anyway that it's even possible for students, or anyone else, to replicate things based only on what's out there. I also think it allows them to be better, by seeing what you did and being able to quickly build on it. Many of us have been handed other people's code. It's often quite confusing or doesn't do exactly what we want, but it's enough to get us going. That's made easier by sharing and it would be possible for many people, even people we don't know or who aren't willing to ask us via email.
Reasons not to share
You no longer have the code or the data or both. One reason for that might be that the code evolved to solve other problems. We all agree that evolving code is a good thing. It's connected to the unit testing stuff that Rich discussed. But not being able to go back is a bad thing and there are tools that can help you to solve this. These include version control software (e.g., Subversion, Mercurial and Git).
Another thing is that cleaning stuff up is a lot of work and might be only achievable by living in a more rigid world, such as one that includes lab-level standardization. True. But again, being motivated to keep cleaner code behooves you, and while it might seem slower in the near term, could actually save you time in the long run. You can build on clean code. Other people can help you with clean code, because they can read it. You can adopt clean code more easily. Having said that, even if you have zero time for cleaning what you have, sharing anything is probably quite helpful in and of itself. It would allow others to check the values of everything, including various things you forgot to mention in a paper or couldn't mention on a poster.
A related point is that your stuff may only run on commercial, proprietary or copy-righted software that cannot be distributed or that takes special hardware to run. True again. Here again, anything is better than nothing. You can release it as text and someone can inspect and rewrite. And, regardless, this seems unlikely to apply in most cases. The people who would be interested to run your MATLAB code probably have access to MATLAB themselves.
How to do it [first paragraph is from Yale Law School Roundtable on Data and Code Sharing]
When you publish work, provide links to the source code. Assign a unique ID to each version of released code, and update the ID whenever the code or data change. A version-control system can be used in conjunction with a unique identifier (e.g., Universal Numerical Fingerprint) for data. Include a statement describing the computing environment software version used in the publication, with stable links to the accompanying code and data. Use open licensing for code to facilitate reuse. Use an open access contract for published papers and make preprints available on a site such as arXiv.org, PubMed Central, or Harvard's Dataverse Network. Publish data and code in nonproprietary formats wherever reasonably concordant with established practices, opting for formats that you believe will be readable well into the future.
Besides the above stuff, the code/data sharing plan that you should write should include answers to these questions: what code and data will be shared; who will have access to it (it should be as broad as possible); where will you put it (it should go to places dedicated to hosting it); when will you share it (it should be shared as soon as possible and for as long as possible); how will people locate and access it.
You should put it on an institutional or university web page. You should put it in an openly accessible third-party archived website designed for this stuff. I only know of some of these places and am still discovering them. For code stuff, I've found GitHub (supposedly the one preferred by cool people); SourceForge (like others in this set, more for software, but could work); and Bitbucket. For data stuff, Dryad (seems pretty good, although not much there yet). Some other stuff that I'm not sure about or which may not be relevant: the NeuroCommons Project (Science Commons); Linked Open Data; OAIster; DSpace or Harvard-MIT Data Center (HMDC) (but NYU doesn't have an equivalent); CODATA (but not clear where to go with stuff); and Google Code.
A survey of willingness to share original data [from Savage and Vickers 2009]
People who are into this stuff
Victoria Stodden (Stanford)
David Donoho (Stanford)
Chris Wiggins (Columbia)
Randall LeVeque (U of Washington)
Roger Peng and Sandrah Eckel (Johns Hopkins+USC)
Sergey Fomel and Jon Claerbout (U of Texas+Stanford)
Kaitlin Thaney (Science Commons)
Michael Friedlander (Virginia Tech)
Mark Gerstein (Yale)
Ian Mitchell (U of British Columbia)
Lisa Larrimore Ouellette (Yale Law School)
Jelena Kovačević and Martin Vetterli (Berkeley)
Nikolaus Kriegeskorte (Cambridge)
5 notes · View notes