Friday 22 April 2011

Open Community Research: cross-institutional integrative Bioinformatics - something for Debian Med to aim for in 2012+ ?

A few days ago this blog opened with a series of observations on the multi-directional education and collaboration that comes with an active or passive participation in Debian Med. My personal ambition is to find ways to further institutionalise this constructive exchange beyond packaging. What came to my mind is that this may mean to talk more about actually doing things with our packages.

This will lead us to discussing/optimising/specifying workflows, i.e. the graph connecting data sources with tools and their outputs with other tools plus the optimisation of command line arguments and the evaluation of the findings. This sounds all very natural to me since the desire to complete a particular workflow locally is the motivation to get most packages to the distribution today. Until recently, we just did not have a way to formally talk about those workflows, except for exchanging shell scripts. This has changed with Alan's and Hajo's continued collaboration to get command line tools integrated with the workflow suite Taverna. It allows describing our executables for inputs and outputs and presents them as regular workflow elements, right next to the (today :o) ) dominant remote web services. The myExperiment.org site is a repository of (frequently nested) workflows, with all the typical user comments and ranking. To have that extended for all those bits one can achieve with various tools in Debian will be highly interesting. Admittedly, knowing about the rather limited success in uploading bits as trivial as screenshots, we need much of a positive feedback loop and should not just expect this to be accepted by the community because it could.

So, this leads us to my initial impetus: the community needs something to work on to develop itself and the technologies (like this blog) it has adopted. And this is where public data sets in. We had previously discussed the integration of data with the distribution in the context of BioMaj/getData for curated protein, structure or interaction data. But when we extend that also for some "weird stuff", maybe something novel from the more clinical branch of Debian Med or for the joint (re-?)analysis of a genome (a virus, maybe?) then I have some good confidence that the enormous heterogeneity of us as a community allows us to yield something that a regular institution's Bioinformatics service unit would find difficult to match.

So, we would apply Open Source principles to biomedical (re-)research. Beyond the further development of ourselves, this certainly has many direct benefits through our findings and indirectly because of the education it brings to of all those who are following the development online. Such shared research efforts could start any time, in principle. The anticipated deeper integration of Taverna with our distribution will allow specifying many smallish workflows as legitimate subgoals. Let's hope for some soonish additional posting with a tutorial for Taverna's external tools. With the advent of Ensembl or gbrowse in our distribution we have the sensation of some sort of "completeness" for the end users: once my genome has arrived in either, the work is perceived as done. This may be wrong or right, just filling those web interfaces with data is a challenging workflow. There is quite something to do for it all, still, and we should talk about it.

Tuesday 19 April 2011

Debian Med: individuals' expertise and their sharing of package build instructions

This is the very first post to a blog about Debian Med, a community of enthusiasts and professionals in computational biology and medical informatics. They all use the Linux distribution Debian or one of those befriended distros like Ubuntu with which Debian exchanges its packages.

The title for this post is that of an abstract just submitted to the Bioinformatics Open Source Conference (BOSC 2011) . The readers of this blog will be among the first to learn about its acceptance :) The abstract stresses that Debian Med is more than the packages contributed to Debian. It is also those packages and the individuals behind them that were only created for local use. Debian Med offers subversion and git repositories to then share that local effort, granting the technically advanced users to finish the effort or to just benefit from patches, compiler flags and the specification of build/run-time dependencies directly. This sharing is especially beneficial for a series of software packages that are available as source code but are not allowed to be redistributed as binaries - VMD comes to mind, its build instructions are here.

This blog may help with some biocomputational infotainment and insights beyond what Debian Med exchanges on its mailing list already: shall the more conventional (for us) sharing of code here be augmented with a sharing of thoughts.