Jamie's PhyloSoC 2007 Project: May 2007

Thursday, May 31, 2007

PhyInit: Creating Database if It Does Not Exist

I worked more on defining the command line options that I want to use. These were updated on the project web page.

I incorporated these command lines option into PhyInit. I added the ability to check for the existence of a MySQL database with the name passed from the command line. If the database name does not exist, a new database will be created. This assumes that the user has permissions to create databases on the db host. I first ask if the user really wants to create the database before doing so. The creation of a new database will only work in MySQL.

For the duration of gsoc, I will use MySQL database back end.

Today I found out how to use perldoc to print help statements from POD documentation. This is pretty useful since I used to always write PrintHelp subfunctions that duplicated what is written in the POD documentation.

system("perldoc $0");

Simple but not something that I have used before. I have not tested this in Windoze to see if it works across platforms.

Symbolic Link Note

I always forget how to do symbolic links in Linux so I thought I would leave a note here. I needed to do this for the scripts that are in biosql-schema.

cd /usr/local/bin
ln -s /usr/bin/perl ./perl

Wednesday, May 30, 2007

Command Line Tweaks and Starting PhyInit.pl

I have been slightly modifying the command line arguments I initially suggested to make the arguments consistant with the existing BioSQL related scripts. (ie --dbuser in addition to -u will be valid arguments). This will require that I use Getopt::Long. I am also trying to decide if I want to mix caps in with the command line (ie. should it be PhyInit.pl, phyinit.pl, OR phy_init.pl). These are pretty minor issues but something I want to make consistant with the prexisting bioperl code.

After looking through the exising BioSQL code, I really like the idea of defaulting many of the command line variables to environmental variables, for example

 my $usrname = $ENV{DBI_USER};
 my $pass = $ENV{DBI_PASSWORD};

This is a somewhat obvious thing to do in retrospect, but not something that I have done before.

On the project page I have been going through the existing code in the codebase that is relevant to my proposed programs and linking to the locations of the code in CVS. I was wanting to finish this up tonight, but it looks like the wg_phyloinformatics wiki is down for the evening.

I have started the PhyInit.pl program and will make my first commit to subversion tonight. This will not be a finished program, but I want to be in the practice of making commits on a daily basis as much as possible.

Thursday, May 24, 2007

BioSQL Installed .. Connected Components

I have installed BioSQL on the machine that I am using for development. I had to modify the existing SQL code slightly to work on MySQL.

I have been working the past few days with the perl module Graph as part of the RepMiner project. I have been really happy with the connected components function. This provides a very quick way to generate some very basic clusters from All-By-All BLAST results. With 2GB of memory, I will be able to cluster most of my LTR retrotransposon data sets using this function.

Thursday, May 17, 2007

Transitive Closure makes my brain hurt

The more I read about transitive closure the more my brain hurts. I am currently trying to implement a simple transitive closure algorithm in PERL (Warshall's) for basic classification of repeat families.

Monday, May 14, 2007

RepMiner & Installing BioSQL Plans

I've been reading the "Trees and Hierarchies in SQL" book and I have been getting the RepMiner project ready for publication. I have been adding POD documentation to the RepMiner programs. The more I use POD, the more I like it and wonder why I have never used it before. I have also been real happy with subversion. It seems like it is a lot easier to use then CVS.

I have looked over my CVS checkout of bioperl-db. I cleaned up some space on my machine to have the room to work this summer. I will be installing a BioSQL database this week to play around with. I have still been searching for good references on transitive closure in an SQL framework.

Thursday, May 3, 2007

Paperwork, Books and POD

I have submitted my paperwork to google, and my copy of "Joe Celko's Trees and Hierarchies in SQL for Smarties" arrived a couple of days ago. I have been pretty happy with the results I get including POD documentation in PERL code.

Jamie's PhyloSoC 2007 Project