Jamie's PhyloSoC 2007 Project: 2007

Monday, July 16, 2007

Weekly Update

Last Week:

Finished Delete branch query for PhyMod
Added subtree export to PhyExport
Began PhyQry
Been reading Perl Best Practices .. good book

This Week:

Finish PhyQry with basic database and tree overview information
Add (--usage,--help, and --man) options to the command line

Thursday, July 12, 2007

I installed RapidSVN today. It looks like this will really make working with svn a bit easier. I have been doing most of my SVN work from the command line and I like being able to have a nice visual overview of how everything stands. This was really easy to install from the RPMs that I found online.

I have uploaded the code that I wanted on google at this time so I feel like I am pretty much on track to get this project done by the end of the summer. I went ahead and did the GSoC survey too.

After reading "Perl Best Practices (PBP)" I plan on going back and doing some editing of the command line options from the programs. I really think I should change the way the help works, this currently just kicks into perldoc. The PBP book suggests a much better way to do this by having separate options:

--usage
Short usage statement.
--help
Short usage statement with one liners describing the required arguments and options . This will be the default to go to when the user enters incorrect information at the command line.
--man
This will print the perldoc POD documentation

I will also be adding additional information to the POD doc as suggested by PBP.

Wednesday, July 11, 2007

PhyExport: Subtree export

Subtree export is now included in PhyExport using the --parent-node variable. This was actually pretty easy to implement. I just need to add some code to make sure that users can not try to use the parent-node variable when they are exporting all of the trees in the database. The node ids from the database can also be exported now using the --db-node-id boolean.

A test tree image

The current test tree is above. Node labels indicated by letters, node_id from the database indicated by numbers. This image is mainly for me to reference later when working with phyexort and phymod. I added the ability to include the database node_id in PhyExport so that it is easier to reference a given node by the identifier it has in the database.

Tuesday, July 10, 2007

PhyMod: Delete Query Available

A version of PhyMod that can do a delete query is now available from the project SVN. I ended up using the nested set values after tinkering with the transitive closure data. I will probably recode the SQL in these to avoid using nested SQL. The delete query is simply a cut at the command line where a paste is not indicated. The code will count the number of records that will be deleted from the tables and warns the user before wreaking havoc.

Monday, July 9, 2007

MySQL Query Browser

While doing some reading this weekend, I discovered the MySQL query browser. A GUI app for working with MySQL databases. I just downloaded it, and it seems really awesome. The 5.0 release appears to work with the 4.1 version of MySQL that I have installed.

Upgrading MySQL Notes

Upgrading MySQL from 4.0 to 4.1 on RHEL WS 3 was not too easy. I had to first back up all of my databases ... and then delete the existing version. I wanted to put my notes here in case I need to do this again since I had to put this together from a few places.

Backed up all of my existing databses

 mysqldump --user jestill --all-databases --password >/home/jestill/mysqldump/JamieDbs.dump

Check what version of MySQL I am actually running

mysql>SELECT version()

Stop the server

cd /etc/rc.d/init.d
sudo ./mysql stop

Copied the existing version to a new name.

Find what I have currently installed for MySQL

rpm -q -a | grep -i mysql

Delete what I had installed:

rpm -e MySQL-bench
rpm -e MySQL-server
rpm -e MySQL-devel
rpm -e MySQL-client
rpm -e mysql-connector-odbc-3.511.12-1

Dowloaded the rpms for my version of Linux from MySQL. Then installed the new rpms after logging on as root

rpm -i MySQL-client-standard-4.1.11
rpm -i MySQL -devel-standard-4.1.22
rpm -i MySQL-server-standard-4.1.22
rpm -i MySQL-shared-standard-4.1.22
rpm -i MySQL-shared-compat-4.1.22

I then started MySQL back up again to check the version.

mysql>SELECT version();
+-----------------+
| version()       |
+-----------------+
| 4.1.22-standard |
+-----------------+
1 row in set (0.01 sec)

Since I updated MySQL I also updated DBD::MySQL

cpan
cpan>force DBD::MySQL

Weekly Update

Hi all...been busy last few weeks.

PROJECT: Command Line Topological Query Apps for BioSQL

Last TWO weeks:

Completed a Working version of PhyOpt. The core of this comes from H. Lapp's tree-precompute script. This put me ahead of my original timeline on the front of having working transitive closure precomputes available. I am using the precomputed values for PhyExport and PhyQry.
Upgraded my version of MySQL from 4.0 to 4.1. This allows me to use nested queries. It looks like nested queries are going to be the easiest way for me to do the optimization as well as use the optimization info. I may upgrade again to 5.0, but I would like the program to run on the oldest version of the database possible. (I also don't want to kill the 40 other databases I have on my machine.)
Started PhyMod to modify trees in the database .. this is making use of the precomputations from above in PhyOpt.
Updated info on the project web page to reflect some changes in the command line options.
I have been reading Perl Best Practices by D. Conway... I wish I had this book years ago!
I have also been dealing with genome annotation problems that have nothing to do with this project (I really would like bioperl to support computational results in game xml or Apollo to support Chado better..)

For this week:

Finish subtree Delete query in phymod based on a node ID passed to the program.
Add subtree export to PhyExport ... It made more sense for me to get PhyOpt fully working before doing this.

Monday, June 25, 2007

Week 5 Project Update

Below is the weekly update email I sent out this morning.

PROJECT: Command Line Topological Query Application for BioSQL

Last Week I:

Tried to work with Bio::Phylo .. decided to stick with Bio::Tree
Finished PhyExport - a command line program to export trees from the PhyloDB

This week:

Start PhyOpt - a phylogeny optimizer program to calculate nested sets
Extend PhyExport to export subtrees based on nested set values calculated by Ph

Friday, June 22, 2007

PhyExport: Working copy posted

I now have a working version of PhyExport (as phyexport.pl) posted on the project source code repository. This version uses the Bio::Tree object. The biggest problem I had was to figure out how to add data to the tree object in a recursive subfunction. The recursive subfunction was used to fetch all of the children nodes from the root.

I ended up giving the program the package name PhyloDB. I then used 'our $tree' to set the scope of tree object to a package level variable. This allowed me to add nodes to the tree as $PhyloDB::tree.

PhyExport can now export node names, edge lengths, and boot strap values in any export format that Bio::Tree can use.

I have a lot of clean up work to do with this, but at least I have something that works now.

Wednesday, June 20, 2007

PhyImport Bug: Child and parent switched

I just realized that the child and parent ids were getting switched in the edge table using the phyimport program. I did not realize this until I tried to extract trees back out of the database..silly mistake.

Tuesday, June 19, 2007

Google Changing Midterm Evaluation Criteria

Although I plan on meeting my midterm deadlines for submitting code to my project's svn repository on google, I just saw that google is changing the midterm evaluation requirements. There is a lot of signal to noise problems on the GSOC student and announce mailing lists so I am posting this here FYI.

So, we're actually changing the code submission requirement for the mid-term to a requirement that you fill out a survey instead - we haven't had the resources to implement the infrastructure we wanted to have in place before you submitted your code to us.

I'll also make a post about this to the announcement list, but:

1) No student mid-term code submission.
2) Students need to take a survey instead.
3) Instructions for completing the survey will be sent to the program
announcement list.
4) You will be able to complete the survey between July 9 - July 16th.

I'll also update the FAQ in the next few days.

Cheers,
LH

Bio::Phylo -- giving up for now

The ioctl problem in Bio::Phylo is not easy to fix .. I ran the script with strace:

strace perl -le phylotest.pl

and I get the following output that I really can't figure out:


execve("/usr/local/bin/perl", ["perl", "-le", "phylotest.pl"], [/* 41 vars */]) = 0
uname({sys="Linux", node="JamieHidThis", ...}) = 0
brk(0)                                  = 0x9643000
open("/etc/ld.so.preload", O_RDONLY)    = -1 ENOENT (No such file or directory)
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/tls/i686/mmx/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/tls/i686/mmx", 0xbfffa1a0) = -1 ENOENT (No such file or directory)
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/tls/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/tls/i686", 0xbfffa1a0) = -1 ENOENT (No such file or directory)
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/tls/mmx/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/tls/mmx", 0xbfffa1a0) = -1 ENOENT (No such file or directory)
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/tls/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/tls", 0xbfffa1a0) = -1 ENOENT (No such file or directory)
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/i686/mmx/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/i686/mmx", 0xbfffa1a0) = -1 ENOENT (No such file or directory)
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/i686", 0xbfffa1a0) = -1 ENOENT (No such file or directory)
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/mmx/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/mmx", 0xbfffa1a0) = -1 ENOENT (No such file or directory)
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\10\2\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0555, st_size=1194580, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75f8000
old_mmap(NULL, 1205760, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x944000
old_mmap(0xa5e000, 45056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x119000) = 0xa5e000
old_mmap(0xa69000, 5632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xa69000
close(3)                                = 0
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libnsl.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=80436, ...}) = 0
old_mmap(NULL, 80436, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb75e4000
close(3)                                = 0
open("/lib/libnsl.so.1", O_RDONLY)      = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 <\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=87563, ...}) = 0
old_mmap(NULL, 80480, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xc5b000
old_mmap(0xc6c000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x11000) = 0xc6c000
old_mmap(0xc6d000, 6752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc6d000
close(3)                                = 0
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libdl.so.2", O_RDONLY)       = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260\32"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=13601, ...}) = 0
old_mmap(NULL, 12244, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x907000
old_mmap(0x909000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = 0x909000
close(3)                                = 0
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libm.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/tls/libm.so.6", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\3604\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=185942, ...}) = 0
old_mmap(NULL, 135616, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xee2000
old_mmap(0xf03000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x21000) = 0xf03000
close(3)                                = 0
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/tls/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0G\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=86486, ...}) = 0
old_mmap(NULL, 65140, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x33d000
old_mmap(0x34a000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xc000) = 0x34a000
old_mmap(0x34b000, 7796, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34b000
close(3)                                = 0
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/tls/libc.so.6", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200X\1"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1516255, ...}) = 0
old_mmap(NULL, 1279980, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x15c000
old_mmap(0x28f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x132000) = 0x28f000
old_mmap(0x292000, 10220, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x292000
close(3)                                = 0
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libcrypt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libcrypt.so.1", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\220\t\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=22242, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75e3000
old_mmap(NULL, 181308, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x111000
old_mmap(0x116000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x4000) = 0x116000
old_mmap(0x117000, 156732, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x117000
close(3)                                = 0
open("/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libutil.so.1", O_RDONLY)     = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\16\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=11375, ...}) = 0
old_mmap(NULL, 11012, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xabf000
old_mmap(0xac1000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = 0xac1000
close(3)                                = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75e2000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb75e2080, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb75e4000, 80436)               = 0
set_tid_address(0xb75e20c8)             = 20752
rt_sigaction(SIGRTMIN, {0x341660, [], SA_RESTORER|SA_SIGINFO, 0x347f80}, NULL, 8) = 0rt_sigprocmask(SIG_UNBLOCK, [RTMIN], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
rt_sigaction(SIGFPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
brk(0)                                  = 0x9643000
brk(0x9664000)                          = 0x9664000
brk(0)                                  = 0x9664000
getuid32()                              = 507
geteuid32()                             = 507
getgid32()                              = 507
getegid32()                             = 507
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=32148976, ...}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb73e2000
close(3)                                = 0
mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb73c1000
time([1182268075])                      = 1182268075
stat64("/home/jestill/src/bioperl/bioperl-live/5.8.0/i386-linux-thread-multi", 0xbfffa800) = -1 ENOENT (No such file or directory)
stat64("/home/jestill/src/bioperl/bioperl-live/5.8.0", 0xbfffa800) = -1 ENOENT (No such file or directory)
stat64("/home/jestill/src/bioperl/bioperl-live/i386-linux-thread-multi", 0xbfffa800) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/5.8.0/i386-linux-thread-multi", 0xbfffa800) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/5.8.0", 0xbfffa800) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.0/i386-linux-thread-multi", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
_llseek(0, 0, 0xbfffa5f0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
_llseek(1, 0, 0xbfffa5f0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
_llseek(2, 0, 0xbfffa5f0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
open("/dev/null", O_RDONLY|O_LARGEFILE) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfffa688) = -1 ENOTTY (Inappropriate ioctl for device)
_llseek(3, 0, [0], SEEK_CUR)            = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
fstat64(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
readlink("/proc/self/exe", "/usr/bin/perl", 4095) = 13
getpid()                                = 20752
getppid()                               = 20751
close(3)                                = 0
exit_group(0)                           = ?
Process 20752 detached

It looks like the problem is toward the bottom involving SNDCT_TMR_TIMEBASE but I really have no clue what that is. I am giving up on using Bio::Phylo at least for now.

Monday, June 18, 2007

Bio::Phylo

I have installed Bio::Phylo from CPAN. I will first try to get phyimport.pl up and running using the Bio::Phylo object model for nodes. If this works without too much trouble, I will use Bio::Phylo for PhyExport as well. This should allow more of the information related to the tree to be added to the database and exported to output files.

Bio::Phylo documentation.

Update:
I am getting the following error when trying to parse a NEXUS file
"Inappropirate ioctl for device"

I don't know what is going on with that. If I don't get this working quickly I will abandon Bio::Phylo.

Week 4 Project Update

I spent most of last week out of town for a meeting so I have some catching up to do this week.

Last week:

Tried to get a more stable dsn parser to work by using DBI subfunction parse_dsn
Finished up PhyImport
Updated code to better fit with existing bioperl coding standards
Fixed my installation of bioperl (This was to get NEXUS file import working)

I am now working from bioperl-live

Committed biosql-phylodb-mysql.sql to biosql-schema CVS

This is my first commit to a group project :)
I hope bioperl converts to SVN soon

This week:

Finish PhyExport to export trees from the database to text files
Start PhyOpt by getting a precomputed nested sets working
Try to figure out if I really want to continue to use Bio::Tree or switch to Bio::Phylo

The Bio::Phylo object seems more rich but it is not a bioperl module

PhyImport: Added root node info to tree table

I have added the root node to the tree table. This was not to hard to do using the Tree object in bioperl.

I tried to get the parse_dsn subfucntion from DBI to work, but it is not parsing the dsn correctly, and I need to move forward on other aspects of the project. For now, only a specific dsn string will be properly parsed by PhyImport.

I have changed the name of programs to lowercase.

I am considering switching to using R. Vos's Bio::Phylo. It seems like a richer object model for phylogenies.

I am setting PhyImport aside for now to work on PhyExport but I will come back to it later.

Friday, June 15, 2007

PhyImport: Trying to fix import of NEXUS file

It seems like the trouble I am having with nexus file parsing has something to do with the installation of bioperl I was using. Running PhyImport.pl with the bioperl-live has fixed the problem.

Note to self:
To see what version of bioperl is being used, I need to do the following from the command line:

$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'

Thursday, June 14, 2007

Attempted to Post to CVS

I just tried to attempt to commit something to the CVS server for the first time (biosql-phylodb-mysql.sql). This was my first attempt to commit with CVS and I am not sure if it worked. It seems like SVN is a bit easier to use.

PERL Coding Practices

I got a really good email from my mentor regarding coding suggestions and coding practices in PERL/bioperl. There have also been some recent discussions of coding practices on the bioperl mailing list. This has all had me looking up info on coding practices that I am linking here just I know where to go for the links:

GNU standards for command-line interfaces
bioperl Wiki Pages:

PERL Critic Documentation

PERL Critic Module
I just installed this module

I am very self taught when it comes to coding, and PERL is a very vulgar programming language, so it is good for me too see how other folks are implementing standards. This seems very important with group open source projects.

Tuesday, June 12, 2007

Week 3 Update Email

I am currently out of town for some wheat related work, below is the email update I sent out.

Week 3 project update: Command Line Topological Query Application for BioSQL

Last week I:

Made changes to MySQL phylo tables in the biosql-phylodb-mysql.sql (http://phylosoc2007jestill.googlecode.com/svn/trunk/sql/biosql-phylodb-mysql.sql) to get foreign keys working and to include recent changes in the biosql schema
Updated PhyInit.pl to include these schema changes
Completed a version of PhyImport that uses the TreeIO module of bioperl to import tree nodes and edges
Generated random trees to serve as test import trees for PhyImport

This week I will:

Figure out problems I am having importing NEXUS trees with PhyImport (something to do with my bioperl installation)
Add node and edge attribute information to PhyImport
Add tree root information to PhyImport
Begin PhyExport for whole tree export

Friday, June 8, 2007

PhyImport: Bio::TreeIO and Nexus problems

I was able to get nodes and edges loaded into the database for the example newick file using Bio::TreeIO. This is working for newick and New Hampshire extended files. However, I can't find a nexus file that Bio::TreeIO can seem to handle. :(. Perhaps this is due to the general chaos surrounding the NEXUS "standard", but I would like to get the import working for nearly all NEXUS files.

Maybe I should switch to the Bio::Phylo object, but I wanted to use the object model that was most tightly integrated with bioperl. I am trying to see if I can generate at least one "nexus" file that I can parse.

Since I can not even convert from a newick file to nexus, I seem to be having trouble with my installation of bioperl similar to a recent discussion: (http://portal.open-bio.org/pipermail/bioperl-l/2007-February/024829.html).

Thursday, June 7, 2007

PhyImport: Test Newick Format Tree

I added a randomly generated tree to the code repository. This is a simple newick format tree with 26 leaf nodes. The image links to the *.tre file.
The file is randtree_26.tre.

This will serve as a test file for the development of PhyImport.

The tree was generate using RandTree.pl. I have been having trouble with the NEXUS file I originally wanted to use.

I have stopped trying to get parseTreeePG.pl to work with the MySQL schema. I have moved on to using the Bio::TreeI object as I originally proposed. Since the tree above was generated with Bio::Tree::RandomFactory it works with Bio::TreeI.

PhyImport: Can't defer foreign keys in MySQL

The use of InnoDB with foreign keys now causes the following error in the parseTrees program:

DBD::mysql::st execute failed: Cannot add or update a child row: a foreign key constraint fails at ./parseTreesPG.pl line 710.

This is not a problem in PG because foreign key checks are deferrable. Since foreigns keys are not deferrable in MySQL I am temporarily turning off FK checks in the PERL code:

$dbh->do("SET FOREIGN_KEY_CHECKS=0");
#UPDATE tree TABLE HERE
$dbh->do("SET FOREIGN_KEY_CHECKS=1");

to deal with this in MySQL.

This solves this problem, but now I am still getting problems with commit:
commit ineffective with AutoCommit enabled at ./parseTreesPG.pl line 316.
Commmit ineffective while AutoCommit is on at ./parseTreesPG.pl line 316.
DBD::mysql::db commit failed: Commmit ineffective while AutoCommit is on at ./parseTreesPG.pl line 316.

I therefore added a check to see if AutoCommit was on before attempint $dbh->commit:


unless ($dbh->{AutoCommit}) {
   $dbh->commit;
}

I am now trying to see if this will fix the problem without introducing new errors.

MySQL Schema Changes, Blog comments enabled

I've made changes to the MySQL schema to fit the changes made by H. Lapp to the Postgres version of the PhyloDB extensions. Since MySQL does not support booleans I used ENUM:

is_rooted ENUM ('FALSE', 'TRUE') DEFAULT 'TRUE'

in the tree table.

Comments are now enabled in the blog, I did not know that they were turned off.

Wednesday, June 6, 2007

PhyInit: INT(10) != INTEGER

I fixed the Foreign Key problems for some of tables that make references to other PhyloDB tables. However linking to the other BioSQL tables seems to be a problem because the INTEGER values in the Phylo tables are created as INT(11) while the INTEGER values in the other BioSQL tables are INT(10) UNSIGNED.

I am going to make all of the integer values in the PhyloDB tables INT(10) so that the foreign key values will work. This will also make the tables consistent with the rest of BioSQL.

PhyInit: Change to InnoDB tables causes ALTER TABLE errors

Changing the table types to InnoDB now causes problems with using ALTER TABLE to create foreign keys.
For example:

ALTER TABLE tree ADD CONSTRAINT FKNode
FOREIGN KEY (node_id) REFERENCES node (node_id);

is giving the error:

DBD::mysql::db do failed: Can't create table './biosql/#sql-cc7_bba.frm' (errno: 150) at ./PhyInit.pl line 363, <> line 1.

Typing the SQL code directly in the MySQL Command line gives:

ERROR 1005: Can't create table './biosql/#sql-cc7_ba6.frm' (errno: 150)

It looks there may be some help in an online discussion of this issue. It is odd that it flags this as a "Can't create table error" when this is really an ALTER TABLE problem.

Info on Foreign Key constraints is also in the MySQL manual. The conditions for foreign key definitions that are listed in the MySQL manual are:

~~Both tables must be InnoDB tables and they must not be TEMPORARY tables.~~
All of my tables are now InnoDB tables so this is not the problem.
~~Corresponding columns in the foreign key and the referenced key must have similar internal data types inside InnoDB~~ ~~so that they can be compared without a type conversion.~~ ~~The size and sign of integer types must be the same. The length of string types need not be the same. For non-binary (character) string columns, the character set and collation must be the same.~~
Both columns in the broken SQL are INT(11) so this is probably not the problem.
In the referencing table, there must be an index where the foreign key columns are listed as the first columns in the same order. Such an index is created on the referencing table automatically if it does not exist.
This is it, adding indexes to the tables fixed the problem.
In the referenced table, there must be an index where the referenced columns are listed as the first columns in the same order.
This is it, adding indexes to the tables fixed the problem.
~~Index prefixes on foreign key columns are not supported. One consequence of this is that BLOB and TEXT~~ columns cannot be included in a foreign key, because indexes on those columns must always include a prefix length.
This is not the problem since the columns are INT(11).
~~If the CONSTRAINT symbol clause is given, the symbol va~~lue must be unique in the database. If the clause is not given, InnoDB creates the name automatically.
This is not the problem since FKnode is a unique value in the database. As a test, I ran the SQL without specifying the symbol, and I still have the error.

I am crossing these off the list as I can ..

Transaction Support in MySQL

I am working with the parseTreesPG.pl script to make it work with MySQL and I am having trouble with transaction support. The use of

$dbh->commit();

is currently causing fatal errors with the message

commit ineffective with AutoCommit enabled at ./parseTreesPG.pl line 736

According to the documentation, this error message occurs when AutoCommit is off, or when transactions are not supported by the system you are using.

It looks like transaction support for MySQL has been around for a few years, but I have never worked with transactions before so this is new for me.

I am working through the Requirements for Transaction Support in MySQL to see where the trouble is.

The version of MySQL I am using (4.0.18-standard) should support transactions
The version of DBD:MySQL I am using supports transactions
ISAM and MyISAM tables in MySQL do NOT support transaction support.
The tables that do support transaction support are: BDB, InnoDB and Gemini.

The code that I am using to create tables currenty does not specify the table type, so MyISAM tables are being created. So .. my guess is that the MyISAM tables are the problem.

It looks like I will need to make sure that MySQL is creating InnoDB tables by modifying the PhyInit.pl script CREATE TABLE syntax to specify the table type as INNODB, this would be something like:

CREATE TABLE tree (
tree_id INTEGER NOT NULL auto_increment,
name VARCHAR(32) NOT NULL,
identifier VARCHAR(16),
node_id INTEGER NOT NULL
, PRIMARY KEY (tree_id)
, UNIQUE (name)
,
)TYPE=INNODB;

Monday, June 4, 2007

Week 2 Project update

Below is the weekly progress report email I sent to the Wg-phyloinformatics listserv.

Hi All --

Week 2 Update for: Command Line Topological Query Application for BioSQL

Last week:
* Updated project web page:
- to reflect changes in command line options
- linked to SVN source
- linked to existing code that is relevant to what I am working on
* Modified my original command line options to fit the standards used in the existing BioSQL scripts
* Wrote PhyInit.pl to initialize the phylogenetic data tables for BioSQL
- This currently assumes an existing BioSQL database
- DB handle info can be sent as:
(1) dsn string as ENV variable,
(2) dsn string at command line,
(3) separate command line vars (--host,--driver,--dbname) that are used to create dsn string
- A new DB will be created if a DB with the name in the dsn string does not exist
() This uses the --dbname or a series of split commands to get info from command line --dsn
() SQL create table code is hard coded in MySQL format
- Password can be entered in a 'secure' fashion if not passed at the command line.
* Given an existing DB, only the new tables will be created, existing tables will be deleted
- User is warned before deleting any existing data. This step does a record count to tell the user how many records would be deleted from any existing tables.
* Started PhyImport.pl to import phylogenetic data from NEXUS,Newick files
- Mainly just set up command line options and POD documentation
* Posted changes I made to the schema to get this to work in MySQL on the project code repository
* All new PERL code was place in the project working repository listed below
* The -h or --help command line switch can be used to read POD documentation

This week:
* PhyImport.pl
- Add NEXUS file support
- Add Newick file support
* PhyInit.pl
- Check over POD documentation
- Add ability to use sql in the sqldir, this will create the full BioSQL schema if needed

Project Web: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:Command_Line_Topological_Query_Application_for_BioSQL
Project blog: http://phylosoc2007jestill.blogspot.com
Working repository: http://code.google.com/p/phylosoc2007jestill

Friday, June 1, 2007

PhyInit: Create tables

I added the code to create the phylo tables if they do not exist. The other BioSQL tables are not currently created, and the current version uses hard coded SQL instead of running external SQL code. The current implementation will only work with MySQL.

Download of source available: PhyInit.pl

Thursday, May 31, 2007

PhyInit: Creating Database if It Does Not Exist

I worked more on defining the command line options that I want to use. These were updated on the project web page.

I incorporated these command lines option into PhyInit. I added the ability to check for the existence of a MySQL database with the name passed from the command line. If the database name does not exist, a new database will be created. This assumes that the user has permissions to create databases on the db host. I first ask if the user really wants to create the database before doing so. The creation of a new database will only work in MySQL.

For the duration of gsoc, I will use MySQL database back end.

Today I found out how to use perldoc to print help statements from POD documentation. This is pretty useful since I used to always write PrintHelp subfunctions that duplicated what is written in the POD documentation.

system("perldoc $0");

Simple but not something that I have used before. I have not tested this in Windoze to see if it works across platforms.

Symbolic Link Note

I always forget how to do symbolic links in Linux so I thought I would leave a note here. I needed to do this for the scripts that are in biosql-schema.

cd /usr/local/bin
ln -s /usr/bin/perl ./perl

Wednesday, May 30, 2007

Command Line Tweaks and Starting PhyInit.pl

I have been slightly modifying the command line arguments I initially suggested to make the arguments consistant with the existing BioSQL related scripts. (ie --dbuser in addition to -u will be valid arguments). This will require that I use Getopt::Long. I am also trying to decide if I want to mix caps in with the command line (ie. should it be PhyInit.pl, phyinit.pl, OR phy_init.pl). These are pretty minor issues but something I want to make consistant with the prexisting bioperl code.

After looking through the exising BioSQL code, I really like the idea of defaulting many of the command line variables to environmental variables, for example

 my $usrname = $ENV{DBI_USER};
 my $pass = $ENV{DBI_PASSWORD};

This is a somewhat obvious thing to do in retrospect, but not something that I have done before.

On the project page I have been going through the existing code in the codebase that is relevant to my proposed programs and linking to the locations of the code in CVS. I was wanting to finish this up tonight, but it looks like the wg_phyloinformatics wiki is down for the evening.

I have started the PhyInit.pl program and will make my first commit to subversion tonight. This will not be a finished program, but I want to be in the practice of making commits on a daily basis as much as possible.

Thursday, May 24, 2007

BioSQL Installed .. Connected Components

I have installed BioSQL on the machine that I am using for development. I had to modify the existing SQL code slightly to work on MySQL.

I have been working the past few days with the perl module Graph as part of the RepMiner project. I have been really happy with the connected components function. This provides a very quick way to generate some very basic clusters from All-By-All BLAST results. With 2GB of memory, I will be able to cluster most of my LTR retrotransposon data sets using this function.

Thursday, May 17, 2007

Transitive Closure makes my brain hurt

The more I read about transitive closure the more my brain hurts. I am currently trying to implement a simple transitive closure algorithm in PERL (Warshall's) for basic classification of repeat families.

Monday, May 14, 2007

RepMiner & Installing BioSQL Plans

I've been reading the "Trees and Hierarchies in SQL" book and I have been getting the RepMiner project ready for publication. I have been adding POD documentation to the RepMiner programs. The more I use POD, the more I like it and wonder why I have never used it before. I have also been real happy with subversion. It seems like it is a lot easier to use then CVS.

I have looked over my CVS checkout of bioperl-db. I cleaned up some space on my machine to have the room to work this summer. I will be installing a BioSQL database this week to play around with. I have still been searching for good references on transitive closure in an SQL framework.

Thursday, May 3, 2007

Paperwork, Books and POD

I have submitted my paperwork to google, and my copy of "Joe Celko's Trees and Hierarchies in SQL for Smarties" arrived a couple of days ago. I have been pretty happy with the results I get including POD documentation in PERL code.

Monday, April 30, 2007

Genealogy Transitive Closure

This is just a reminder to myself to check out how the different open source genealogy software projects implement transitive closure type algorithms.

Sunday, April 29, 2007

CVS Account Activated

My CVS account for BioPerl/BioSQL was approved. I have logged on to the account and changed the password and signed up for the obf-developers mailing list.

After working with POD documentation yesterday, it seems like a pretty straightforward way to do inline documentation. I will probably switch a lot of my existing code of to POD.

Saturday, April 28, 2007

POD Documentation

Today I am playing around with including POD documentation in my PERL scripts. I am working loosely with the type of POD documentation that is used by the BioPerl project. Using POD should make it easier to extract documentation from the PERL code. It does not look as good in emacs as the internal documentation I usually make, but it will be easier to extract the relevant information from my PERL code.

Monday, April 23, 2007

CVS Accounts and Subversion.

I requested a CVS account from both BioSQL and BioPerl.

I checked out the bioperl-db from bioperl CVS to get familiar with this code.

I have been playing around with subversion as supplied by google at code.google. I have been using jperl as my test space. I will probably continue to use google for hosting the repository for the jperl code but will use open-bio CVS for my PhyloSoC project.

I don't like the wiki syntax that is available from google, so I will probably stick with the MediaWiki pages that are available through PhyloSoC and NESCent. The project homepage will therefore be hosted at the PhyloSoc wiki.

Friday, April 20, 2007

The Beginning

Started the blog for the google summer of code project "A Perl-based Command Line Interface to a Topological Query Application for BioSQL in Support of High Throughput Classification and Analysis of LTR Retrotransposons in Plant Genomes". I don't know if I will actually make use of this.

Today I worked on getting subversion working with Emacs on my work computer. I also worked a bit on the code.google.com website .. checking out the available features etc.