<tt id="2rb"></tt>
<tt id="2rb"><li id="2rb"></li></tt>

    <cite id="2rb"></cite><rt id="2rb"><optgroup id="2rb"></optgroup></rt>

    <rt id="2rb"><meter id="2rb"></meter></rt>
    <rp id="2rb"><meter id="2rb"></meter></rp>

    Quiet Times

    | No Comments

    This have been quite here becae I have been focing my energy on work and family. To simplify things, I’m going to foc nearly all of my blogging on the Panda’s Thumb for the time being.

    I’ve had the good fortune of having some papers published recently. The first one is a methodology paper concerning a way of extracting phylogenetic information from regions of multiple sequence alignments that are full of indels and difficult to align:

    PICS-Ord: unlimited coding of ambiguo regions by pairwise identity and cost scores ordination. (link)

    Robert Lücking, Brendan P Hodkinson, Alexandros Stamatakis and Reed A Cartwright

    BMC Bioinformatics 2011, 12:10 doi:10.1186/1471-2105-12-10

    My co-author, Brendan Hodkinson, has already covered it on his blog.

    In molecular biology, an alignment is a partial reconstruction of the evolutionary history of a group of sequences. In an alignment, all residues found in the same column are considered to be descended from a single residue in the ancestral sequence. (Of course, insertions violate this description, but I won’t get into that.) Alignments are not direct observations. They are actually inferences based on the patterns of sequences found in the dataset. Often times there are particular areas in which the alignment is difficult to resolve. Take this example:

    A typical problem in multiple sequence alignments where a section is full of gaps and contains a complicated phylogenetic signal. Dark red: high certainty that alignment is accurate; Dark blue: low certainty that alignment is accurate..

    It was constructed via the GUIDANCE webserver. (A great resource that everyone should e.) In this example, we have a region defined by a lot of sequence variation created by many insertions and deletions. The alignment is not well defined here, and in most applications it will jt be removed, and the data “thrown away”.

    But is this the only solution? In our paper we develop a methodology, dubbed PICS-Ord (download), that provides an easy solution for extracting phylogenetic information from problematic regions chosen by its er. PICS-Ord works through a three-step process:

    1. Realign the segments in pairs ing Ngila, and calculate the likelihood of the alignment from an evolutionary model. This produces a distance matrix of the segments.
    2. Ordinate the distance matrix ing principal coordinate analysis (PCoA). This assigns each segment to a point in n-1-dimensional space.
    3. Quantize each dimension into a set of characters

    This might seem a bit odd at first. “Why not jt e the distance matrix directly?” That would be great if we could, but there aren’t any phylogenetic programs that we know off that allow the mixing of distance matrices and sequence data. With our method, we get discrete, ordered characters that can be ed in popular programs like, RAxML.

    There are three example files in the PICS-Ord distribution, and I’ll illtrate its age with example1.fas. The alignment of these sequence fragments is messy:

     100 114
    sequence_001 ----------------------------------------------------------------------------tatactatcta---------------------------
    sequence_002 -------------------------------------------------------------------aattgtatttatactatata---------------------------
    sequence_003 -------------------------------------------------------------------tttaagatttattctatatt---------------------------
    sequence_004 tttaggattaattttata--------------------------------------------------------taatactaatata---------------------------
    sequence_005 -------------gatgg--------------------------------------------------------ttttacctatata---------------------------
    sequence_006 ---------------------------------------------------------------------------tatcattatgca---------------------------
    sequence_007 ---------------------------------------------------------------------------tatcattatgca---------------------------
    sequence_008 -------------------------------------------------------------------------atatgtttaagata---------------------------
    sequence_009 -------------------------------------------------------------------------atatgtttaagata---------------------------
    sequence_010 -------------------------------------------------------------------------atatgtttaagata---------------------------
    sequence_011 -------------gtac----------------------------------------------------------aattataatata---------------------------
    sequence_012 -------------gtac----------------------------------------------------------aattataatata---------------------------
    sequence_013 -------------gtac----------------------------------------------------------taatttaatata---------------------------
    sequence_014 -------------ctac-----------------------------------------------------------aatataatata---------------------------
    sequence_015 -------------ctac-----------------------------------------------------------aatataatata---------------------------
    sequence_016 -------------ctac-----------------------------------------------------------attaaaatata---------------------------
    sequence_017 -------------ctac-----------------------------------------------------------attaaaatata---------------------------
    sequence_018 -------------gtat-----------------------------------------------------------aatttaatcta---------------------------
    sequence_019 -------------gtat-----------------------------------------------------------attttaatcta---------------------------
    sequence_020 -------------------------------------------------------------------------------ataagata---------------------------
    sequence_021 -------------------------------------------------------------------------------ataagata---------------------------
    sequence_022 --------------------------------------------------------------------------attataattaata---------------------------
    sequence_023 --------------------------------------------------------------------------attataattaata---------------------------
    sequence_024 -------------------------------------------------------------------------------ataagata---------------------------
    sequence_025 -------------------------------------------------------------------------------ataagata---------------------------
    sequence_026 ----------------------------------------------------------------------------aaaaaaaaata---------------------------
    sequence_027 -----------------------------------------------------------------------------aaaaaaaata---------------------------
    sequence_028 -------------------------------------------------------------------------------acaaaata---------------------------
    sequence_029 -------------------------------------------------------------------------------acaagata---------------------------
    sequence_030 --------------------------------------------------------------------------------acaaata---------------------------
    sequence_031 -------------------------------------------------------------------------------acaaaata---------------------------
    sequence_032 -------------gaat-----------------------------------------------------------aatattaaata---------------------------
    sequence_033 -------------gaat-----------------------------------------------------------aatattaaata---------------------------
    sequence_034 -------------gaaa-----------------------------------------------------------aatattaaata---------------------------
    sequence_035 -------------gtat-----------------------------------------------------------tctttaatata---------------------------
    sequence_036 -------------gtat-----------------------------------------------------------tatttaatcta---------------------------
    sequence_037 -------------gtat-----------------------------------------------------------tatttaatata---------------------------
    sequence_038 -------------gtat-----------------------------------------------------------tatttaatcta---------------------------
    sequence_039 -----------------------------------------------------------------------------gttttatata---------------------------
    sequence_040 -----------------------------------------------------------------------------gtttaatata---------------------------
    sequence_041 -------------------------------------------------------------------------atcagtttaatacg------------------ctgagtgat
    sequence_042 -------------------------------------------------------------------------accagtttaattta------------------ctgggtgat
    sequence_043 ----------------------------------------------------------------------------------------------ctcagtttctgctgagtggt
    sequence_044 ----------------------------------------------------------------------------agtttaatatg------------------ctgattgat
    sequence_045 --------------------------------------------------------------------------------atatgta---------------------------
    sequence_046 --------------------------------------------------------------------------------atatgta---------------------------
    sequence_047 --------------------------------------------------------------------------------ataagta---------------------------
    sequence_048 --------------------------------------------------------------------------------ataagta---------------------------
    sequence_049 --------------------------------------------------------------------------------ataagta---------------------------
    sequence_050 --------------------------------------------------------------------------------atatgta---------------------------
    sequence_051 -----------------------------------------------------------------------------gttttctaat---------------------------
    sequence_052 -----------------------------------------------------------------------------gtttactaaa---------------------------
    sequence_053 -----------------------------------------------------------------------------gtttactaat---------------------------
    sequence_054 -----------------------------------------------------------------------------gtttactaat---------------------------
    sequence_055 -------------------------------------------------------------------------------gcta-aaa---------------------------
    sequence_056 -------------------------------------------------------------------------------gcta-aaa---------------------------
    sequence_057 -------------------------------------------------------------------------------gcta-aaa---------------------------
    sequence_058 -----------------------------------------------------------------------------gtttactgaa---------------------------
    sequence_059 -----------------------------------------------------------------------------gtttactgaa---------------------------
    sequence_060 -----------------------------------------------------------------------------gtttactgaa---------------------------
    sequence_061 -----------------------------------------------------------------------------gttagctgaa---------------------------
    sequence_062 -----------------------------------------------------------------------------gttagctgaa---------------------------
    sequence_063 -----------------------------------------------------------------------------gttagctgaa---------------------------
    sequence_064 -------------------------------------------------------------------------------gttt-aaa---------------------------
    sequence_065 -------------------------------------------------------------------------------gttt-aaa---------------------------
    sequence_066 -------------------------------------------------------------------------------gttt-aaa---------------------------
    sequence_067 -------------------------------------------------------------------------------gcta-aaa---------------------------
    sequence_068 -------------------------------------------------------------------------------gcta-aaa---------------------------
    sequence_069 -----------------------------------------------------------------------------atttacttaa---------------------------
    sequence_070 -----------------------------------------------------------------------------atttacttaa---------------------------
    sequence_071 -----------------------------------------------------------------------------atttacttaa---------------------------
    sequence_072 ---------------------------------------------------------------------------------gttaaa---------------------------
    sequence_073 ---------------------------------------------------------------------------------gttaaa---------------------------
    sequence_074 aattttattaattactttagtaattaataaggttattttaagtaacagcaaaatattagttaaaagcgttgct-tgcaattagtaaagt--------------agca-ttatta
    sequence_075 aattatattaattactttagtaattaaatttgttatttttagtaacagcaaaatattagttacaagcgttgct-tgtaattagtaaagt--------------agca-ttatta
    sequence_076 ---------------------------------------------------------------------------------ttttta---------------------------
    sequence_077 ---------------------------------------------------------------------------------ttttta---------------------------
    sequence_078 ---------------------------------------------------------------------------------ttttta---------------------------
    sequence_079 ---------------------------------------------------------------------------------ttttta---------------------------
    sequence_080 -------------gaag-----------------------------------------------------------attaataacta---------------------------
    sequence_081 -----------------------------------------------------------------------------atttatatta---------------------------
    sequence_082 -----------------------------------------------------------------------------atttatatta---------------------------
    sequence_083 actcctact------ttaaacatttagtagtgtcgaacctactgatagcatctggttttctattgg--------tacttataacataaccactaaatatttagagtattaatta
    sequence_084 actcctact------ttaaacatttagtagtgtcgaacctactgatagcatctggttttctattgg--------tacttataacataaccactaaatatttagagtattaatta
    sequence_085 -------------gaaa----------------------------------------------------------taacagtaacta---------------------------
    sequence_086 -------------aaag-----------------------------------------------------------attagtaacta---------------------------
    sequence_087 aattttaca------tttagtttttaatctttatgtttaaaa----acatgtatgctatttatatg--------tatatataatatagt--------------agaacttacaa
    sequence_088 aattttact-------------------ttgggt-tttaaaa----actagtatgctatgtttatatattaatttatatatcatatagt--------------agaacttacaa
    sequence_089 aattttact------ctt--tttttaagttttat-atttaaa----atctgtatgctatgtttatatattaatttatatataatatagt--------------agaacttacaa
    sequence_090 aattttact------ctt--tttttaagttttat-atttaaa----atctgtatgctatgtttatatattaatttatatataatatagt--------------agaacttacaa
    sequence_091 -------------gtac-----------------------------------------------------------ataataatata---------------------------
    sequence_092 -------------gtaca--------------------------------------------------------taataataatata---------------------------
    sequence_093 -------------gtaca--------------------------------------------------------taataataatata---------------------------
    sequence_094 -------------gtac-----------------------------------------------------------ataataatata---------------------------
    sequence_095 ---------------------------------------------------------------ttttttataccaataaataatata---------------------------
    sequence_096 ---------------------------------------------------------------ttttttataccaataaataatata---------------------------
    sequence_097 ---------------------------------------------------------------ctatttata-taataaataatata---------------------------
    sequence_098 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------
    sequence_099 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------
    sequence_100 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------
    

    But instead of throwing it away, you can process it with PICS-Ord and get a clean set of ordered characters that contain approximately the same phylogenetic information as the sequences above.

        100    20
    sequence_001 53221002101000000010
    sequence_002 44121113101010000000
    sequence_003 53211103102011000100
    sequence_004 53321103111000010100
    sequence_005 53211003101000001000
    sequence_006 53221002001000000000
    sequence_007 53221002001000000000
    sequence_008 43220112011000000000
    sequence_009 43220112011000000000
    sequence_010 43220112011000000000
    sequence_011 53221012011000000000
    sequence_012 53221012011000000000
    sequence_013 53221012011000100000
    sequence_014 53321012001010100000
    sequence_015 53321012001010100000
    sequence_016 53221013001000000000
    sequence_017 53221013001000000000
    sequence_018 53221012001000100000
    sequence_019 53221012001000001000
    sequence_020 53220102011000000000
    sequence_021 53220102011000000000
    sequence_022 53121012011010000000
    sequence_023 53121012011010000000
    sequence_024 53220102011000000000
    sequence_025 53220102011000000000
    sequence_026 53220002001000000000
    sequence_027 53220002001000000000
    sequence_028 53120102011000000000
    sequence_029 53220102011000000000
    sequence_030 53120002011000000000
    sequence_031 53120102011000000000
    sequence_032 53220002111100000000
    sequence_033 53220002111100000000
    sequence_034 53220002111000000000
    sequence_035 53221012011000001000
    sequence_036 53221012001000000000
    sequence_037 53221012011000000000
    sequence_038 53221012001000000000
    sequence_039 53221102111000000000
    sequence_040 53211002011000000000
    sequence_041 53300112011000000000
    sequence_042 53200112011010001010
    sequence_043 53300103001100001001
    sequence_044 53200112001110000000
    sequence_045 53120112001000000000
    sequence_046 53120112001000000000
    sequence_047 53120102001000000000
    sequence_048 53120102001000000000
    sequence_049 53120102001000000000
    sequence_050 53120112001000000000
    sequence_051 53211002011000000000
    sequence_052 53111002011000000000
    sequence_053 53211002011000000000
    sequence_054 53211002011000000000
    sequence_055 53110002001010000000
    sequence_056 53110002001010000000
    sequence_057 53110002001010000000
    sequence_058 43111002011000000000
    sequence_059 43111002011000000000
    sequence_060 43111002011000000000
    sequence_061 53201002011000000000
    sequence_062 53201002011000000000
    sequence_063 53201002011000000000
    sequence_064 43111002011000000000
    sequence_065 43111002011000000000
    sequence_066 43111002011000000000
    sequence_067 53110002001010000000
    sequence_068 53110002001010000000
    sequence_069 43111002011000000000
    sequence_070 43111002011000000000
    sequence_071 43111002011000000000
    sequence_072 53111002001000000000
    sequence_073 53111002001000000000
    sequence_074 59021102011001100000
    sequence_075 59020012001110001000
    sequence_076 53121102001100000000
    sequence_077 53121102001100000000
    sequence_078 53121102001100000000
    sequence_079 53121102001100000000
    sequence_080 53220002001100000000
    sequence_081 53121102001000000000
    sequence_082 53121102001000000000
    sequence_083 90021002001000000000
    sequence_084 90021002001000000000
    sequence_085 53220002000100000000
    sequence_086 53120002000101000000
    sequence_087 02121100001000000000
    sequence_088 02020003000010100000
    sequence_089 02021013011100000000
    sequence_090 02021013011100000000
    sequence_091 53321002001000000000
    sequence_092 53321002011000000000
    sequence_093 53321002011000000000
    sequence_094 53321002001000000000
    sequence_095 53321202010100001000
    sequence_096 53321202010100001000
    sequence_097 43321102011000001000
    sequence_098 53321003001000000000
    sequence_099 53321003001000000000
    sequence_100 53321003001000000000
    

    I haven’t been keeping up with my Calix Cari polls this year for collage football. But now that the regular season has ended, I have found time to produce one. The events of this season have been rather unpredictable. Of course by the end of the season there were few surprises. Auburn appeared out of nowhere to become #1 on the strength of a once-in-a-decade player who fit perfectly into their offensive system. (Yay, for the booster who had cash to spare in this economy. We will see if their season stands the test of time.) But in my calculation Auburn is only #3, behind Oregon and ¡Stanford! (I still think Harbaugh would make a smooth transition into the coach’s chair at UGA but wouldn’t be there long. It’s good that Richt was retained.)

    It appears my algorithm likes the Pac-10 over the SEC, and Auburn lost ground becae of its tight victories, early in the season.

    RankTeamRecordQuality
    1Oregon12-07.3937
    2Stanford11-17.2491
    3Auburn13-07.1707
    4Oklahoma11-27.0580
    5TCU12-06.9927
    6Boise St11-16.8977
    7Missouri10-26.7980
    8Ohio State11-16.5913
    9Texas A&M9-36.5043
    10Oklahoma St10-26.4876

    Ngila 1.3 Released

    It has been a long time coming, but I have finally released Ngila 1.3. This version fixes a few bugs and includes many new features.

    • e CMake for compilation and installation
    • New scaling option enabled by default (identical sequences default to cost of 0)
    • Protein evolutionary models: aazeta and aageo
    • Fasta and Phylip format output support
    • Cltal and Phylip format input support
    • Report sequence identity measure
    • Matrix output formats for distance measures
    • Look for “ngilarc” file in the home directory.
    • New separator option
    • New const-align option
    • Replace arg-file option with ngilarc option.
    • e ctom zeta function if GSL not found.
    • Optimize size of travel table.
    • Ordering of –pairs-all fixed
    • bug fix for output of large alignments >10kb
    • minor bug fix for geo model

    The Working Life

    | 1 Comment

    I apologize for things being slow on this blog. I’ve been knee deep in programming, mancripts, grant proposals, and teaching. I’m hoping to have results to share in the near future. In the mean time, you can follow some of my activities on the Panda’s Thumb.

    I will say that the development version of Dawg now supports codon models, and Ngila has some new features as well.

    Odd Hack

    | 1 Comment

    This machine got partially hacked over the weekend. From what I can tell, Ziproxy was compromised and ed to submit spam email through my system. Becae my mail sever accepts local email, it was going out. It looks like only yahoo emails were being hit. Of course, the spam was coming from China.

    Since I turned off Ziproxy, I haven’t seen any odd email originating from my machine.

    Dawg 2

    | 3 Comments

    Dawg created its first protein sequences today. Woot!

    Bama Wins 2009 Calix Cari

    | 1 Comment

    The announcement is a bit late, but after beating Texas, Bama has won the 2009 Calix Cari.

    RankTeamRecordQuality
    1Alabama14-07.8239
    2Florida13-17.4268
    3Texas13-17.1820