I am a Brazilian researcher/developer that likes a lot learning new technologies as well as accepting and accomplishing challenges.

I also have some hobbies like reading, learning new languages in Duolingo (French, German, Italian and Russian), swimming, drawing animals in a paper and buildings in AutoCad.

Topics I have been working with: bioinformatics , semantic web, machine learning, data analysis, natural language processing, back end, front end.

Portfolio

PredPrIn

PredPrIn is a scientific workflow to predict Protein-Protein Interactions (PPIs) using machine learning to combine multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations. It is composed of three main modules:

  • First - Construction of a knowledge base with the available annotations of proteins and reuses this base for other prediction experiments, saving time and becoming more efficient.
  • Second - Numerical feature generation from several evidence from different classes, such as: Gene Ontology (GO) information, domain interaction, metabolic pathway participation and sequence-based interaction. For the GO branches, we made a study to evaluate the best method to calculate semantic similarity to enhance the workflow performance. In this module, I implemented a modified version of the original stacking ensemble technique adapted to the biological domain of interest.
  • Third - Classification of the samples with the Adaboost classifier, performance evaluation metrics and model.

Tags: Python , Gene Ontology, semantic similarity, scientific workflow, parallelization, machine learning.

PPIVPro

Python pipelines to filter positive predicted protein interactions according to two criteria: (i) association rules of cellular components according to gold standard Protein Protein Interaction data from HINT and (ii) text mining on scientific papers published on Pubmed extracting sentences where the proteins in the PPIs appeared in an interaction context

Tags: Python , rules association, natural language processing, machine learning.

EpiCurator

This pipeline contains a series of functions to filter small amino acid sequences (peptides) predicted by epitopes discovery tools. It parses files from BepiPred and Discotope tools and executes filtering and descriptive steps to refine the final list of epitopes. One of the modules (Epiminer) executes text mining on scientific papers directed to the search of these peptides in a context of epitopes prediction and immunology in order to check the originality of the user's epitopes.

Tags: Python , data analysis, results exportation and natural language processing.

About Me

Academic Background

1 Graduated in System's analysis and development at Fluminense Federal Institute, where I developed a web system to apply cognitive tests of memory and attention generating reports in the administration page. All the data about this kind of tests and the terminology is recorded following the linked data principles in order to encourage other systems to adopt the same terminology and facilitate the analysis and querying using the same file format.

2 Master's degree in Systems and Information at Military Institute of Engineering. I developed a dissertation about data interlinking on the web of data in the semantic web context, having as target since the ranking of the best datasets publicly available to link to some source without external links, till the data items mapping between the selected datasets and the source one, human validation using an online platform by crowdsourcing to validate the items mapping, and at the end making the data fusion between source dataset and the validated items.

3 Doctor's degree in Computational modelling at the National Laboratory of Scientific Computing where I developed a thesis about protein interaction prediction, using multiple evidence of biological information which gives different shadows about physical and functional associations between proteins and massive use of machine learning to combine detection methods and posterior classification.

4 I worked as a postdoctoral researcher at the National Laboratory of Scientific Computing, specifically, in the bioinformatics laboratory, from 12/2020 to 01/2022 analyzing huge amount of genomic data for the SARS-CoV-2 projects, using bioinformatic tools to map, align and annotate proteins. I also had the opportunity to learn and execute structural analysis on proteins and perform docking essays. Most of the analysis relied on data exploration, visualization, prediction and forecasting. I and the bioinformatics lab team produced research articles in order to share the findings. I also have experience of presenting short-term courses teaching Python language with biological study cases. In 2016 and 2019, I could participate in two international conferences to present two of my research articles.

Web Development Experience

I worked full time as a full stack developer, for the MTW company from 05/2013 to 02/2014, but I ocasionally colaborate and act as consultant. We developed a mangement system named PortalGov to organize the administrative data of city halls and municipal chambers such as news, tourism facilities, legislation, bidding, organizational structure, etc. Portalgov is composed of a laravel API and a Vue js front end, storing data in a relational database. Other minor projects in this same company involved the use of Code igniter framework, Ruby on rails, Flutter and Angular. In this company, I also have opportunity to talk with the clients in requirement acquisition stage, functional validation and training users to use the systems.