Year: 2024 | Month: April | Volume 14 | Issue 2

Gene Prediction in Rumen Metagenomic Reads of Cattle Using Machine Learning Based Approach

Safeer M. Saifudeen Anilkumar K T.V. Aravindakshan Jamuna Valsalan Ally K. Gleeja V.L.
DOI:10.30954/2277-940X.02.2024.4

Abstract:

The present study was focused to build a predictive model for protein coding genes from the rumen metagenomic data utilising
most promising machine learning (ML) tools. We classified the sequence reads into coding genes and non-coding sequences,
converted the sequences into k-mers of various sizes (k = three to six) and extracted features named k-mer count that were
representative of the sequence reads. ML classifiers were trained using 16 genomes consisted of 13 bacterial kingdom and 3
archaeal kingdom selected from diverse environment and various systems. Among the five ML models for gene prediction,
artificial neural network (ANN) performed best with maximum accuracy 89 per cent for k-mer three. We observed that logistic
regression and SVMtook only reasonable computational time when compared to ANN.DNA was isolated from rumen liquor of
crossbred cattle and were used for metagenomic sequencing. Annotated rumen metagenomic sequences was used to validate the
ML models created. Logistic regression performed best with 85 per cent accuracy on minimum feature count itself (unigram)
for k-mer four. Out of 8718 coding sequences provided to logistic regression classifier, 8073 sequences correctly predicted as
genes (true positives) and remaining 645 coding sequences were predicted as non-coding (false negatives). We concluded that
machine learning models created namely artificial neural network, support vector machine and logistic regression shows strong,
robust and powerful ability for classification of coding and non-coding genes and it represents an intriguing and promising
avenue for predicting rumen metagenomic genes.

Highlights

  • Classification of sequence into coding and non-coding based on k-mers.
  • Machine learning models for gene prediction in metagenomic DNA fragments.
  • Validation of the models using bovine rumen metagenomic sequences.


© This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited



Print This Article Email This Article to Your Friend

@ Journal of Animal Research | In Association with Association of Mastitis

35752137 - Visitors since March 23, 2019