UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Seqfam: A python package for analysis of Next Generation Sequencing DNA data in families [version 1]

Frampton, M; Schiff, E; Pontikos, N; Segal, A; Levine, A; (2018) Seqfam: A python package for analysis of Next Generation Sequencing DNA data in families [version 1]. F1000Research , 7 , Article 281. 10.12688/f1000research.13930.1. Green open access

[thumbnail of 838ebb02-20e8-441b-b06a-e70379e444e9_13930_-_Matthew_Frampton.pdf]
Preview
Text
838ebb02-20e8-441b-b06a-e70379e444e9_13930_-_Matthew_Frampton.pdf - Published Version

Download (1MB) | Preview

Abstract

This article introduces seqfam, a python package which is primarily designed for analysing next generation sequencing (NGS) DNA data from families with known pedigree information in order to identify rare variants that are potentially causal of a disease/trait of interest. It uses the popular and versatile Pandas library, and can be straightforwardly integrated into existing analysis code/pipelines. Seqfam can be used to verify pedigree information, to perform Monte Carlo gene dropping, to undertake regression-based gene burden testing, and to identify variants which segregate by affection status in families via user-defined pattern of occurrence rules. Additionally, it can generate scripts for running analyses in a “MapReduce pattern” on a computer cluster, something which is usually desirable in NGS data analysis and indeed “big data” analysis in general. This article summarises how seqfam’s main user functions work and motivates their use. It also provides explanatory context for example scripts and data included in the package which demonstrate use cases. With respect to verifying pedigree information, software exists for efficiently calculating kinship coefficients, so seqfam performs the necessary extra steps of mapping pedigrees and kinship coefficients to expected and observed degrees of relationship respectively. Gene dropping and the application of variant pattern of occurrence rules in families can provide evidence for a variant being causal. The authors are unaware of other software which performs these tasks in familial cohorts, so seqfam fulfils this need. Gene burden rather than single marker tests are often used to detect rare causal variants due to greater power. Seqfam may be an attractive alternative to existing gene burden testing software due to its flexibility, particularly in grouping and aggregating variants.

Type: Article
Title: Seqfam: A python package for analysis of Next Generation Sequencing DNA data in families [version 1]
Open access status: An open access version is available from UCL Discovery
DOI: 10.12688/f1000research.13930.1
Publisher version: http://dx.doi.org/10.12688/f1000research.13930.1
Language: English
Additional information: Copyright: © 2018 Frampton M et al. This is an open access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: python, bioinformatics, NGS, DNA, pedigree-information, gene-drop, gene-burden, kinship, mapreduce
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Institute of Ophthalmology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences > Div of Medicine
URI: https://discovery.ucl.ac.uk/id/eprint/10054003
Downloads since deposit
228Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item