Uploaded by Diyuan Lu

2022.12.01-project-proposal-Efficient interpretable embedding learning for gene expression data with deep neural networks

advertisement
Efficient interpretable embedding learning for gene expression data
with deep neural networks for cancer
Project background
Recently, deep learning (DL) has demonstrated great potential in medical applications, such as
cardiovascular disease diagnosis, skin disease classification, protein structure prediction, and singlecell-level understanding of gene expression data for diagnosis and prognosis. However, the power of
DL has been limited by that fact that the gene expression data often has an extremely large feature
space, often over tens of thousands of dimensions, and the number of samples is highly
underrepresented due to data collection. Thus, it is a very challenging task to learn an efficient
representation of gene expression data that is interpretable in the context of biology and clinical
implication, especially for cancer subtype identification and precise treatment design.
Project description
A few studies have been explored the possibility of applying deep autoencoders on gene expression
data, but the interpretability of the learned models still requires further improvement. In this
project, we will approach the aim of learning an efficient representation of gene expression data
for the purpose of precise medicine in cancer patients by
1. exploring current existing literature and methods for gene expression data understanding
2. incorporating expert knowledge in feature selecting and engineering
3. exploring different network structures of the deep autoencoder
4. developing methods to interpreting and evaluating latent dimensions of the learned
embedding in the context of cancer mechanisms
ultimately, we aim to submit a manuscript in a fitting venue (peer-reviewed high impact journal or
conference) and publish the code repository for reproducibility.
Project requirements
A strong background in computer science, engineering, or mathematics with experience of machine
learning or deep learning is desirable. A successful candidate should be a confident programmer and
curious of the state-of-the-art deep learning advances as well as their biological applications.
Proficiency with python and one of the machine learning frameworks, i.e., tensorflow or pytorch is
required. What we can offer is a group of outstanding researchers from biology, bioinformatic, and
computer science and tightly bond as a whole team. The candidate will be exploring deep learning
methods, understanding genetic data in the context of cancer research, and developing scientific
thinking/writing/problem solving skills with this project and be prepared to be highly qualified in
future career development.
Diversity
Women and people from underrepresented groups are strongly encouraged to apply. We are
committed to seeking and providing any support you require to complete the project.
Download