Data Scientist Internship
Remote in ID, OR, UT or WA
This full-time twelve-week internship position is scheduled to begin in May/June 2022
At Cambia, our members need us more than ever, and to continue serving them, we need to be vaccinated. Following federal mandate, all Cambia Health Solutions employees, including 100% remote workers, need to be fully vaccinated for COVID-19 by 1/1/22.
Cambia Health Solutions is working to create a seamless and frictionless health care experience for consumers nationwide. This presents a unique challenge and opportunity for innovative and disruptive solutions from the Cambia DTS Artificial Intelligence (AI) Team.
Looking for a passionate, talented and inventive Data Scientist intern to help design, develop and build industry-leading data science solutions using machine learning technologies and advanced statistical analyses. Together with a highly multi-disciplinary team of scientists, engineers, strategic partners and subject domain experts, you will work on building a real product with machine learning at its core.
Essential Function of the Data Scientist Internship:
- Design novel algorithms for problem solving, which may include data cleaning, feature selection, statistical modeling, data clustering and classification, text processing, and other machine learning techniques, to solve complex healthcare problems presented by healthcare organizations
- Research, design, develop, and implement data driven solutions using machine learning technologies and advanced statistical methods
- Identify and deploy existing machine learning, natural language processing, and information retrieval techniques and systems for knowledge management and discovery, such as using Electronic Medical Records (EMR) data, progress notes, and discharge summaries to identify admitting diagnosis, reason for consultation, clinical history, etc.
- Identify ways to analyze consumers’ experiences from various communication channels and improve customer satisfaction
- Generate and test working hypotheses: aggregate and mine data, conduct analyses, and extract actionable results. Apply statistical techniques to develop models utilizing large-scale database systems
- Cluster and analyze large amounts of user generated content and process data in large-scale environments in Amazon AWS such as EC2, EMR, MapReduce, and PySpark
- Work as a key part of cross-functional teams with various internal customer groups to study quantified business cases, identify business problems and formulate desired outcomes
Key Qualifications and Experience:
- Currently enrolled in an undergraduate or graduate degree program focused on Big Data, Computer Science, Data Analytics, Data Science, Engineering, Math, Statistics, Science or related degree program (preference will be given to graduate students)
- Candidates who have completed their degree in the last six months are also encouraged to apply
- Strong analytic and problem-solving skills, including the ability to apply quantitative analysis techniques to business situations including forecasting, descriptive statistics, statistical inference, and multivariate modeling techniques
- Ability to develop prototypes by manipulating and analyzing complex, high-volume, high-dimensionality data from various sources
- Expertise in producing, processing, evaluating, and utilizing unstructured/semi-structured data
- Practical ability to visualize data, communicate about data, and utilize data effectively
- Demonstrated ability to create complex SQL queries, analyze large amounts of data, create visualizations, and interpret qualitative data (research and feedback) and incorporate into analyses
- Proficiency in machine learning toolkits such as TensorFlow, PyTorch, Stanford CoreNLP, NLTK, Gensim, OpenNLP, scikit-learn, NumPy, etc.
- Demonstrable knowledge of and practical experience applying AI methodologies such as natural language processing, personalization and machine learning algorithms (e.g., regression, clustering, neural networks, kernel methods, dimensionality reduction, ensemble methods, decision tree methods, recommendation systems).
- Must have ML algorithm implementation experience as well as the ability to modify standard algorithms, e.g., change objective functions, work out the math, and implement
- Strong programming skills in at least one object-oriented programming language, e.g., Java, Python, C++, Scala, etc.
- Eager to learn new algorithms, new application areas and new tools
- Excellent oral and written communication skills to effectively interface and communicate with a broad array of internal and external contacts including leadership
- Ability to think creatively and to work well both as part of a team and as an individual contributor
- Strong facilitation skills, including the ability to resolve issues and build consensus among groups of diverse stakeholders
- Ability to coordinate cross-functionally to drive solutions and resolve issues in a timely and effective manner
- Fluency with Linux/Unix
- Required minimum cumulative undergraduate GPA of 3.0
The following skills/experiences/knowledge, a plus:
- Expertise in one or more of the following AI specializations: natural language processing, personalization, information retrieval, recommendation systems, or knowledge bases
- Experience and/or motivation to work on modern deep learning approaches
- Experience with noisy and/or unstructured textual data, such as tweets and search queries
- Knowledge of or experience in building production quality and large-scale deployment of applications related to AI, data science and machine learning
- Experience with large-scale data analysis tools in a cloud environment, such as Spark, Hadoop, MapReduce, Hive, Pig, etc.
- Experience with open-source search engines like ElasticSearch, Solr or Lucene
- Demonstrated knowledge of health plan operations, medical terminologies/ontologies and/or clinical informatics and healthcare systems
- Knowledge of RESTful APIs and visualization tools, such as HTML, CSS, JS, and D3.js
- General software development skills (source code management, debugging, testing, deployment, etc.)
- Publication in AI academic conferences/journals or industrial circles, such as AAAI, IJCAI, ACL, EMNLP, NAACL, NIPS, ICML, KDD, SIGIR, WWW, etc.