The Data Science Essentials and Advanced Techniques course offered by Geneve Institute of Business Management presents a compact, professionally oriented programme that equips practitioners with the analytical thinking, tooling familiarity and advanced methods required to deliver insight-driven products. Across ten instructional units the syllabus moves from statistical foundations and data engineering through modelling, interpretability and productionisation concerns. Participants will learn to select appropriate methods, reason about uncertainty, validate results robustly and prepare models for reliable operational use. Teaching favours clarity, reproducible workflows and practical judgement so graduates can design end-to-end data pipelines, present defensible conclusions and collaborate effectively with engineering and product teams.
Target group
-
Early-career data analysts and junior data scientists wanting to progress into full data-science responsibilities and project ownership.
-
Software engineers and data engineers seeking structured knowledge to build reliable pipelines and model-serving infrastructure.
-
Product managers and business stakeholders who need to interpret analytics results and prioritise data-driven features.
-
Statisticians and researchers transitioning to applied machine learning and large-scale data workflows.
-
Analytics managers responsible for establishing team standards, reproducible processes and model governance.
-
Professionals from domain fields (finance, healthcare, marketing) aiming to apply advanced analytical techniques in their sector.
Objectives
-
Explain core statistical concepts, probabilistic reasoning and uncertainty quantification to underpin robust analysis.
-
Construct reliable data pipelines: ingestion, cleaning, feature engineering and reproducible transformation flows.
-
Build, evaluate and compare predictive models using classical and modern machine-learning algorithms.
-
Implement model-interpretability, fairness checks and uncertainty measures to support transparent decisions.
-
Prepare models for deployment: monitoring, retraining triggers, performance regression checks and lifecycle management.
-
Choose appropriate tools and workflows that balance experimentation speed with production robustness.
Course Outline
-
Statistical Foundations and Probability:
-
Descriptive statistics: central tendency, dispersion and distributional shape measures.
-
Probability basics: events, conditional probability and Bayes’ theorem intuition.
-
Sampling theory, confidence intervals and hypothesis testing principles.
-
Practical pitfalls: p-hacking, multiple comparisons and effect-size interpretation.
Exploratory Data Analysis and Visualization:
-
Data summarisation techniques and exploratory visual patterns to surface structure.
-
Choosing visual encodings: when to use bars, lines, density or scatter variants.
-
Detecting outliers, missingness patterns and structural anomalies through visuals.
-
Storytelling with charts: framing insights, annotating salient observations and limitations.
-
-
Data Wrangling and Feature Engineering:
-
Ingesting diverse sources, schema harmonisation and robust parsing strategies.
-
Handling missing data, imputation trade-offs and encoding categorical variables.
-
Feature creation: interaction terms, temporal features and domain-informed transformations.
-
Feature selection, dimensionality reduction and avoiding leakage in pipeline design.
Databases and Data Engineering Principles:
-
Storage formats, columnar layouts and trade-offs for analytical workloads.
-
ETL versus ELT patterns and designing idempotent transformation jobs.
-
Data versioning, provenance and schema migration practices for reproducibility.
-
Query optimisation basics and practical partitioning strategies for large tables.
-
-
Supervised Learning — Classical Methods:
-
Linear models, regularisation and interpreting coefficients in applied contexts.
-
Tree-based ensembles: decision trees, random forests and gradient boosting essentials.
-
Model selection workflows: cross-validation, hyperparameter search and validation leakage avoidance.
-
Evaluation metrics: precision/recall, ROC/AUC, calibration and business-aligned KPIs.
Unsupervised Learning and Clustering:
-
Clustering algorithms: k-means, hierarchical and density-based approaches and use cases.
-
Dimensionality reduction: PCA, t-SNE and UMAP for structure discovery.
-
Topic modeling and representation techniques for text and categorical data.
-
Assessing cluster validity, stability and practical utility for downstream tasks.
-
-
Modern Machine Learning and Neural Methods:
-
Neural network basics, activation choices and training dynamics at a conceptual level.
-
Regularisation, optimisers and practical recipes for stable training runs.
-
Use cases for deep learning: vision, text embeddings and time-series models.
-
Transfer learning, fine-tuning and model compression considerations for deployment.
Time Series Analysis and Forecasting:
-
Decomposition, seasonality, trend detection and stationarity diagnostics.
-
Forecasting models: ARIMA family, exponential smoothing and state-space approaches.
-
Modern approaches: sequence models, temporal convolution and hybrid architectures.
-
Backtesting, rolling windows and evaluation metrics suited for temporal forecasts.
-
-
Model Interpretability and Responsible AI:
-
Global versus local interpretability methods and when to apply each.
-
Feature importance, SHAP/LIME intuition and communicating explanations clearly.
-
Fairness checks: metrics, subgroup performance assessment and mitigation avenues.
-
Uncertainty quantification: predictive intervals, calibration and Bayesian considerations.
Feature Store and Reproducibility Practices:
-
Principles for feature serving and synchronised offline/online representations.
-
Reproducible experiments: seed management, environment capture and artifact tracking.
-
Metadata, lineage and experiment registries as collaboration enablers.
-
Packaging models and features for repeatable production rollouts.
-
-
Model Deployment and Serving Architectures:
-
Serving paradigms: batch, real-time, and hybrid scoring patterns with trade-offs.
-
Containerised deployment, model servers and RPC/HTTP serving considerations.
-
Latency, throughput and concurrency implications for different serving choices.
-
Canarying models, shadow deployments and graceful rollback mechanisms.
Monitoring, Drift Detection and Model Maintenance:
-
Production telemetry: prediction distributions, input drift and performance degradation signals.
-
Automated alerting thresholds, anomaly detection and incident playbooks for model issues.
-
Retraining triggers, dataset shift strategies and continuous evaluation pipelines.
-
Logging for auditability and root-cause analysis of mispredictions.
-
-
Advanced Feature Engineering and Representation Learning:
-
Embeddings for categorical and textual data and their integration into models.
-
Automated feature synthesis methods and the role of human-guided features.
-
Feature stability measurement and lifecycle management for long-lived features.
-
Evaluating incremental value of features and avoiding overfitting to historical quirks.
Natural Language Processing Essentials:
-
Text preprocessing, tokenisation strategies and representation choices.
-
Word and contextual embeddings, sentence pooling and semantic features.
-
Common NLP tasks: classification, sequence tagging and retrieval-oriented models.
-
Evaluation protocols for NLP and pitfalls with domain shift.
-
-
Probabilistic Modelling and Bayesian Techniques:
-
Bayesian thinking, priors, posteriors and reasoning about parameter uncertainty.
-
Probabilistic graphical models and structured latent-variable representations.
-
Practical approximate inference approaches: variational methods and MCMC basics overview.
-
Use cases where probabilistic approaches improve decision-making under uncertainty.
Optimization and Advanced Training Techniques:
-
Loss surfaces, batch sizing, learning-rate schedules and stability heuristics.
-
Dealing with class imbalance, sample weighting and custom loss functions.
-
Curriculum learning, multi-task training and transfer strategies for scarce-label scenarios.
-
Checkpointing, resumable training and resource-aware training designs.
-
-
Ethics, Governance and Data Privacy:
-
Ethical frameworks for data use, consent and impact assessment.
-
Data minimisation, anonymisation techniques and re-identification risk awareness.
-
Governance constructs: model cards, data sheets and responsible-deployment checklists.
-
Regulatory touchpoints: GDPR-like obligations, audit trails and cross-border data flows.
Scaling Data Science Teams and Processes:
-
Team roles, handoffs and collaboration patterns between analysts, engineers and product.
-
Operationalising best practices: code review, CI for models and shared tooling.
-
Prioritisation frameworks for experiments, feature work and technical debt reduction.
-
Onboarding, documentation standards and maintaining institutional knowledge.
-
-
Tooling, Ecosystem and Production Stack Choices:
-
Comparing libraries and platforms for modelling, feature stores and serving.
-
Trade-offs between managed ML platforms and bespoke open-source stacks.
-
Data pipelines: scheduler choices, orchestration and workflow observability essentials.
-
Cost, scalability and vendor lock-in considerations for tooling selection.
Career Paths and Continued Development:
-
Roles: data scientist, ML engineer, research engineer and analytics lead distinctions.
-
Building a portfolio: reproducible projects, notebooks, and demonstrable impact metrics.
-
Learning pathways: advanced coursework, conferences and community contribution strategies.
-
Networking, mentorship and certification options to accelerate professional growth.
-
