“ [A] data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human. They spend a lot of time in the process of collecting, cleaning, and munging data, because data is never clean. This process requires persistence, statistics, and software engineering skills—skills that are also necessary for understanding biases in the data, and for debugging logging output from code. ” - O'Neil and Schutt, 2013.
“ [An] academic data scientist is a scientist, trained in anything from social science to biology, who works with large amounts of data, and must grapple with computational problems posed by the structure, size, messiness, and the complexity and nature of the data, while simultaneously solving a real-world problem. ” - O'Neil and Schutt, 2013.
Communication: Clear, readable results for domain and non-domain experts. Strong data visualization.
Domain Expertise: Deep understanding of the data and how it relates/applies to the problem
Heterogenous Tools: Full computational toolchain will involve multi-language tooling, expertise
Data Cleaning: All data needs to be clean!
Reproducibility: tools (git, hg, svn), analysis (jupyter notebooks), methods (papers, blogs), software (versioning, stability)
Scalability: Methods and analysis must scale from local machines to large, "production" compute nodes
Ethics: Use and/or implement tools/algorithms ethically
Programming, SQL, Math/Stats, ML Algorithms, DataViz, Communication, Cloud Computing, Software Engineering, Automated ML, Domain Expertise
Heterogenous data sources, trustworthy AI/methods, Automation, Privacy, Ethics
Packaging, Production Code Development, Version Control, Reproducibility, Sharing Data/Analysis
Image Credit: MCNP, X. Monte Carlo Team, MCNP–A General Purpose Monte Carlo N-Particle Transport Code, Version 5. LA-UR-03-1987, Los Alamos National Laboratory, April 2003.
General:
Monte Carlo specific:
Denovo specific:
Forward Flux
Adjoint Flux
Adjoint Flux
Used by CADIS
$\Omega$ Flux
Used by CADIS-$\Omega$
Anisotropy Metric 2 ( $\phi^{\dagger}_{\Omega}/\phi^{\dagger}$ ) Distribution, Group 26 for Steel Beam in Concrete
Above: Anisotropy Metric 4 Distribution ( $\psi^{c}_{max:avg}/\psi^{\dagger}_{max:avg}$ ), by Energy Group, in Regions where the Contributon Flux is High
Below: Trend Results for Anisotropy Metric 4 as Related to the Ratio of Relative Errors $RE_{\Omega}/RE_{CADIS}$
Nuclear Data: Detection --> Data Reduction --> Format --> Human-Readable/Machine-Readable --> Simulation
Coupled Reactor Multiphysics: High Dimensionality + Coupled Fields, data selection, minimization + visualization
Real-Time Grid Monitoring/Response: Multiply-Sourced, Time-Series Data --> Cleaning --> Response Metrics --> GUI for human interaction
Materials Selection and Design: Using ML optimization techniques to minimize compute time for parametric studies
Using data science methods and best practices in scientific computing will help us do better, more reproducible, robust research
Practicing the skills we learn in scientific training will help us be better data scientists
Computing makes our lives better as scientists

Use MCNP to get the fluence data for the reflector
↓
Use fluence and experimental data to calculate radiation-induced strain
↓
Use strain calculation to impose a pseudo-temperature distribution in FEM
↓
Use FEM and fluence buildup to determine lifetime
Idea: Perform lifetime evaluations on core components in emerging reactor designs based on coupled physics
Deliverable: software, lifetime characterization
Collaborators: Kairos Power, Westinghouse, INL
Funding: DOE-NE, ARPA-E, SBIR
Image Credit: Westinghouse eVinci micro reactor link
The forward neutron transport equation:
\[ \begin{multline} \hat\Omega \cdot \nabla \psi (\vec {r} ,E,\:\hat\Omega)+\Sigma _{ t } (\vec{r},E)\psi (\vec { r } ,E,\:\hat\Omega) = \\ \int _{ 4\pi } \int _{ 0 }^{ \infty } \Sigma _{ s }(E'\rightarrow E, \hat\Omega'\rightarrow\hat\Omega)\psi (\vec { r } ,E',\: \hat\Omega')dE' \:d\hat\Omega' + q_{e}(\vec { r } ,E, \:\hat\Omega). \end{multline} \]The adjoint neutron transport equation:
\[ \begin{multline} \hat\Omega \cdot \nabla \color{teal}{\psi^{\dagger}} (\vec {r} ,E,\:\hat\Omega)+\Sigma _{ t } (\vec{r},E)\color{teal}{\psi^{\dagger}} (\vec { r } ,E,\:\hat\Omega) = \\ \int _{ 4\pi } \int _{ 0 }^{ \infty } \Sigma _{ s }(\color{violet}{E\rightarrow E'}, \color{purple}{\hat\Omega\rightarrow\hat\Omega'})\color{teal}{\psi^{\dagger}} (\vec { r } ,E',\: \hat\Omega')dE' \:d\hat\Omega' + \color{teal}{q_{e}^\dagger}(\vec { r } ,E, \:\hat\Omega). \end{multline} \]The adjoint solution can be used to make an importance map for a desired outcome.
An exact adjoint solution can be used to obtain a zero variance Monte Carlo solution.
Notation:
$\langle ab \rangle = \int a(P)b(P) dP$Detector response
$ \begin{align} R &= \langle \sigma_d \psi \rangle \\ &= \langle q^{\dagger} \psi \rangle \\ &= \langle q \psi^{\dagger} \rangle \\ \end{align} $Point source
$q ( \vec{r}, E, \hat{\Omega}) = \delta(\vec{r}-\vec{r_0})\delta(\vec{E}-\vec{E_0})\delta(\vec{\hat{\Omega}}-\vec{\hat{\Omega}_0})$Response = adjoint flux
$R = \psi^{\dagger} (r_0, E_0, \Omega_0)$Biased source distribution
Starting weight of the particles
Weight window target values
$\\ \hat{q}(\vec{r},E) = \frac{\phi^{\dagger}(\vec{r},E)q(\vec{r},E)}{R} \\$
$\\ w_{0}(\vec{r},E) = \frac{q(\vec{r},E)}{\hat{q}(\vec{r},E)} = \frac{R}{\phi^{\dagger}(\vec{r},E)} \\$
$\\ w(\vec{r},E) = \frac{R}{\phi^{\dagger}(\vec{r},E)}$
MCNP, X. Monte Carlo Team, MCNP–A General Purpose Monte Carlo N-Particle Transport Code, Version 5. LA-UR-03-1987, Los Alamos National Laboratory, April 2003.
Evans, Thomas M., et al. "Denovo: A new three-dimensional parallel discrete ordinates code in SCALE." Nuclear technology 171.2 (2010): 171-200.
Mosher, Scott W., et al. "ADVANTG—an automated variance reduction parameter generator." ORNL/TM-2013/416, Oak Ridge National Laboratory (2013).
Above: Figure of Merit results for CADIS and CADIS-$\Omega$ for various tally regions of interest
Below: Timing results for CADIS and CADIS-$\Omega$
Above: Figure of Merit results for CADIS and CADIS-$\Omega$ for various tally regions of interest
Below: Timing results for CADIS and CADIS-$\Omega$
Idea: Use anisotropy metrics to characterize beyond $\Omega$-methods, build out suite of novel analysis tools for nuclear engineering with PyNE+yt
Deliverable: Robust scheme to recommend method based on problem type, open-source package to calculate anisotropy metrics
Data Science Influence: Use methods from other domains (halo finding (astro)), cleaning and filtering methods
Idea: Use LDO quadrature set to do angle biasing in Monte Carlo
Deliverable: Angle biasing in Monte Carlo for low angular refinement
Collaborators: ORNL, UC Berkeley Neutronics
Funding: DOE-NE, ORNL subcontract