Actions
ML beam test PID Meetings » History » Revision 70
« Previous |
Revision 70/71
(diff)
| Next »
Richard Trotta, 07/22/2024 11:20 AM
- Table of contents
- ML beam test PID Meetings
ML beam test PID Meetings¶
Weekly Meeting Zoom Info¶
- Meeting ID: 160 823 2612
- Passcode: 575365
Summer 2024¶
May 28th, 2024¶
- Darren ran with Python3.9
- Docker (containerization) Definition
- Containerization is a technology that allows developers to package and run applications along with all their dependencies in isolated environments called containers. This ensures that the application runs consistently across different computing environments, from a developer's laptop to testing, staging, and production.
- Docker is a popular platform that simplifies containerization. It provides tools to create, deploy, and manage containers. With Docker, developers can write code locally, share their work with colleagues, and deploy to production in a seamless and efficient manner. Docker containers are lightweight, fast, and portable, making them ideal for modern software development and deployment.
- A Crash Course for Summer Research
- GitHub and Python Introduction
- Navigate to python_tutorials and read through the two html files (Introduction and python_tutorial_2)
- Note, these are html files so you'll need
- Navigate to python_tutorials and read through the two html files (Introduction and python_tutorial_2)
Near-term Goals¶
- Setup Linux Subsystem for Windows
- Fork the ML Beam Testing GitHub repository
- Containerize ML Beam Testing GitHub repository with Docker
Homework¶
May 29th, 2024¶
Setting Up Jupyter Notebooks in Docker¶
- Create a directory to store, build, and run Docker container
cd /path/to/directory mkdir beamtest_dir cd beamtest_dir
- Once in the new directory, create a Dockerfile
touch Dockerfile
- Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information
# Use the official Python 3.9 image as the base image FROM python:3.9 # Install Jupyter Notebook and other dependencies RUN pip install --no-cache-dir jupyter # Create a working directory WORKDIR /workspace # Expose the Jupyter Notebook port EXPOSE 8888 # Set the default command to start Jupyter Notebook CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
- Build the Docker image (e.g., helloworld)
docker build -t helloworld .
- Run the Docker container
docker run -p 8888:8888 helloworld
- Jupyter Notebook is now running. Navigate to a browser and type in either
- The custom token authorization screen
http://localhost:8888
- To bypass the token screen, copy/paste the URL that splashes in the terminal into your browser
Near-term Goals¶
- Try and get the python tutorial Jupyter notebooks working in a container.
Homework¶
May 30th, 2024¶
Setting Up Bash Script for Running Docker¶
- Create a bash script
touch run_docker.sh
- Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit run_docker.sh
#!/bin/bash # Build docker image ($1 is the first bash script argument) docker build -t $1 . # Run the docker container (renaming it with the '_container' string) docker run -p 8888:8888 --name "${1}_container" $1
- Adjust permissions with chmod to allow script execution
chmod 755 run_docker.sh
- Execute run_docker.sh with container name (e.g., helloworld)
./run_docker.sh helloworld
The docker image should be properly built and the container should be running. Follow instructions from yesterday to continue.
Setting Up Bash Script for committing Docker container¶
This should be done while a docker container is running and performed in another terminal.
- Create a bash script
touch commit_docker.sh
- Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit commit_docker.sh
#!/bin/bash # Create a formatted string with date to organize committed containers # - `date` is a command that prints or sets the system date and time. # - `+%H` extracts the hour in 24-hour format. # - `+%M` extracts the minute. # - `+%S` extracts the second. # - `+%Y` extracts the year. # - `+%m` extracts the month. # - `%d` extracts the day. f_date=$(date +%Y-%m-%d_h%Hm%Ms%S) # While a docker container is running (with root name given as first argument), # commit to a dated container so that changes in the container are saved docker commit "${1}_container" "${1}_${f_date}"
- Adjust permissions with chmod to allow script execution
chmod 755 commit_docker.sh
- Execute run_docker.sh with container name (e.g., helloworld)
./commit_docker.sh helloworld
Useful Docker Commands¶
- To see all docker images
docker images
- To see all docker containers running
docker ps
- To end specific docker container
docker stop <container_id>
- To end all docker container
docker stop $(docker ps -q)
Near-term Goals¶
- Try and get the python tutorial Jupyter notebooks working in a container.
- Take a look at the GitHub ML Beam Test Repository
- In particular, look at the script solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb
- Try to get this notebook working in a container by creating a requirements.txt file (use this site as a general guide)
- We will cover this in detail tomorrow
Homework¶
May 31st, 2024¶
Clone a git repository into the Docker container by adding a line to the Dockerfile¶
- Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information (the python tutorial GitHub repo is used as an example)
# Use the official Python 3.9 image as the base image FROM python:3.9 # Install Jupyter Notebook and other dependencies RUN pip install --no-cache-dir jupyter # Create a working directory WORKDIR /workspace # Clone GitHub repo RUN git clone https://github.com/trottar/UVA_summer_students.git # Expose the Jupyter Notebook port EXPOSE 8888 # Set the default command to start Jupyter Notebook CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
Quick requirements.txt introduction¶
- requirements.txt is a way to easily install many python packages by recursively reading through a text file
- This text file can either be generated...
- Manually, by writing the required python packages into the text file.
- Note: it is good practice to specify the version of the package. Such as...
matplotlib==3.3.4
- Note: it is good practice to specify the version of the package. Such as...
- Automatically, by running the following command
pip freeze > requirements.txt
- Note: This command will include all packages installed.
- Manually, by writing the required python packages into the text file.
- A Docker container can install all these packages upon building its image by adding a few lines to the Dockerfile
- Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information
# Use the official Python 3.9 image as the base image FROM python:3.9 # Use the most up-to-date version of pip3.9 RUN pip install --upgrade pip # Install git and ssh RUN apt-get update && apt-get install -y git openssh-client # Create a working directory WORKDIR /workspace # Clone GitHub RUN git clone https://github.com/trottar/solid_beamtest_hallc_2022.git # copy the dependencies file to the working directory COPY requirements.txt . # install dependencies RUN pip install -r requirements.txt # Expose the Jupyter Notebook port EXPOSE 8888 # Set the default command to start Jupyter Notebook CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
- Note: I also made some other adjustments such as...
- Moved the pip installation of jupyter directly into the requirements.txt
- Added an installation for git and ssh to avoid completely clean image creations.
- Added a pip install --upgrade pip to assure pip is the latest version for python3.9
Useful Docker Commands¶
- Remove all stopped containers (use with caution)
docker system prune -a
Near-term Goals¶
- Take a look at the GitHub ML Beam Test Repository
- In particular, look at the script solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb
- Try to get this notebook working in a container by creating a requirements.txt file (use this site as a general guide)
Homework¶
June 3rd, 2024¶
First Time General Docker Procedure¶
- Clear out previous images and containers
docker system prune -a
- Build the Docker image and run the container with bash script
./run_docker.sh beamtest
- While the container is running, create snapshots of changes by committing
./commit_docker.sh beamtest
- IMPORTANT Properly stop a container by checking its ID with docker ps then
docker stop <contatiner_ID>
Running a snapshot¶
- Snapshots are built images so no rebuilding required
- To run a container of a snapshot...
- Check out all built images/snapshots
docker images
- Run the specific snapshot
docker run -p 8888:8888 <snapshot_name>
- While the container of snapshot is running, create further snapshots of changes by committing
./commit_docker.sh beamtest
- IMPORTANT Properly stop a container by checking its ID with docker ps then
docker stop <contatiner_ID>
Near-term Goals¶
- Take a look at the GitHub ML Beam Test Repository
- In particular, look at the script solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb
- Try to get this notebook working in a container by creating a requirements.txt file (use this site as a general guide)
- Compare the plots in this notebook with those in Darren's report to try and get an understanding.
Homework¶
June 4th, 2024¶
In a Docker container, grab CSV files from a download link to use in Jupyter scripts¶
- Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information
# Use the official Python 3.9 image as the base image FROM python:3.9 # Use the most up-to-date version of pip3.9 RUN pip install --upgrade pip # Install git and ssh RUN apt-get update && apt-get install -y git openssh-client # Create a working directory WORKDIR /workspace # Clone GitHub RUN git clone https://github.com/trottar/solid_beamtest_hallc_2022.git # copy the dependencies file to the working directory COPY requirements.txt . # install dependencies RUN pip install -r requirements.txt # Create new directory (Sim_CSV) to store data files for simulations RUN mkdir -p /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV # Grab CSV files from OneDrive RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Bkg_CherChannels.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/Ee9RmPVcNwdIjUTFcES0XXIB7H0mNnieXicFxWfVobY0vg?e=8TJFbN&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_CherChannels.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbeTrS8gka9GpzQx3ZMjev4BvrqUJMogjMQ6WfiaVdn7JA?e=N50RL9&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_CherChannels_500k.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/ESutJhLkFyFLqOTwHNPYy9YB50Tvwmx2La2MZrgTLtTUww?e=PPbXln&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbWm9NHCFh1MhpPM72nkXQIBF4D23Y62KGDNK38-hr_kgA?e=iGtYba&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_AllEvents.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EaBQxnnHPcZFgVI3gizyWtcBzjLd9NNvToYnUmFREopwSA?e=kLX1Sd&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_AllEvents_TID1.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EYqKKiXhdIxNo1EuUhzuDYcB7PhYbD0Cuv0fiG1CfiJwdQ?e=a6XeeS&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_Bkg.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbDFEIyjgPxDkMEd4gJYHFEB6fmfdPkbuNYGGC6QBM4hBQ?e=n1gIEW&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_CherChannels_AllEvents.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EQBKDvJ9_TxLoNTXfZfMxcIBFmFaDHdhfTILSv-C5hExhA?e=BZ2dfM&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_CherChannels_AllEvents_TID1.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EZRZ9H7IDdNMukCrELRv7SwB_QIORnLSUCDUJgZ2s_tj6g?e=GjLuwX&download=1" # Expose the Jupyter Notebook port EXPOSE 8888 # Set the default command to start Jupyter Notebook CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
- At this point, everything should be ready to fully run the first script, solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb
- Note: These are huge CSV files (2-9 GB) so Jupyter may crash when reading these into Pandas. I am working on ways to limit the burden on your PCs, but, in the meantime, you can update the following cell
%%time # Set the maximum number of rows to read in from the CSV lim_rows = 50000 #Full raw_sim_df = pd.read_csv("Sim_CSV/Sim_Pencil_AllEvents_TID1.csv", nrows=lim_rows) raw_bkg_df = pd.read_csv("Sim_CSV/Sim_Pencil_Bkg.csv", nrows=lim_rows) #Cher Arrays raw_sim_cher = pd.read_csv("Sim_CSV/Sim_Pencil_CherChannels_AllEvents_TID1.csv", names=np.core.defchararray.add(np.array(16*["Cher"]), np.arange(0,16,1).astype(str)), nrows=lim_rows) raw_sim_cher["pid"] = raw_sim_df["pid"] raw_bkg_cher = pd.read_csv("Sim_CSV/Bkg_CherChannels.csv", names=np.core.defchararray.add(np.array(16*["Cher"]), np.arange(0,16,1).astype(str)), nrows=lim_rows) raw_bkg_cher["pid"] = raw_bkg_df["pid"] #raw_sim_df["Npesum"] = raw_sim_df.iloc[:,] sim_df = raw_sim_df #[raw_sim_df["GEM00_Edep"]>35e-6]#[(raw_sim_df["PreShSum"]>0)].reset_index(drop=1) bkg_df = raw_bkg_df #[GEM00_Edep >35e-6]#[(raw_bkg_df["PreShSum"]>0)&(raw_bkg_df["ShowerSum"]>0)].reset_index(drop=1) sim_cher = raw_sim_cher bkg_cher = raw_bkg_cher sim_df
- WARNING: Limiting the number of entries in this way will break some things later on. In particular, see how this changes the plots compared to Darren's. If you're feeling daring, try to debug these errors. HINT: The number of entries in the column PID will change, which effects some loops.
Near-term Goals¶
- Take a look at the GitHub ML Beam Test Repository
- In particular, look at the script solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb
- Try to get this notebook working in a container by creating a requirements.txt file (use this site as a general guide)
- Compare the plots in this notebook with those in Darren's report to try and get an understanding.
Homework¶
June 11th, 2024¶
Introduction: PID Using Machine-Learning Methods for SoLID Beam Test Analysis¶
June 25th, 2024¶
SoLID Workshop slides¶
Today's presentation¶
- Mohhamed's slides
Near-term Goals¶
- Read through the physics material and collect some questions for next weeks meeting.
Homework¶
July 8th, 2024¶
Slides¶
Uproot information¶
- Starting guide
- Tutorial presentation
- Tutorial lesson
- Uproot to Pandas/Numpy
- TTree handling
- Uproot and awkward array tutorial
Near-term Goals¶
- Mohhamed:
- Read root data with uproot and begin training data
- Make list of all necessary root leaves
- Taylor:
- Once Richard gives you the Beam Test CSV data, begin applying classical PID cuts. Eventually, the goal is to reproduce Spencer's results.
- Kadosa:
- Keep practicing in python for class. Once comfortable, move onto classical PID cuts.
Homework¶
Updated by Richard Trotta about 2 months ago · 70 revisions