ML beam test PID Meetings » History » Revision 58
Revision 57 (Richard Trotta, 06/11/2024 09:56 AM) → Revision 58/73 (Richard Trotta, 06/25/2024 10:28 AM)
{{>toc}} h1. ML beam test PID Meetings --- h2. "Weekly Meeting Zoom Info":https://jlab-org.zoomgov.com/j/1608232612?pwd=MjJlYXRUZzE3R2U2MC83WXRKMGt5QT09 * Meeting ID: 160 823 2612 * Passcode: 575365 --- h2. Summer 2024 --- h3. May 28th, 2024 * Darren ran with Python3.9 * Docker (containerization) Definition ** Containerization is a technology that allows developers to package and run applications along with all their dependencies in isolated environments called containers. This ensures that the application runs consistently across different computing environments, from a developer's laptop to testing, staging, and production. ** Docker is a popular platform that simplifies containerization. It provides tools to create, deploy, and manage containers. With Docker, developers can write code locally, share their work with colleagues, and deploy to production in a seamless and efficient manner. Docker containers are lightweight, fast, and portable, making them ideal for modern software development and deployment. * "A Crash Course for Summer Research":https://drive.google.com/file/d/1-TPLo5VGSwCUDjqj_j4g--p4FbBgYHbw/view?usp=sharing * "GitHub and Python Introduction":https://github.com/trottar/UVA_summer_students ** Navigate to "python_tutorials":https://github.com/trottar/UVA_summer_students/tree/master/python_tutorials and read through the two html files ("Introduction":https://github.com/trottar/UVA_summer_students/blob/master/python_tutorials/Introduction.ipynb and "python_tutorial_2":https://github.com/trottar/UVA_summer_students/blob/master/python_tutorials/python_tutorial_2.ipynb) *** Note, these are html files so you'll need h4. Near-term Goals # Setup Linux Subsystem for Windows # Fork the ML Beam Testing GitHub repository # Containerize ML Beam Testing GitHub repository with Docker h4. Homework * "Read chapters 0-3 and finish exercises":https://drive.google.com/file/d/1-TPLo5VGSwCUDjqj_j4g--p4FbBgYHbw/view?usp=sharing --- h3. May 29th, 2024 h4. Setting Up Jupyter Notebooks in Docker # Create a directory to store, build, and run Docker container <pre><code class="bash"> cd /path/to/directory mkdir beamtest_dir cd beamtest_dir </code></pre> # Once in the new directory, create a Dockerfile <pre><code class="bash"> touch Dockerfile </code></pre> # Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information <pre><code class="bash"> # Use the official Python 3.9 image as the base image FROM python:3.9 # Install Jupyter Notebook and other dependencies RUN pip install --no-cache-dir jupyter # Create a working directory WORKDIR /workspace # Expose the Jupyter Notebook port EXPOSE 8888 # Set the default command to start Jupyter Notebook CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"] </code></pre> # Build the Docker image (e.g., helloworld) <pre><code class="bash"> docker build -t helloworld . </code></pre> # Run the Docker container </code></pre> <pre><code class="bash"> docker run -p 8888:8888 helloworld </code></pre> # Jupyter Notebook is now running. Navigate to a browser and type in either * The custom token authorization screen</code></pre> <pre><code class="bash"> http://localhost:8888 </code></pre> * To bypass the token screen, copy/paste the URL that splashes in the terminal into your browser h4. Near-term Goals # Try and get the "python tutorial":https://github.com/trottar/UVA_summer_students Jupyter notebooks working in a container. h4. Homework * "Read chapters 0-3 and finish exercises":https://drive.google.com/file/d/1-TPLo5VGSwCUDjqj_j4g--p4FbBgYHbw/view?usp=sharing --- h3. May 30th, 2024 h4. Setting Up Bash Script for Running Docker # Create a bash script <pre><code class="bash"> touch run_docker.sh </code></pre> # Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit run_docker.sh <pre><code class="bash"> #!/bin/bash # Build docker image ($1 is the first bash script argument) docker build -t $1 . # Run the docker container (renaming it with the '_container' string) docker run -p 8888:8888 --name "${1}_container" $1 </code></pre> # Adjust permissions with "chmod":https://www.theserverside.com/blog/Coffee-Talk-Java-News-Stories-and-Opinions/how-permissions-chmod-with-numbers-command-explained-777-rwx-unix to allow script execution <pre><code class="bash"> chmod 755 run_docker.sh </code></pre> # Execute run_docker.sh with container name (e.g., helloworld) <pre><code class="bash"> ./run_docker.sh helloworld </code></pre> The docker image should be properly built and the container should be running. Follow instructions from "yesterday":https://redmine.jlab.org/projects/uva-phys-zheng/wiki/ML_beam_test_PID_Meetings#May-29th-2024 to continue. h4. Setting Up Bash Script for committing Docker container This should be done while a docker container is running and performed in another terminal. # Create a bash script <pre><code class="bash"> touch commit_docker.sh </code></pre> # Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit commit_docker.sh <pre><code class="bash"> #!/bin/bash # Create a formatted string with date to organize committed containers # - `date` is a command that prints or sets the system date and time. # - `+%H` extracts the hour in 24-hour format. # - `+%M` extracts the minute. # - `+%S` extracts the second. # - `+%Y` extracts the year. # - `+%m` extracts the month. # - `%d` extracts the day. f_date=$(date +%Y-%m-%d_h%Hm%Ms%S) # While a docker container is running (with root name given as first argument), # commit to a dated container so that changes in the container are saved docker commit "${1}_container" "${1}_${f_date}" </code></pre> # Adjust permissions with "chmod":https://www.theserverside.com/blog/Coffee-Talk-Java-News-Stories-and-Opinions/how-permissions-chmod-with-numbers-command-explained-777-rwx-unix to allow script execution <pre><code class="bash"> chmod 755 commit_docker.sh </code></pre> # Execute run_docker.sh with container name (e.g., helloworld) <pre><code class="bash"> ./commit_docker.sh helloworld </code></pre> h4. Useful Docker Commands * To see all docker images <pre><code class="bash"> docker images </code></pre> * To see all docker containers running <pre><code class="bash"> docker ps </code></pre> * To end specific docker container <pre><code class="bash"> docker stop <container_id> </code></pre> * To end all docker container <pre><code class="bash"> docker stop $(docker ps -q) </code></pre> h4. Near-term Goals # Try and get the "python tutorial":https://github.com/trottar/UVA_summer_students Jupyter notebooks working in a container. # Take a look at the "GitHub ML Beam Test Repository":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/tree/main/aiml * In particular, look at the script "solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/blob/main/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb ** Try to get this notebook working in a container by creating a requirements.txt file (use this "site as a general guide":https://www.docker.com/blog/containerized-python-development-part-1/) ** We will cover this in detail tomorrow h4. Homework * "Read chapters 0-3 and finish exercises":https://drive.google.com/file/d/1-TPLo5VGSwCUDjqj_j4g--p4FbBgYHbw/view?usp=sharing --- h3. May 31st, 2024 h4. Clone a git repository into the Docker container by adding a line to the Dockerfile * Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information (the python tutorial GitHub repo is used as an example) <pre><code class="bash"> # Use the official Python 3.9 image as the base image FROM python:3.9 # Install Jupyter Notebook and other dependencies RUN pip install --no-cache-dir jupyter # Create a working directory WORKDIR /workspace # Clone GitHub repo RUN git clone https://github.com/trottar/UVA_summer_students.git # Expose the Jupyter Notebook port EXPOSE 8888 # Set the default command to start Jupyter Notebook CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"] </code></pre> h4. Quick requirements.txt introduction * requirements.txt is a way to easily install many python packages by recursively reading through a text file * This text file can either be generated... ** Manually, by writing the required python packages into the text file. *** Note: it is good practice to specify the version of the package. Such as... <pre><code class="bash"> matplotlib==3.3.4 </code></pre> ** Automatically, by running the following command <pre><code class="bash"> pip freeze > requirements.txt </code></pre> *** *Note:* This command will include all packages installed. * A Docker container can install all these packages upon building its image by adding a few lines to the Dockerfile * Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information <pre><code class="bash"> # Use the official Python 3.9 image as the base image FROM python:3.9 # Use the most up-to-date version of pip3.9 RUN pip install --upgrade pip # Install git and ssh RUN apt-get update && apt-get install -y git openssh-client # Create a working directory WORKDIR /workspace # Clone GitHub RUN git clone https://github.com/trottar/solid_beamtest_hallc_2022.git # copy the dependencies file to the working directory COPY requirements.txt . # install dependencies RUN pip install -r requirements.txt # Expose the Jupyter Notebook port EXPOSE 8888 # Set the default command to start Jupyter Notebook CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"] </code></pre> * *Note:* I also made some other adjustments such as... ** Moved the pip installation of jupyter directly into the requirements.txt ** Added an installation for git and ssh to avoid completely clean image creations. ** Added a pip install --upgrade pip to assure pip is the latest version for python3.9 h4. Useful Docker Commands * Remove all stopped containers (use with caution) <pre><code class="bash"> docker system prune -a </code></pre> h4. Near-term Goals # Take a look at the "GitHub ML Beam Test Repository":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/tree/main/aiml * In particular, look at the script "solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/blob/main/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb ** Try to get this notebook working in a container by creating a requirements.txt file (use this "site as a general guide":https://www.docker.com/blog/containerized-python-development-part-1/) h4. Homework * "Read chapters 0-3 and finish exercises":https://drive.google.com/file/d/1-TPLo5VGSwCUDjqj_j4g--p4FbBgYHbw/view?usp=sharing --- h3. June 3rd, 2024 h4. First Time General Docker Procedure # Clear out previous images and containers <pre><code class="bash"> docker system prune -a </code></pre> # Build the Docker image and run the container with bash script <pre><code class="bash"> ./run_docker.sh beamtest </code></pre> # While the container is running, create snapshots of changes by committing <pre><code class="bash"> ./commit_docker.sh beamtest </code></pre> # *IMPORTANT* Properly stop a container by checking its ID with _docker ps_ then <pre><code class="bash"> docker stop <contatiner_ID> </code></pre> h4. Running a snapshot * Snapshots are built images so no rebuilding required * To run a container of a snapshot... # Check out all built images/snapshots <pre><code class="bash"> docker images </code></pre> # Run the specific snapshot <pre><code class="bash"> docker run -p 8888:8888 <snapshot_name> </code></pre> # While the container of snapshot is running, create further snapshots of changes by committing <pre><code class="bash"> ./commit_docker.sh beamtest </code></pre> # *IMPORTANT* Properly stop a container by checking its ID with _docker ps_ then <pre><code class="bash"> docker stop <contatiner_ID> </code></pre> h4. Near-term Goals # Take a look at the "GitHub ML Beam Test Repository":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/tree/main/aiml * In particular, look at the script "solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/blob/main/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb ** Try to get this notebook working in a container by creating a requirements.txt file (use this "site as a general guide":https://www.docker.com/blog/containerized-python-development-part-1/) * Compare the plots in this notebook with those in "Darren's report":https://wordpress.its.virginia.edu/zhenggroup/files/2023/10/SoLID_beamtest_ML_PID_Upton.pdf to try and get an understanding. h4. Homework * "Read chapters 0-4 and finish exercises":https://drive.google.com/file/d/1-TPLo5VGSwCUDjqj_j4g--p4FbBgYHbw/view?usp=sharing --- h3. June 4th, 2024 h4. In a Docker container, grab CSV files from a download link to use in Jupyter scripts * Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information <pre><code class="bash"> # Use the official Python 3.9 image as the base image FROM python:3.9 # Use the most up-to-date version of pip3.9 RUN pip install --upgrade pip # Install git and ssh RUN apt-get update && apt-get install -y git openssh-client # Create a working directory WORKDIR /workspace # Clone GitHub RUN git clone https://github.com/trottar/solid_beamtest_hallc_2022.git # copy the dependencies file to the working directory COPY requirements.txt . # install dependencies RUN pip install -r requirements.txt # Create new directory (Sim_CSV) to store data files for simulations RUN mkdir -p /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV # Grab CSV files from OneDrive RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Bkg_CherChannels.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/Ee9RmPVcNwdIjUTFcES0XXIB7H0mNnieXicFxWfVobY0vg?e=8TJFbN&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_CherChannels.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbeTrS8gka9GpzQx3ZMjev4BvrqUJMogjMQ6WfiaVdn7JA?e=N50RL9&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_CherChannels_500k.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/ESutJhLkFyFLqOTwHNPYy9YB50Tvwmx2La2MZrgTLtTUww?e=PPbXln&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbWm9NHCFh1MhpPM72nkXQIBF4D23Y62KGDNK38-hr_kgA?e=iGtYba&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_AllEvents.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EaBQxnnHPcZFgVI3gizyWtcBzjLd9NNvToYnUmFREopwSA?e=kLX1Sd&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_AllEvents_TID1.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EYqKKiXhdIxNo1EuUhzuDYcB7PhYbD0Cuv0fiG1CfiJwdQ?e=a6XeeS&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_Bkg.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbDFEIyjgPxDkMEd4gJYHFEB6fmfdPkbuNYGGC6QBM4hBQ?e=n1gIEW&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_CherChannels_AllEvents.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EQBKDvJ9_TxLoNTXfZfMxcIBFmFaDHdhfTILSv-C5hExhA?e=BZ2dfM&download=1" RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_CherChannels_AllEvents_TID1.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EZRZ9H7IDdNMukCrELRv7SwB_QIORnLSUCDUJgZ2s_tj6g?e=GjLuwX&download=1" # Expose the Jupyter Notebook port EXPOSE 8888 # Set the default command to start Jupyter Notebook CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"] </code></pre> * At this point, everything should be ready to fully run the first script, "solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/blob/main/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb * *Note:* These are huge CSV files (2-9 GB) so Jupyter may crash when reading these into Pandas. I am working on ways to limit the burden on your PCs, but, in the meantime, you can update the following cell <pre><code class="python"> %%time # Set the maximum number of rows to read in from the CSV lim_rows = 50000 #Full raw_sim_df = pd.read_csv("Sim_CSV/Sim_Pencil_AllEvents_TID1.csv", nrows=lim_rows) raw_bkg_df = pd.read_csv("Sim_CSV/Sim_Pencil_Bkg.csv", nrows=lim_rows) #Cher Arrays raw_sim_cher = pd.read_csv("Sim_CSV/Sim_Pencil_CherChannels_AllEvents_TID1.csv", names=np.core.defchararray.add(np.array(16*["Cher"]), np.arange(0,16,1).astype(str)), nrows=lim_rows) raw_sim_cher["pid"] = raw_sim_df["pid"] raw_bkg_cher = pd.read_csv("Sim_CSV/Bkg_CherChannels.csv", names=np.core.defchararray.add(np.array(16*["Cher"]), np.arange(0,16,1).astype(str)), nrows=lim_rows) raw_bkg_cher["pid"] = raw_bkg_df["pid"] #raw_sim_df["Npesum"] = raw_sim_df.iloc[:,] sim_df = raw_sim_df #[raw_sim_df["GEM00_Edep"]>35e-6]#[(raw_sim_df["PreShSum"]>0)].reset_index(drop=1) bkg_df = raw_bkg_df #[GEM00_Edep >35e-6]#[(raw_bkg_df["PreShSum"]>0)&(raw_bkg_df["ShowerSum"]>0)].reset_index(drop=1) sim_cher = raw_sim_cher bkg_cher = raw_bkg_cher sim_df </code></pre> * *WARNING:* Limiting the number of entries in this way will break some things later on. In particular, see how this changes the plots compared to Darren's. If you're feeling daring, try to debug these errors. +HINT+: The number of entries in the column PID will change, which effects some loops. h4. Near-term Goals # Take a look at the "GitHub ML Beam Test Repository":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/tree/main/aiml * In particular, look at the script "solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb":https://github.com/JeffersonLab/solid_beamtest_hallc_2022/blob/main/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb ** Try to get this notebook working in a container by creating a requirements.txt file (use this "site as a general guide":https://www.docker.com/blog/containerized-python-development-part-1/) * Compare the plots in this notebook with those in "Darren's report":https://wordpress.its.virginia.edu/zhenggroup/files/2023/10/SoLID_beamtest_ML_PID_Upton.pdf to try and get an understanding. h4. Homework * "Read chapters 0-4 and finish exercises":https://drive.google.com/file/d/1-TPLo5VGSwCUDjqj_j4g--p4FbBgYHbw/view?usp=sharing --- h3. June 11th, 2024 h4. Introduction: PID Using Machine-Learning Methods for SoLID Beam Test Analysis * attachment:IntroPIDUsingMachineLearningMethodsforSoLIDBeamTestAnalysis_2024.pdf --- h3. June 25th, 2024 * "Richard's ECal/SPD SoLID Workshop slides":https://indico.phy.anl.gov/event/51/contributions/279/attachments/205/562/ECal_SPD_2024.pdf * "Ye Tian's Simulations SoLID Workshop slides":https://indico.phy.anl.gov/event/51/contributions/264/attachments/197/545/06212024_collaboration_meeting_Ye.pdf * "Mohammed Rafi's"