Project

General

Profile

Actions

ML beam test PID Meetings » History » Revision 57

« Previous | Revision 57/73 (diff) | Next »
Richard Trotta, 06/11/2024 09:56 AM


ML beam test PID Meetings


Weekly Meeting Zoom Info

  • Meeting ID: 160 823 2612
  • Passcode: 575365

Summer 2024


May 28th, 2024

  • Darren ran with Python3.9
  • Docker (containerization) Definition
    • Containerization is a technology that allows developers to package and run applications along with all their dependencies in isolated environments called containers. This ensures that the application runs consistently across different computing environments, from a developer's laptop to testing, staging, and production.
    • Docker is a popular platform that simplifies containerization. It provides tools to create, deploy, and manage containers. With Docker, developers can write code locally, share their work with colleagues, and deploy to production in a seamless and efficient manner. Docker containers are lightweight, fast, and portable, making them ideal for modern software development and deployment.

Near-term Goals

  1. Setup Linux Subsystem for Windows
  2. Fork the ML Beam Testing GitHub repository
  3. Containerize ML Beam Testing GitHub repository with Docker

Homework


May 29th, 2024

Setting Up Jupyter Notebooks in Docker

  1. Create a directory to store, build, and run Docker container
    cd /path/to/directory
    mkdir beamtest_dir
    cd beamtest_dir
    
  2. Once in the new directory, create a Dockerfile
    touch Dockerfile
    
  3. Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information
    # Use the official Python 3.9 image as the base image
    FROM python:3.9
    
    # Install Jupyter Notebook and other dependencies
    RUN pip install --no-cache-dir jupyter
    
    # Create a working directory
    WORKDIR /workspace
    
    # Expose the Jupyter Notebook port
    EXPOSE 8888
    
    # Set the default command to start Jupyter Notebook
    CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
    
  4. Build the Docker image (e.g., helloworld)
    docker build -t helloworld .
    
  5. Run the Docker container

    docker run -p 8888:8888 helloworld
    
  6. Jupyter Notebook is now running. Navigate to a browser and type in either
  • The custom token authorization screen
    http://localhost:8888
    
  • To bypass the token screen, copy/paste the URL that splashes in the terminal into your browser

Near-term Goals

  1. Try and get the python tutorial Jupyter notebooks working in a container.

Homework


May 30th, 2024

Setting Up Bash Script for Running Docker

  1. Create a bash script
    touch run_docker.sh
    
  2. Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit run_docker.sh
    #!/bin/bash
    
    # Build docker image ($1 is the first bash script argument)
    docker build -t $1 .
    
    # Run the docker container (renaming it with the '_container' string)
    docker run -p 8888:8888 --name "${1}_container" $1
    
  3. Adjust permissions with chmod to allow script execution
    chmod 755 run_docker.sh
    
  4. Execute run_docker.sh with container name (e.g., helloworld)
    ./run_docker.sh helloworld
    

The docker image should be properly built and the container should be running. Follow instructions from yesterday to continue.

Setting Up Bash Script for committing Docker container

This should be done while a docker container is running and performed in another terminal.

  1. Create a bash script
    touch commit_docker.sh
    
  2. Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit commit_docker.sh
    #!/bin/bash
    
    # Create a formatted string with date to organize committed containers
    # - `date` is a command that prints or sets the system date and time.
    # - `+%H` extracts the hour in 24-hour format.
    # - `+%M` extracts the minute.
    # - `+%S` extracts the second.
    # - `+%Y` extracts the year.
    # - `+%m` extracts the month.
    # - `%d` extracts the day.
    f_date=$(date +%Y-%m-%d_h%Hm%Ms%S)
    
    # While a docker container is running (with root name given as first argument), 
    # commit to a dated container so that changes in the container are saved
    docker commit "${1}_container" "${1}_${f_date}" 
    
  3. Adjust permissions with chmod to allow script execution
    chmod 755 commit_docker.sh
    
  4. Execute run_docker.sh with container name (e.g., helloworld)
    ./commit_docker.sh helloworld
    

Useful Docker Commands

  • To see all docker images
    docker images
    
  • To see all docker containers running
    docker ps
    
  • To end specific docker container
    docker stop <container_id>
    
  • To end all docker container
    docker stop $(docker ps -q)
    

Near-term Goals

  1. Try and get the python tutorial Jupyter notebooks working in a container.
  2. Take a look at the GitHub ML Beam Test Repository

Homework


May 31st, 2024

Clone a git repository into the Docker container by adding a line to the Dockerfile

  • Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information (the python tutorial GitHub repo is used as an example)
    # Use the official Python 3.9 image as the base image
    FROM python:3.9
    
    # Install Jupyter Notebook and other dependencies
    RUN pip install --no-cache-dir jupyter
    
    # Create a working directory
    WORKDIR /workspace
    
    # Clone GitHub repo
    RUN git clone https://github.com/trottar/UVA_summer_students.git
    
    # Expose the Jupyter Notebook port
    EXPOSE 8888
    
    # Set the default command to start Jupyter Notebook
    CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
    

Quick requirements.txt introduction

  • requirements.txt is a way to easily install many python packages by recursively reading through a text file
  • This text file can either be generated...
    • Manually, by writing the required python packages into the text file.
      • Note: it is good practice to specify the version of the package. Such as...
        matplotlib==3.3.4
        
    • Automatically, by running the following command
      pip freeze > requirements.txt
      
      • Note: This command will include all packages installed.
  • A Docker container can install all these packages upon building its image by adding a few lines to the Dockerfile
  • Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information
    # Use the official Python 3.9 image as the base image
    FROM python:3.9
    
    # Use the most up-to-date version of pip3.9
    RUN pip install --upgrade pip
    
    # Install git and ssh
    RUN apt-get update && apt-get install -y git openssh-client
    
    # Create a working directory
    WORKDIR /workspace
    
    # Clone GitHub
    RUN git clone https://github.com/trottar/solid_beamtest_hallc_2022.git
    
    # copy the dependencies file to the working directory
    COPY requirements.txt .
    
    # install dependencies
    RUN pip install -r requirements.txt
    
    # Expose the Jupyter Notebook port
    EXPOSE 8888
    
    # Set the default command to start Jupyter Notebook
    CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
    
  • Note: I also made some other adjustments such as...
    • Moved the pip installation of jupyter directly into the requirements.txt
    • Added an installation for git and ssh to avoid completely clean image creations.
    • Added a pip install --upgrade pip to assure pip is the latest version for python3.9

Useful Docker Commands

  • Remove all stopped containers (use with caution)
    docker system prune -a
    

Near-term Goals

  1. Take a look at the GitHub ML Beam Test Repository

Homework


June 3rd, 2024

First Time General Docker Procedure

  1. Clear out previous images and containers
    docker system prune -a
    
  2. Build the Docker image and run the container with bash script
    ./run_docker.sh beamtest
    
  3. While the container is running, create snapshots of changes by committing
    ./commit_docker.sh beamtest
    
  4. IMPORTANT Properly stop a container by checking its ID with docker ps then
    docker stop <contatiner_ID>
    

Running a snapshot

  • Snapshots are built images so no rebuilding required
  • To run a container of a snapshot...
    1. Check out all built images/snapshots
      docker images
      
    2. Run the specific snapshot
      docker run -p 8888:8888 <snapshot_name>
      
    3. While the container of snapshot is running, create further snapshots of changes by committing
      ./commit_docker.sh beamtest
      
    4. IMPORTANT Properly stop a container by checking its ID with docker ps then
      docker stop <contatiner_ID>
      

Near-term Goals

  1. Take a look at the GitHub ML Beam Test Repository

Homework


June 4th, 2024

In a Docker container, grab CSV files from a download link to use in Jupyter scripts

  • Using the text editor (e.g., vim, gedit, or emacs) of your choice, edit the Dockerfile with the following information
    # Use the official Python 3.9 image as the base image
    FROM python:3.9
    
    # Use the most up-to-date version of pip3.9
    RUN pip install --upgrade pip
    
    # Install git and ssh
    RUN apt-get update && apt-get install -y git openssh-client
    
    # Create a working directory
    WORKDIR /workspace
    
    # Clone GitHub
    RUN git clone https://github.com/trottar/solid_beamtest_hallc_2022.git
    
    # copy the dependencies file to the working directory
    COPY requirements.txt .
    
    # install dependencies
    RUN pip install -r requirements.txt
    
    # Create new directory (Sim_CSV) to store data files for simulations
    RUN mkdir -p /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV
    
    # Grab CSV files from OneDrive
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Bkg_CherChannels.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/Ee9RmPVcNwdIjUTFcES0XXIB7H0mNnieXicFxWfVobY0vg?e=8TJFbN&download=1" 
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_CherChannels.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbeTrS8gka9GpzQx3ZMjev4BvrqUJMogjMQ6WfiaVdn7JA?e=N50RL9&download=1" 
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_CherChannels_500k.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/ESutJhLkFyFLqOTwHNPYy9YB50Tvwmx2La2MZrgTLtTUww?e=PPbXln&download=1" 
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbWm9NHCFh1MhpPM72nkXQIBF4D23Y62KGDNK38-hr_kgA?e=iGtYba&download=1" 
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_AllEvents.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EaBQxnnHPcZFgVI3gizyWtcBzjLd9NNvToYnUmFREopwSA?e=kLX1Sd&download=1" 
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_AllEvents_TID1.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EYqKKiXhdIxNo1EuUhzuDYcB7PhYbD0Cuv0fiG1CfiJwdQ?e=a6XeeS&download=1" 
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_Bkg.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EbDFEIyjgPxDkMEd4gJYHFEB6fmfdPkbuNYGGC6QBM4hBQ?e=n1gIEW&download=1" 
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_CherChannels_AllEvents.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EQBKDvJ9_TxLoNTXfZfMxcIBFmFaDHdhfTILSv-C5hExhA?e=BZ2dfM&download=1" 
    RUN wget -O /workspace/solid_beamtest_hallc_2022/aiml/Pencil_Beam/Sim_CSV/Sim_Pencil_CherChannels_AllEvents_TID1.csv "https://myuva-my.sharepoint.com/:x:/g/personal/nar2rk_virginia_edu/EZRZ9H7IDdNMukCrELRv7SwB_QIORnLSUCDUJgZ2s_tj6g?e=GjLuwX&download=1" 
    
    # Expose the Jupyter Notebook port
    EXPOSE 8888
    
    # Set the default command to start Jupyter Notebook
    CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
    
  • At this point, everything should be ready to fully run the first script, solid_beamtest_hallc_2022/aiml/Pencil_Beam/Pencil_Cher_ML.ipynb
  • Note: These are huge CSV files (2-9 GB) so Jupyter may crash when reading these into Pandas. I am working on ways to limit the burden on your PCs, but, in the meantime, you can update the following cell
    %%time
    
    # Set the maximum number of rows to read in from the CSV
    lim_rows = 50000
    
    #Full
    raw_sim_df = pd.read_csv("Sim_CSV/Sim_Pencil_AllEvents_TID1.csv", nrows=lim_rows)
    raw_bkg_df = pd.read_csv("Sim_CSV/Sim_Pencil_Bkg.csv", nrows=lim_rows)
    
    #Cher Arrays
    raw_sim_cher = pd.read_csv("Sim_CSV/Sim_Pencil_CherChannels_AllEvents_TID1.csv", 
                              names=np.core.defchararray.add(np.array(16*["Cher"]), np.arange(0,16,1).astype(str)), nrows=lim_rows)
    raw_sim_cher["pid"] = raw_sim_df["pid"]
    
    raw_bkg_cher = pd.read_csv("Sim_CSV/Bkg_CherChannels.csv", 
                               names=np.core.defchararray.add(np.array(16*["Cher"]), np.arange(0,16,1).astype(str)), nrows=lim_rows)
    raw_bkg_cher["pid"] = raw_bkg_df["pid"]
    
    #raw_sim_df["Npesum"] = raw_sim_df.iloc[:,]
    
    sim_df = raw_sim_df #[raw_sim_df["GEM00_Edep"]>35e-6]#[(raw_sim_df["PreShSum"]>0)].reset_index(drop=1)
    bkg_df = raw_bkg_df #[GEM00_Edep >35e-6]#[(raw_bkg_df["PreShSum"]>0)&(raw_bkg_df["ShowerSum"]>0)].reset_index(drop=1)
    
    sim_cher = raw_sim_cher
    bkg_cher = raw_bkg_cher
    
    sim_df
    
  • WARNING: Limiting the number of entries in this way will break some things later on. In particular, see how this changes the plots compared to Darren's. If you're feeling daring, try to debug these errors. HINT: The number of entries in the column PID will change, which effects some loops.

Near-term Goals

  1. Take a look at the GitHub ML Beam Test Repository

Homework


June 11th, 2024

Introduction: PID Using Machine-Learning Methods for SoLID Beam Test Analysis

Updated by Richard Trotta 7 months ago · 57 revisions