- First, clone this repository
- Create a new branch off of the
masterbranch. Give it an informative name. - Add the new software to the conda environment used by that module. Make sure to follow best practices (see the section below)!
Note: Never use one conda
environment.ymlfile for more than one module. Each module should have its own.ymlfile. Mixing modules into the same environment will make it difficult for future TAs to maintain the environment, since they won't be able to tell which packages to add or remove as the notebooks change. - Check that the conda environment can still be solved
conda env create --dry-run --file spatial-tx.yml - Commit and push your changes
- Once you're ready, create a pull request to merge it back into the
masterbranch - Wait at most 40 minutes for the image to be built and for the checks to pass
- You should see a green check-mark if all of the checks pass. If not, click on the red X and then "Details" to view the error message. Add additional commit(s) to fix the issue.
- Test your changes (see section below) and add any commits as needed
- Once all checks and tests pass, merge your pull request!
Note: This section is now outdated. There used to be a way to test actions before they became live. But now any successful changes to the environments (even on an unmerged pull request) will immediately become live on DataHub! This can be dangerous. Use with caution.
After creating a pull request for changes to our Dockerfile or a conda environment within our notebook repository, GitHub actions will automatically build an updated Docker image. The image will be tagged by the number assigned to your pull request.
-
(If off-campus) connect to the UCSD VPN. Then log into DataHub via
sshfrom your terminal.ssh username@dsmlp-login.ucsd.edu -
Run your container on DataHub
launch-scipy-ml.sh -W BNFO262_WIXX_A00 -P Always -i ghcr.io/biom262/cmm262-notebook:pr-#You should replace
#with the number of the pull request andXXwith the last two digits of the current year. For example, the number for this pull request is 11 andXXwould be 24 for 2024. -
Executing that command will generate a URL to a DataHub environment that uses your updated changes. Open the URL in your browser, and use that notebook environment to test if your changes work as expected. You should rerun your notebooks one more time here -- there's a possibility that they don't work here, even if they worked earlier!
If the URL isn’t working, make sure you connect to the UCSD VPN.
Note: If DataHub gives you "Error: ImagePullBackOff", then it probably means that your container image has yet to be pushed to the image repository. You can check the list of available images that have been pushed to the image repository here. If the tag does not appear there, then you will probably need to wait a bit and check back later.
- Write your environment file manually. Don't create an environment and then export it to a
.ymlfile usingconda env export. This will inevitably create.ymlfiles that cannot be easily updated in future years. Also, the.ymlfile will be unlikely to work with other environments besides your own (or other base Docker images besides the one from which you exported it). If you absolutely must useconda env export, it is best to do it with the--from-historyflag. - Always specify conda-forge before bioconda in the channels list if both of them are needed. (Note that conda-forge is needed whenever bioconda is needed, but the opposite is not true.)
- You should avoid using packages from anywhere else but the conda-forge and bioconda channels. Other channels (like anaconda and r) have been known to eventually purge old packages.
- You should also specify
nodefaultsas a channel in the channels list, since the defaults channel conflicts with conda-forge. When possible, you should specify exact package versions and channels to reduce the amount of time it takes for conda to find the correct versions and channels to use (aka "solve the environment"). This also makes the.ymlfile much more reproducible and less likely to break in the future. Here's an example where we specify the channel name (conda-forge), the package name (r-base), and the package version (3.6.3):To pin to exact package versions use a double equals == instead of a single equals = sign.dependencies: - conda-forge::r-base==3.6.3 - If a package can be installed via conda, do not specify it as a pip dependency in your environment file. Avoid pip dependencies if possible.
- Do not include dependencies of any packages already listed in your environment file unless you import or use those dependencies in your own code.
For example, if you use
scanpyand it importspytables, you shouldn't addpytablesto your conda environment file unless you directly import and usepytablesin your code. This rule helps to ensure.ymlfile can be easily updated in future years. - When checking whether your environment file will solve, make sure the --strict-channel-priority setting is turned on. (See here for more info.)
- If you are creating an environment that should be used from an R notebook, you must also specify the r-irkernel package as a dependency. This allows the environment to be detected by DataHub's nb_conda_kernels.
Otherwise, if you are creating an environment that should be used from a Python notebook, specify the ipykernel package. In either case, be sure to specify a version. I would recommend using the most recent one unless the other packages in your environment are too old.
dependencies: - conda-forge::r-irkernel==1.3.1dependencies: - conda-forge::ipykernel==6.20.1