EMPIARreader Examples ============================= Using the Python interface -------------------------- For this example, we open the `EMPIAR entry 10943 `_ and load an image dataset from its available directories. .. code:: python from empiarreader import EmpiarSource, EmpiarCatalog test_entry = 10943 Every EMPIAR entry has an associated xml file which contains the default order of the directory. This information can be accessed by loading the entry into an EmpiarCatalog. .. code:: python test_catalog = EmpiarCatalog(test_entry) To get the dataset from the catalog, one would need to specify which directory to load. In this case, there is only one so we choose the key in the position 0. .. code:: python test_catalog_dir = list(test_catalog.keys())[0] dataset_from_catalog = test_catalog[test_catalog_dir] However, the intended target is not always the directory present in the xml. We can further specify the directory to which directory we would like to get the images from. EMPIARreader can load the dataset from an EmpiarSource, using the EMPIAR entry number and the directory of the images. In this case, we also specify that we want the MRC files from the specified directory. .. code:: python ds = EmpiarSource( test_entry, directory="data/MotionCorr/job003/Tiff/EER/Images-Disc1/GridSquare_11149061/Data", filename=".*EER\\.mrc", regexp=True, ) The dataset is loaded lazily (using Dask), so the images are loaded one at a time when ``read_partition`` is called. To choose an image, one can just pick the partition - in this case, it was the partition 10. .. code:: python part = ds.read_partition(10) This example can be visualised in the `Jupyter Notebook `_ provided in the repository. Using the command line interface -------------------------------- You can use the EMPIARreader CLI to search the EMPIAR archive one directory at a time to find what you are looking for before then downloading those files to disk. First, you will need to choose an EMPIAR entry - in this example `EMPIAR entry 10934 `_ is used. Here we use a glob wildcard (``--select "*"``) to list every subdirectory and file in a readable format: .. code:: bash empiarreader search --entry 10934 --select "*" --verbose which returns: .. code:: Matching path #0: https://ftp.ebi.ac.uk/empiar/world_availability/10934//10934.xml Matching path #1: https://ftp.ebi.ac.uk/empiar/world_availability/10934//data/ Subdirectories are: https://ftp.ebi.ac.uk/empiar/world_availability/10934 Subdirectories are: https://ftp.ebi.ac.uk/empiar/world_availability/10934//data We've found the xml containing the metadata for the entry and a subdirectory called `data`. To look inside you can add the `--dir` argument and repeat recursively until you find the directory you are interested in: .. code:: bash empiarreader search --entry 10934 --select "*" --dir "data" --verbose Once you have found one or more files which you want to download from a directory in the EMPIAR archive you can create a list of URLs using the `--save_search` argument: .. code:: bash empiarreader search --entry 10934 --dir \ "data/CL44-1_20201106_111915/Images-Disc1/GridSquare_6089277/Data" \ --select "*gain.tiff.bz2" --save_search saved_search.txt Using the workflow described above, a user can quickly search and identify datasets that fulfill their criteria. These can then be downloaded using the download utility of the CLI. A user simply needs to specify the file list and a directory to download the files into: .. code:: bash empiarreader download --download saved_search.txt --save_dir new_dir --verbose