.. _basic-usage: =========== Basic usage =========== This section details a comprehensive example to illustrate usage of OpenTabulate. For example's sake, let's say you are interested in tabulating hospital location data, where the data of interest is latitude-longitude coordinates, hospital name and regional health authority. The data collected generally does not share the same schema (data attributes) but is structured enough to extract this information. How do we go about using OpenTabulate to achieve our goal of organizing and tabulating this data? First, an output schema is formalized. Say that we want to output CSVs to have the column names :: name, health_authority, longitude, latitude We must specify this in the OpenTabulate configuration, which will be made use of in a *source file*. Source files will be discussed in a bit, for now simply take each of them to be metadata and configuration for a specific dataset. The OpenTabulate configuration file is ``~/.config/opentabulate.conf``, which should have be in place after following the installation instructions. Its structure is simple, consisting of sections and key-value pairs. :: [section] key = value key = value ... [section] ... As provided, there should be exactly two sections, ``[general]`` and ``[labels]``. The *labels* section is where the output schema is specified. The key is the *group name*, which only serves for readability. The value is a Python tuple of strings. Based on the desired output schema, we edit the configuration to be :: [labels] hospital = ('name', 'health_authority') location = ('longitude', 'latitude') The next, and probably most involved step, is to write a *source file* for each dataset to process. Let's say we have two datasets ``hospitals-bc.csv`` :: HOSPITAL_NAME,RG_NAME,X,Y Burnaby Hospital,Fraser Health Authority,-123.016837,49.248776 Fort St. John Hospital,Northern Health Authority,-120.817122,56.256892 "UBC Hospital, Purdy Pavilion",Vancouver Coastal Health,-123.245945,49.263388 and ``hospitals-ab.xml`` :: Canmore General Hospital Calgary -115.35172 51.092455 Alberta Children's Hospital Calgary First we want to place this data where OpenTabulate will read them. They will be located in the directory:: $root_directory/data/input where ``$root_directory`` refers to the path in the ``root_directory`` variable in the configuration file. In the example provided in the installation section, ``$root_directory`` is ``/home/bob/opentabulate``. To direct OpenTabulate on how to tabulate this data into our schema, we need to write a source file for each dataset. A source file is written in JSON format, and is usually very brief. Minimal examples for our example datasets are shown below. For ``hospitals-bc.csv``, our source file is named ``src-hospitals-bc.json`` and contains :: { "localfile": "hospitals-bc.csv", "schema_groups": ["hospital", "location"], "format": { "type": "csv", "delimiter": ",", "quote": "\"" }, "schema": { "hospital" : { "name" : "HOSPITAL_NAME", "health_authority": "RG_NAME" }, "location" : { "longitude": "X", "latitude": "Y" } } } and for ``hospitals-ab.xml`` we have ``src-hospitals-ab.json`` which contains :: { "localfile": "hospitals-ab.xml", "schema_groups": ["hospital", "location"], "format": { "type": "xml", "header": "Hospital" }, "schema": { "name": "Name", "location" : { "longitude": "X", "latitude": "Y" } } } Each JSON key has a specific meaning in the source file: - ``localfile`` specifies the filename of the dataset in the input directory - ``schema_groups`` refer to which group names (those configured in the ``[labels]`` section of ``opentabulate.conf``) are allowed to be used - ``format`` detail the input file format parameters - ``schema`` describes the schema mapping that the tabulation will use In particular, the contents ``schema``of the form ``"output_column" : "input_attribute"`` tell OpenTabulate how to precisely map the input data to the output. The use of group names in ``schema`` are for organizational purposes. Where should source files be stored? They can be stored anywhere, but by our convention we organize them here:: $root_directory/sources Now we are ready to process! Simply run :: $ opentab src-hospitals-ab.json src-hospitals-bc.json Replace the source file names with paths to them as needed. The resulting output files should be :: $root_directory/data/output/hospitals-bc.csv $root_directory/data/output/hospitals-ab.csv with output (up to permutation of the columns) :: name,health_authority,longitude,latitude Burnaby Hospital,Fraser Health Authority,-123.016837,49.248776 Fort St. John Hospital,Northern Health Authority,-120.817122,56.256892 "UBC Hospital, Purdy Pavilion",Vancouver Coastal Health,-123.245945,49.263388 and :: name,health_authority,longitude,latitude Canmore General Hospital,,-115.35172,51.092455 Alberta Children's Hospital,,, respectively.