dataset¶

Dataset management operations

Usage¶

Usage: nemar dataset [options] [command]

Dataset management

Options:
  -h, --help                            display help for command

Commands:
  validate [options] [path]             Validate a BIDS dataset using the
                                        official BIDS validator (requires Deno)
  upload [options] <path>               Upload a BIDS dataset to NEMAR
  download [options] <dataset-id>       Download a dataset from NEMAR
  status [options] <dataset-id>         Check status of a dataset
  list [options]                        List publicly available datasets on
                                        NEMAR
  release [options] <dataset-id>        Create a version bump PR for a dataset
  update [options] [path]               Push local changes to a dataset via PR
  request-access <dataset-id>           Request collaborator access to a
                                        dataset
  invite <username> <dataset-id>        Invite a user as collaborator to your
                                        dataset
  collaborators [options] <dataset-id>  List collaborators for a dataset
  publish                               Publication workflow management
  clone [options] <dataset-id>          Clone a dataset from NEMAR
  get [options] [files...]              Download annexed data files for the
                                        current dataset
  save [options]                        Stage and commit changes in the current
                                        dataset
  push [options]                        Push commits and data to remotes
  drop [files...]                       Free local copies of annexed files
                                        (keeps remote copies)
  ci [dataset-id]                       Check BIDS validation CI status for the
                                        current dataset
  manifest [options] [version]          View version manifests for a dataset
  help [command]                        display help for command

Description:
  Manage BIDS datasets on NEMAR. Upload, download, validate, and version
  neurophysiology datasets in Brain Imaging Data Structure (BIDS) format.

Prerequisites:
  - git-annex (for upload/download)
  - Deno runtime (for BIDS validation)
  - NEMAR account (for upload)

Examples:
  $ nemar dataset validate ./my-dataset          # Validate locally
  $ nemar dataset upload ./my-dataset            # Upload to NEMAR
  $ nemar dataset download nm000104              # Download a dataset
  $ nemar dataset list --mine                    # List your datasets
  $ nemar dataset status nm000104                # Check dataset status
  $ nemar dataset request-access nm000104        # Request collaborator access
  $ nemar dataset invite johndoe nm000104        # Invite user as collaborator

Learn More:
  https://nemar-cli.pages.dev/commands/dataset/

Subcommands¶

dataset validate¶

Usage: nemar dataset validate [options] [path]

Validate a BIDS dataset using the official BIDS validator (requires Deno)

Arguments:
  path                 Path to BIDS dataset directory (default: ".")

Options:
  --ignore-warnings    Only report errors, not warnings
  -c, --config <file>  Validation config file (.bidsvalidatorrc)
  -r, --recursive      Validate derivatives subdirectories
  --prune              Skip sourcedata and derivatives for faster validation
  -v, --verbose        Show verbose output
  --json               Output results as JSON (for scripting)
  --version-info       Show BIDS validator version info
  --update             Force update the BIDS validator to the latest version
  -h, --help           display help for command

  Extra flags after known options are passed through to the BIDS validator.
  See all validator flags: deno run jsr:@bids/validator --help

  Examples:
    $ nemar dataset validate                            # Validate current directory
    $ nemar dataset validate ./ds --prune               # Skip derivatives
    $ nemar dataset validate ./ds --json > out.json     # JSON for scripting
    $ nemar dataset validate ./ds --ignoreNiftiHeaders  # Pass-through flag
    $ nemar dataset validate ./ds --max-rows 0           # Headers only

dataset upload¶

Usage: nemar dataset upload [options] <path>

Upload a BIDS dataset to NEMAR

Arguments:
  path                      Path to BIDS dataset directory

Options:
  -n, --name <name>         Dataset name (defaults to BIDS Name, then directory
                            name)
  -d, --description <desc>  Dataset description
  --skip-validation         Skip BIDS validation (not recommended)
  --skip-orcid              Skip co-author ORCID collection
  --dry-run                 Show what would be uploaded without doing it
  -j, --jobs <number>       Parallel upload streams (default: 4) (default: "4")
  -y, --yes                 Skip confirmation and proceed
  --restart                 Clear upload progress and re-upload all files
  --no                      Skip confirmation and decline
  -h, --help                display help for command

Description:
  Upload a BIDS dataset to NEMAR. The dataset will be validated, assigned
  a unique ID (nm000XXX), and stored on GitHub (metadata) and S3 (data files).

Requirements:
  - NEMAR account (nemar auth login)
  - git-annex installed
  - GitHub SSH access configured

Process:
  1. Validates BIDS format (unless --skip-validation)
  2. Creates GitHub repository for metadata
  3. Uploads large files to S3 in parallel
  4. Enables PR-based versioning workflow

Examples:
  $ nemar dataset upload ./my-eeg-dataset
  $ nemar dataset upload ./ds -n "My EEG Study" -d "64-channel EEG data"
  $ nemar dataset upload ./ds --dry-run        # Preview without uploading
  $ nemar dataset upload ./ds -j 16            # More parallel streams

dataset download¶

Usage: nemar dataset download [options] <dataset-id>

Download a dataset from NEMAR

Arguments:
  dataset-id           Dataset ID (e.g., nm000104)

Options:
  -o, --output <path>  Output directory (default: ./<dataset-id>)
  -j, --jobs <number>  Parallel download streams (default: 4) (default: "4")
  --no-data            Download metadata only (skip large data files)
  -h, --help           display help for command

Description:
  Download a BIDS dataset from NEMAR. Uses git-annex for efficient
  data transfer with parallel streams.

  Private datasets require authentication (nemar auth login) and can
  only be downloaded by the owner or designated collaborators.
  After publishing, datasets become publicly available.

Requirements:
  - git-annex installed
  - NEMAR account (for private datasets)

Examples:
  $ nemar dataset download nm000104              # Download to ./nm000104
  $ nemar dataset download nm000104 -o ./data    # Custom output directory
  $ nemar dataset download nm000104 --no-data    # Metadata only (fast)
  $ nemar dataset download nm000104 -j 8         # More parallel streams

dataset status¶

Usage: nemar dataset status [options] <dataset-id>

Check status of a dataset

Arguments:
  dataset-id  Dataset ID (e.g., nm000104)

Options:
  --json      Output as JSON for scripting
  -h, --help  display help for command

Description:
  Show detailed information about a NEMAR dataset including owner,
  creation date, GitHub repository, and DOI information.

Examples:
  $ nemar dataset status nm000104
  $ nemar dataset status nm000104 --json | jq '.concept_doi'

dataset list¶

Usage: nemar dataset list [options]

List publicly available datasets on NEMAR

Options:
  --mine       List only your datasets (both private and public)
  --json       Output as JSON for scripting
  --limit <n>  Limit number of results (default: 50) (default: "50")
  -h, --help   display help for command

Description:
  By default, lists only PUBLIC datasets on NEMAR that anyone can access.

  To see your own datasets (including private ones), use the --mine flag.
  This requires authentication.

Visibility Rules:
  Without --mine:
    - Shows only public datasets (visible to everyone)
    - Does not show private datasets, even your own
    - Exception: Admins see ALL datasets for oversight

  With --mine:
    - Shows all YOUR datasets (both private and public)
    - Requires authentication (nemar auth login)

Examples:
  $ nemar dataset list                   # List public datasets only
  $ nemar dataset list --mine            # List YOUR datasets (private + public)
  $ nemar dataset list --json            # JSON output for scripting
  $ nemar dataset list --limit 10        # Show only 10 datasets

dataset release¶

Usage: nemar dataset release [options] <dataset-id>

Create a version bump PR for a dataset

Arguments:
  dataset-id           Dataset ID (e.g., nm000104)

Options:
  --type <type>        Bump type: patch, minor, or major
  --version <version>  Explicit version (e.g., 2.0.0)
  --dir <path>         Use existing local clone instead of cloning
  --monitor            Watch CI checks and offer to merge
  -y, --yes            Skip confirmation and proceed
  -h, --help           display help for command

Description:
  Create a pull request that bumps the dataset version in
  dataset_description.json. The PR triggers CI checks (BIDS validation,
  version check). On merge, GitHub Actions tags the release and
  publishes a version DOI (if a concept DOI exists).

Examples:
  $ nemar dataset release nm000104 --type patch
  $ nemar dataset release nm000104 --version 2.0.0
  $ nemar dataset release nm000104                   # interactive prompt

dataset update¶

Usage: nemar dataset update [options] [path]

Push local changes to a dataset via PR

Arguments:
  path                 Path to local dataset clone (default: current directory)

Options:
  --bump <type>        Version bump type: patch, minor, or major (default:
                       "patch")
  --branch <name>      Custom branch name
  -m, --message <msg>  Commit message
  --monitor            Watch CI checks and offer to merge
  -y, --yes            Skip confirmation and proceed
  -h, --help           display help for command

Description:
  Push local changes (metadata or data files) to a dataset via a pull
  request. Automatically bumps the version, commits, pushes, and creates
  a PR. For data files (annexed), copies them to S3 via git-annex.

  Run this from inside a dataset clone, or pass the path as an argument.

Examples:
  $ cd nm000104 && nemar dataset update
  $ nemar dataset update ./nm000104 --bump minor -m "Add new subjects"
  $ nemar dataset update --branch fix/metadata -m "Fix participant ages"

dataset request-access¶

Usage: nemar dataset request-access [options] <dataset-id>

Request collaborator access to a dataset

Arguments:
  dataset-id  Dataset ID (e.g., nm000104)

Options:
  -h, --help  display help for command

Description:
  Request access to a NEMAR dataset to push data via git-annex.
  Access is automatically granted for public repositories.

  For metadata-only changes, you can fork and submit a PR without
  requesting access.

Requirements:
  - NEMAR account (nemar auth login)
  - Approved user status

Examples:
  $ nemar dataset request-access nm000104

dataset invite¶

Usage: nemar dataset invite [options] <username> <dataset-id>

Invite a user as collaborator to your dataset

Arguments:
  username    Username to invite
  dataset-id  Dataset ID (e.g., nm000104)

Options:
  -h, --help  display help for command

Description:
  Invite a NEMAR user as a collaborator to your dataset.
  Only dataset owners and admins can invite collaborators.

  Works for both public and private repositories.

Requirements:
  - NEMAR account (nemar auth login)
  - Dataset ownership or admin status

Examples:
  $ nemar dataset invite johndoe nm000104

dataset collaborators¶

Usage: nemar dataset collaborators [options] <dataset-id>

List collaborators for a dataset

Arguments:
  dataset-id  Dataset ID (e.g., nm000104)

Options:
  --json      Output as JSON for scripting
  -h, --help  display help for command

Description:
  List all collaborators who have access to a dataset.
  Only dataset owners and admins can view collaborators.

Examples:
  $ nemar dataset collaborators nm000104
  $ nemar dataset collaborators nm000104 --json

dataset publish request¶

Usage: nemar dataset publish request [options] <dataset-id>

Request publication of a dataset

Arguments:
  dataset-id  Dataset ID (e.g., nm000104)

Options:
  -h, --help  display help for command

Description:
  Submit a publication request to make your private dataset publicly accessible.
  NEMAR admins will be notified and can approve or deny your request.

  Once approved, your dataset will:
  - Become publicly visible on GitHub
  - Receive a permanent DOI via Zenodo
  - Have tag protection enabled (prevents version manipulation)
  - Have S3 Object Lock enabled (prevents data deletion)

  You can only have one active publication request per dataset.

Status Flow:
  requested → approving → published (or denied)

Examples:
  $ nemar dataset publish request nm000104
  $ nemar dataset publish status nm000104     # Check request status

dataset publish status¶

Usage: nemar dataset publish status [options] <dataset-id>

Check publication status of a dataset

Arguments:
  dataset-id  Dataset ID (e.g., nm000104)

Options:
  -h, --help  display help for command

Description:
  Check the status of your publication request and see progress through
  the approval workflow.

Possible Statuses:
  requested  - Waiting for admin review
  approving  - Admin is running the publication process
  published  - Dataset is now public with DOI
  denied     - Request was denied (includes reason)

Steps in Approval Process:
   1. CI check          - Verify BIDS validation passes
   2. Make public       - Change repository visibility
   3. S3 public read    - Grant public read access to S3 data
   4. Tag protection    - Prevent version manipulation
   5. Create DOI        - Create concept DOI (EZID/Zenodo)
   6. Update metadata   - Update from BIDS description
   7. Update README     - Add DOI badge and citation
   8. Create tag        - Create version tag
   9. Create release    - Create GitHub release
  10. Upload to Zenodo  - Upload archive (if Zenodo provider)
  11. Publish DOI       - Make DOI public (permanent)
  12. S3 lock           - Enable Object Lock for data preservation
  13. Generate archive  - Create downloadable zip
  14. Notify user       - Send publication confirmation email

Examples:
  $ nemar dataset publish status nm000104

dataset publish resend¶

Usage: nemar dataset publish resend [options] <dataset-id>

Resend publication request notification to admins

Arguments:
  dataset-id  Dataset ID (e.g., nm000104)

Options:
  -h, --help  display help for command

Description:
  Resend the publication request notification email to all NEMAR admins.
  Use this if admins haven't responded to your original request.

  This does NOT create a duplicate request - it only sends a reminder
  email for your existing publication request.

When to Use:
  - Admins haven't responded after several days
  - You want to remind admins about your pending request
  - Your request status is still "requested"

Examples:
  $ nemar dataset publish resend nm000104

dataset clone¶

Usage: nemar dataset clone [options] <dataset-id>

Clone a dataset from NEMAR

Arguments:
  dataset-id           Dataset ID (e.g., nm000104)

Options:
  -o, --output <path>  Output directory (default: ./<dataset-id>)
  -h, --help           display help for command

Description:
  Clone a NEMAR dataset repository with git-annex initialized.
  Data files are not downloaded; use 'nemar dataset get' afterward.

  Private datasets require authentication (nemar auth login) and are
  only accessible to the owner or designated collaborators.

Requirements:
  - git-annex installed
  - NEMAR account (for private datasets)

Examples:
  $ nemar dataset clone nm000104
  $ nemar dataset clone nm000104 -o ./my-dataset

dataset get¶

Usage: nemar dataset get [options] [files...]

Download annexed data files for the current dataset

Arguments:
  files                Specific files/paths to get (default: all)

Options:
  -j, --jobs <number>  Parallel download streams (default: "4")
  -h, --help           display help for command

Description:
  Download data files from the remote for a cloned dataset.
  Must be run inside a git-annex dataset directory.

  For private datasets, credentials are fetched automatically
  if you are logged in (nemar auth login).

Examples:
  $ nemar dataset get                    # Get all files
  $ nemar dataset get sub-01/eeg/        # Get specific directory
  $ nemar dataset get *.edf -j 8         # Get EDF files with 8 streams

dataset save¶

Usage: nemar dataset save [options]

Stage and commit changes in the current dataset

Options:
  -m, --message <msg>  Commit message (default: "Save changes")
  -h, --help           display help for command

Description:
  Stage all changes (git add -A) and commit them. Large files are
  automatically handled by git-annex based on the dataset's largefiles config.

Examples:
  $ nemar dataset save
  $ nemar dataset save -m "Add new EEG recordings"

dataset push¶

Usage: nemar dataset push [options]

Push commits and data to remotes

Options:
  -j, --jobs <number>  Parallel upload streams for S3 (default: "4")
  --no-s3              Skip pushing data to S3 remote
  --pr                 Create a pull request after pushing
  -t, --title <title>  Pull request title (with --pr)
  -b, --body <body>    Pull request body (with --pr)
  -h, --help           display help for command

Description:
  Push git commits to GitHub (main + git-annex branches) and optionally
  copy annexed data to the S3 remote.

  With --pr, creates a pull request after pushing the current branch.

  S3 push uses temporary credentials from the NEMAR API. Falls back to
  environment AWS credentials if not logged in.

Examples:
  $ nemar dataset push
  $ nemar dataset push --no-s3      # Git only, skip S3
  $ nemar dataset push -j 8         # More parallel S3 streams
  $ nemar dataset push --pr -t "Add new recordings"

dataset drop¶

Usage: nemar dataset drop [options] [files...]

Free local copies of annexed files (keeps remote copies)

Arguments:
  files       Specific files to drop (default: all)

Options:
  -h, --help  display help for command

Description:
  Remove local copies of annexed data files. Git-annex verifies that
  remote copies exist before dropping. Use 'nemar dataset get' to
  re-download later.

Examples:
  $ nemar dataset drop                   # Drop all local data
  $ nemar dataset drop sub-01/eeg/       # Drop specific directory
  $ nemar dataset drop *.edf             # Drop EDF files

dataset ci¶

Usage: nemar dataset ci [options] [dataset-id]

Check BIDS validation CI status for the current dataset

Arguments:
  dataset-id  Dataset ID (auto-detected from git remote if omitted)

Options:
  -h, --help  display help for command

Description:
  Show the status of the BIDS validation CI workflow for a dataset.
  When run inside a cloned dataset, the dataset ID is auto-detected
  from the git remote URL.

Examples:
  $ nemar dataset ci              # Auto-detect from CWD
  $ nemar dataset ci nm000104     # Explicit dataset ID

dataset manifest¶

Usage: nemar dataset manifest [options] [version]

View version manifests for a dataset

Arguments:
  version             Version to view (lists available if omitted)

Options:
  -d, --dataset <id>  Dataset ID (auto-detected from git remote if omitted)
  --json              Output raw JSON
  -h, --help          display help for command

Description:
  View version manifests that map file paths to S3 annex keys.
  Manifests are generated when a version DOI is published.

  When run inside a dataset directory, the dataset ID is auto-detected.

Examples:
  $ nemar dataset manifest                    # List available versions
  $ nemar dataset manifest v1.0.0             # View specific version
  $ nemar dataset manifest v1.0.0 --json      # Raw JSON output
  $ nemar dataset manifest -d nm000104        # Explicit dataset ID