Skip to content

realzza/xenopy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

birdData

BirdData is a python wrapper for Xeno-canto API 2.0. Enables user to download bird data with one command line. BirdData supports multithreading download.

Environment

Download repo to local:

git clone git@github.com:realzza/birdData.git

Set up environment:

pip install -r requirement.txt

Download MetaData

Metadata is a simple configuration for each recording. Typically, metadata files contain information like recordist, recoding time, country, location, latitude, longitude, altitude, and recording length. Below is an example of a metadata file.

{
    "id": "426350", 
    "gen": "Abroscopus", 
    "sp": "superciliaris", 
    "ssp": "", 
    "en": "Yellow-bellied Warbler", 
    "rec": "Peter Boesman", 
    "cnt": "India", 
    "loc": "Eagle Nest, Sessni area and lower, Arrunachal Pradesh",
    "lat": "27.0223", 
    "lng": "92.4139",
    "alt": "", 
    "type": "song", 
    "url": "//xeno-canto.org/426350", 
    "file": "https://xeno-canto.org/426350/download"
}

Use download_meta.py to download metadata files. Customize your own query by defining multiple parameters before you request metadata from xeno-canto api.

optional arguments:
  -h, --help           show this help message and exit
  --gen GEN            genus
  --ssp SSP            subspecies
  --cnt CNT            country
  --type TYPE          type
  --rmk RMK            remark
  --lat LAT            latitude
  --lon LON            longtitude
  --loc LOC            location
  --box BOX            box:LAT_MIN,LON_MIN,LAT_MAX,LON_MAX
  --area AREA          Continent
  --since SINCE        e.g. since:2012-11-09
  --year YEAR          year
  --month MONTH        month
  --output OUTPUT      directory to output directory. default: `dataset/metadata/`
  --attempts ATTEMPTS

A sample metadata downloading activity

python download-meta.py --cnt China --loc Shanghai --since 2022-01-01 --output test/

Please refer to the Search Tips for definitions about above parameters.

Download Recordings

Single-thread

Download audio data for one bird species. Use scientific name starting with lowercase. e.g, cettia cetti.

python download.py --name "cettia cetti"

Download audio data for a file of species names. Format requirement: names divided by "\n"

python download.py --name name_file

General Usage:

usage: download.py [-h] --name NAME

download bird audios

optional arguments:
  -h, --help   show this help message and exit
  --name NAME  [1] name of one bird species; [2] file of bird species spaced
               by '\n'

Multi-thread

Usage

Speed up downloading using multiple threads.

python download-mult.py --name "cettia cetti" --process-ratio 0.6

Download multiple birds in a file, format requirement: names divided by "\n"

python download-mult.py --name name_file --process-ratio 0.6

General Usage:

usage: download-mult.py [-h] --name NAME [--process-ratio PROCESS_RATIO]

download bird audios

optional arguments:
  -h, --help            show this help message and exit
  --name NAME           [1] name of one bird species; [2] file of bird species
                        spaced by '\n'
  --process-ratio PROCESS_RATIO
                        float[0~1], define cpu utilities in downloading audios
                        [default: 0.8]

Kill multiprocess

It would be hard to kill multiprocess programs manually. download-mult.py has a backdoor for this concern: it will automatically generate a kill.sh after downloading started. Kill program by

bash kill.sh

Badcase backup

Find download failure record at bad_urls.txt so that you can redownload afterwards if necessary.

Align Dataset

The bird data you download is in .mp3 format, unsupported by lightweight feature-extracting libraries such as soundfile and audiofile (librosa is terribly slow compared to these two). Transform unextractable .mp3 into extractable .wav by alignDataset-mult.py script.

python alignDataset-mult.py --dataDir dataset/audio --outDir ./wavs --process 24 

Usage

usage: alignDataset-mult.py [-h] [--dataDir DATADIR] [--outDir OUTDIR]
                            [--process PROCESS]

align smaplerate of dataset

optional arguments:
  -h, --help         show this help message and exit
  --dataDir DATADIR  path to the input dir
  --outDir OUTDIR    path to the output dir
  --process PROCESS  number of process running

Kill multiprocess

bash kill_align.sh

Bad transformation backup

Find transformation failures at bad_aligns.txt

To-do

  • [12.29] multiprocess download
  • [1.1] Automated killing script for multiprocess program
  • [1.1] Bad url backup for trace back
  • define sample rate prior to download

Contact

Feel free to file an issue had you encountered any problems. Have fun!

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy