------------------------------------------------------------------------ csvkit ... a suite of utilities to display & manipulate csv files ------------------------------------------------------------------------ Hompeage: https://csvkit.readthedocs.io/ csvkit is a suite of command-line tools for coverting to and working with comma separated value or "csv" files and writing out to several common formats. NOTE: csvkit also handles "tsv" or tab-separated value files with "-t" option. ------------------------------------------------------------ I. Convert various formats into csv ------------------------------------------------------------ convert dbf, fixed, geojson, json, ndjson, xls, xlsxExcel, json to csv [with or without headers] $ in2csv infile.xls > outfile.csv $ in2csv --no-header-row infile.xls | tail -n +2 ------------------------------------------------------------------------ II. First look and scalpel: csvlook, csvcut ------------------------------------------------------------------------ $ csvlook file.csv | less -S #view the file $ csvcut -n file.csv #enumerate column names $ csvcut -c 2,5-7 file.csv #extract columns 2,5,6,7 $ csvcut -c name1,name2 file.csv #extract by column names $ csvcut -c 6-10 file.csv #extract all columns except Note: options for each task can be found via help $ csvcut -help ------------------------------------------------------------------------ III. Examining the data: csvstat, csvgrep, csvort ------------------------------------------------------------------------ $ csvstat $ csvgrep $ csvsort ------------------------------------------------------------------------ IV. Power tools: csvjoin, csvstack, csvsql,, sql2csv ------------------------------------------------------------------------ $ csvjoin $ csvstack $ csvsql $ sql2csv ------------------------------------------------------------------------ V. To other formats: csvjson, csvpy, csvformat ------------------------------------------------------------------------ $ csvformat $ csvjson $ csvpy ======================================================================== Arguments common to all csvkit tools: -d DELIMITER, --delimiter DELIMITER Delimiting character of the input CSV file. -t, --tabs Specify that the input CSV file is delimited with tabs. Overrides "-d". -q QUOTECHAR, --quotechar QUOTECHAR Character used to quote strings in the input CSV file. -u {0,1,2,3}, --quoting {0,1,2,3} Quoting style used in the input CSV file. 0 = Quote Minimal, 1 = Quote All, 2 = Quote Non-numeric, 3 = Quote None. -b, --no-doublequote Whether or not double quotes are doubled in the input CSV file. -p ESCAPECHAR, --escapechar ESCAPECHAR Character used to escape the delimiter if --quoting 3 ("Quote None") is specified and to escape the QUOTECHAR if --no-doublequote is specified. -z FIELD_SIZE_LIMIT, --maxfieldsize FIELD_SIZE_LIMIT Maximum length of a single field in the input CSV file. -e ENCODING, --encoding ENCODING Specify the encoding of the input CSV file. -L LOCALE, --locale LOCALE Specify the locale (en_US) of any formatted numbers. -S, --skipinitialspace Ignore whitespace immediately following the delimiter. --blanks Do not coerce empty, "na", "n/a", "none", "null", "." strings to NULL values. --date-format DATE_FORMAT Specify a strptime date format string like "%m/%d/%Y". --datetime-format DATETIME_FORMAT Specify a strptime datetime format string like "%m/%d/%Y %I:%M %p". -H, --no-header-row Specify that the input CSV file has no header row. Will create default headers (a,b,c,...). -K SKIP_LINES, --skip-lines SKIP_LINES Specify the number of initial lines to skip before the header row (e.g. comments, copyright notices, empty rows). -v, --verbose Print detailed tracebacks when errors occur. -l, --linenumbers Insert a column of line numbers at the front of the output. Useful when piping to grep or as a simple primary key. --zero When interpreting or displaying column numbers, use zero-based numbering instead of the default 1-based numbering.