csvcount¶
Description¶
Count the number of rows, cells, empty rows and cells, and blank lines
Optionally, given a regex or list of regexes, count:
the number of rows with at least 1 match
the number of cells with at least 1 match
the number of total matches (e.g. some cell values may match the patterns more than once)
Examples¶
Basic counting of file records¶
Basic example:
$ csvcount examples/dummy4.csv
rows,cells,empty_rows,empty_cells,blank_lines
4,12,0,0,0
On real world data:
$ csvcount examples/realdata/osha-violation.csv | csvlook -I
| rows | cells | empty_rows | empty_cells | blank_lines |
| ----- | ------ | ---------- | ----------- | ----------- |
| 29999 | 299990 | 0 | 39035 | 0 |
On file with empty rows and cells:
$ csvcount examples/empties.csv
rows,cells,empty_rows,empty_cells,blank_lines
4,12,1,4,0
On file with blank lines:
$ csvcount examples/blankedlines.csv
rows,cells,empty_rows,empty_cells,blank_lines
3,9,0,0,4
Counting of patterns¶
Counting words with at least 5 letters:
$ csvcount -P '[A-z]{5,}' examples/longvals.csv | csvlook
| pattern | rows | cells | matches |
| --------- | ---- | ----- | ------- |
| [A-z]{5,} | 3 | 9 | 43 |
Limiting the match searching to the description
column:
$ csvcount -P '[A-z]{5,}' -c description examples/longvals.csv | csvlook
| pattern | rows | cells | matches |
| --------- | ---- | ----- | ------- |
| [A-z]{5,} | 3 | 3 | 31 |
Counting (naively) the number of @mentions, #hastags, and URLs in tweet texts:
$ csvcount -P '@\w+' -P '#\w+' -P 'https:' \
-c text examples/realdata/tweets-whitehouse.csv \
| csvlook
| pattern | rows | cells | matches |
| ------- | ----- | ----- | ------- |
| @\w+ | 1,591 | 1,591 | 2,266 |
| #\w+ | 409 | 409 | 560 |
| https: | 2,596 | 2,596 | 2,938 |