turicreate.SFrame.read_csv_with_errors¶
-
classmethod
SFrame.
read_csv_with_errors
(url, delimiter=',', header=True, comment_char='', escape_char='\\', double_quote=True, quote_char='"', skip_initial_space=True, column_type_hints=None, na_values=['NA'], line_terminator='\n', usecols=[], nrows=None, skiprows=0, verbose=True, nrows_to_infer=100, true_values=[], false_values=[], _only_raw_string_substitutions=False, **kwargs)¶ Constructs an SFrame from a CSV file or a path to multiple CSVs, and returns a pair containing the SFrame and a dict of filenames to SArrays indicating for each file, what are the incorrectly parsed lines encountered.
Parameters: - url : string
Location of the CSV file or directory to load. If URL is a directory or a “glob” pattern, all matching files will be loaded.
- delimiter : string, optional
This describes the delimiter used for parsing csv files.
- header : bool, optional
If true, uses the first row as the column names. Otherwise use the default column names: ‘X1, X2, …’.
- comment_char : string, optional
The character which denotes that the remainder of the line is a comment.
- escape_char : string, optional
Character which begins a C escape sequence. Defaults to backslash() Set to None to disable.
- double_quote : bool, optional
If True, two consecutive quotes in a string are parsed to a single quote.
- quote_char : string, optional
Character sequence that indicates a quote.
- skip_initial_space : bool, optional
Ignore extra spaces at the start of a field
- column_type_hints : None, type, list[type], dict[string, type], optional
This provides type hints for each column. By default, this method attempts to detect the type of each column automatically.
Supported types are int, float, str, list, dict, and array.array.
- If a single type is provided, the type will be applied to all columns. For instance, column_type_hints=float will force all columns to be parsed as float.
- If a list of types is provided, the types applies to each column in order, e.g.[int, float, str] will parse the first column as int, second as float and third as string.
- If a dictionary of column name to type is provided, each type value in the dictionary is applied to the key it belongs to. For instance {‘user’:int} will hint that the column called “user” should be parsed as an integer, and the rest will be type inferred.
- na_values : str | list of str, optional
A string or list of strings to be interpreted as missing values.
- true_values : str | list of str, optional
A string or list of strings to be interpreted as 1
- false_values : str | list of str, optional
A string or list of strings to be interpreted as 0
- line_terminator : str, optional
A string to be interpreted as the line terminator. Defaults to “n” which will also correctly match Mac, Linux and Windows line endings (“r”, “n” and “rn” respectively)
- usecols : list of str, optional
A subset of column names to output. If unspecified (default), all columns will be read. This can provide performance gains if the number of columns are large. If the input file has no headers, usecols=[‘X1’,’X3’] will read columns 1 and 3.
- nrows : int, optional
If set, only this many rows will be read from the file.
- skiprows : int, optional
If set, this number of rows at the start of the file are skipped.
- verbose : bool, optional
If True, print the progress.
Returns: - out : tuple
The first element is the SFrame with good data. The second element is a dictionary of filenames to SArrays indicating for each file, what are the incorrectly parsed lines encountered.
Examples
>>> bad_url = 'https://static.turi.com/datasets/bad_csv_example.csv' >>> (sf, bad_lines) = turicreate.SFrame.read_csv_with_errors(bad_url) >>> sf +---------+----------+--------+ | user_id | movie_id | rating | +---------+----------+--------+ | 25904 | 1663 | 3 | | 25907 | 1663 | 3 | | 25923 | 1663 | 3 | | 25924 | 1663 | 3 | | 25928 | 1663 | 2 | | ... | ... | ... | +---------+----------+--------+ [98 rows x 3 columns]
>>> bad_lines {'https://static.turi.com/datasets/bad_csv_example.csv': dtype: str Rows: 1 ['x,y,z,a,b,c']}