Turi Create  4.0
turi::file_line_count_estimator Class Reference

#include <core/util/file_line_count_estimator.hpp>

Public Member Functions

 file_line_count_estimator ()
 
 file_line_count_estimator (size_t file_size_in_bytes)
 
void set_file_size (size_t file_size_in_bytes)
 
void observe (file_line_count_estimator &other_estimator)
 
void observe (size_t line_count, size_t file_pos)
 
double number_of_lines () const
 
size_t num_lines_observed () const
 

Detailed Description

Estimate the number of lines in a file and the number of bytes used to represent each line.

We estimate the number of lines in a file by making continuous observations of the current file position, and the number of lines read so far, and making simple assumptions about buffering behavior.

ifstream fin;
while(...) {
read_lines.
estimator.observe(lines_read since we last called observe,
fin.tellg());
estimator.lines_in_file() contains estimate of the number
of lines in the file
}

Definition at line 32 of file file_line_count_estimator.hpp.

Constructor & Destructor Documentation

◆ file_line_count_estimator() [1/2]

turi::file_line_count_estimator::file_line_count_estimator ( )
inline

The default constructor. If used, set_file_size must be used to set the filesize in bytes.

Definition at line 38 of file file_line_count_estimator.hpp.

◆ file_line_count_estimator() [2/2]

turi::file_line_count_estimator::file_line_count_estimator ( size_t  file_size_in_bytes)
inline

Constructs a file line count estimator.

Parameters
file_size_in_bytesThe file size in bytes.

Definition at line 44 of file file_line_count_estimator.hpp.

Member Function Documentation

◆ num_lines_observed()

size_t turi::file_line_count_estimator::num_lines_observed ( ) const
inline

Total number of lines observed so far

Definition at line 114 of file file_line_count_estimator.hpp.

◆ number_of_lines()

double turi::file_line_count_estimator::number_of_lines ( ) const
inline

The current estimate of the number of lines left in the file. This returns 0 if the estimate is not available. One call to observe is sufficient to get a rough estimate.

Definition at line 102 of file file_line_count_estimator.hpp.

◆ observe() [1/2]

void turi::file_line_count_estimator::observe ( file_line_count_estimator other_estimator)
inline

Integrates statistics from another estimator

Definition at line 58 of file file_line_count_estimator.hpp.

◆ observe() [2/2]

void turi::file_line_count_estimator::observe ( size_t  line_count,
size_t  file_pos 
)
inline

This should be called for every block of read operations performed on the file. Missing observations will cause the estimate to drift. The more frequently this is called (preferably once for every line), the more accurate the estimate.

Definition at line 70 of file file_line_count_estimator.hpp.

◆ set_file_size()

void turi::file_line_count_estimator::set_file_size ( size_t  file_size_in_bytes)
inline

Sets the file size in bytes.

Definition at line 50 of file file_line_count_estimator.hpp.


The documentation for this class was generated from the following file: