This dataset originally housed Department of Permits, Licenses, and Inspections violations (2015-2020), and has now been expanded to include violations logged by other units (including DOMI) from 2020-06-01 until the present. These data are used to manage and track all updates to casefiles by city employees and can be used to understand when citations/investigations/court proceedings are issued, the nature and location of the violation, and the status of the casefile at any point in time. By using addresses or parcel numbers, which are contained in these data, users can also display information on geospatial maps.
Collection/Interpretation
It is important to understand the distinction between violations and casefiles, and how updates to a casefile are represented in the dataset. A casefile refers to one or more violations. When an initial investigation is conducted each of these violations is recorded separately. The investigation will result in a new status for all of violations ("VIOLATIONS FOUND"). The subject of the investigation will be informed of this outcome and must address the problem(s). There will be a follow-up inspection at this point, and depending on the results, further steps will be taken (follow-up investigations, criminal complaints issued, court proceedings, etc.)
Each violation for each casefile is represented as a unique row in the dataset. As explained above, there will be a minimum of two updates for each violation (the initial and follow-up investigation). Though the investigation of all violations in a casefile is conducted simultaneously, each investigation is represented as a unique row. Thus, for a property with three violations there will be a minimum of six rows (both investigations for each violation). It is possible to track the entire case history by observing all rows for each casefile.
Each violation is cited according to the violation_code_section
field.
The casefile_number
represents the only UUID for each casefile (the entire group of violations). By using the casefile_number
and violation_code_section
fields in combination, one can track the history each violation for a given casefile. Combining the above fields with investigation_date renders a UUID for each record.
DOMI (the Department of Mobility and Infrastructure), PLI, and Environmental Services (ES) all use this system to log violations. In most cases, the department involved in the casefile can be extracted from the casefile_number
field (beginning with the 4th character). For instance, a casefile_number
like CF-PLI-2021-025422, represents a violation reported by PLI. The remaining casefile IDs start with "O-"; these are PLI violation codes from an old ticketing system.
The records from 2020-06 onward are obtained from the City's Computronix system, one of several independent systems used by the City to track property-level data.
Preprocessing/Formatting
All string text (most fields) were converted to UPPERCASE data. The data are manually entered and often contain non-uniform formatting. While several solutions for cleaning the data exist, including allowing the user to clean the data after accessing it here, text field values were transformed to UPPERCASE to ensure the data were uniformly formatted in this case. Future improvements to this ETL pipeline may approach this problem with a more sophisticated technique.