Actual Extraction [Python]

An easy step by step tutorial about how to extract an Actual Time Series in Python SDK.

Artesian gives you straightforward access to the data history to perform an analysis and produce a plan in the most compatible way.

Let’s see how to proceed step by step.

The goal

Extract the data of an Actual Time Series Market Data.

The reference data is fictitious, created exclusively for this case. With Artesian, it is possible to write any data attribute to a Time Series, making it suitable for saving your data production.

Let’s see how to proceed step by step.

Import of Artesian libraries and configuration

To use all the features of Artesian, we must first authenticate. First, we need to install the Artesian SDK in the python environment used, using the “pip install artesian-SDK” command. Then the base library and a series of modules required to instantiate the authentication towards Artesian (line 4 of the script) and read the data.

We can configure Artesian by entering the essential link and the API key once all the necessary libraries have been imported.

To extract these two essential data, refer to the reference tutorial  “How to Configure Artesian Python SDK“. 

Once the Artesian configuration is complete, we can configure the  Query Service(line 6).

				
					from Artesian import ArtesianConfig, Granularity
from Artesian.Query import QueryService

cfg = ArtesianConfig("https://arkive.artesian.cloud/{tenantName}/", "{api-key}")

qs = QueryService(cfg)
				
			


The creation of the Actual extraction

Once we have configured Artesian and the Query Service, we can start thinking about what data and how we want to extract it. The basic information from Artesian is the ID or a list of IDs relating to the Market Data of interest obtained through the UI.

Fundamental parameters to decide:

Once we decide on the IDs we are interested in extracting; we can begin to evaluate how we want to extract them. The fundamental parameters to be determined are:

The Time Range of extraction: Artesian offers various possibilities; for each, you must consider that the time reference at the end of the extraction is always exclusive. For this specific example, let’s consider the AbsoluteDateRange(“2022-03-01”, “2022-03-02”).

The data extraction TimeZone: is selected according to your interest. Artesian will take care of converting the data necessary.

The Granularity of data extraction can be coincident with the original data or different as long as an Aggregation Rule has been configured on the curve.

The Aggregation Rule: is the Artesian feature that allows you to extract data even on different granularities from the original one. The aggregation/disaggregation operation applied to the data is defined through the enhancement of this property. The possible options are “Undefined”, “SumAndDivide” or “AverageAndReplicate”. In the case of “Undefined”, extracting data at different granularities from the original will not be possible.

With our chosen extraction parameters, it is ready to be launched, and the data obtained can be viewed.

				
					q = qs.createActual() \
    .forMarketData([100020707,100020706,100020705]) \
    .inAbsoluteDateRange("2022-03-01","2022-03-02") \
    .inTimeZone("UTC") \
    .inGranularity(Granularity.Hour)
    .execute()

print(q)
				
			

Other options for data extraction

Regarding the selection of the extraction ranges, Artesian supports the following options:

AbsoluteDateRange“: an absolute fixed period of time (eg: from “2018-08-01” to “2018-08-13” will allow you to extract data from “2018-08-01” to “2018-08-12 “).

RelativePeriod“: represents a relative period of time, before or after today (e.g., Considering that today is “2021-03-31”, requesting the period “P-5D” will mean extracting the data from “2021-03-26 “To” 2021-03-30 “. Requesting the period” P5D “will mean extracting the data from” 2021-03-31 “to” 2021-04-04 “). For the syntax, it is possible to refer to the ISO8601 standard; in addition to the simple “RelativePeriod”, it is possible to use the RelativePeriodRange” (e.g., from “P-5D” to “P5D” it will extract the data from “2021-03-26” to “2021-04-04”).

RelativeInterval“: is a fixed dimension “rolling” time span. The possible options are: “RollingWeek”, “RollingMonth”, “RollingQuarter” or “RollingYear” or the last 7, 30, 90, 365 days of data (with the current day included); “WeekToDate”, “MonthToDate”, “QuarterToDate” or “YearToDate” or considering from the current day to the beginning of the week, month or year.

				
					   .inAbsoluteDateRange("2018-08-01", "2018-08-10")
   .inRelativeInterval(RelativeInterval.RollingWeek)
   .inRelativePeriod("P5D")
   .inRelativePeriodRange("P-3D","P10D")
   
				
			

In addition to those previously mentioned, Artesian also offers the possibility to apply a filling strategy to the data to manage any missing data. The possible options are:

FillNull (): a default operation that also returns empty values (null) in the extraction.

FillNone (): an operation that does not return empty values (null) in the extraction.

FillLatestValue (“P5D”): an operation that returns in the extraction the last available value to the period indicated in the call, in this case, “5” days back.

FillCustomValue (): an operation that applies a custom value in the extraction instead of missing values (null). The values will apply to the settlement, the opening and/or closing price, the highest and/or lowest price, the volume paid, the volume sold and/or the volume.

				
					
.withFillNull()
.withFillNone() 
.withFillLatestValue("P5D") 
.withFillCustomValue(123) 
				
			


Alternative to the SDK extraction

As an alternative to the SDK extraction, we can extract data directly from the portal in Excel format.

 

It is sufficient to employ just once and then have it entirely reproducible and automated in our workflow.

Not only does it save you time, but it allows you to minimize human errors caused by repeated operations on substantial amounts of data or different Excel files.

An undeniable advantage that allows us to focus on data analysis instead of its management and optimization.