Zdenek Svoboda
posted this on August 06, 2010 04:50
Update: for most up to date documentation of the GoodData CL tool please refer to the GoodData developer website. The GoodData CL commands are described in the Runtime Commands documentation.
I've recently got few support requests that reported missing table errors. I've figured out that we haven't described well how the CL commands work. I'll try to quickly fix this with this post knowing that we need to extend our documentation ASAP.
The CL tool provides multiple connectors. Connectors allow you to transfer data from various different data sources. There are for example file (e.g. CSV) , JDBC, Google Analytics, and Salesforce connectors available in the CL tool.
Connectors always load data to a GoodData project. Most of the integration scenarios load one dataset and typically have following sequence:
Here are the most common scenarios:
1. Generate the input file configuration (typically executed once per a dataset):
GenerateCsvConfig | GenerateJdbcConfig | GenerateGoogleAnalyticsConfig | GenerateSfdcConfig
2. Initialize a new GoodData project and store the project context (typically executed once per a dataset) :
CreateProject
StoreProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc
GenerateMaql
ExecuteMaql
3. Load data to the project (typically executed daily for a dataset):
RetrieveProject | OpenProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc
TransferLastSnapshot | TransferAllSnapshots | TransferSnapshots
Many examples of the CL scripts are available in this document.
Here is an incorrect scenario that I have noticed few times recently:
1. Generate the input file configuration for a DATASET 1:
GenerateCsvConfig | GenerateJdbcConfig | GenerateGoogleAnalyticsConfig | GenerateSfdcConfig
2. Initialize a new GoodData project and store the project context for the DATASET 1 :
CreateProject
StoreProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc
GenerateMaql
ExecuteMaql
3. Generate the input file configuration for a DATASET 2 that references the DATASET 1 :
GenerateCsvConfig | GenerateJdbcConfig | GenerateGoogleAnalyticsConfig | GenerateSfdcConfig
4. Initialize a new GoodData project and store the project context for the DATASET 2 :
CreateProject
StoreProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc
GenerateMaql
ExecuteMaql
5. Load data to the project for the DATASET 2:
RetrieveProject | OpenProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc
TransferLastSnapshot | TransferAllSnapshots | TransferSnapshots
This scenario chokes because the DATASET 1 that is referenced from the DATASET 2 has never been transferred via a Transfer command. This means that the DATASET 1 transformation structures haven't been created yet. These structures are necessary when processing the DATASET 2 that references the DATASET 1.
Comments
How does Good Data handle changes to any data sets that have already been uploaded? Incremental loads are fine for growing data sets, but not changing ones. Is it expected that the entire data set be deleted, recreated and uploaded again?
Thanks
Hi Gareth, This is handled by the "CONNECTION POINT" of the data set. If you update data which already has an existing record, the system will update the record based on this Connection Point (primary key).
Hope this helps!
Thanks Ray!