Thursday, August 23, 2012

How does pushData works

On a previous post I have shown how to use push data.
Since then there have been some changes to it and as for the user end they are shown here: http://www.youtube.com/watch?v=ZEf6JRb5OF8&feature=youtu.be
That video might not be the best one you watched but it is just a draft, final version will come soon enough.
However this post is not about how to make a new procedure, but what happens in the background.

When a new procedure is created
Obviously the app creates a whole lot of panels, labels, buttons and lists. For the most part they are the same as any other app would do.
Here I want to show one part of it (this is used in a few places)
1 var folder=DocsList.getFileById(ss.getId()).getParents()[0];2 var files=folder.getFilesByType("spreadsheet");3 for(var i=0; i<files.length; i++)4 sourceSpreadsheet.addItem(files[i].getName(), files[i].getId());

This piece of code is responsible for the list boxes where the user will choose source and destination spreadsheet.

The list has a on change handler which updates the list of sheets in that spreadsheet.



The rest of the user interface is quit simple. The real work starts when the user hits the finish button.

In the first steps all the user-provided data is mapped from various text and list boxes.

1 var newPush={}; 2 3 4 newPush.sourceSpreadsheetName=sourceSS.getName(); 5 newPush.sourceSpreadsheetId=sourceSS.getId(); 6 newPush.sourceSheetName=sourceSheetName; 7 newPush.headers=headers; 8 newPush.destinationSpreadsheetName=destinationSS.getName(); 9 newPush.destinationSpreadsheetId=destinationSS.getId();10 newPush.destinationSheet=destinationSheetName;11 newPush.row=startRow;12 newPush.column=startColumn;13 newPush.time=timeInterval;14 newPush.keepEmpty=e.parameter.keepEmpty;15 var id=1;16 while(db.query({type:"pushData", pushId:id}).hasNext()==true)17 id++;18 newPush.pushId=id;19 newPush.lastPush=new Date().toString();20 newPush.success="Done";21 newPush.type="pushData";22 var realTime=false 23 if(timeInterval==0){24 newPush.realTime="true";25 realTime=true;26 }27 else 28 newPush.realTime="false";

After all the data is collected the app makes a new object and saves the provided data in it

Most of this is just saving data, however lines 15-17 might seem a bit odd.

ScriptDb has an integrated function getSize() for query results, but from what I have seen getSize() sometimes returns false results.

That is why the id is calculated manually here.



The next part is to create the trigger that will track changes on the source spreadsheet.

And this was not as easy as it sounds. At the time of writing the app there was no way to determine the source of the onEdit trigger. Yes, there is a e.source for onEdit triggers

but that will return the app’s host spreadsheet, not the spreadsheet where the change occurred. There I used a little trick…

The trick is (and it is not very elegant) I have 10 hardcoded onEdit functions (called editOn)

1 function editOn1(){ 2 3 markEdit("editOn1"); 4 5 } 6 7 function editOn2(){ 8 9 markEdit("editOn2");10 11 }12 13 function editOn3(){14 15 markEdit("editOn3");16 17 }18 19 function editOn4(){20 21 markEdit("editOn4");22 23 }24 25 function editOn5(){26 27 markEdit("editOn5");28 29 }30 31 function editOn6(){32 33 markEdit("editOn6");34 35 }36 37 function editOn7(){38 39 markEdit("editOn7");40 41 }42 43 function editOn8(){44 45 markEdit("editOn8");46 47 }48 49 function editOn9(){50 51 markEdit("editOn9");52 53 }54 55 function editOn10(){56 57 markEdit("editOn10");58 59 }60 61 function markEdit(root){62 63 var db=ScriptDb.getMyDb();64 65 var results=db.query({type:"editHandler", editFunction:root});66 67 var obj=results.next();68 obj.haveChanged="true"; 69 db.save(obj);70 71 72 }



As You can see all of them have one line of code, they call the “markEdit” function sending as a parameter the function name.

And this is why…

…As I had no way to find out which spreadsheet was edited but wanted to avoid unnecessary database updates at all cost I made a little on edit web.

When a user defines a new process the source spreadsheet gets an onEdit handler assigned (that is IF it does not have one previously assigned).

When a spreadsheet is changed the onEdit (that is editOn) function will fire, and record the change to ScriptDb. Later on, when the app has to update the database

it will only update sheets that are assigned to the function which recorded the change. (the code is bellow in the part about the update of db)

After the tedious process of checking if the source spreadsheet has a onEdit handler then assigning it… Or if the push interval is set to real time assigning the real time trigger the app will go ahead, save the sheet to ScriptDb and finally perform the first data push.



Updating the database

As is stated in the post Big data sets and ScriptDb I had to assume a lot of spreadsheets with a lot of data and I had to find an alternative way of saving the data.

This solution works, and No I am not 100% happy with it and I intend to change it soon enough (working on a library special for cases such as pushData) but the app had to be released and this is what we have now.

1 function updateProces(){ 2 3 var db=ScriptDb.getMyDb(); 4 5 var changedSS=db.query({type:"editHandler", haveChanged:"true"}); 6 7 8 while(changedSS.hasNext()){ 9 10 var editObj=changedSS.next();11 12 var spreadsheets=Utilities.jsonParse(editObj.spreadsheets);13 14 for(var k=0; k<spreadsheets.length; k++){15 16 var ss=SpreadsheetApp.openById(spreadsheets[k]); 17 var sheets=ss.getSheets();18 19 for(var l=0; l<sheets.length; l++){20 21 var sheet=sheets[l]; 22 var hasProcedure=db.query({type:"pushData", sourceSheetName:sheet.getName()});23 if(hasProcedure.hasNext()==true)24 updateDbSingleSheet(ss, sheet)25 26 }27 }28 29 }30 31 }32

Remember the object edited in markEdit, well here the app will query for all of those objects who’s haveChanged attribute is set to true.

That is it will generate a list of all spreadsheets that are changed (at least potentially changed). Again I had no way of finding the actual source of the change so I work with arrays of spreadsheets.

All the spreadsheets associated with the trigger the fired onEdit are remapped in the database. updateDbSingleeSheet has been documented in the post bellow Big data sets and ScriptDb (hack)



Pushing the data

Another trick here, only data that has been changed will be pushed. And again some of it has not been actually changed but is associated with the onEdit handler which recorded a change.

First the app will generate a list of changed spreadsheets.


1 while(changedSS.hasNext()){2 3 var editObj=changedSS.next(); 4 mergeArray(listOfChangedSS,Utilities.jsonParse(editObj.spreadsheets));5 }6


Then it will cross reference that list with procedures who's push time interval has expired( or plainly it will check if it is time to push the data)

1 if( ( (timeStart-lastPush)/3600000)+0.25>=obj.time){ 2 if(underscoreGS._contains(listOfChangedSS, obj.sourceSpreadsheetId)==true)3 objectsToPush.push(obj);4 }






underscoreGS is a awesome library with a bunch of functions for manipulating arrays. (it can be found in the Google Apps Script documentation in the part Notable Libraries)

The app will then go through the list of objects to push, and do the push, sheet by sheet.


1 unction writteData(headers, sourceSheet, destinationSheet, sourceSpreadsheet, row, column, keepEmpty){ 2 3 var db=ScriptDb.getMyDb(); 4 var items=db.query({itemType:"sheetData", spreadsheetId:sourceSpreadsheet.getId(), sheetName:sourceSheet}).sortBy("itemId", db.NUMERIC); 5 6 var completeData=[]; 7 for(var i=0; i<headers.length; i++){ 8 9 completeData[i]=[];10 completeData[i][0]=headers[i];11 }12 13 while(items.hasNext()){14 15 var thisItem=items.next();16 for(var i=0; i<headers.length; i++){17 if(!completeData[i])18 completeData[i]=[];19 20 mergeArray(completeData[i], Utilities.jsonParse(thisItem[headers[i]]));21 }22 23 }24 25 completeData=ArrayLib.transpose(completeData);26 27 if(keepEmpty=="false")28 completeData=findEmpty(completeData);29 30 var mR=destinationSheet.getMaxRows();31 var lR=destinationSheet.getLastRow();32 if(lR>completeData.length)33 destinationSheet.getRange(row, column, lR, completeData[0].length).clear();34 destinationSheet.getRange(row,column,completeData.length, completeData[0].length).setValues(completeData); 35 }36


The arguments of this function is what the user provided while creating the process. Headers-headers to be pushed, row- the row where the data writing should start, column- the column where the data writing should start, keepEmpty-is weather the empty rows from the source sheet should be kept in the destination sheet or filed with the next available non-empty row.

First thing the function does is assemble all the pieces of data from ScriptDb (while saving the data it might be divided into smaller pieces)

Then it will clear the columns in which the data will be written. (why? well if you had 100 entries previously and now have only 50 this will delete the remaining rows)

Finally the data is written and the app will proceed to the next process.

Tuesday, August 14, 2012

pushData

The best way to explain what pushData does is by the words of the maker of original pushData , Andrew Stillman.

Here it is a nice picture explaining it:

pushDataAbout

 

Andrew has been my mentor on all the projects I did this summer and a very good and thorough teacher.

pushData was his idea and out there is a published version, the difference between the previous version and this is basically that here the user gets a graphical interface, this version uses ScriptDb and as suggested by users of the previous version there is a real time transfer option included.

 

How it looks

All parts of the look are not final, but here it is.

After installing the app the onOpen handler will add an additional menu to your spreadsheet:

pushData1

The above entries will do:

Start pushData – will show the user interface where you can define new processes, edit/delete existing processes do an immediate data push (overriding the time interval) and see all processes defined

About pushData - will display the picture first shown in this post

Update database now – will check for changes in your source sheets and update the database accordingly

Clear all data and triggers – will delete whatever you and the script made and give you a fresh start

Package push procedures for others to copy – will generate Google Apps Script code for you that you can share. Say you have a great system of processes and want to share it with someone, this is what you need.

Import copy procedures – will use the code generated by the function above to initialize the processes. NOTE: the spreadsheet names have to bee the same as from the original user

Manage user tracker settings – it is great to allow user tracking, the tracking will not allow anyone to steal the users identity but will help New Vision.

 

Start Screen

pushData2

The grid will provide data on any active processes, Push all data now button will provide an immediate push and New Push Procedure will get the user to the interface for defining a new procedure.

 

The starting point of creating a new procedure is to pick the source spreadsheet, afterwards the app will allow for picking the source sheet. After the source sheet has been picked the app will provide a grid with an overview of the first 5 rows in the source sheet.

pushData3

 

The next step is to chose the headers to be pushed. The app provides a set of list boxes, every box will contain all the headers from the source sheet. The order in which the headers are picked is the order they will appear in the destination sheet.

pushData4

After the headers have been picked the user is prompted to choose the destination spreadsheet and sheet. It will also allow for picking the starting point for the data to be written. A range check will be done, if there is a possibility for overwriting data in the destination sheet the user will get a warning. A grid will provide a overview of data currently in the destination sheet.

pushData5 

Finally the user will pick the push interval, options go from 1 to 24 hours for all processes, but will allow real time pushing for one process.

pushData7

After hitting the finish button the app will save the source sheet to ScriptDb, and preform the first data push taking the user back to the start screen,

pushData7

 

pushData is most definitely the best app I produced this summer of several reasons. On of the main is that Andrew took a very active role in developing this.

Additionally the previous version of pushData has received some feedback which are implemented in the new version.

 

Cheers

Big data sets and ScriptDb (hack)

First of all ScriptDb just may be the best thing that happened to GAS. It allows for fast data retrieval and storage(more or less).

But ScriptDb has its problems, and probably the biggest one is saving big data sets. This was a problem I encountered when working on pushData.

A quick intro:

pushData is supposed to save spreadsheets to ScriptDb and use that data to copy entries from some source spreadsheets to some destination spreadsheets.

To make it more simple, imagine having a huge spreadsheets with student data. The data going from first and last name all the way to place of birth.

But you also need smaller sheets which will have only parts of the data for each student. Say a roster for all the students in the school. pushData will allow you to define a process

for copying data, what data should be copied from where to where and in which order you want the columns to appear. You can decide when the copying should happen. Or rather the time interval.

The time interval can go from 1 to 24 hours. And for one process the time can be set to real time.(as soon as a change has been made to the source sheet it will be reflected to the destination sheet).

 

Now my problem is easily defined, I have 1,2,3..5? huge sheets a few smaller ones and all the data from those sheets has to be saved to ScriptDb. If you have ever tried to save a big sheet to ScriptDb you can see the problem.

  At the time of writing this saving that data is very very difficult, I am confident that GAS engineers will have a solution very soon but until then it is a real quest to do such a thing.

 

My initial approach was to create objects, every row would be represented by an object, the object keys would be the column names and all those object would be neatly saved to the database. (Used something very similar to the function in the documentation). The problem is that saving objects one by one is just very slow, so very slow it exceeds the execution time for any big data set. That I knew and my plan was to use the saveBatch function, however saveBatch will be able to save only small arrays of objects. By small I mean something ~20 objects at a time. I did try slicing and saving only parts but that did not go so well either.

NOTE: “Data Storage error” is the error most frequently shown, but it can (and does) mean very different things.

After a few days trying to solve this and a few periods of thinking: “Ooook it just is not possible” I did find a solution. This solution however will render some ScriptDb possibilities useless. But it still was working and it was acceptable for the script I am building.

 

My Solution (a simplified version)

My solution is a divide and conquer approach. Also I do not save rows any more I save columns.

The data from the sheet is taken by getDataRange().getValues and using 2DArrayLib’s transpose function I shift the rows and columns. 

I go though the array row by row turn the whole row into a JSON string(just temporarily) and find its length. (There is another approach without turning the data to strings but this will work much faster, and I needed fast)

In this process the largest array is found.

 

1 var db=ScriptDb.getMyDb();
2
3 var sheetData=sheet.getDataRange().getValues();
4 var keys=sheetData[0];
5
6 checkDates(sheetData);
7
8 sheetData=ArrayLib.transpose(sheetData);
9
10 var largestSize=0;
11 var indexOfLargest=0;
12
13 for(var i=0; i<sheetData.length; i++){
14 var thisSize=(Utilities.jsonStringify(sheetData[i]).length);
15 if(thisSize>largestSize){
16 indexOfLargest=i;
17 largestSize=thisSize;
18 }
19 }


checkDates(sheetData) is a function that will find columns containing dates and convert the dates to a ScriptDb-friendly format.


Now the "guess” work comes. I assume that all the rows have the same, largest, length. Knowing that there is a limit on the object size ScriptDb will save  I calculate how many objects it will take to save the whole sheet. From that I find the number of elements I can save for every sheet column per object.


1 var elementPerObject;
2 var numberOfObjects;
3 var storePerObject=5*8*1024; //to get size in bytes
4 var additional=0;
5 var type="sheetData";
6
7 numberOfObjects=parseInt(Math.ceil( (largestSize*keys.length+additional) / storePerObject));
8 elementPerObject=parseInt( Math.floor( (sheetData[indexOfLargest].length)/numberOfObjects) );
9

 



This is by no means a surgically correct calculation, far from it. But it will find a SAFE number of elements you can save. It kind of works like sending files over a network when you think of it (mush less sophisticated thou)


All that is left now is to loop from 0 to the number of objects needed and save those parts of the sheet.


1 for(var count=0; count<numberOfObjects; count++){
2
3 var thisItem={};
4 thisItem.headers=keys;
5 thisItem.itemType="sheetData";
6 thisItem.itemId=itemId;
7 thisItem.sheetName=sheet.getName();
8 thisItem.spreadsheetId=spreadsheet.getId();
9 itemId++;
10
11 for(var i=0; i<keys.length; i++){
12 thisItem[keys[i]]=Utilities.jsonStringify(sheetData[i].slice(begin, end));
13 }
14
15 items.push(thisItem);
16
17 begin=end;
18 end+=elementPerObject;
19
20 }

Initially begin and end are set to 0 and elementsPerObject. As you can see it will slice the array and load the parts to the object keys saving pushing all of the objects to an array. Every run of the loop will make a new object.


The number of objects can (and will) get somewhat large, some of the space will get wasted. But it will never cause the script to fail because an object was to large, and that is what I wanted to achieve.


Finally the array of items created has to be saved:


1 begin=0;
2 end=20;
3 if(end>items.length)
4 end=items.length;
5
6 while(end<=items.length){
7
8 var subItems=items.slice(begin, end);
9 var results = db.saveBatch(subItems, false);
10
11 if (!db.allOk(results)) {
12 // partial or no success
13 for (var i = 0 ; i < results.length; i++) {
14 Logger.log(i);
15 var item = results[i];
16 if ((typeof item.success) == 'function' && !item.success()) {
17 db.save(item);
18 }
19 }
20 }
21
22 if(end==items.length)
23 break;
24 else{
25
26 begin=end;
27 end+=18;
28
29 if(end>items.length)
30 end=items.length;
31 }
32
33 }

Again the array is sliced into more digestible pieces and saved, I used the fail-safe from the ScriptDb documentation to make sure all the elements will be saved.


And that is it, the sheet is saved. This function will save a sheet 10 000 rows x 13 columns in 10-14 seconds (tested it for a few days). And even better when using this to write data it finishes in 6-10 seconds.


Giving the possibility of saving several really big sheets Smile.


 


Usually when you save something you want to retrieve it, this is how I do it. The function will depend on the saving data function…


My function queries the database for all the objects associated with a sheet (made in the previous function), sorting the results by item ID. (In my case it was crucial to write the data in the same order it originally was)


Again in this case I am not interested in all the columns from the original sheet( it is not excluded)


 


1 var db=ScriptDb.getMyDb();
2
3 var items=db.query({itemType:"sheetData", spreadsheetId:sourceSpreadsheet.getId(), sheetName:sourceSheet}).sortBy("itemId", db.NUMERIC);
4
5 var completeData=[];
6
7 for(var i=0; i<headers.length; i++){
8
9 completeData[i]=[];
10 completeData[i][0]=headers[i];
11 }

In this case headers is an array filled with column names I want to have on my new Sheet . E.G. the original sheet could have “Name, Surname, ID, Date of Birth” but I only want to have “Name” and “ID” on my new sheet, then headers will contain “Name, ID”.


After setting every first element in the new array to the column name I go through the saved items and join what is inside the row with the values the new object contains.


1 while(items.hasNext()){
2
3
4 var thisItem=items.next();
5
6 for(var i=0; i<headers.length; i++){
7 if(!completeData[i])
8 completeData[i]=[];
9
10 mergeArray(completeData[i], Utilities.jsonParse(thisItem[headers[i]]));
11 }
12
13 }

Finally using 2DArayLib I again transpose the array and write the data to a new sheet. Simple enough, effective and in some cases useless. But for this app and similar ones this approach is gold.


 


Cheers

Long time no read

Hi there,

It has been a while since the last post here but that does not mean that I was slacking of. In the silence period 2 new apps have been written.

A lite weight Reservation app and pushData. Reservation app is meant to be embedded to your site and allow for your visitors to make reservations for any events you are hosting.

pushData is to be very short an app allowing you to copy certain columns from source to destination spreadsheets.  More on pushData in the next post.

 

Reservations

Again reservations is meant to be placed somewhere on your site. It is simple but allows for basic options. Your visitor can make a reservation, edit it or delete it.

Reservations will ask you to provide your calendar ID (the calendar is where you make events available for reserving). You allow for a certain number of available seats and the app will take care of the rest.

(a complete manual on how to set this app is available here: Manual )

The app will look like this:

app1

The settings button will be available only for the site owner.(obvious but I wanted to confirm that).

Now when the app is first installed you will want to go to the settings part and set it up and that will look like this:

First of all it will ask for the company/school name and for the calendar containing the event data.

After that it will ask for required fields, that is what the user needs to provide to make a reservation. There is no filed required by the app but you enter what ever it is you want.(the limit is 40 characters).

Finally it will ask for a confirmation message. That is after the visitor makes a reservation an email will be sent to them with the message you enter here.

A few neat options have been added here, you can use the key word “Name” in the message (if name is one of the fields you ask from the user) and the message will contain the visitors name.

And after hitting finish the setup is done.

 

What the visitor can see

First thing would be making a reservation, so if the user hits the “Make reservation” button it will take him to a new user interface.

app2

The green buttons represent dates with seats left, red buttons are displaying events that are full.  The app will show 5 events per page (if available). It will check for 120 days counting from current date to search for events.

It takes the user through the process step buy step, first being to choose a date from the list. After that has been picked it will show text boxes for information you asked for.

app3

When all the fields contain data the submit button will appear.

app4

The process for the user is finished after clicking submit. The app however will open/make a spreadsheet called reservation for you and fill the data provided by the user.

It will make a sheet for every date with reservation information for the visitors who picked that date and one sheet where you can preview all the reservations made. The app will also edit the calendar event decreasing available seats for the event by 1.

 

Canceling reservations is easy for the visitor, after hitting the “Cancel Reservation” button they will be asked to enter their reservation ID. The ID is randomly generated for every reservation and sent to the user in an email.

All they have to do is copy/paste the ID to a text box hit “Cancel” and the reservation will be cancelled. Along it the data from you spreadsheet will be deleted and the number of available seats will be increased by 1.

app5

 

Editing reservations is almost as easy.

First the event ID has to be provided. (the UI will look as the UI for cancellation). If the ID is recognized the app will provide a new interface.

The new UI will be identical to the interface for making a reservation except the text boxes will be filled with previously provided data.

When a change has been made to a reservation the data in your calendar and spreadsheet will be changed accordingly.

 

That is it about the Reservation app, small, simple and powerful.

 

Cheers