A few useful develop tools

  1. Design & implement a website:
    Bootstrap: http://getbootstrap.com/
  2. A web-based UI for controlling MySQL database
    Lazy Mofo: http://lazymofo.wdschools.com/
  3. Push notifications:
    Pushwoosh: http://docs.pushwoosh.com/
  4. Cross platform app design:
    Corona: https://coronalabs.com/products/corona-sdk/
    another one is Monkey, not used.
  5. W3C school for web design:
Posted in Uncategorized | Leave a comment

[leetcode] Substring with Concatenation of All Words

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in wordsexactly once and without any intervening characters.

For example, given:
s: "barfoothefoobarman"
words: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).


Correct Ans:

public class Solution {
 public List<Integer> findSubstring(String s, String[] words) {
 List<Integer> res = new ArrayList<Integer>();
 if(s==null||words==null||words.length*words[0].length()>s.length()) return res;
 int wordLen = words[0].length();
 int len = words.length;
 Map<String, Integer> dict = new HashMap<String, Integer>();
 for(String word: words)
 if(dict.containsKey(word)) dict.put(word,dict.get(word)+1);
 else dict.put(word,1);
 for(int j=0; j<wordLen; j++){ //if the first several char is not is the dict, eg. bkapp vs [app]
 Map<String, Integer> leftover = new HashMap<String,Integer>(dict);
 Queue<String> queue = new LinkedList<String>();
 for(int i=j; i<=s.length()-wordLen; i+=wordLen){
 String cur = s.substring(i,i+wordLen);
 String oldStr = queue.remove();
 leftover.put(oldStr, leftover.get(oldStr)+1);
 leftover.put(cur, leftover.get(cur)-1);
 leftover=new HashMap<String,Integer>(dict);
return res;


My Ans:
1. The problem is that it cannot handle multiple identical words in dict, since we use HashSet to store dict.
2. The pros: we use map<String, Integer> to store visited word, where integer is the word’s position (start index) in S, so that when we found it again, we can quickly adjust left boarder(curPos/staring index for current examine substring).

 public List<Integer> findSubstring(String s, String[] words) {
 List<Integer> res = new ArrayList<Integer>();
 if(s==null||words==null||words.length*words[0].length()>s.length()) return res;
 int wordLen = words[0].length();
 Map<String, Integer> visited = new HashMap<String,Integer>();
 HashSet<String> dict = new HashSet<String>();
 for(String word: words)
 int curPos = 0;
 for(int i=0; i<=s.length()-wordLen; i+=wordLen){
 String cur = s.substring(i,i+wordLen);
 res.add(curPos); curPos=visited.get(cur)+wordLen; //visited.clear();
 }else{ visited.put(cur,i); if(visited.size()==1) curPos=i;}

 return res;

Posted in coding | Leave a comment

Matlab Basic

1. Sub matrix:
一列元素:  A(:,j)表示提取A矩阵的第j列全部元素
一行元素:  A(i,:)表示提取A矩阵的第i行元素,
A(i, j)表示提取A矩阵的第i行第j列的元素。
多行元素:  A(i:i+m,:)表示提取A的第i行到第i+m行的元素。
多列元素:  A(:,j:j+n)表示提取A的第j列到第j+n列的元素。
提取块:     A(i:i+m, j:j+n)表示的是mxn的一个子块的元素。

2. Delete Data:
clear;  (clear all data)
clear m; (just delete m)

3. Model comparison:
RMSE (root mean squared error)   close 0 is better
regstats(y,yy)  output statistics about model.

4. nlinfit/nlintool
write function, using .* ./  .^, if not then it become matrix operator.
options=statset()  ,  which set the error limit, etc.

Posted in Uncategorized | Leave a comment

Odroid Freq Configure

Verified on XU+E

FILE_CPU_CURRENT_SCALING_FREQ = “/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq”;
FILE_CPU_SCALING_FREQ = “/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed”;
FILE_CPU_SCALING_GOVERNER = “/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor”;
FILE_CPU_AVAILABLE_FREQS = “/sys/devices/system/cpu/cpufreq/iks-cpufreq/freq_table”;
FILE_CPU_AVAILABLE_GOVERNERS = “/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors”;

FILE_GPU_CURRENT_FREQ = “/sys/devices/platform/pvrsrvkm.0/sgx_dvfs_cur_clk”;
FILE_GPU_MIN_FREQ = “/sys/devices/platform/pvrsrvkm.0/sgx_dvfs_min_lock”;
FILE_GPU_MAX_FREQ = “/sys/devices/platform/pvrsrvkm.0/sgx_dvfs_max_lock”;
FILE_GPU_AVAILABLE_FREQS = “/sys/devices/platform/pvrsrvkm.0/sgx_dvfs_table”;

FILE_MEM_CURRENT_FREQ = “/sys/devices/platform/exynos5-busfreq-mif/devfreq/exynos5-busfreq-mif/cur_freq”;
FILE_MEM_MIN_FREQ = “/sys/devices/platform/exynos5-busfreq-mif/devfreq/exynos5-busfreq-mif/min_freq”;
FILE_MEM_MAX_FREQ = “/sys/devices/platform/exynos5-busfreq-mif/devfreq/exynos5-busfreq-mif/max_freq”;
FILE_MEM_AVAILABLE_FREQS = “/sys/devices/platform/exynos5-busfreq-mif/devfreq/exynos5-busfreq-mif/freq_table”;

Posted in Uncategorized | Leave a comment

NPB benchmark on Linux

NAS Parallel Benchmarks

Usually use in multi-core system to compare performance. It supports three implementations, and works on Linux and Android.

If the time is limited to wait for the official version, another way is to download Phoronix Test Suite (pts) for Linux, it includes many benchmarks on every aspect. And one of them is NPB.

Downloaded the NPB from pts, go to the source code level folder, it has instructions to run each benchmark or group them together.

1) Using MPI implementation, the compile process can use the defaulting code. When run the program ,  specify the thread number: mpirun -np x program-name

2) For OpenMP implementation, (motivation: UA and DC are not included in MPI version), we need modify the makefile to feed the right thread number we want, otherwise, the default approach only compiles one thread version. On GCC, add -fopenmp option: gcc -o XYZ -fopenmp XYZ.c. For DC, remember to clean the data before next run.  The two benchmark are implement by C and Forturn, so modify two places to add OpenMP option.

Posted in Uncategorized | Leave a comment

Android Benchmarks

1. 3DMark – The Gamer’s Benchmark:  https://play.google.com/store/apps/details?id=com.futuremark.dmandroid.application

2.Multi-thread:  (Not tried)
2.1 CompuBench:  http://compubench.com/result.jsp
2.2 Linpack:   https://play.google.com/store/apps/details?id=com.greenecomputing.linpack&hl=en
2.3 RGBench:  http://www.walkingrandomly.com/?p=3079

3. Benchmark suits (includes a group of benchmarks focus on different part)
3.1 GeekBench (cross multiple platform): http://www.primatelabs.com/geekbench/
3.2 Antutu, show some score
3.3  Vellemo , cpu and browser
3.4 GFBench , GPU related
3.5  BBench, measure web browser

MobileBench: recommend by some big companies, e.g. samsung, they listed some benchmarks to evaluate performance of different devices.

Benchmark introduction: http://www.nextechage.com/how-to-test-smartphones-performance-using-benchmark-android-app/
One slides that describe part of the benchmarks: http://www.slideshare.net/kstan2/benchmark-osdctw2014

Posted in Uncategorized | Leave a comment

ARM big.Little Coherency



”Cache Coherent Interconnect” – in this case the ARM CoreLink™ CCI-400 interconnect IP. The system is completed by the CoreLink GIC-400, which provides dynamically configurable interrupt distribution to all the cores.


the bus interfaces of Cortex-A15 and Cortex-A7 processors make use of the
AMBA® AXI Coherency Extensions (ACE) to the widely-used AMBA AXI protocol. This protocol provides for coherent data transfer at the bus level. In the AMBA ACE protocol, three coherency channels are added in addition to the normal five channels of AMBA AXI. As an example, the lower part of Figure shows the steps in a coherent data read from the Cortex-A7 cluster to the Cortex-A15 cluster. This starts with the Cortex-A7 cluster issuing a Coherent Read Request through the RADDR channel. The CCI-400 hands over the request to the Cortex-A15 processor’s ACADDR channel to snoop into Cortex-A15 processor’s cache. On receiving the request from CCI-400, the Cortex-A15 processor checks the data availability and reports this information back through the CRRESP channel. If the requested data is in the cache, the Cortex-A15 processor places it on the CDATA channel. Then the CCI-400 moves the data from the Cortex-A15 processor’s CDATA channel to the Cortex-A7 processor’s RDATA channel, resulting in a cache linefill in the Cortex-A7 processor. The CCI-400 and the ACE protocol enable full coherency between the Cortex-A15 and Cortex-A7 clusters, allowing data sharing to take place without external memory transactions.

All interfaces support 128-bit wide data allowing for systems scaling to 10’s Gbyte/s data bandwidths to support high definition multimedia requirements and the latest high performance networking interfaces.

Without hardware coherency the software is responsible for cache maintenance including cleaning, flushing and invalidating caches. This takes significant processing cycles and energy as data is cleaned out from caches to external memory. The hardware coherency introduced with AMBA 4 ACE allows the different processing engines to view each other’s caches and removes or reduces the need for the cache maintenance operations. The hardware coherency ensures that any cached data in the small core can be passed seamlessly to the large core without having to access external memory.

Therefore, the Cortex-A15-Cortex-A7 system is designed to migrate in less than 20,000-cycles, or 20-microSeconds with processors operating at 1GHz. Less than 2,000 instructions are required to achieve save-restore and because the two processors are architecturally identical there is a one-to-one mapping between state registers in the inbound and outbound processors.(http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdf)

For private cache warmup penalty, prior work shows that performance often improves when private LLCs of big and little cores are powered on together [Scheduling Heterogeneous Multi-Cores through Performance Impact Estimation (PIE) ]. Thus, we ignore the warmup penalty. Also, prior work suggested that the power overhead of task migration is < 0.75% [Thread Motion: Fine-Grained Power Management for
Multi-Core Systems]. Thus, we do not consider the additional energy consumption of our scheduling mechanism.

Posted in Uncategorized | Leave a comment