【数据结构与算法】系列一 - 基本概念

为什么选择Java编程语言？

C：非面向对象，写法复杂，大量内存管理代码。

C++：写法复杂，大量内存管理代码。

Objective-C、Swift：需要Mac系统。

JavaScript、Python：依赖于脚本解析器，同一个逻辑使用不同写法会影响代码性能，影响算法性能测评。

Java：语法丰富严谨，更多的注意力可以放到业务逻辑上，建议使用至少Java8（JDK1.8）。

Windows、Mac系统，均可轻松搭建Java开发环境。学好数据结构与算法，与编程语言无关。

一、什么是算法？

算法是用于解决特定问题的一系列的执行步骤。

使用不同的算法解决同一个问题，效率可能相差非常大。比如：求第n个斐波那契数（fibonacci number）。

斐波那契数列：从第3项开始，每一项都等于前两项之和。0、1、1、2、3、5、8、13、21、34

public class complexity {
    public static void main(String[] args) {
        // 方法一
        System.out.println(fib(10)); // 输出：55
        System.out.println(fib(20)); // 输出：6765
        System.out.println(fib(40)); // 输出：102334155（等待大概4s出结果）
        System.out.println(fib(50)); // 输出：1830（等待1min以上无结果）

        // 方法二
        System.out.println(fib2(40)); // 输出：102334155
        System.out.println(fib2(50)); // 输出：-298632863
    }

    // 求第n个斐波那契数
    // 方法一：递归
    public static int fib(int n) {
        if (n <= 1) return n;
        return fib(n - 1) + fib(n - 2);
    }

    // 方法二：
    public static int fib2(int n) {
        if (n <= 1) return n;

        int first = 0;
        int second = 1;
        for (int i = 0; i < n - 1; i++) {
            int sum = first + second;
            // 上一次参与计算的second就是下一次的first
            first = second;
            // 上一次的两数之和就是下一次的second
            second = sum;
        }
        return second;
    }
}

通过上面的两个方法可以看出，不同的算法效率相差巨大。使用递归求斐波那契数会随着数值增大最终导致程序直接卡死，但使用算法2就会非常高效。

二、算法复杂度

2.1. 如何判断一个算法的好坏？

如果单从执行效率上进行评估，可能会想到这么一种方案：比较不同算法对同一组输入的执行处理时间，这种方案也叫做“事后统计法”。

上面的方案有比较明显的缺点：执行时间严重依赖硬件以及运行时各种不确定的环境因素，而且还需要编写相应的测算代码，测试数据的选择比较难保证公正性。

// 计算a+b的和
public static int plus(int a, int b) {
    return a + b;
}

// 计算1+2+3+...+n的和
public static int sum1(int n) {
    int result = 0;
    for (int i = 1; i <= n; i++) {
        result += i;
    }
    return result;
}

public static int sum2(int n) {
    return (n + 1) * n / 2;
}

上面示例代码中，如果n是100，可能sum1比较快，sum2比较慢；如果n是200，可能sum1比较慢，sum2比较快。所以我们一般从以下纬度来评估算法的优劣：

正确性、可读性、健壮性（对不合理输入的反应能力和处理能力）
时间复杂度（time complexity）：估算程序指令的执行次数（执行时间）
空间复杂度（space complexity）：估算所需占用的存储空间

2.2. 时间复杂度的估算

// O(1)
public static void test1(int n) {		
    /*
    代码片段一：
    只会执行一次输出语句，所以一共执行1次
    */
    if (n > 10) { 
        System.out.println("n > 10");
    } else if (n > 5) { // 2
        System.out.println("n > 5");
    } else {
        System.out.println("n <= 5"); 
    }
    
    /*
    代码片段二：
    int i = 0 									执行1次
    i < 4 											执行4次
    i++ 												执行4次
    System.out.println("test"); 执行4次
    所以，执行次数一共13次
    */
    for (int i = 0; i < 4; i++) {
        System.out.println("test");
    }

    // 上面的代码一共执行14次（代码片段一 + 代码片段二），即O(1)
}

// O(n)
public static void test2(int n) {
    /*
    int i = 0 									执行1次
    i < n 											执行n次
    i++ 												执行n次
    System.out.println("test"); 执行n次
    所以，执行次数一共(1 + 3n)次，即O(n)
    */
    for (int i = 0; i < n; i++) {
        System.out.println("test");
    }
}

// O(n)
public static void test3(int n) {
    /*
    外面for循环：
    int i = 0										执行1次
    i < n												执行n次
    i++													执行n次
    执行次数：(2n + 1)

    里面for循环：
    int j = 0										执行1次
    j < n												执行15次
    j++													执行15次
    System.out.println("test"); 执行15次
    执行次数：n * (1 + 45)

    所以，执行次数是外面for循环和里面for循环的总次数，一共(48n + 1)次，即O(n)
    */
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < 15; j++) {
            System.out.println("test");
        }
    }
}

// O(n)
public static void test4(int n) {
    /*
    int a = 10; 												执行1次
    int b = 20; 												执行1次
    int c = a + b;											执行1次
    int[] array = new int[n];						执行1次

    int i = 0 													执行1次
    i < array.length 										执行n次
    i++ 																执行n次
    System.out.println(array[i] + c);		执行n次

    所以，执行次数一共(5 + 3n)次，即O(n)
    */
    int a = 10;
    int b = 20;
    int c = a + b;
    int[] array = new int[n];
    for (int i = 0; i < array.length; i++) {
        System.out.println(array[i] + c);
    }
}

// O(n^2)
public static void test5(int n) {
    /*
    外面for循环：
    int i = 0										执行1次
    i < n												执行n次
    i++													执行n次
    执行次数：(2n + 1)

    里面for循环：
    int j = 0										执行1次
    j < n												执行n次
    j++													执行n次
    System.out.println("test"); 执行n次
    执行次数：n * (3n + 1)

    所以，执行次数是外面for循环和里面for循环的总次数，一共(3n^2 + 3n + 1)次，即O(n^2)
    */
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            System.out.println("test");
        }
    }
}

// O(logn)
public static void test6(int n) {
    /*
    n = n / 2  相当于每次都需要除以原来的数，然后再做判断（注意：“/”是取整的意思）。
    例如：
    n = 8 = 2^3	
    n = 16 = 2^4
    也就是说，执行次数是指数，只需要求指定值的对数即可。
    3 = log2(8)
    4 = log2(16)
    所以，执行次数：log2(n)，即O(logn)
    */
    while ((n = n / 2) > 0) {
        System.out.println("test");
    }
}

// O(logn)
public static void test7(int n) {
    /*
    根据上面的test6代码就能知道，下面代码的执行次数是：log5(n)，即O(logn)。

    注意：求对数时，除或乘以多少次，就是以什么数为底数。
    */
    while ((n = n / 5) > 0) {
        System.out.println("test");
    }
}

// O(nlogn)
public static void test8(int n) {
    /*
    外面for循环：
    int i = 1										执行1次
    i < n												执行log2(n)次
    i = i * 2										执行log2(n)次
    执行次数：(2*log2(n) + 1)

    里面for循环：
    int j = 0										执行1次
    j < n												执行n次
    j++													执行n次
    System.out.println("test"); 执行n次
    执行次数：log2(n) * (3n + 1)

    所以，执行次数是外面for循环和里面for循环的总次数，一共(1 + 3*log2(n) + 3n*log2(n))次，即O(nlogn)
    */
    for (int i = 1; i < n; i = i * 2) {
        // 1 + 3n
        for (int j = 0; j < n; j++) {
            System.out.println("test");
        }
    }
}

2.3. 大O表示法

一般用大O表示法来描述复杂度，它表示的是数据规模n对应的复杂度。

忽略常数、系数、低阶
- 9（常数） >> O(1)
- 3log2(n) >> O(logn)
- 2n + 3 >> O(n)
- 3n*log2(n) >> O(nlogn)
- n^2 + 2n + 6 >> O(n^2)
- 4n^3 + 3n^2 + 22n + 100 >> O(n^3)

注意：大O表示法仅仅是一种粗略的分析模型，是一种估算，能帮助我们短时间内了解一个算法的执行效率。

2.4. 常见的复杂度

执行次数： O(1) < O(logn) < O(n) < O(nlogn) < O(n^2 ) < O(n^3 ) < O(2^n ) < O(n!) < O(n^n)

可以借助函数生成工具对比复杂度的大小：https://zh.numberempire.com/graphingcalculator.php

2.4.1. 数据规模较小时

2.4.2. 数据规模较大时

2.5. 斐波那契函数的时间复杂度分析

// 方法一：
public static int fib(int n) {
    if (n <= 1) return n;
    return fib(n - 1) + fib(n - 2);
}

// 方法二：
public static int fib2(int n) {
    if (n <= 1) return n;

    int first = 0;
    int second = 1;
    for (int i = 0; i < n - 1; i++) {
        int sum = first + second;
        first = second;
        second = sum;
    }
    return second;
}

方法二可以直接得出复杂度是O(n)。

方法一的复杂度分析：

调用次数是1 + 2 + 4 + 8 = 2^0 + 2^1 + 2^2 + 2^3 = 2^4 − 1 = 2^(n−1) − 1 = 0.5 ∗ 2^n − 1，所以复杂度是O(2^n)。

方法一和方法二的的差别有多大呢？如果如果有一台1GHz的普通计算机，运算速度是10^9次每秒，假设n为64，O(n)大约耗时6.4 * 10^(−8) 秒，O(2^n) 大约耗时584.94年。所以有时候算法之间的差距，往往比硬件方面的差距还要大。

2.6. 算法的优化方向

用尽量少的存储空间；
用尽量少的执行步骤（执行时间）；
根据情况，可以空间换时间或时间换空间。

2.7. 多个数据规模怎么表示复杂度？

public static void test(int n, int k) {
    for (int i = 0; i < n; i++) {
        System.out.println("test");
    }

    for (int i = 0; i < k; i++) {
        System.out.println("test");
    }
}

上面示例代码的时间复杂度是O(n + k)。

三、LeetCode

LeetCode是一个用于练习算法的好网站。

官方网站：https://leetcode.com/
中文网站：https://leetcode-cn.com/

斐波那契数习题：https://leetcode-cn.com/problems/fibonacci-number/