2. HPSC MOOC¶

These projects are based on the course “High Performance Scientific Computing” by Professor Randall LeVeque.

This page contains the following codes:

Workspace 1: Testing Scripts and Dummy Code
- test1.py (python script to test for environment variables and python version)
- test2.sh (shell script to test for environment variables, gfortran and ipython version)
- test3.f90 (a dummy fortran code)
Workspace 2: Quadratic, Cubic and Polynomial Interpolation
- hw2a.py (python script for quadratic interpolation)
- hw2b.py (python script for quadratic, cubic and polynomial interpolation)
Workspace 3: Newton’s Method and Convergence Criteria
- newton.py (ipython module to compute square root of a number using Newton’s method)
- intersections.py (ipython script to plot a graph where two functions intersect - uses newton.py)
- test1.f90, newton.f90, intersections.f90, functions.f90, Makefile (fortran code for the intersection problem using max iterations for convergence)
- problem7/newton.f90, problem7/functions.f90, problem7/test_quartic.f90, problem7/Makefile, problem7/intersections.py (intersection problem using residual error for convergence)
Workspace 4: OpenMP
- quadrature.f90 (module for integrating a function)
- test1.f90 (cubic function - uses quadrature module)
- test2.f90 (sinusoidal function - uses quadrature module)
- quadrature_omp.f90, test2_omp.f90 (same as above but using OpenMP)
Workspace 5: Load Balancing
- Makefile (makefile for the 1D solution)
- functions.f90 (sinusoidal function module)
- quadrature.f90, test.f90 (trapezoid rule module + test - serial)
- quadrature2.f90, test2.f90 (simpson’s rule module + test - serial)
- quadrature3.f90, test3.f90 (trapezoid rule module + test - OpenMP)
- quad2d/functions.f90, quad2d/quadrature.f90, quad2d/test.f90, quad2d/Makefile (integrate a sinusoid with trapezoid rule in 2D with Open MP and load balancing)
Workspace 6: MPI
- Makefile (makefile for this directory)
- copyvalue.f90 (passes a value from Process 0 to Processes 1-3 using MPI)
- copyvalue2.f90 (passes a value from Process 3 to Processes 0-2 using MPI)
- part2/functions.f90, part2/quadrature.f90, part2/Makefile (the sinusoidal function, quadrature module and makefile)
- part2/test.f90 (compute the same integral over all Processes using MPI)
- part2/test2.f90 (integral using Master-Worker paradigm)
Project 1: 1D Integral and MPI
- functions.f90, quadrature.f90, test2.f90, Makefile (compute a 1D integral using MPI by dividing up intervals between an indeterminate number of processes)
Project 2: 20D Integral and the Monte Carlo Method
- functions.f90, quadrature_mc.f90, random_util.f90, test_quad_mc.f90, Makefile (compute integral using Monte Carlo method for a 20 dimensional integral - serial)
- plot_mc_quad_error.py (plot the result)
Project 3: Laplace’s Equation and the Monte Carlo Method
- laplace_mc.py (python version)
- problem_description.f90, laplace_mc.f90, mc_walk.f90, random_util.f90, Makefile (compute the solution to Laplace’s Equation using the Monte Carlo Method - serial)
- test1.f90, test2.f90 (trials I made along the way)
- plot_mc_quad_error.py (plot the result)
Project 4: Laplace’s Equation, MPI and the Monte Carlo Method
- problem_description.f90, laplace_mc.f90, mc_walk.f90, random_util.f90, Makefile (compute the solution to Laplace’s Equation using the Monte Carlo Method - using MPI)
- test1.f90, test1b.f90, test2.f90 (trials I made along the way)
- plot_mc_quad_error.py (plot the result)

2.1. Workspace 1: Testing Scripts and Dummy Code¶

test1.py (python script to test for environment variables and python version)

"""
test1.py file

Note: this is a Python script.  What you are reading now in the triple
quotes is a comment (a "docstring" since it's the first thing in this file).

This file is to test whether you have environment variables set properly and
are able to import numpy (for numerical computation), matplotlib, 
and pylab (for plotting).

Instructions:

Make sure your environment variables $UWHPSC and $MYHPSC are set properly.

Insert your name where instructed below.

Type
    $ python test1.py
at the Unix prompt ($ represents the prompt) to run the code.

If the output looks ok, change the file to set debug_mode = False
below and run it again.  This time the results will go into a file
test1output.txt.   Check that these look ok.

Commit the final version of this file along with test1output.txt to your git
repository.

"""

import os,sys

debug_mode = False

if debug_mode:
    output_file = sys.stdout  # send output to screen
else:
    # Once it's working, send output to a file rather than screen:
    output_file = open("test1output.txt","w")
    sys.stdout = output_file

print "Code run by **Andrew Roberts**"

UWHPSC = os.environ.get('UWHPSC','** not set **')
print "Environment variable UWHPSC is %s"  % UWHPSC

MYHPSC = os.environ.get('MYHPSC','** not set **')
print "Environment variable MYHPSC is %s"  % MYHPSC


# Check for various Python modules and print an error message if
# an exception occurs:

try:
    import numpy
    print "Imported numpy ok"
except:
    print "** Error importing numpy"

try:
    import matplotlib
    print "Imported matplotlib ok"
except:
    print "** Error importing matplotlib"

try:
    import pylab
    print "Imported pylab ok"
except:
    print "** Error importing pylab"

if not debug_mode:
    output_file.close()  # close the text file

test2.sh (shell script to test for environment variables, gfortran and ipython version)

# bash script to check for software 
# Insert your name where indicated below, then type
#    $ bash test2.sh
# When it looks good, redirect output to a file:
#    $ bash test2h.sh > test2output.txt

echo 
echo Code run by  **Andrew Roberts**
echo Environment variable UWHPSC is $UWHPSC
echo Environment variable MYHPSC is $MYHPSC

echo 
echo which ipython returns...
which ipython

echo 
echo which gfortran returns...
which gfortran

echo 
echo gfortran --version returns...
gfortran --version

echo
echo Compiling and running a Fortran code...
gfortran test3.f90
./a.out

test3.f90 (a dummy fortran code)

! Test fortran program.
! Insert your name where indicated below.  
! This code will be compiled and run when you execute the test2.sh script.

   program test
	   print *, "Code run by  **Andrew Roberts**"
       print *, "Successfully ran Fortran 90 program"
   end program test

2.2. Workspace 2: Quadratic, Cubic and Polynomial Interpolation¶

hw2a.py (python script for quadratic interpolation)

"""
Demonstration script for quadratic interpolation.
Update this docstring to describe your code.
Modified by: ** your name here **
"""

import numpy as np
import matplotlib.pyplot as plt
from numpy.linalg import solve

# Set up linear system to interpolate through data points:

# Data points:
xi = np.array([-1., 1., 2])
yi = np.array([0., 4., 3.])

# It would be better to define A in terms of the xi points.
# Doing this is part of the homework assignment.
A = np.array([[1., -1., 1.], [1., 1., 1.], [1., 2., 4.]])
b = yi

# Solve the system:
c = solve(A,b)

print "The polynomial coefficients are:"
print c

# Plot the resulting polynomial:
x = np.linspace(-2,3,1001)   # points to evaluate polynomial
y = c[0] + c[1]*x + c[2]*x**2

plt.figure(1)       # open plot figure window
plt.clf()           # clear figure
plt.plot(x,y,'b-')  # connect points with a blue line

# Add data points  (polynomial should go through these points!)
plt.plot(xi,yi,'ro')   # plot as red circles
plt.ylim(-2,8)         # set limits in y for plot

plt.title("Data points and interpolating polynomial")

plt.savefig('demo1plot.png')   # save figure as .png file

hw2b.py (python script for quadratic, cubic and polynomial interpolation)

"""
Demonstration module for quadratic interpolation.
Update this docstring to describe your code.
Modified by: ** Andrew Roberts **
"""


import numpy as np
import matplotlib.pyplot as plt
from numpy.linalg import solve

def quad_interp(xi,yi):
    """
    Quadratic interpolation.  Compute the coefficients of the polynomial
    interpolating the points (xi[i],yi[i]) for i = 0,1,2.
    Returns c, an array containing the coefficients of
      p(x) = c[0] + c[1]*x + c[2]*x**2.

    """

    # check inputs and print error message if not valid:

    error_message = "xi and yi should have type numpy.ndarray"
    assert (type(xi) is np.ndarray) and (type(yi) is np.ndarray), error_message

    error_message = "xi and yi should have length 3"
    assert len(xi)==3 and len(yi)==3, error_message

    # Set up linear system to interpolate through data points:

    A = np.vstack([np.ones(3), xi, xi**2]).T
    b = yi
    c = solve(A,b)

    return c

def cubic_interp(xi,yi):
    """
    Cubic interpolation.  Compute the coefficients of the polynomial
    interpolating the points (xi[i],yi[i]) for i = 0,1,2,3.
    Returns c, an array containing the coefficients of
      p(x) = c[0] + c[1]*x + c[2]*x**2 +c[3]x**3.

    """

    # check inputs and print error message if not valid:

    error_message = "xi and yi should have type numpy.ndarray"
    assert (type(xi) is np.ndarray) and (type(yi) is np.ndarray), error_message

    error_message = "xi and yi should have length 4"
    assert len(xi)==4 and len(yi)==4, error_message

    # Set up linear system to interpolate through data points:
    
    A = np.vstack([np.ones(4), xi, xi**2, xi**3]).T
    b = yi
    c = solve(A,b)

    return c

def poly_interp(xi,yi):
    
    """Polynomial interpolation.  Compute the coefficients of the polynomial
    interpolating the points (xi[i],yi[i]) for i = 0,1,2... n
    Returns c, an array containing the coefficients of
      p(x) = c[0] + c[1]*x + c[2]*x**2 + c[3]x**3 +...+ c[n]x**n.
    """

    # check inputs and print error message if not valid:

    error_message = "xi and yi should have type numpy.ndarray"
    assert (type(xi) is np.ndarray) and (type(yi) is np.ndarray), error_message

    error_message = "xi and yi should have the same length"
    assert len(xi)== len(yi), error_message

    # Set up linear system to interpolate through data points:

    #stacks them vertically and then takes transpose
    #A = np.empty([len(xi),len(xi)])
    #A[:,0] = 1.
    #for i in range(0,len(xi),1):
    #    for j in range(1,len(xi),1):
    #        A[i,j]=xi[i]**(j)

    # Uses a list comprehension:
    n = len(xi)
    A = np.vstack([xi**j for j in range(n)]).T
    b = yi
    c = solve(A,b)

    return c
    
def plot_quad(xi,yi):
    c = quad_interp(xi,yi)
    x = np.linspace(xi.min()-1, xi.max() + 1, 1000)
    y = c[0]+c[1]*x + c[2]*x**2
    plt.figure(1)
    plt.clf()
    plt.plot(x,y,'b-')
    plt.plot(xi,yi,'ro')
    plt.xlabel('x (-)')
    plt.ylabel('y (-)')
    plt.title("Data points and interpolated polynomial")
    plt.savefig('quadratic.png')

def plot_cubic(xi,yi):
    c = cubic_interp(xi,yi)
    x = np.linspace(xi.min()-1, xi.max() + 1, 1000)
    y = c[0]+c[1]*x + c[2]*x**2 + c[3]*x**3
    plt.figure(1)
    plt.clf()
    plt.plot(x,y,'b-')
    plt.plot(xi,yi,'ro')
    plt.xlabel('x (-)')
    plt.ylabel('y (-)')
    plt.title("Data points and interpolated polynomial")
    plt.savefig('cubic.png')

def plot_poly(xi,yi):
    c = poly_interp(xi,yi)
    x = np.linspace(xi.min()-1, xi.max() + 1, 1000)
    n = len(c)
    y = c[n-1]
    for j in range(n-1, 0, -1):
        y = y*x + c[j-1]

    plt.figure(1)
    plt.clf()
    plt.plot(x,y,'b-')
    plt.plot(xi,yi,'ro')
    plt.xlabel('x (-)')
    plt.ylabel('y (-)')
    plt.title("Data points and interpolated polynomial")
    plt.savefig('poly.png')

def test_quad1():
    """
    Test code, no return value or exception if test runs properly.
    """
    xi = np.array([-1.,  0.,  2.])
    yi = np.array([ 1., -1.,  7.])
    c = quad_interp(xi,yi)
    c_true = np.array([-1.,  0.,  2.])
    print "c =      ", c
    print "c_true = ", c_true
    # test that all elements have small error:
    assert np.allclose(c, c_true), \
        "Incorrect result, c = %s, Expected: c = %s" % (c,c_true)
        
def test_quad2():
    """
    Test code, no return value or exception if test runs properly.
    """
    xi = np.array([-10.,  1.,  20.])
    yi = np.array([ 15., -50.,  70.])
    c = quad_interp(xi,yi)
    c_true = np.array([-48.16586922,  -2.24162679,   0.40749601])
    print "c =      ", c
    print "c_true = ", c_true
    # test that all elements have small error:
    assert np.allclose(c, c_true), \
        "Incorrect result, c = %s, Expected: c = %s" % (c,c_true)

def test_cubic1():
    """
    Test code, no return value or exception if test runs properly.
    """
    xi = np.array([-1., 1., 2., 3.])
    yi = np.array([5., 2., 3., 5])
    c = cubic_interp(xi,yi)
    c_true = np.array([ 2.5, -1.41666667, 1., -0.08333333])
    print "c =      ", c
    print "c_true = ", c_true
    # test that all elements have small error:
    assert np.allclose(c, c_true), \
        "Incorrect result, c = %s, Expected: c = %s" % (c,c_true)

def test_poly1():
    """
    Test code, no return value or exception if test runs properly.
    """
    xi = np.array([-1., 1., 2., 3.])
    yi = np.array([5., 2., 3., 5])
    c = cubic_interp(xi,yi)
    c_true = np.array([ 2.5, -1.41666667, 1., -0.08333333])
    print "c =      ", c
    print "c_true = ", c_true
    # test that all elements have small error:
    assert np.allclose(c, c_true), \
        "Incorrect result, c = %s, Expected: c = %s" % (c,c_true)

def test_poly2():
    """
    Test code, no return value or exception if test runs properly.
    """
    xi = np.array([-1., 0., 2., 5., 8.])
    yi = np.array([ 1., -1., 7., 15., 25.])
    c = poly_interp(xi,yi)
    c_true = np.array([-1., 1.22777778, 2.51944444, -0.66111111, 0.04722222])
    print "c =      ", c
    print "c_true = ", c_true
    # test that all elements have small error:
    assert np.allclose(c, c_true), \
        "Incorrect result, c = %s, Expected: c = %s" % (c,c_true)

if __name__=="__main__":
    # "main program"
    # the code below is executed only if the module is executed at the command line,
    #    $ python demo2.py
    # or run from within Python, e.g. in IPython with
    #    In[ ]:  run demo2
    # not if the module is imported.
    print "Running test..."
    test_quad1()

2.3. Workspace 3: Newton’s Method and Convergence Criteria¶

newton.py (ipython module to compute square root of a number using Newton’s method)

"""
Module to compute the square root of a number
using Newton's method: f(x) = x**2 - n = 0

Parameters:
	max_iter :- maximum number of iterations for iterative method
	tol :- relative residual tolerance for iterative method
"""
import numpy as np
max_iter = 20
tol = 1.0e-14

def solve(x0, debug=False):
	"""
	Compute the solution to the function using the initial guess & allow debugging

	Inputs:
		fvals_sqrt :- function returning tuple containing values of f(x) and f'(x)
		x0 :- initial guess of the square root
		debug :- prints x0 outside loop and x within loop, default = False
	Outputs:
		x :- computed value of the square root
		iters :- the number of iterations taken 
	Example usage:
		import newton; solve(4., True)
	"""
	iters = 0
	x = x0	
	
	if(debug):
		print "Initial guess: x = %20.15e" %x0

	for i in range(max_iter):

		(fx,dfx) = fvals_sqrt(x)

		if(abs(fx)<tol):			
			break

		x = x - (fx/dfx)
		iters = iters + 1		
		
		if(debug):
			print "At iteration %d x = %20.15e" %(iters,x) 
	return (x, iters)

def fvals_sqrt(x):
	"""
	Return a tuple containing the value of the function & it's derivative at x
	
	Inputs:
		x :- value for which f(x) and f'(x) are to be returned 
	Outputs:
		f :- the value of the function, f(x)
		fp :- the derivative of the function, f'(x)
	Example usage:	
		(f, fp) = fvals_sqrt(10)
	"""
	
	#f = x**2 - 4.0
	#fp = 2.0*x
	f = x*np.cos(np.pi*x)-(1-(0.6*x**2))
	fp = (-x*np.pi)*np.sin(np.pi*x)+np.cos(np.pi*x)+(1.2*x)

	return (f, fp)

def test1(debug_solve=False):
	"""
	Test Newton's method using an array of initial gusses
	
	Inputs:
		None
	Outputs:
		Print x, iters, and f(x) to the screen
	Example usage:
		import newton; newton.test1()
	"""
	#for x in [1., 2., 100.]:
	for x in [-2.2, -1.6, -0.8, 1.4]:

		print ""

		(xout, iters) = solve(x, debug=debug_solve)
		print "After %d iterations, the solver returns x = %20.15e" %(iters,xout)
		
		(f, fp) = fvals_sqrt(xout)
		print "The function f(x) = %20.15e" %f

		#assert abs(xout-2.0) < tol, "***Unexpected result: x = %22.15e" %xout

intersections.py (ipython script to plot a graph where two functions intersect - uses newton.py)

"""
Script to plot g1(x) and g2(x) versus x
and to show where they intersect with black dots

Example use:
	>>> run intersections
"""

import newton
import matplotlib.pyplot as plt
import numpy as np

guesses = np.array([-2.2, -1.6, -0.8, 1.4])
xarray = np.zeros(4)
yarray = np.zeros(4)
i = 0

for xin in guesses:
	(xout, iters) = newton.solve(xin)
	xarray[i] = xout
	yarray[i] = (1-(0.6*xout**2))
	i = i+1

x = np.linspace(-5,5,1000)

g1 = x*np.cos(np.pi*x)

g2 = (1-(0.6*x**2))

plt.figure(1)
plt.clf()
plt.xlim(-5,5)
plt.xlabel('x (-)')
plt.ylabel('y (-)')
plt.title("Two functions and their intersecting points")
#p1 = plt.plot(x, g1, 'b', label=r'$y(x) = xcos(\pix)$')
p1 = plt.plot(x, g1, 'b', label=r'$y(x) = xcos(\pi x)$')
p2 = plt.plot(x, g2, 'r', label=r'$y(x) = 1-(0.6x^2)$')
p3 = plt.plot(xarray, yarray, 'ko', label=r'$intersection$')
plt.legend(loc="best")
#plt.legend([p1, p2, p3],["g1(x) = xcos(pi*x)", "g2(x) = 1-(0.6x**2)", "intersection"])
plt.show() #show the plot on the screen
plt.savefig('intersection.png')   # save figure as .png file

test1.f90, newton.f90, intersections.f90, functions.f90, Makefile (fortran code for the intersection problem using max iterations for convergence)

! $MYHPSC/homework3/test1.f90

program test1

    use newton, only: solve
    use functions, only: f_sqrt, fprime_sqrt

    implicit none
    real(kind=8) :: x, x0, fx
    real(kind=8) :: x0vals(3)
    integer :: iters, i
	logical :: debug         ! set to .true. or .false.

    print *, "Test routine for computing zero of f"
    debug = .true.

    ! values to test as x0:
    x0vals = (/1.d0, 2.d0, 100.d0 /)

    do i=1,3
        x0 = x0vals(i)
		print *, ' '  ! blank line
        call solve(f_sqrt, fprime_sqrt, x0, x, iters, debug)

        print 11, x, iters
11      format('solver returns x = ', es22.15, ' after', i3, ' iterations')

        fx = f_sqrt(x)
        print 12, fx
12      format('the value of f(x) is ', es22.15)

        if (abs(x-2.d0) > 1d-14) then
            print 13, x
13          format('*** Unexpected result: x = ', es22.15)
            endif
        enddo

end program test1

! $MYHPSC/homework3/newton.f90

module newton

    ! module parameters:
    implicit none
    integer, parameter :: maxiter = 20
    real(kind=8), parameter :: tol = 1.d-14

contains

subroutine solve(f, fp, x0, x, iters, debug)

    ! Estimate the zero of f(x) using Newton's method. 
    ! Input:
    !   f:  the function to find a root of
    !   fp: function returning the derivative f'
    !   x0: the initial guess
    !   debug: logical, prints iterations if debug=.true.
    ! Returns:
    !   the estimate x satisfying f(x)=0 (assumes Newton converged!) 
    !   the number of iterations iters
     
    implicit none
    real(kind=8), intent(in) :: x0
    real(kind=8), external :: f, fp
    logical, intent(in) :: debug
    real(kind=8), intent(out) :: x
    integer, intent(out) :: iters

    ! Declare any local variables:
    real(kind=8) :: deltax, fx, fxprime
    integer :: k


    ! initial guess
    x = x0

    if (debug) then
        print 11, x
 11     format('Initial guess: x = ', es22.15)
        endif

    ! Newton iteration to find a zero of f(x) 

    do k=1,maxiter

        ! evaluate function and its derivative:
        fx = f(x)
        fxprime = fp(x)

        if (abs(fx) < tol) then
            exit  ! jump out of do loop
            endif

        ! compute Newton increment x:
        deltax = fx/fxprime

        ! update x:
        x = x - deltax

        if (debug) then
            print 12, k,x
 12         format('After', i3, ' iterations, x = ', es22.15)
            endif

        enddo


    if (k > maxiter) then
        ! might not have converged

        fx = f(x)
        if (abs(fx) > tol) then
            print *, '*** Warning: has not yet converged'
            endif
        endif 

    ! number of iterations taken:
    iters = k-1


end subroutine solve

end module newton

! $MYHPSC/homework3/intersections.f90

program intersections

    use newton, only: solve
    use functions, only: f_function, fprime_function

    implicit none
    real(kind=8) :: x, x0, fx
    real(kind=8) :: x0vals(4)
    integer :: iters, i
	logical :: debug         ! set to .true. or .false.

    print *, "Test routine for computing zero of f"
    debug = .true.

    ! values to test as x0:
    x0vals = (/-2.2d0, -1.6d0, -0.8d0, 1.4d0 /)

    do i=1,4
        x0 = x0vals(i)
		print *, ' '  ! blank line
        call solve(f_function, fprime_function, x0, x, iters, debug)

        print 11, x, iters
11      format('solver returns x = ', es22.15, ' after', i3, ' iterations')

        fx = f_function(x)
        print 12, fx
12      format('the value of f(x) is ', es22.15)

!        if (abs(x-2.d0) > 1d-14) then
!            print 13, x
!13          format('*** Unexpected result: x = ', es22.15)
!            endif
        enddo

end program intersections

! $MYHPSC/homework3/functions.f90

module functions
	
	! parameters for module
!	implicit none
!	real(kind=8) :: pi
!	save
	

contains

real(kind=8) function f_sqrt(x)
    implicit none
    real(kind=8), intent(in) :: x

    f_sqrt = x**2 - 4.d0

end function f_sqrt


real(kind=8) function fprime_sqrt(x)
    implicit none
    real(kind=8), intent(in) :: x
    
    fprime_sqrt = 2.d0 * x

end function fprime_sqrt

real(kind=8) function f_function(x)
    implicit none
    real(kind=8), intent(in) :: x
	real(kind=8) :: pi = 3.141592653589793

    f_function = (x*cos(pi*x))-(1.0d0-(0.6d0*(x**2)))

end function f_function

	
real(kind=8) function fprime_function(x)
    implicit none
    real(kind=8), intent(in) :: x
    real(kind=8) :: pi = 3.141592653589793

    fprime_function = ((-x*pi)*sin(pi*x))+cos(pi*x)+(1.2d0*x)

end function fprime_function


end module functions

# $MYHPSC/homework3/Makefile
# Example usage:
#	$ make test1
#	$ make clean
#	$ make intersections

OBJECTS = functions.o newton.o test1.o
OBJECTS2 = functions.o newton.o intersections.o
MODULES = functions.mod newton.mod

FFLAGS = -g

.PHONY: test1 clean intersections

test1: test1.exe
	./test1.exe

intersections: intersections.exe
	./intersections.exe

test1.exe: $(MODULES) $(OBJECTS)
	gfortran $(FFLAGS) $(OBJECTS) -o test1.exe

intersections.exe: $(MODULES) $(OBJECTS2)
	gfortran $(FFLAGS) $(OBJECTS2) -o intersections.exe

%.o : %.f90
	gfortran $(FFLAGS) -c  $< 

%.mod: %.f90
	gfortran $(FFLAGS) -c $<

clean:
	rm -f *.o *.exe *.mod

problem7/newton.f90, problem7/functions.f90, problem7/test_quartic.f90, problem7/Makefile, problem7/intersections.py (intersection problem using residual error for convergence)

! $MYHPSC/homework3/problem7/newton.f90

module newton

    ! module parameters:
    implicit none
    integer, parameter :: maxiter = 40
    real(kind=8) :: tol
	save

contains

subroutine solve(f, fp, x0, x, iters, debug)

    ! Estimate the zero of f(x) using Newton's method. 
    ! Input:
    !   f:  the function to find a root of
    !   fp: function returning the derivative f'
    !   x0: the initial guess
    !   debug: logical, prints iterations if debug=.true.
    ! Returns:
    !   the estimate x satisfying f(x)=0 (assumes Newton converged!) 
    !   the number of iterations iters
     
    implicit none
    real(kind=8), intent(in) :: x0
    real(kind=8), external :: f, fp
    logical, intent(in) :: debug
    real(kind=8), intent(out) :: x
    integer, intent(out) :: iters

    ! Declare any local variables:
    real(kind=8) :: deltax, fx, fxprime
    integer :: k


    ! initial guess
    x = x0

    if (debug) then
        print 11, x
 11     format('Starting with initial guess: x = ', es22.15)
        endif

    ! Newton iteration to find a zero of f(x) 

    do k=1,maxiter

        ! evaluate function and its derivative:
        fx = f(x)
        fxprime = fp(x)

        if (abs(fx/fxprime) < tol) then
            exit  ! jump out of do loop
            endif

        ! compute Newton increment x:
        deltax = fx/fxprime

        ! update x:
        x = x - deltax

        if (debug) then
            print 12, k,x
 12         format('After', i3, ' iterations, x = ', es22.15)
            endif

        enddo


    if (k > maxiter) then
        ! might not have converged

        fx = f(x)
        if (abs(fx) > tol) then
            print *, '*** Warning: has not yet converged'
            endif
        endif 

    ! number of iterations taken:
    iters = k-1


end subroutine solve

end module newton

! $MYHPSC/homework3/problem7/functions.f90

module functions
	
	! parameters for module
	implicit none
	real(kind=8) :: alpha
	save
	

contains

real(kind=8) function f_sqrt(x)
    implicit none
    real(kind=8), intent(in) :: x

    f_sqrt = x**2 - 4.d0

end function f_sqrt


real(kind=8) function fprime_sqrt(x)
    implicit none
    real(kind=8), intent(in) :: x
    
    fprime_sqrt = 2.d0 * x

end function fprime_sqrt

real(kind=8) function f_function(x)
    implicit none
    real(kind=8), intent(in) :: x
	real(kind=8) :: pi = 3.141592653589793

    f_function = (x*cos(pi*x))-(1.0d0-(0.6d0*(x**2)))

end function f_function

	
real(kind=8) function fprime_function(x)
    implicit none
    real(kind=8), intent(in) :: x
    real(kind=8) :: pi = 3.141592653589793

    fprime_function = ((-x*pi)*sin(pi*x))+cos(pi*x)+(1.2d0*x)

end function fprime_function


real(kind=8) function f_quartic(x)
    implicit none
    real(kind=8), intent(in) :: x

    f_quartic = (x-1)**4 - alpha

end function f_quartic

	
real(kind=8) function fprime_quartic(x)
    implicit none
    real(kind=8), intent(in) :: x

    fprime_quartic = 4*(x-1)**3

end function fprime_quartic

end module functions

! $MYHPSC/homework3/problem7/test_quartic.f90

program test_quartic

    use newton, only: solve, tol
    use functions, only: f_quartic, fprime_quartic, alpha

    implicit none
    real(kind=8) :: x, x0, fx, dfx, xstar
    real(kind=8) :: tolvals(3), alphavals(3)
    integer :: iters, i, j
	logical :: debug         ! set to .true. or .false.

    ! values to test as x0:
    x0 = 4.0d0

    print 10, x0
10	format("Starting with initial guess x = ", es22.15)

    debug = .false.

	! values to use as tolerance:
	tolvals = (/1.0d-5, 1.0d-10, 1.0d-14/)

	! values to use as alpha:
	alphavals = (/1.0d-4, 1.0d-8, 1.0d-12/)

	print *, ' '  ! blank line
	print *, '   alpha        tol        iters          x              f(x)/f`(x)   x-xstar'

    do i=1,3
		do j =1,3
			tol = tolvals(j)
		    alpha = alphavals(i)
			xstar = 1+(alpha**0.25)
		    call solve(f_quartic, fprime_quartic, x0, x, iters, debug)
			fx = f_quartic(x)
			dfx = fprime_quartic(x)
		    print 11, alpha, tol, iters, x, fx/dfx, x-xstar
11 			format(2es13.3, i4, es24.15, 2es13.3)

        enddo
		print *, ' '  ! blank line
	enddo

end program test_quartic

# $MYHPSC/homework3/problem7/Makefile
# Example usage:
#	$ make clean
#	$ make test_quartic

OBJECTS = functions.o newton.o test_quartic.o
MODULES = functions.mod newton.mod

FFLAGS = -g

.PHONY: clean test_quartic

test_quartic: test_quartic.exe
	./test_quartic.exe

test_quartic.exe: $(MODULES) $(OBJECTS)
	gfortran $(FFLAGS) $(OBJECTS) -o test_quartic.exe

%.o : %.f90
	gfortran $(FFLAGS) -c  $< 

%.mod: %.f90
	gfortran $(FFLAGS) -c $<

clean:
	rm -f *.o *.exe *.mod

"""
Script to plot g1(x) and g2(x) versus x
and to show where they intersect with black dots

Example use:
	>>> run intersections
"""

import newton
import matplotlib.pyplot as plt
import numpy as np

guesses = np.array([-4,4])
xarray = np.zeros(2)
yarray = np.zeros(2)
i = 0

for xin in guesses:
	(xout, iters) = newton.solve(xin)
	xarray[i] = xout
	yarray[i] = ((xout-1.0)**4.0)-1.0e-4
	i = i+1

x = np.linspace(-5,5,1000)

g1 = ((x-1.0)**4.0)-1.0e-4
g2 = 0*x

plt.figure(1)
plt.clf()
plt.ylim(-5e-4,5e-4)
plt.xlim(0,2)
plt.xlabel('x (-)')
plt.ylabel('y (-)')
plt.title("Two functions and their intersecting points")
#p1 = plt.plot(x, g1, 'b', label=r'$y(x) = xcos(\pix)$')
p1 = plt.plot(x, g1, 'b', label=r'$y(x) = (x - 1)^4 - 1\times 10^{-4}$')
p2 = plt.plot(x, g2, 'r', label=r'$y(x) = 0$')
p3 = plt.plot(xarray, yarray, 'ko', label=r'$intersection$')
plt.legend(loc="best")
plt.show() #show the plot on the screen
plt.savefig('intersection.png')   # save figure as .png file

2.4. Workspace 4: OpenMP¶

quadrature.f90 (module for integrating a function)

module quadrature

! Performs integration using quadrature integration 
! and creates a table of the error between this and
! the known solution. 

contains

real(kind=8) function trapezoid(f,a,b,n)

	implicit none
	real(kind=8), external :: f
    real(kind=8), intent(in) :: a, b
	integer, intent(in) :: n
	real(kind=8) :: h
	integer :: i
	real(kind=8), dimension(n) :: xpoints, ypoints

	h=(b-a)/(n-1)
!	print *, " "  ! blank line
!	print 12, h

12  format("h: ", es22.14)
	xpoints=(/((i*h+a),i=0,n-1)/)
	ypoints=(/(f(xpoints(i)),i=1,n)/)
	
	trapezoid = h*sum(ypoints)-0.5*h*(ypoints(1)+ypoints(n))
!	do i=1,5	
!		print 13, xpoints(i), ypoints(i)
!13		format("x,y = " es22.14, es22.14)
!	enddo

end function trapezoid

subroutine error_table(f,a,b,nvals,int_true)
	implicit none
	real(kind=8), external :: f
	real(kind=8), intent(in) :: a, b, int_true
	integer, dimension(:), intent(in) :: nvals
	integer :: n
	real(kind=8) :: last_error, int_trap, error, ratio

	print *, "      n  trapezoid               error        ratio"
	last_error = 0.d0
	do n=1,size(nvals)
		int_trap = trapezoid(f, a, b, nvals(n))
		error = abs(int_trap - int_true)
		ratio = last_error / error
		last_error = error
		print 11, nvals(n), int_trap, error, ratio
11		format(i8, es22.14, es13.3, es13.3)
	enddo

end subroutine error_table

end module quadrature

test1.f90 (cubic function - uses quadrature module)

! $MYHPSC/homework4/test1.f90
! The purpose is to print a table for the effect of the number
! of integration points on the accuracy of the integration

! Example use: 
! $ gfortran quadrature.f90 test1.f90
! $ ./a.out

program test1

	! To provide input values to print error_table to the screen

    use quadrature, only: trapezoid, error_table

    implicit none
    real(kind=8) :: a,b,int_true
    integer :: nvals(7), i, n

    a = 0.d0
    b = 2.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0

!    print 10, int_true
! 10 format("true integral: ", es22.14)
!    print *, " "  ! blank line

!	n=5

!	print 11, trapezoid(f,a,b,n)
!11  format("calculated integral: ", es22.14)
!   values of n to test:
    do i=1,7
        nvals(i) = 5 * 2**(i-1)
    enddo

    call error_table(f, a, b, nvals, int_true)

contains

	! To return the the value of the function to integrate, f

    real(kind=8) function f(x)
        implicit none
        real(kind=8), intent(in) :: x 
        
        f = 1.d0 + x**3
    end function f

end program test1

test2.f90 (sinusoidal function - uses quadrature module)

! $MYHPSC/homework4/test2.f90
! The purpose is to print a table for the effect of the number
! of integration points on the accuracy of the integration

! Example use: 
! $ gfortran quadrature.f90 test2.f90
! $ ./a.out

program test2

	! To provide input values to print error_table to the screen

    use quadrature, only: trapezoid, error_table

    implicit none
    real(kind=8) :: a,b,int_true,k
    integer :: nvals(12), i, n

    a = 0.d0
    b = 2.d0
	k = 1000.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0 - &
			   ((cos(k*b) - cos(k*a)) / k)

!   values of n to test:
    do i=1,12
        nvals(i) = 5 * 2**(i-1)
    enddo

    call error_table(f, a, b, nvals, int_true)

contains

	! To return the the value of the function to integrate, f

    real(kind=8) function f(x)
        implicit none
        real(kind=8), intent(in) :: x 
        
        f = 1.d0 + x**3 + sin(1000*x)
    end function f

end program test2

quadrature_omp.f90, test2_omp.f90 (same as above but using OpenMP)

module quadrature_omp

! Performs integration using quadrature integration 
! and creates a table of the error between this and
! the known solution. 

contains

real(kind=8) function trapezoid(f,a,b,n)

	use omp_lib
	implicit none
	real(kind=8), external :: f
    real(kind=8), intent(in) :: a, b
	integer, intent(in) :: n
	real(kind=8) :: h
	integer :: i
	real(kind=8), dimension(n) :: xpoints, ypoints


	h=(b-a)/(n-1)
   
    !$omp parallel private(i)

	!$omp do reduction(+ : trapezoid)

	do i=1,n
		xpoints(i)=(i-1)*h+a
		ypoints(i)=f(xpoints(i))
	enddo

	trapezoid = h*sum(ypoints)-0.5*h*(ypoints(1)+ypoints(n))

	!$omp end parallel

end function trapezoid

subroutine error_table(f,a,b,nvals,int_true)
	implicit none
	real(kind=8), external :: f
	real(kind=8), intent(in) :: a, b, int_true
	integer, dimension(:), intent(in) :: nvals
	integer :: n
	real(kind=8) :: last_error, int_trap, error, ratio

	print *, "      n  trapezoid               error        ratio"
	last_error = 0.d0
	do n=1,size(nvals)
		int_trap = trapezoid(f, a, b, nvals(n))
		error = abs(int_trap - int_true)
		ratio = last_error / error
		last_error = error
		print 11, nvals(n), int_trap, error, ratio
11		format(i8, es22.14, es13.3, es13.3)
	enddo

end subroutine error_table

end module quadrature_omp

! $MYHPSC/homework4/test2_omp.f90
! The purpose is to print a table for the effect of the number
! of integration points on the accuracy of the integration

! Example use: 
! $ gfortran -fopenmp quadrature_omp.f90 test2_omp.f90
! $ ./a.out

program test2_omp

	! To provide input values to print error_table to the screen
	use omp_lib
    use quadrature_omp, only: trapezoid, error_table

    implicit none
    real(kind=8) :: a,b,int_true,k
    integer :: nvals(20), i, n, nthreads
	real(kind=8) :: t1, t2, elapsed_time
    integer(kind=8) :: tclock1, tclock2, clock_rate
    a = 0.d0
    b = 2.d0
	k = 1000.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0 - &
			   ((cos(k*b) - cos(k*a)) / k)

	! Specify number of threads to use:
    nthreads = 1       ! need this value in serial mode
	!$ nthreads = 2    
	!$ call omp_set_num_threads(nthreads)
	!$ print "('Using OpenMP with ',i3,' threads')", nthreads

	! Specify number of threads to use:
	!$ call omp_set_num_threads(2)

!   values of n to test:
    do i=1,20
        nvals(i) = 5 * 2**(i-1)
    enddo

	call system_clock(tclock1)  ! start wall timer
    call cpu_time(t1)   ! start cpu timer

    call error_table(f, a, b, nvals, int_true)

	call cpu_time(t2)   ! end cpu timer

    print 10, t2-t1
10  format("CPU time = ",es13.3, " seconds")
    
    call system_clock(tclock2, clock_rate)
    elapsed_time = float(tclock2 - tclock1) / float(clock_rate)
    print 11, elapsed_time
11  format("Elapsed time = ",es13.3, " seconds")

contains

	! To return the the value of the function to integrate, f

    real(kind=8) function f(x)
        implicit none
        real(kind=8), intent(in) :: x 
        
        f = 1.d0 + x**3 + sin(1000*x)
    end function f

end program test2_omp

2.5. Workspace 5: Load Balancing¶

Makefile (makefile for the 1D solution)

OBJECTS = functions.o quadrature.o test.o
OBJECTS2 = functions.o quadrature2.o test2.o
OBJECTS3 = functions.o quadrature3.o test3.o

FFLAGS = -fopenmp

.PHONY: test test2 test3 clean 

test: test.exe
	./test.exe

test2: test2.exe
	./test2.exe

test3: test3.exe
	./test3.exe

test.exe: $(OBJECTS)
	gfortran $(FFLAGS) $(OBJECTS) -o test.exe

test2.exe: $(OBJECTS2)
	gfortran $(FFLAGS) $(OBJECTS2) -o test2.exe

test3.exe: $(OBJECTS3)
	gfortran $(FFLAGS) $(OBJECTS3) -o test3.exe

%.o : %.f90
	gfortran $(FFLAGS) -c  $< 

clean:
	rm -f *.o *.exe *.mod

functions.f90 (sinusoial function module)

functions.f90

module functions

    use omp_lib
    implicit none
    integer :: fevals(0:7)
    real(kind=8) :: k
    save

contains

    real(kind=8) function f(x)
        implicit none
        real(kind=8), intent(in) :: x 
        integer thread_num

        ! keep track of number of function evaluations by
        ! each thread:
        thread_num = 0   ! serial mode
        !$ thread_num = omp_get_thread_num()
        fevals(thread_num) = fevals(thread_num) + 1
        
        f = 1.d0 + x**3 + sin(k*x)
        
    end function f

end module functions

quadrature.f90, test.f90 (trapezoid rule module + test - serial)

module quadrature

    use omp_lib

contains

real(kind=8) function trapezoid(f, a, b, n)

    ! Estimate the integral of f(x) from a to b using the
    ! Trapezoid Rule with n points.

    ! Input:
    !   f:  the function to integrate
    !   a:  left endpoint
    !   b:  right endpoint
    !   n:  number of points to use
    ! Returns:
    !   the estimate of the integral
     
    implicit none
    real(kind=8), intent(in) :: a,b
    real(kind=8), external :: f
    integer, intent(in) :: n

    ! Local variables:
    integer :: j
    real(kind=8) :: h, trap_sum, xj

    h = (b-a)/(n-1)
    trap_sum = 0.5d0*(f(a) + f(b))  ! endpoint contributions
    
    !$omp parallel do private(xj) reduction(+ : trap_sum) 
    do j=2,n-1
        xj = a + (j-1)*h
        trap_sum = trap_sum + f(xj)
        enddo

    trapezoid = h * trap_sum

end function trapezoid


subroutine error_table(f,a,b,nvals,int_true,method)

    ! Compute and print out a table of errors when the quadrature
    ! rule specified by the input function method is applied for
    ! each value of n in the array nvals.

    implicit none
    real(kind=8), intent(in) :: a,b, int_true
    real(kind=8), external :: f, method
    integer, dimension(:), intent(in) :: nvals

    ! Local variables:
    integer :: j, n
    real(kind=8) :: ratio, last_error, error, int_approx

    print *, "      n         approximation        error       ratio"
    last_error = 0.d0   
    do j=1,size(nvals)
        n = nvals(j)
        int_approx = method(f,a,b,n)
        error = abs(int_approx - int_true)
        ratio = last_error / error
        last_error = error  ! for next n

        print 11, n, int_approx, error, ratio
 11     format(i8, es22.14, es13.3, es13.3)
        enddo

end subroutine error_table


end module quadrature

program test

    use omp_lib

    use quadrature, only: trapezoid, error_table
    use functions, only: f, fevals, k

    implicit none
    real(kind=8) :: a,b,int_true
    integer :: nvals(12)
    integer :: i, nthreads

    real(kind=8) :: t1, t2, elapsed_time
    integer(kind=8) :: tclock1, tclock2, clock_rate

    nthreads = 1      ! for serial mode
    !$ nthreads = 4   ! for openmp
    !$ call omp_set_num_threads(nthreads)
    print 100, nthreads
100 format("Using ",i2," threads")

    fevals = 0

    k = 1.d3   ! functions module variable for function f2
    a = 0.d0
    b = 2.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0 - (1.d0/k) * (cos(k*b) - cos(k*a))

    print 10, int_true
 10 format("true integral: ", es22.14)
    print *, " "  ! blank line

    ! values of n to test:   (larger values than before)
    do i=1,12
        nvals(i) = 50 * 2**(i-1)
        enddo

    ! time the call to error_table:
    call system_clock(tclock1)  
    call cpu_time(t1)
    call error_table(f, a, b, nvals, int_true, trapezoid)
    call cpu_time(t2)   
    call system_clock(tclock2, clock_rate)

    elapsed_time = float(tclock2 - tclock1) / float(clock_rate)
    print *, " "
    print 11, elapsed_time
 11 format("Elapsed time = ",f12.8, " seconds")

    print 12, t2-t1
 12 format("CPU time = ",f12.8, " seconds")

    
    ! print the number of function evaluations by each thread:
    do i=0,nthreads-1
        print 101,  i, fevals(i)
101     format("fevals by thread ",i2,": ",i13)
        enddo

    print 102, sum(fevals)
102 format("Total number of fevals: ",i10)

end program test

quadrature2.f90, test2.f90 (simpson’s rule module + test - serial)

module quadrature2

    use omp_lib

contains

real(kind=8) function simpson(f, a, b, n)

    ! Estimate the integral of f(x) from a to b using the
    ! Simpson Rule with n points.

    ! Input:
    !   f:  the function to integrate
    !   a:  left endpoint
    !   b:  right endpoint
    !   n:  number of points to use
    ! Returns:
    !   the estimate of the integral
     
    implicit none
    real(kind=8), intent(in) :: a,b
    real(kind=8), external :: f
    integer, intent(in) :: n

    ! Local variables:
    integer :: j
    real(kind=8) :: h, simp_sum, xj, xj2

    h = (b-a)/(n-1)
    simp_sum = (1.d0/6.d0)*(f(a) + f(b))  ! endpoint contributions
    simp_sum = simp_sum + (2.d0/3.d0)*f(a+0.5d0*h) !for the half before the loop
    !$omp parallel do private(xj,xj2) reduction(+ : simp_sum) 
    do j=2,n-1
        xj = a + (j-1.d0)*h
		xj2 = a + (j-0.5d0)*h
        simp_sum = simp_sum + (1.d0/3.d0)*f(xj) + (2.d0/3.d0)*f(xj2)
        enddo

    simpson = h * simp_sum

end function simpson


subroutine error_table(f,a,b,nvals,int_true,method)

    ! Compute and print out a table of errors when the quadrature
    ! rule specified by the input function method is applied for
    ! each value of n in the array nvals.

    implicit none
    real(kind=8), intent(in) :: a,b, int_true
    real(kind=8), external :: f, method
    integer, dimension(:), intent(in) :: nvals

    ! Local variables:
    integer :: j, n
    real(kind=8) :: ratio, last_error, error, int_approx

    print *, "      n         approximation        error       ratio"
    last_error = 0.d0   
    do j=1,size(nvals)
        n = nvals(j)
        int_approx = method(f,a,b,n)
        error = abs(int_approx - int_true)
        ratio = last_error / error
        last_error = error  ! for next n

        print 11, n, int_approx, error, ratio
 11     format(i8, es22.14, es13.3, es13.3)
        enddo

end subroutine error_table


end module quadrature2

program test2

    use omp_lib

    use quadrature2, only: simpson, error_table
    use functions, only: f, fevals, k

    implicit none
    real(kind=8) :: a,b,int_true
    integer :: nvals(12)
    integer :: i, nthreads

    real(kind=8) :: t1, t2, elapsed_time
    integer(kind=8) :: tclock1, tclock2, clock_rate

    nthreads = 1      ! for serial mode
    !$ nthreads = 4   ! for openmp
    !$ call omp_set_num_threads(nthreads)
    print 100, nthreads
100 format("Using ",i2," threads")

    fevals = 0

    k = 1.d3   ! functions module variable for function f2
    a = 0.d0
    b = 2.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0 - (1.d0/k) * (cos(k*b) - cos(k*a))

    print 10, int_true
 10 format("true integral: ", es22.14)
    print *, " "  ! blank line

    ! values of n to test:   (larger values than before)
    do i=1,12
        nvals(i) = 50 * 2**(i-1)
        enddo

    ! time the call to error_table:
    call system_clock(tclock1)  
    call cpu_time(t1)
    call error_table(f, a, b, nvals, int_true, simpson)
    call cpu_time(t2)   
    call system_clock(tclock2, clock_rate)

    elapsed_time = float(tclock2 - tclock1) / float(clock_rate)
    print *, " "
    print 11, elapsed_time
 11 format("Elapsed time = ",f12.8, " seconds")

    print 12, t2-t1
 12 format("CPU time = ",f12.8, " seconds")

    
    ! print the number of function evaluations by each thread:
    do i=0,nthreads-1
        print 101,  i, fevals(i)
101     format("fevals by thread ",i2,": ",i13)
        enddo

    print 102, sum(fevals)
102 format("Total number of fevals: ",i10)

end program test2

quadrature3.f90, test3.f90 (trapezoid rule module + test - OpenMP)

module quadrature3

    use omp_lib

contains

real(kind=8) function trapezoid(f, a, b, n)

    ! Estimate the integral of f(x) from a to b using the
    ! Trapezoid Rule with n points.

    ! Input:
    !   f:  the function to integrate
    !   a:  left endpoint
    !   b:  right endpoint
    !   n:  number of points to use
    ! Returns:
    !   the estimate of the integral
     
    implicit none
    real(kind=8), intent(in) :: a,b
    real(kind=8), external :: f
    integer, intent(in) :: n

    ! Local variables:
    integer :: j
    real(kind=8) :: h, trap_sum, xj

    h = (b-a)/(n-1)
    trap_sum = 0.5d0*(f(a) + f(b))  ! endpoint contributions
    
    do j=2,n-1
        xj = a + (j-1)*h
        trap_sum = trap_sum + f(xj)
        enddo

    trapezoid = h * trap_sum

end function trapezoid


subroutine error_table(f,a,b,nvals,int_true,method)

    ! Compute and print out a table of errors when the quadrature
    ! rule specified by the input function method is applied for
    ! each value of n in the array nvals.

    implicit none
    real(kind=8), intent(in) :: a,b, int_true
    real(kind=8), external :: f, method
    integer, dimension(:), intent(in) :: nvals

    ! Local variables:
    integer :: j, n
    real(kind=8) :: ratio, last_error, error, int_approx
	integer thread_num

    print *, "      n         approximation        error       ratio     thread no."
    last_error = 0.d0   
	!$omp parallel do firstprivate(last_error) private(n, int_approx, error, ratio) & 
    !$omp schedule(dynamic)
	do j=size(nvals),1,-1
        n = nvals(j)
        int_approx = method(f,a,b,n)
        error = abs(int_approx - int_true)
        ratio = last_error / error
        last_error = error  ! for next n -- last_error is firstprivate variable - 
! each thead must start with last_error as zero - so ratio will be wrong.
	 	thread_num = 0   ! serial mode
        !$ thread_num = omp_get_thread_num()
        print 11, n, int_approx, error, ratio, thread_num
 11     format(i8, es22.14, es13.3, es13.3, i8)
        enddo

end subroutine error_table


end module quadrature3

program test3

    use omp_lib

    use quadrature3, only: trapezoid, error_table
    use functions, only: f, fevals, k

    implicit none
    real(kind=8) :: a,b,int_true
    integer :: nvals(12)
    integer :: i, nthreads

    real(kind=8) :: t1, t2, elapsed_time
    integer(kind=8) :: tclock1, tclock2, clock_rate

    nthreads = 1      ! for serial mode
    !$ nthreads = 4  ! for openmp
    !$ call omp_set_num_threads(nthreads)
    print 100, nthreads
100 format("Using ",i2," threads")

    fevals = 0

    k = 1.d3   ! functions module variable for function f2
    a = 0.d0
    b = 2.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0 - (1.d0/k) * (cos(k*b) - cos(k*a))

    print 10, int_true
 10 format("true integral: ", es22.14)
    print *, " "  ! blank line

    ! values of n to test:   (larger values than before)
    do i=1,12
        nvals(i) = 50 * 2**(i-1)
        enddo

    ! time the call to error_table:
    call system_clock(tclock1)  
    call cpu_time(t1)
    call error_table(f, a, b, nvals, int_true, trapezoid)
    call cpu_time(t2)   
    call system_clock(tclock2, clock_rate)

    elapsed_time = float(tclock2 - tclock1) / float(clock_rate)
    print *, " "
    print 11, elapsed_time
 11 format("Elapsed time = ",f12.8, " seconds")

    print 12, t2-t1
 12 format("CPU time = ",f12.8, " seconds")

    
    ! print the number of function evaluations by each thread:
    do i=0,nthreads-1
        print 101,  i, fevals(i)
101     format("fevals by thread ",i2,": ",i13)
        enddo

    print 102, sum(fevals)
102 format("Total number of fevals: ",i10)

end program test3

quad2d/functions.f90, quad2d/quadrature.f90, quad2d/test.f90, quad2d/Makefile (integrate a sinusoid with trapezoid rule in 2D with Open MP and load balancing)

module functions

    use omp_lib
    implicit none
    integer :: fevals(0:7), gevals(0:7)
    real(kind=8) :: k
    save

contains

    real(kind=8) function f(g, x, c, d)
        implicit none
		real(kind=8), intent(in) :: c,d
        real(kind=8), intent(in) :: x
		real(kind=8), external :: g

        integer thread_num, n, j
		real(kind=8) :: h, trap_sum, y

        ! keep track of number of function evaluations by 
        ! each thread: 
        thread_num = 0   ! serial mode
        !$ thread_num = omp_get_thread_num()
        fevals(thread_num) = fevals(thread_num) + 1
        n = 1000
		h = (d-c)/(n-1)
    	trap_sum = 0.5d0*(g(x,c) + g(x,d))  ! endpoint contributions
    
!    	!$omp parallel do private(y) reduction(+ : trap_sum) 
    	do j=2,n-1
       		y = c + (j-1)*h
        	trap_sum = trap_sum + g(x,y)
        	enddo

    	f = h * trap_sum

    end function f

	real(kind=8) function g(x,y)
		implicit none
		real(kind=8), intent(in) :: x,y
		integer thread_num_g

		thread_num_g = 0

		!$ thread_num_g = omp_get_thread_num()
		gevals(thread_num_g) = gevals(thread_num_g) + 1

		g = sin(x+y)

	end function g

end module functions

module quadrature

    use omp_lib

contains

real(kind=8) function trapezoid(f, g, a, b, c, d, n)

    ! Estimate the integral of f(x) from a to b using the
    ! Trapezoid Rule with n points.

    ! Input:
    !   f:  the function to integrate
    !   a:  left endpoint x
    !   b:  right endpoint x
    !   c:  left endpoint y
    !   d:  right endpoint y
    !   n:  number of points to use
    ! Returns:
    !   the estimate of the integral
     
    implicit none
    real(kind=8), intent(in) :: a,b,c,d
    real(kind=8), external :: f,g

    integer, intent(in) :: n

    ! Local variables:
    integer :: j
    real(kind=8) :: h, trap_sum, xj

    h = (b-a)/(n-1)
    trap_sum = 0.5d0*(f(g, a, c, d) + f(g, b, c, d))  ! endpoint contributions
    
    !$omp parallel do private(xj) reduction(+ : trap_sum) 
    do j=2,n-1
        xj = a + (j-1)*h
        trap_sum = trap_sum + f(g, xj, c, d)
        enddo

    trapezoid = h * trap_sum

end function trapezoid


subroutine error_table(f,g,a,b,c,d,nvals,int_true,method)

    ! Compute and print out a table of errors when the quadrature
    ! rule specified by the input function method is applied for
    ! each value of n in the array nvals.

    implicit none
    real(kind=8), intent(in) :: a,b,c,d, int_true
    real(kind=8), external :: f, g, method
    integer, dimension(:), intent(in) :: nvals

    ! Local variables:
    integer :: j, n
    real(kind=8) :: ratio, last_error, error, int_approx

    print *, "      n         approximation        error       ratio"
    last_error = 0.d0   
    do j=1,size(nvals)
        n = nvals(j)
        int_approx = method(f,g,a,b,c,d,n)
        error = abs(int_approx - int_true)
        ratio = last_error / error
        last_error = error  ! for next n

        print 11, n, int_approx, error, ratio
 11     format(i8, es22.14, es13.3, es13.3)
        enddo

end subroutine error_table


end module quadrature

program test

    use omp_lib

    use quadrature, only: trapezoid, error_table
    use functions, only: f, g, fevals, gevals, k

    implicit none
    real(kind=8) :: a,b,c,d,int_true
    integer :: nvals(10)
    integer :: i, nthreads

    real(kind=8) :: t1, t2, elapsed_time
    integer(kind=8) :: tclock1, tclock2, clock_rate

    nthreads = 1      ! for serial mode
    !$ nthreads = 4   ! for openmp
    !$ call omp_set_num_threads(nthreads)
    print 100, nthreads
100 format("Using ",i2," threads")

    fevals = 0

    k = 1.d3   ! functions module variable for function f2
    a = 0.d0
    b = 2.d0
	c = 1.d0
	d = 4.d0
!    int_true = (-sin(b+c)+sin(b+d))-(-sin(a+c)+sin(a+d))
	int_true = (-sin(b+d)+sin(b+c))-(-sin(a+d)+sin(a+c))
    print 10, int_true
 10 format("true integral: ", es22.14)
    print *, " "  ! blank line

    ! values of n to test:   (larger values than before)
    do i=1,10
        nvals(i) = 5 * 2**(i-1)
        enddo

    ! time the call to error_table:
    call system_clock(tclock1)  
    call cpu_time(t1)
    call error_table(f, g, a, b, c, d, nvals, int_true, trapezoid)
    call cpu_time(t2)   
    call system_clock(tclock2, clock_rate)

    elapsed_time = float(tclock2 - tclock1) / float(clock_rate)
    print *, " "
    print 11, elapsed_time
 11 format("Elapsed time = ",f12.8, " seconds")

    print 12, t2-t1
 12 format("CPU time = ",f12.8, " seconds")

    
    ! print the number of function evaluations by each thread:
    do i=0,nthreads-1
        print 101,  i, fevals(i)
101     format("fevals by thread ",i2,": ",i13)
        enddo

    print 102, sum(fevals)
102 format("Total number of fevals: ",i10)

    ! print the number of function evaluations by each thread:
    do i=0,nthreads-1
        print 103,  i, gevals(i)
103     format("gevals by thread ",i2,": ",i13)
        enddo

    print 104, sum(gevals)
104 format("Total number of gevals: ",i10)



end program test

2.6. Workspace 6: MPI¶

Makefile (makefile for this directory)

# $MYHPSC/codes/mpi/Makefile

# Usage:  to try out test1.f90, for example:
#     make test FNAME=test1
# or
#     make test FNAME=test1 NUM_PROCS=8

NUM_PROCS ?= 4   # default if not specified on command line or env variable

FC = mpif90
FFLAGS = 
LFLAGS = 

.PHONY: clean, test

FNAME ?= test

%.o : %.f90
	$(FC) $(FFLAGS) -c $<

$(FNAME).exe: $(FNAME).f90
	$(FC) $(FFLAGS) $(LFLAGS) $(FNAME).f90 -o $(FNAME).exe

test: $(FNAME).exe
	mpiexec -n $(NUM_PROCS) ./$(FNAME).exe

clean:
	rm -f *.o *.exe

copyvalue.f90 (passes a value from Process 0 to Processes 1-3 using MPI)

! $UWHPSC/codes/mpi/copyvalue.f90
!
! Set value in Process 0 and copy this through a chain of processes
! and finally print result from Process numprocs-1.
!

program copyvalue

    use mpi

    implicit none

    integer :: i, proc_num, num_procs,ierr
    integer, dimension(MPI_STATUS_SIZE) :: status

    call MPI_INIT(ierr)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, proc_num, ierr)

    if (num_procs==1) then
        print *, "Only one process, cannot do anything"
        call MPI_FINALIZE(ierr)
        stop
        endif


    if (proc_num==0) then
        i = 55
        print '("Process ",i3," setting      i = ",i3)', proc_num, i

        call MPI_SEND(i, 1, MPI_INTEGER, 1, 21, &
                      MPI_COMM_WORLD, ierr)

      else if (proc_num < num_procs - 1) then

        call MPI_RECV(i, 1, MPI_INTEGER, proc_num-1, 21, &
                      MPI_COMM_WORLD, status, ierr)

        print '("Process ",i3," receives     i = ",i3)', proc_num, i
        print '("Process ",i3," sends        i = ",i3)', proc_num, i

        call MPI_SEND(i, 1, MPI_INTEGER, proc_num+1, 21, &
                      MPI_COMM_WORLD, ierr)


      else if (proc_num == num_procs - 1) then

        call MPI_RECV(i, 1, MPI_INTEGER, proc_num-1, 21, &
                      MPI_COMM_WORLD, status, ierr)

        print '("Process ",i3," ends up with i = ",i3)', proc_num, i
      endif

    call MPI_FINALIZE(ierr)

end program copyvalue

copyvalue2.f90 (passes a value from Process 3 to Processes 0-2 using MPI)

! $UWHPSC/codes/mpi/copyvalue2.f90
!
! Set value in Process numprocs-1 and copy this through a chain of processes
! and finally print result from Process 0.
!

program copyvalue2

    use mpi

    implicit none

    integer :: i, proc_num, num_procs,ierr
    integer, dimension(MPI_STATUS_SIZE) :: status

    call MPI_INIT(ierr)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, proc_num, ierr)

    if (num_procs .EQ. 1) then
        print *, "Only one process, cannot do anything"
        call MPI_FINALIZE(ierr)
        stop
        endif


    if (proc_num .EQ. num_procs-1) then
        i = 55
        print '("Process ",i3," setting      i = ",i3)', proc_num, i

        call MPI_SEND(i, 1, MPI_INTEGER, num_procs-2, 21, &
                      MPI_COMM_WORLD, ierr)

      else if (proc_num .GT. 0) then

        call MPI_RECV(i, 1, MPI_INTEGER, proc_num+1, 21, &
                      MPI_COMM_WORLD, status, ierr)

        print '("Process ",i3," receives     i = ",i3)', proc_num, i
        print '("Process ",i3," sends        i = ",i3)', proc_num, i-1

        call MPI_SEND(i-1, 1, MPI_INTEGER, proc_num-1, 21, &
                      MPI_COMM_WORLD, ierr)


      else if (proc_num .EQ. 0) then

        call MPI_RECV(i, 1, MPI_INTEGER, 1, 21, &
                      MPI_COMM_WORLD, status, ierr)

        print '("Process ",i3," ends up with i = ",i3)', proc_num, i
      endif

    call MPI_FINALIZE(ierr)

end program copyvalue2

part2/functions.f90, part2/quadrature.f90, part2/Makefile (the sinusoidal function, quadrature module and makefile)

module functions

    implicit none
    integer :: fevals_proc
    real(kind=8) :: k
    save

contains

    real(kind=8) function f(x)
        implicit none
        real(kind=8), intent(in) :: x 
        integer proc_num, ierr

        ! keep track of number of function evaluations by
        ! each process:
        fevals_proc = fevals_proc + 1
        
        f = 1.d0 + x**3 + sin(k*x)
        
    end function f

end module functions

module quadrature

contains

real(kind=8) function trapezoid(f, a, b, n)

    ! Estimate the integral of f(x) from a to b using the
    ! Trapezoid Rule with n points.

    ! Input:
    !   f:  the function to integrate
    !   a:  left endpoint
    !   b:  right endpoint
    !   n:  number of points to use
    ! Returns:
    !   the estimate of the integral
     
    implicit none
    real(kind=8), intent(in) :: a,b
    real(kind=8), external :: f
    integer, intent(in) :: n

    ! Local variables:
    integer :: j
    real(kind=8) :: h, trap_sum, xj

    h = (b-a)/(n-1)
    trap_sum = 0.5d0*(f(a) + f(b))  ! endpoint contributions
    
    do j=2,n-1
        xj = a + (j-1)*h
        trap_sum = trap_sum + f(xj)
        enddo

    trapezoid = h * trap_sum

end function trapezoid

end module quadrature

OBJECTS = functions.o quadrature.o test.o
OBJECTS2 = functions.o quadrature.o test2.o
FFLAGS = 
NUM_PROCS ?= 4   # default if not specified on command line or env variable

.PHONY: test clean test2

test: test.exe
	mpiexec -n $(NUM_PROCS) test.exe

test2: test2.exe
	mpiexec -n $(NUM_PROCS) test2.exe

test.exe: $(OBJECTS)
	mpif90 $(FFLAGS) $(OBJECTS) -o test.exe

test2.exe: $(OBJECTS2)
	mpif90 $(FFLAGS) $(OBJECTS2) -o test2.exe

%.o : %.f90
	mpif90 $(FFLAGS) -c  $< 

clean:
	rm -f *.o *.exe *.mod

part2/test.f90 (compute the same integral over all Processes using MPI)

program test

    use mpi

    use quadrature, only: trapezoid
    use functions, only: f, fevals_proc, k

    implicit none
    real(kind=8) :: a,b,int_true, int_approx

    integer :: proc_num, num_procs, ierr, n, fevals_total
    integer, dimension(MPI_STATUS_SIZE) :: status

    call MPI_INIT(ierr)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, proc_num, ierr)

    ! All processes set these values so we don't have to broadcast:
    k = 1.d3   ! functions module variable 
    a = 0.d0
    b = 2.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0 - (1.d0/k) * (cos(k*b) - cos(k*a))
    n = 1000

    ! Each process keeps track of number of fevals:
    fevals_proc = 0

    if (proc_num==0) then
        print '("Using ",i3," processes")', num_procs
        print '("true integral: ", es22.14)', int_true
        print *, " "  ! blank line
        endif

    call MPI_BARRIER(MPI_COMM_WORLD,ierr) ! wait for process 0 to print

    ! Note: In this version all processes call trap and repeat the
    !       same work, so each should get the same answer.  
    int_approx = trapezoid(f,a,b,n)
    print '("Process ",i3," with n = ",i8," computes int_approx = ",es22.14)', &
            proc_num,n, int_approx

    call MPI_BARRIER(MPI_COMM_WORLD,ierr) ! wait for all process to print

    ! print the number of function evaluations by each thread:
    print '("fevals by Process ",i2,": ",i13)',  proc_num, fevals_proc

    call MPI_BARRIER(MPI_COMM_WORLD,ierr) ! wait for all process to print
	
	call MPI_REDUCE(fevals_proc, fevals_total, 1, &
			MPI_DOUBLE_PRECISION, MPI_SUM, 0, &
		    MPI_COMM_WORLD, ierr)

    if (proc_num==0) then
        ! This is wrong -- part of homework is to fix this:
		
    !    fevals_total = 0   !! need to fix
        print '("Total number of fevals:  ",i10)', fevals_total
        endif

    call MPI_FINALIZE(ierr)

end program test

part2/test2.f90 (integral using Master-Worker paradigm)Makefile for this directory

program test2

    use mpi

    use quadrature, only: trapezoid
    use functions, only: f, fevals_proc, k

    implicit none
    real(kind=8) :: a, b, int_true, int_approx, dx_sub, int_sub

    integer :: proc_num, num_procs, ierr, n, fevals_total, n_sub, j, jj
    integer, dimension(MPI_STATUS_SIZE) :: status
	real(kind=8), allocatable, dimension(:,:) :: ab_sub
	real(kind=8), allocatable, dimension(:) :: sub, int_array
 
    call MPI_INIT(ierr)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, proc_num, ierr)

    ! All processes set these values so we don't have to broadcast:
    k = 1.d3   ! functions module variable 
    a = 0.d0
    b = 2.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0 - (1.d0/k) * (cos(k*b) - cos(k*a))
    n = 1000
	
	n_sub = num_procs-1 ! Set the number of Worker Processes

    ! Each process keeps track of number of fevals:
    fevals_proc = 0

    if (proc_num .GT. 0) then
        allocate(sub(2))   ! to hold a column vector sent from master
    endif 

    if (proc_num .EQ. 0) then
        print '("Using ",i3," processes")', num_procs
        print '("true integral: ", es22.14)', int_true
        print *, " "  ! blank line

		allocate(ab_sub(2,n_sub))    ! to hold a-b values for Workers
		allocate(int_array(n_sub))	 ! to hold integral from Workers

		ab_sub = 1.d0

		! Process 0 (Master) sends other processes (Workers) left and right edges
		! of the integral      		

		dx_sub = (b-a) / n_sub

    	do j=1,n_sub
      		ab_sub(1,j) = a + (j-1)*dx_sub
      		ab_sub(2,j) = a + j*dx_sub
      		call MPI_SEND(ab_sub(1,j), 2, MPI_DOUBLE_PRECISION, j, j, &
                    MPI_COMM_WORLD, ierr)
      	enddo

   		! Master recieves part of the integral from the Workers and stores it in
		! an array from any process

		do j=1,n_sub
			call MPI_RECV(int_sub, 1, MPI_DOUBLE_PRECISION, &
                    MPI_ANY_SOURCE, MPI_ANY_TAG, &
                    MPI_COMM_WORLD, status, ierr)
			jj = status(MPI_TAG)
			int_array(jj) = int_sub       	
		enddo

		int_approx = sum(abs(int_array))

    endif

	! Process 0 (Master) sends other processes (Workers) left and right edges
	! of the integral 

	if(proc_num .GT. 0) then

		call MPI_RECV(sub, 2, MPI_DOUBLE_PRECISION, &
		              0, MPI_ANY_TAG, &
		              MPI_COMM_WORLD, status, ierr)

		int_sub = trapezoid(f,sub(1),sub(2),n)

		j = status(MPI_TAG)   ! this is the row number
		                      ! (should agree with proc_num)

	 	call MPI_SEND(int_sub, 1, MPI_DOUBLE_PRECISION, &
	 	               0, j, MPI_COMM_WORLD, ierr)

		! print the number of function evaluations by each thread:
		print '("fevals by Process ",i2,": ",i13)', proc_num, fevals_proc
	
	endif

	! Master Process prints out the result:
	call MPI_BARRIER(MPI_COMM_WORLD,ierr) ! wait for all processes to print

    if (proc_num .EQ. 0) then
   		print '("Trapezoid approximation with ",i8," total points: ",es22.14)',&
        n_sub*n, int_approx
    endif

    call MPI_FINALIZE(ierr)

end program test2

2.7. Project 1: 1D Integral and MPI¶

functions.f90, quadrature.f90, test2.f90, Makefile (compute a 1D integral using MPI by dividing up intervals between an indeterminate number of processes)

OBJECTS = functions.o quadrature.o test.o
OBJECTS2 = functions.o quadrature.o test2.o
OBJECTS3 = functions.o quadrature.o test3.o
FFLAGS = 
NUM_PROCS ?= 4   # default if not specified on command line or env variable

.PHONY: test clean test2 test3

test: test.exe
	mpiexec -n $(NUM_PROCS) test.exe

test2: test2.exe
	mpiexec -n $(NUM_PROCS) test2.exe

test3: test3.exe
	mpiexec -n $(NUM_PROCS) test3.exe

test.exe: $(OBJECTS)
	mpif90 $(FFLAGS) $(OBJECTS) -o test.exe

test2.exe: $(OBJECTS2)
	mpif90 $(FFLAGS) $(OBJECTS2) -o test2.exe

test3.exe: $(OBJECTS3)
	mpif90 $(FFLAGS) $(OBJECTS3) -o test3.exe

%.o : %.f90
	mpif90 $(FFLAGS) -c  $< 

clean:
	rm -f *.o *.exe *.mod

module functions

    implicit none
    integer :: fevals_proc
    real(kind=8) :: k
    save

contains

    real(kind=8) function f(x)
        implicit none
        real(kind=8), intent(in) :: x 
        integer proc_num, ierr

        ! keep track of number of function evaluations by
        ! each process:
        fevals_proc = fevals_proc + 1
        
        f = 1.d0 + x**3 + sin(k*x)
        
    end function f

end module functions

module quadrature

contains

real(kind=8) function trapezoid(f, a, b, n)

    ! Estimate the integral of f(x) from a to b using the
    ! Trapezoid Rule with n points.

    ! Input:
    !   f:  the function to integrate
    !   a:  left endpoint
    !   b:  right endpoint
    !   n:  number of points to use
    ! Returns:
    !   the estimate of the integral
     
    implicit none
    real(kind=8), intent(in) :: a,b
    real(kind=8), external :: f
    integer, intent(in) :: n

    ! Local variables:
    integer :: j
    real(kind=8) :: h, trap_sum, xj

    h = (b-a)/(n-1)
    trap_sum = 0.5d0*(f(a) + f(b))  ! endpoint contributions
    
    do j=2,n-1
        xj = a + (j-1)*h
        trap_sum = trap_sum + f(xj)
        enddo

    trapezoid = h * trap_sum

end function trapezoid

end module quadrature

program test2

    use mpi

    use quadrature, only: trapezoid
    use functions, only: f, fevals_proc, k

    implicit none
    real(kind=8) :: a, b, int_true, int_approx, dx_sub, int_sub

    integer :: proc_num, num_procs, ierr, n, fevals_total, n_sub, j, jj
    integer, dimension(MPI_STATUS_SIZE) :: status
	real(kind=8), allocatable, dimension(:,:) :: ab_sub
	real(kind=8), allocatable, dimension(:) :: sub, int_array
 
    call MPI_INIT(ierr)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, proc_num, ierr)

    ! All processes set these values so we don't have to broadcast:
    k = 1.d3   ! functions module variable 
    a = 0.d0
    b = 2.d0
    int_true = (b-a) + (b**4 - a**4) / 4.d0 - (1.d0/k) * (cos(k*b) - cos(k*a))
    n = 1000
	
	n_sub = num_procs-1 ! Set the number of Worker Processes

    ! Each process keeps track of number of fevals:
    fevals_proc = 0

    if (proc_num .GT. 0) then
        allocate(sub(2))   ! to hold a column vector sent from master
    endif 

    if (proc_num .EQ. 0) then
        print '("Using ",i3," processes")', num_procs
        print '("true integral: ", es22.14)', int_true
        print *, " "  ! blank line

		allocate(ab_sub(2,n_sub))    ! to hold a-b values for Workers
		allocate(int_array(n_sub))	 ! to hold integral from Workers

		ab_sub = 1.d0

		! Process 0 (Master) sends other processes (Workers) left and right edges
		! of the integral      		

		dx_sub = (b-a) / n_sub

    	do j=1,n_sub
      		ab_sub(1,j) = a + (j-1)*dx_sub
      		ab_sub(2,j) = a + j*dx_sub
      		call MPI_SEND(ab_sub(1,j), 2, MPI_DOUBLE_PRECISION, j, j, &
                    MPI_COMM_WORLD, ierr)
      	enddo

   		! Master recieves part of the integral from the Workers and stores it in
		! an array from any process

		do j=1,n_sub
			call MPI_RECV(int_sub, 1, MPI_DOUBLE_PRECISION, &
                    MPI_ANY_SOURCE, MPI_ANY_TAG, &
                    MPI_COMM_WORLD, status, ierr)
			jj = status(MPI_TAG)
			int_array(jj) = int_sub       	
		enddo

		int_approx = sum(abs(int_array))

    endif

	! Process 0 (Master) sends other processes (Workers) left and right edges
	! of the integral 

	if(proc_num .GT. 0) then

		call MPI_RECV(sub, 2, MPI_DOUBLE_PRECISION, &
		              0, MPI_ANY_TAG, &
		              MPI_COMM_WORLD, status, ierr)

		int_sub = trapezoid(f,sub(1),sub(2),n)

		j = status(MPI_TAG)   ! this is the row number
		                      ! (should agree with proc_num)

	 	call MPI_SEND(int_sub, 1, MPI_DOUBLE_PRECISION, &
	 	               0, j, MPI_COMM_WORLD, ierr)

		! print the number of function evaluations by each thread:
		print '("fevals by Process ",i2,": ",i13)', proc_num, fevals_proc
	
	endif

	! Master Process prints out the result:
	call MPI_BARRIER(MPI_COMM_WORLD,ierr) ! wait for all processes to print

    if (proc_num .EQ. 0) then
   		print '("Trapezoid approximation with ",i8," total points: ",es22.14)',&
        n_sub*n, int_approx
    endif

    call MPI_FINALIZE(ierr)

end program test2

2.8. Project 2: 20D Integral and the Monte Carlo Method¶

functions.f90, quadrature_mc.f90, random_util.f90, test_quad_mc.f90, Makefile (compute integral using Monte Carlo method for a 20 dimensional integral - serial)
plot_mc_quad_error.py (plot the result)

OBJECTS = random_util.o functions.o quadrature_mc.o test_quad_mc.o
FFLAGS = 

.PHONY: test plot clean clobber

test: test.exe
	./test.exe

test.exe: $(OBJECTS)
	gfortran $(FFLAGS) $(OBJECTS) -o test.exe

%.o : %.f90
	gfortran $(FFLAGS) -c  $< 

mc_quad_error.txt: test.exe
	./test.exe

plot: mc_quad_error.txt
	python plot_mc_quad_error.py

clean:
	rm -f *.o *.exe *.mod

clobber: clean
	rm -f mc_quad_error.txt mc_quad_error.png

! ---
! This module returns the value of a function:
! sum(x**2) over all dimensions
! it also sets the number of evaulations of this function
! ---
module functions
! ---
! gevals is a module variable
! Example use: use functions, only gevals
! ---
    implicit none
    integer :: gevals
    save

contains
! ---
! Multiple dimension function
! Example use: f = g(xin,ndim)
! Inputs: x (vector of length ndim), ndim (number of dimensions)
! Output: g (the value of the function over all dimensions)	
! ---		
    function g(x,ndim)
	! ---
	! Declare parameters and vectors
	! ---
		implicit none
		integer, intent(in) :: ndim
		real(kind=8), intent(in) :: x(ndim)
		real(kind=8) :: g
		integer :: i
	! ---
	! We need to initialise the number of evaluations to zero
	! ---
		g = 0.d0
	! ---
	! Evaluate function in 20 dimensions
	! ---
		do i=1,ndim
		    g = g + x(i)**2
		enddo
	! ---
	! Update the number of function evaluations
	! ---
		gevals = gevals + 1

	end function g

end module functions

! ---
! This module performs Monte Carlo Integration by:
! 1) Generating random input over a number of points
! 2) Evaluating the function at these points for 20 dimensions
! 3) Summing the function and dividing by the number of points for the average
! 4) Multiplying the average by the base for the integral
! ---
module quadrature_mc
! ---
! Uses random seed and functions modules
! ---
	use random_util, only: init_random_seed
	use functions, only: g

contains
! ---
! Monte Carlo Integration function
! Example use: int_mc = quad_mc(g,a,b,ndim,npoints)
! Inputs: g (from functions module), a and b (vectors of length ndim)
!		  ndim (number of dimensions) npoints (number of points)
! Output: quad_mc (the value of the integral)	
! ---		
	function quad_mc(g, a, b, ndim, npoints)
	! ---
	! Declare parameters and arrays.
	! ---
		implicit none
		real(kind=8), external :: g
		real(kind=8), dimension(ndim), intent(in) :: a, b
		integer, intent(in) :: ndim, npoints
		integer :: seed1, i, j
		real(kind=8), dimension(:), allocatable :: f, xin, x
		real(kind=8) :: quad_mc 
	! ---
	! Number of points & dimensions is unknown, so must allocate:
	! ---
		allocate(x(npoints)) ! random number array 0 to 1  
		allocate(xin(ndim))  ! random number array a+(b-a)x
		allocate(f(npoints)) ! result of sum(x**2) over all dimensions
							 ! for n points
	! ---
	! Generate random number array x once per call of quad_mc
	! Length of the array is the number of random points 
	! ---
		seed1 = 12345   ! seed1 = 12345 for repeatable seed
						! seed1 = 0 for random seed
 		call init_random_seed(seed1)
		call random_number(x)
	! ---
	! Generate random input based on a and b and 
	! compute sum(x**2) for all dimensions
	! ---
		do i=1,npoints
			do j=1,ndim
				xin(j)=a(j)+(b(j)-a(j))*x(i)
			enddo
			f(i) = g(xin,ndim)
		enddo
	! ---
	! Return the value of the Monte Carlo Integral by multiplying the
	! average value by the base
	! ---
		quad_mc = product(b-a)*(sum(f)/npoints)

	end function quad_mc

end module quadrature_mc

! ---
! This module returns seed for the random_number function
! ---
module random_util

contains

	subroutine init_random_seed(seed1)
	! ---
	! Declare the parameters
	! ---
		integer, intent(inout) :: seed1
		integer :: clock 
		integer, dimension(:), allocatable :: seed
	! ---
	! Determine how many numbers needed to seed and allocate
	! ---
		call random_seed(size = nseed)  
		allocate(seed(nseed))
	! ---  
	! If seed1 = 0 then set seed1 randomly using the system clock.
	! This will give different sequences of random numbers
	! from different runs, so results are not reproducible.
	! ---
		if (seed1 == 0) then
		    call system_clock(count = clock)
		    seed1 = clock
		endif
	! ---
	! If seed1 > 0 then results will be reproducible since calling this
	! twice with the same seed will initialize the random
	! number generator so the same sequence is generated.
	! Once seed1 is set, set the other elements of the seed array by adding
	! multiples of 37 as suggested in the documentation. 
	! --- 
		do i=1,nseed
		    seed(i) = seed1 + 37*(i-1)  
		enddo
	! ---
	! Seed the generator
	! ---
		call random_seed(put = seed)
	! ---
	! Deallocate the seed
	! ---
		deallocate(seed)

	end subroutine init_random_seed

end module random_util

! ---
! This program computes the Monte Carlo Integral by doubling the number
! of points every time
! ---
program test_quad_mc
! ---
! Uses functions, quadrature_mc and random_util modules
! ---
	use functions, only: g, gevals
    use quadrature_mc, only: quad_mc
    use random_util, only: init_random_seed
! ---
! Declare the parameters
! ---
    implicit none
    integer, parameter :: ndim = 20
    real(kind=8), dimension(ndim) :: a, b
    real(kind=8) :: volume, int_true, int_mc, int_mc_total, error
    integer :: i, seed1, n_total, npoints
! ---
! Need to initialize number of g evaulations to zero, because it's a module variable
! ---
    gevals = 0
! ---
! Open a file to write to
! ---
    open(unit=25, file='mc_quad_error.txt', status='unknown')
! ---
! a and b are the vectors for the integral limits in all dimensions
! and are the same in all directions
! ---
    do i=1,ndim
        a(i) = 2.d0
        b(i) = 4.d0
    enddo
! ---
! Compute the true integral for special case where
! g(x) = sum(x**2) over all dimensions
! True integral = base * height, 
! where height is the sum of the average value of the true integral in all dimensions
! ---
    volume = product(b-a)  ! =  product of b(i)-a(i) of ndim dimensions
    int_true = (volume) * sum((b**3 - a**3) / (3.d0*(b-a)))

    print '("Testing Monte Carlo quadrature in ",i2," dimensions")', ndim
    print '("True integral: ", es22.14)', int_true

! ---
! Start with Monte Carlo using only a few points.
! ---    
	npoints = 10
! ---
! Loop to successively and double the number of points used:
! ---
	do i=1,17
	! ---
    ! Compute integral using Monte Carlo method for a given set of points
	! compute relative error and write these to file with number of points
	! ---     
   		int_mc = quad_mc(g,a,b,ndim,npoints)
        error = abs(int_mc - int_true) / abs(int_true)
		write(25,'(i10,e23.15,e15.6)') npoints, int_mc, error
	! ---
	! Double the number of points
	! ---
        npoints = 2*npoints
    enddo

    print '("Final approximation to integral: ",es22.14)',int_mc
    print *, "Total g evaluations: ",gevals

end program test_quad_mc

from pylab import *

# read in three columns from file and unpack into 3 arrays:
n,int_approx,error = loadtxt('mc_quad_error.txt',unpack=True)

figure(1)
clf()
loglog(n,error,'-o',label='Monte-Carlo')
loglog([1,1e7],[1,sqrt(1e-7)],'k',label='1 / sqrt(N)')
legend()
xlabel('number of MC points used')
ylabel('abs(error)')
title('Log-log plot of relative error in MC quadrature')
savefig('mc_quad_error.png')

2.9. Project 3: Laplace’s Equation and the Monte Carlo Method¶

laplace_mc.py (python version)
problem_description.f90, laplace_mc.f90, mc_walk.f90, random_util.f90, Makefile (compute the solution to Laplace’s Equation using the Monte Carlo Method - serial)
test1.f90, test2.f90 (trials I made along the way)
plot_mc_quad_error.py (plot the result)

"""
Random walk approximate solution to Laplace's equation u_{xx} + u{yy} = 0.

Set demo==True to plot a few random walks interactively.
With demo==False many more walks are used to estimate the solution.

The boundary conditions are set for this test problem by evaluating the
true solution u(x,y) = x^2 - y^2 of Laplace's equation on the
boundary of the domain.

Moreover, the exact solution to the discrete equations
  U_{i-1,j} + U_{i+1,j} + U_{i,j-1} + U_{i,j+1} - 4u_{ij} = 0
with boundary values obtained in this way is easily computed. It is simply
given by evaluating the exact solution at the grid points,
  U_{ij} = x_i^2 - y_j^2
This is because the centered difference approximation to the second
derivative is exact when applied to a quadratic function.

This code implements a random walk on a lattice (rectangular grid) where in
each step the walk goes in one of 4 directions based on the value of a
random number that's uniformly distributed in [0,1].

"""

from pylab import *
from numpy.random import RandomState
import time, sys


# Set plot_walk to:
#   True ==> 10 walks will be done, plotting each for illustration.
#   False ==> Many more walks taken to investigate convergence.

plot_walk = False


# problem description:
ax = 0.
bx = 1.  
ay = 0.4
by = 1.

nx = 19
ny = 11

dx = (bx-ax)/float(nx+1)
dy = (by-ay)/float(ny+1)

debug = False     # to turn on some additional printing



def utrue(x,y):
    """
    Return true solution for a test case where this is known.
    This function should satisfy u_{xx} + u_{yy} = 0.
    """

    utrue = x**2 - y**2
    return utrue


def uboundary(x,y):
    """
    Return u(x,y) if (x,y) is on the boundary.
    """

    if (x-ax)*(x-bx)*(y-ay)*(y-by) != 0:
        print "*** Not on the boundary"
        raise Exception("*** uboundary called at non-boundary point")
        return nan

    # assume we know the true solution and can just evaluate there:
    ub = utrue(x,y)  
    return ub


def plot_initialize(i0,j0):
    """
    Set up plot for demo.
    """

    figure(1,figsize=(7,8))
    clf()
    axes([.1,.3,.8,.6]) # leave room for printing messages
    x = linspace(ax,bx,nx+2)
    y = linspace(ay,by,ny+2)
    X,Y = meshgrid(x,y)  # turn into 2d arrays
    plot(X,Y,'k')
    plot(X.T,Y.T,'k')
    plot([ax+i0*dx],[ay+j0*dy],'ro')
    axis('scaled')
    title("Initial point for random walk")
    draw()
    show()
    time.sleep(1)  # pause to see the initial location
    

def plot_ub(xb,yb,ub):
    """
    Called when we hit the boundary
    """
    plot([xb], [yb], 'ro')
    text(ax, ay-0.2, 'Hit boundary where u = %9.6f' % ub, fontsize=20)
    draw()
    time.sleep(2)


def plot_step(iold, jold, i, j, istep):
    """
    Called each step of the random walk to plot progress
    """

    # plot next segment of walk and new point in red:
    plot([ax+iold*dx, ax+i*dx], [ay+jold*dy, ay+j*dy], 'r',linewidth=2)
    plot([ax+i*dx], [ay+j*dy], 'ro')
    title("After %6i random steps" % (istep+1))
    draw()

    time.sleep(0.05)
    # redraw last segment and point in blue:
    plot([ax+iold*dx, ax+i*dx], [ay+jold*dy, ay+j*dy], 'b',linewidth=2)
    plot([ax+i*dx], [ay+j*dy], 'bo')
    draw()


def random_walk(i0, j0, max_steps):

    """
    Take one random walk starting at (i0,j0) until we reach the boundary or
    exceed max_steps steps.
    Return the value at the boundary point reached, or nan if we failed.
    """

    # starting point:
    i = i0
    j = j0
    if plot_walk:
        plot_initialize(i,j)

    # generate as many random numbers as we could possibly need
    # for this walk, since this is much faster than generating one at a time:
    r = random_generator.uniform(0., 1., size=max_steps)

    if debug:
        print "+++ generated r: ", r
    

    for istep in range(max_steps):
        iold,jold = i,j  # needed for plotting only
    
        # Take the next random step with equal probability in each
        # direction:

        if r[istep] < 0.25:
            i = i-1   # step left
        elif r[istep] < 0.5:
            i = i+1   # step right
        elif r[istep] < 0.75:
            j = j-1   # step down
        else:   
            j = j+1   # step up

        if plot_walk:
            plot_step(iold,jold, i,j, istep)

        # check if we hit the boundary:
        if i*j*(nx+1-i)*(ny+1-j) == 0:
            xb = ax + i*dx
            yb = ay + j*dy
            ub = uboundary(xb, yb)

            if plot_walk:
                plot_ub(xb,yb,ub)
            if debug:
                print "Hit boundary at (%7.3f, %7.3f) after %i7 steps, ub = %7.3f" \
                       % (xb,yb,istep+1,ub)
            break  # end the walk

        if istep==(max_steps-1):
            if debug:
                print "Did not hit boundary after %i steps" % max_steps
            if plot_walk:
                text(0, -0.2, "Did not hit boundary after %i steps" \
                        % max_steps, fontsize=20)
                draw()
                time.sleep(2)
            ub = nan
            
    return ub


def many_walks(i0, j0, max_steps, n_mc):

    ub_sum = 0.   # to accumulate boundary values reached from all walks
    n_success = 0    # to keep track of how many walks reached boundary

    for k in range(n_mc):
        i = i0
        j = j0
        ub = random_walk(i0, j0, max_steps)
        if not isnan(ub):
            # use this result unless walk didn't reach boundary
            ub_sum += ub
            n_success += 1

    u_mc = ub_sum / n_success   # average over successful walks

    return u_mc, n_success


if __name__ == "__main__":
    
    # MAIN PROGRAM
    
    # Try it out from a specific (x0,y0):
    x0 = 0.9
    y0 = 0.6
    
    i0 = round((x0-ax)/dx)
    j0 = round((y0-ay)/dy)

    # shift (x0,y0) to a grid point if it wasn't already:
    x0 = ax + i0*dx
    y0 = ay + j0*dy

    u_true = utrue(x0,y0)

    print "True solution of PDE: u(%7.3f, %7.3f) = %9.5f" % (x0,y0,u_true)
    print "Note: with solution used in demo this is also the solution to the"
    print "      the finite-difference equations on the same grid."

    ans = raw_input("Enter seed for random generator or <return> .... ")
    if ans == "":
        seed = None       # will cause a random seed to be used
    else:
        seed = int(ans)   # convert string returned from raw_input into integer

    random_generator = RandomState(seed)  

    # maximum number of step in each before giving up:
    max_steps = 100*max(nx, ny)

    # initial number of Monte-Carlo walks to take:
    n_mc = 10

    u_mc, n_success = many_walks(i0, j0, max_steps, n_mc)

    error = abs((u_mc - u_true) / u_true)

    print "After %8i random walks, u = %15.9f, rel. error = %15.6e" \
            % (n_success, u_mc, error)
    
    if plot_walk:
        sys.exit()   # quit after a few walks

    # start accumulating totals:
    u_mc_total = u_mc
    n_total = n_success

    for i in range(12):
        u_sum_old = u_mc_total * n_total
        u_mc, n_success = many_walks(i0, j0, max_steps, n_mc)
        u_sum_new = u_mc * n_success
        n_total = n_total + n_success
        u_mc_total = (u_sum_old + u_sum_new) / n_total
        error = abs((u_mc_total - u_true) / u_true)

        print "After %8i random walks, u = %15.9f, rel. error = %15.6e" \
                % (n_total, u_mc_total, error)
        n_mc = 2*n_mc   # double number of trials for next iteration
        
        

OBJECTS = random_util.o problem_description.o mc_walk.o test1.o
OBJECTS2 = random_util.o problem_description.o mc_walk.o test2.o
OBJECTS3 = random_util.o problem_description.o mc_walk.o laplace_mc.o
FFLAGS = 

.PHONY: test test2 laplace

test: test.exe
	./test.exe

test2: test2.exe
	./test2.exe

laplace: laplace.exe
	./laplace.exe

test.exe: $(OBJECTS)
	gfortran $(FFLAGS) $(OBJECTS) -o test.exe

test2.exe: $(OBJECTS2)
	gfortran $(FFLAGS) $(OBJECTS2) -o test2.exe

laplace.exe: $(OBJECTS3)
	gfortran $(FFLAGS) $(OBJECTS3) -o laplace.exe

%.o : %.f90
	gfortran $(FFLAGS) -c  $< 

clean:
	rm -f *.o *.exe *.mod

module problem_description

    implicit none
    real(kind=8), parameter :: ax = 0.0d0
    real(kind=8), parameter :: bx = 1.0d0
    real(kind=8), parameter :: ay = 0.4d0
    real(kind=8), parameter :: by = 1.0d0
    integer, parameter :: nx = 19
    integer, parameter :: ny = 11
    real(kind=8), parameter :: dx = (bx - ax) / (nx+1)
    real(kind=8), parameter :: dy = (by - ay) / (ny+1)
	save

contains

	function utrue(x, y)

		! True solution for comparison, if known.

		implicit none
		real(kind=8), intent(in) :: x,y
		real(kind=8) :: utrue

		utrue = x**2 - y**2

	end function utrue

	function uboundary(x, y)

		! Return u(x,y) assuming (x,y) is a boundary point.

		implicit none
		real(kind=8), intent(in) :: x,y
		real(kind=8) :: uboundary

		if ((x-ax)*(x-bx)*(y-ay)*(y-by) .ne. 0.d0) then
		    print *, "*** Error -- called uboundary at non-boundary point"
		    stop
		    endif

		uboundary = utrue(x,y)   ! assuming we know this

	end function uboundary
    
end module problem_description

program laplace_mc
! ---
! Uses random seed module
! ---
	use mc_walk, only: random_walk, many_walks, nwalks
	use problem_description, only: utrue, ax, ay, bx, by, nx, ny, dx, dy

		real(kind=8) :: u_mc, x0, y0, rel_error
		integer :: n_mc, n_success, i0, j0, max_steps

		! Try it out from a specific (x0,y0):
		x0 = 0.9
		y0 = 0.6
		
		i0 = nint((x0-ax)/dx)
		j0 = nint((y0-ay)/dy)

		! shift (x0,y0) to a grid point if it wasn't already:
		x0 = ax + i0*dx
		y0 = ay + j0*dy

		u_true = utrue(x0,y0)
		!nwalks = 0
		n_mc = 10
    	max_steps = 100*max(nx, ny)
		! do i=1,13
		! call many_walks(i0, j0, max_steps, n_mc, u_mc, n_success)

		! ---
		! Open a file to write to
		! ---
    		open(unit=25, file='laplace_mc.txt', status='unknown')
		!rel_error = abs(u_true-u_mc)/u_true
		print *, ""
		print '(" True solution of PDE at x =",f10.4," y =",f10.4)', x0, y0
		print '(" u_true =   ",es13.3)', u_true
		print *, ""
		print *, "    Walks    Monte Carlo Solution    Relative Error"
		do i=1,13
			nwalks=0
			call many_walks(i0, j0, max_steps, n_mc, u_mc, n_success)
			n_mc = 2.0d0 * n_mc
			rel_error = abs(u_true-u_mc)/u_true
			print '("  ",i8,"      ",es13.3,"        ",es13.3)', nwalks, u_mc, rel_error
			write(25,'(i10,e23.15,e15.6)') nwalks, u_mc, rel_error
		enddo
end program laplace_mc

! ---
! This has two functions random_walk and many_walks
! ---
module mc_walk
! ---
! Uses random seed module
! ---
	use random_util, only: init_random_seed
	use problem_description, only: uboundary, ax, ay, dx, dy, nx, ny
! ---
! nwalks is a module variable
! Example use: use mc_walk, only nwalks
! ---
    implicit none
    integer :: nwalks
	!integer :: iabort 
    save

contains

! ---
! A single random walk
! Inputs: i (step in x), j (step in y), max_steps (maximum number of steps)
! Outputs: ub (value at boundary), iabort (0 for success, 1 for failure)
! Example use: ub, iabort = random_walk(0, 0, 100, ub, iabort) 
!     
!    i:  0,  1,.. nx, nx+1
! j:
! 0,   (BC) (BC) (BC) (BC)
! 1,   (BC)           (BC)
! ...  (BC)           (BC)
! nj   (BC)           (BC)
! nj+1 (BC) (BC) (BC) (BC)
! ---
	subroutine random_walk(i0, j0, max_steps, ub, iabort)
	! ---
	! Declare parameters
	! ---
		implicit none
		integer, intent(in) :: i0, j0, max_steps 	! i0 and j0 are inputs (static)
		integer :: i, j 							! i and j are dummy variables
		real(kind=8), intent(out) :: ub
		integer, intent(out) :: iabort
		real(kind=8), dimension(:), allocatable :: x
		integer :: istep, seed1
		real(kind=8) :: xb, yb
	! ---
	! Number of steps is unknown, so we must allocate:
	! ---
		allocate(x(max_steps)) ! random number array 0 to 1 
	! ---
	! Generate random number array x (0 to 1)
	! ---
		seed1 = 0   ! seed1 = 12345 for repeatable seed
						! seed1 = 0 for random seed
		call init_random_seed(seed1)
		call random_number(x)
		i = i0
		j = j0
		!nwalks = 0
	! ---
	! Go through the vector of random numbers and
	! make moves depending on the value of x
	! ---
		do istep=1,max_steps
			if (x(istep) .LT. 0.25) then
				i = i-1    ! go left
				! print *, "go left"
				! print *, "i=", i
				! print *, "istep=", istep
				! print *, "x=", x(istep)
			else if (x(istep) .LT. 0.5) then
				i = i+1    ! go right
				! print *, "go right"
				! print *, "i=", i
				! print *, "istep=", istep
				! print *, "x=", x(istep)
			else if (x(istep) .LT. 0.75) then
				j = j-1    ! go down
				! print *, "go up"
				! print *, "j=", j
				! print *, "istep=", istep
				! print *, "x=", x(istep)
			else
				j = j+1    ! go up
				! print *, "go down"
				! print *, "j=", j
				! print *, "istep=", istep
				! print *, "x=", x(istep)
			endif
		
		! ---
		! What if we are at the boundary?
		! ---			
			if (i*j*(nx+1 - i)*(ny+1 - j) .EQ. 0) then
				! print *, "we are at the boundary"
				! print *, " "
				xb = ax + i*dx 			! x is ax or ax+i*dx
				yb = ay + j*dy			! y is ay or ay+j*dy
				! print *, "xb=",xb
				! print *, "yb=",yb
				ub = uboundary(xb, yb)	! the boundary value is then computed
				iabort = 0				! success
				go to 99				! end do loop
			endif
		! ---
		! What if we never reach the boundary within max steps?		
		! ---
			if (istep .EQ. max_steps) then 
			! 	print *, "we aborted"
			! 	print *, " "
				ub = 0					! set ub to zero
				iabort = 1				! failure
			endif
		enddo

99  continue


	end subroutine random_walk 

	subroutine many_walks(i0, j0, max_steps, n_mc, u_mc, n_success)
	! ---
	! Declare parameters
	! ---
		implicit none
		integer, intent(in) :: i0, j0, max_steps 		! i0 and j0 are inputs (static)
		integer, intent(in) :: n_mc
		integer :: i, j, k, iabort !, nwalks		! i and j are dummy variables
		real(kind=8), intent(out) :: u_mc
		integer, intent(out) :: n_success
		real(kind=8) :: ub_sum, ub

		ub_sum = 0
		n_success = 0
		!nwalks = 0
		do k=1,n_mc
			!nwalks = 0
			i = i0
			j = j0
			call random_walk(i, j, max_steps, ub, iabort)
			if(iabort .NE. 1) then
				ub_sum = ub_sum + ub
				n_success = n_success + 1
				nwalks = nwalks +1	
			endif
			u_mc = ub_sum/n_success
	
		enddo

	end subroutine many_walks

end module mc_walk

! ---
! This module returns seed for the random_number function
! ---
module random_util

contains

	subroutine init_random_seed(seed1)
	! ---
	! Declare the parameters
	! ---
		integer, intent(inout) :: seed1
		integer :: clock 
		integer, dimension(:), allocatable :: seed
	! ---
	! Determine how many numbers needed to seed and allocate
	! ---
		call random_seed(size = nseed)  
		allocate(seed(nseed))
	! ---  
	! If seed1 = 0 then set seed1 randomly using the system clock.
	! This will give different sequences of random numbers
	! from different runs, so results are not reproducible.
	! ---
		if (seed1 == 0) then
		    call system_clock(count = clock)
		    seed1 = clock
		endif
	! ---
	! If seed1 > 0 then results will be reproducible since calling this
	! twice with the same seed will initialize the random
	! number generator so the same sequence is generated.
	! Once seed1 is set, set the other elements of the seed array by adding
	! multiples of 37 as suggested in the documentation. 
	! --- 
		do i=1,nseed
		    seed(i) = seed1 + 37*(i-1)  
		enddo
	! ---
	! Seed the generator
	! ---
		call random_seed(put = seed)
	! ---
	! Deallocate the seed
	! ---
		deallocate(seed)

	end subroutine init_random_seed

end module random_util

program test1
! ---
! Uses random seed module
! ---
	use mc_walk, only: random_walk, nwalks
		real(kind=8) :: ub
		integer :: iabort, i, j, max_steps
 		integer, parameter :: nx = 19
    	integer, parameter :: ny = 11
		! range of i is 1 to nx
		! range of j is 1 to nj
		
		i = 1
		j = 1
    	max_steps = 100*max(nx, ny)
		call random_walk(i, j, max_steps, ub, iabort) 
		print *, "ub=", ub
		print *, "iabort=", iabort
		print *, "nwalks=", nwalks

end program test1

program test2
! ---
! Uses random seed module
! ---
	use mc_walk, only: random_walk, many_walks, nwalks
		real(kind=8) :: u_mc
		integer :: n_mc, n_success, i, j, max_steps
 		integer, parameter :: nx = 19
    	integer, parameter :: ny = 11
		! range of i is 1 to nx
		! range of j is 1 to nj
		
		i = 1
		j = 1
    	max_steps = 100*max(nx, ny)
		call many_walks(i, j, max_steps, n_mc, u_mc, n_success)
		print *, "u_mc=", u_mc
		print *, "n_success=", n_success

end program test2

from pylab import *

# read in three columns from file and unpack into 3 arrays:
n,int_approx,error = loadtxt('laplace_mc.txt',unpack=True)

figure(1)
clf()
loglog(n,error,'-o',label='Monte-Carlo')
loglog([1,1e7],[1,sqrt(1e-7)],'k',label='1 / sqrt(N)')
legend()
xlabel('number of MC points used')
ylabel('abs(error)')
title('Log-log plot of relative error in MC quadrature')
savefig('mc_quad_error.png')

2.10. Project 4: Laplace’s Equation, MPI and the Monte Carlo Method¶

problem_description.f90, laplace_mc.f90, mc_walk.f90, random_util.f90, Makefile (compute the solution to Laplace’s Equation using the Monte Carlo Method - using MPI)
test1.f90, test1b.f90, test2.f90 (trials I made along the way)
plot_mc_quad_error.py (plot the result)

# $MYHPSC/project/part4/Makefile

OBJECTS = random_util.o problem_description.o mc_walk.o laplace_mc.o
OBJECTS2 = random_util.o problem_description.o mc_walk.o test1.o
OBJECTS2b = random_util.o problem_description.o mc_walk.o test1b.o
OBJECTS3 = random_util.o problem_description.o mc_walk.o test2.o
NUMPROCS ?= 4
FLAGS ?= 

.PHONY: mpi laplace test1 test1b test2

mpi: testquad.exe
	mpiexec -n $(NUMPROCS) ./testquad.exe

test1: test1.exe
	./test1.exe

test1b: test1b.exe
	./test1b.exe

test2: test2.exe
	./test2.exe

testquad.exe: $(OBJECTS)
	mpif90 $(FLAGS) $(OBJECTS) -o testquad.exe

test1.exe: $(OBJECTS2)
	gfortran $(FFLAGS) $(OBJECTS2) -o test1.exe

test1b.exe: $(OBJECTS2b)
	gfortran $(FFLAGS) $(OBJECTS2b) -o test1b.exe

test2.exe: $(OBJECTS3)
	gfortran $(FFLAGS) $(OBJECTS3) -o test2.exe

%.o : %.f90
	mpif90 -c $(FLAGS) $< 

laplace: laplace.exe
	./laplace.exe

laplace.exe: $(OBJECTS)
	gfortran $(FFLAGS) $(OBJECTS) -o laplace.exe

#%.o : %.f90
#	gfortran $(FFLAGS) -c  $<

clean:
	rm -f *.o *.exe *.mod

! ---
! This module returns the true value and the value at the boundary
! ---
module problem_description
! ---
! Initilize parameters
! ---
    implicit none
    real(kind=8), parameter :: ax = 0.0d0
    real(kind=8), parameter :: bx = 1.0d0
    real(kind=8), parameter :: ay = 0.4d0
    real(kind=8), parameter :: by = 1.0d0
    integer, parameter :: nx = 19
    integer, parameter :: ny = 11
    real(kind=8), parameter :: dx = (bx - ax) / (nx+1)
    real(kind=8), parameter :: dy = (by - ay) / (ny+1)
	save

contains
! ---
! Function for the true solution (if known)
! Inputs: x and y
! Output: z
! ---
	function utrue(x, y)

		implicit none
		real(kind=8), intent(in) :: x,y
		real(kind=8) :: utrue

		utrue = x**2 - y**2

	end function utrue
! ---
! Function for the value at the boundary
! Inputs: x and y
! Output: z
! ---
	function uboundary(x, y)

		implicit none
		real(kind=8), intent(in) :: x,y
		real(kind=8) :: uboundary
	! ---
	! If we are not at the boundary we print an error and stop
	! ---
		if ((x-ax)*(x-bx)*(y-ay)*(y-by) .ne. 0.d0) then
		    print *, "*** Error -- called uboundary at non-boundary point"
		    stop
		endif
	! ---
	! If we are at the boundary we compute the solution
	! ---
		uboundary = utrue(x,y)   ! assuming we know this

	end function uboundary
    
end module problem_description

! ---
! This program loops through the Monte Carlo solution of Laplace's equation,
! doubling the number of sample points every loop using MPI
! ---
program laplace_mc
! ---
! ALL PROCESSES: Modules and functions
! ---
    use mpi
	use random_util, only: init_random_seed
	use mc_walk, only: random_walk, many_walks, n_walks, x, n_success_proc
	use problem_description, only: utrue, ax, ay, bx, by, nx, ny, dx, dy
! ---
! ALL PROCESSES: Parameters, vectors and arrays
! ---	
	implicit none
	real(kind=8) :: u_mc, x0, y0, rel_error, u_true
	integer :: n_mc, n_success, i0, j0, max_steps, i, &
	process_num, num_processes, ierr, &
	istep, seed1, n_success_total
	logical :: debug_many
! ---
! ALL PROCESSES: Number of steps is unknown, so we must allocate -
! 200000 is the maximum number of moves
! ---
	n_mc = 10		! This must be at least 3 as there are 3 processes
					! Use n_mc = 10 if not debugging random_walk
	max_steps = 200000*max(nx, ny)
	allocate(x(max_steps)) ! random number array 0 to 1 
! ---- 
! ALL PROCESSES: Initialize MPI
! ----
    call MPI_INIT(ierr)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, num_processes, ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, process_num, ierr) 
! ---
! ALL PROCESSES: Generate random number array x (0 to 1)
! ---
	seed1 = 12345   ! seed1 = 12345 for repeatable seed
					! seed1 = 0 for random seed
	seed1 = seed1 + 97*process_num 	! unique for each process
	print '(" Seed for process",i2, " is ",i5)', process_num, seed1
	call MPI_BARRIER(MPI_COMM_WORLD,ierr) ! wait for processes to print
	call init_random_seed(seed1)
	call random_number(x)
! ----
! ALL PROCESSES: Try it out from a specific (x0,y0):
! ----
	x0 = 0.9
	y0 = 0.6
	i0 = nint((x0-ax)/dx)
	j0 = nint((y0-ay)/dy)
! ----
! ALL PROCESSES: Shift (x0,y0) to a grid point if it wasn't already:
! ----
	x0 = ax + i0*dx
	y0 = ay + j0*dy
	u_true = utrue(x0,y0)
	n_walks = 0
! ---
! MASTER PROCESS: Open a file to write to, provide true solution and print output headers
! ---
	if (process_num .EQ. 0) then
		open(unit=25, file='laplace_mc.txt', status='unknown')
		print '(" True solution of PDE at x =",f10.4," y =",f10.4)', x0, y0
		print '(" u_true =",e23.15)', u_true
		print *, ""
		print *, "  Successes    Monte Carlo Solution   Relative Error"
	endif
! ---
! ALL PROCESSES: Loop through and call many_walks, doubling the number
! of Monte Carlo points by two each time
! ---
! MASTER PROCESS: Compute relative error and print output of many_walks  
! ---
	debug_many = .False.		! Best to remove the do loop if you're trying this
	do i=1,13
		call many_walks(i0, j0, max_steps, n_mc, u_mc, n_success, debug_many)
		n_mc = 2.0d0 * n_mc
		if (process_num .EQ. 0) then
			rel_error = abs(u_true-u_mc)/u_true
			print '("  ",i10," ",e23.15," ",e23.15)', n_success, u_mc, rel_error
			write(25,'(i10,e23.15,e15.6)') n_success, u_mc, rel_error
		endif
	enddo
! ---
! ALL PROCESSES: Wait for process 0 to print
! ---
	call MPI_BARRIER(MPI_COMM_WORLD,ierr) 
! ---
! MASTER PROCESS: Reduce n_success_proc to n_success_total  
! ---
	call MPI_REDUCE(n_success_proc, n_success_total, 1, &
		MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr)
! ---
! MASTER PROCESS: Reduce n_success_proc to n_success_total  
! ---
	if (process_num .EQ. 0) then
		print '("Final approximation to u(x0,y0):",e23.15)', u_mc
		print '("Total number of walks by all processes:",i10)', n_success_total
	endif
! ---
! ALL PROCESSES: Wait for process 0 to print
! ---
	call MPI_BARRIER(MPI_COMM_WORLD,ierr)
! ---
! ALL PROCESSES: Print out process number and number of successes
! ---
	print '("Walks by process ",i1,":",i8)', process_num, n_success_proc
! ----
! ALL PROCESSES: COMPLETE MPI
! ----
    call MPI_FINALIZE(ierr)

end program laplace_mc

! ---
! This module performs a random walk and also many random walks
! ---
module mc_walk
! ---
! ALL PROCESSES: Use random seed module & mpi
! ---
 	use mpi
	use problem_description, only: uboundary, ax, ay, dx, dy, nx, ny
! ---
! ALL PROCESSES: nwalks & n_success_proc are module variables
! Example use: use mc_walk, only nwalks, n_success_proc
! ---
    implicit none
    integer :: n_walks, n_success_proc
	real(kind=8), dimension(:), allocatable :: x
    save

contains
! ---
! ALL PROCESSES: A single random walk
! Inputs: i0 (step in x), j0 (step in y), max_steps (maximum number of steps)
! Outputs: ub (value at boundary), iabort (0 for success, 1 for failure)
! Example use: random_walk(0, 0, 100, ub, iabort) 
!     
!    i:  0,  1,.. nx, nx+1
! j:
! 0,   (BC) (BC) (BC) (BC)
! 1,   (BC)           (BC)
! ...  (BC)           (BC)
! nj   (BC)           (BC)
! nj+1 (BC) (BC) (BC) (BC)
! ---
	subroutine random_walk(i0, j0, max_steps, ub, iabort, debug_random)
	! ---
	! ALL PROCESSES: Declare parameters
	! ---
		implicit none
		integer, intent(in) :: i0, j0, max_steps
		logical, intent(in) :: debug_random
		real(kind=8), intent(out) :: ub
		integer, intent(out) :: iabort
		integer :: i, j, istep, seed1, start
		real(kind=8) :: xb, yb
	! ---
	! ALL PROCESSES: Initialize the walk from it's starting point
	! ---
		i = i0
		j = j0
		start = n_walks
	! ---
	! ALL PROCESSES: Go through the vector of random numbers and
	! make moves depending on the value of x
	! ---
		do istep=start,max_steps
			if (x(istep) .LT. 0.25) then
				i = i-1    ! go left
				if (debug_random) then
					print *, "go left"
					print *, "i=", i
					print *, "istep=", istep
					print *, "x=", x(istep)
				endif
				n_walks = n_walks + 1
			else if (x(istep) .LT. 0.5) then
				i = i+1    ! go right
				if (debug_random) then
					print *, "go right"
					print *, "i=", i
					print *, "istep=", istep
					print *, "x=", x(istep)
				endif
				n_walks = n_walks + 1
			else if (x(istep) .LT. 0.75) then
				j = j-1    ! go down
				if (debug_random) then
					print *, "go up"
					print *, "j=", j
					print *, "istep=", istep
					print *, "x=", x(istep)
				endif
				n_walks = n_walks + 1
			else
				j = j+1    ! go up
				if (debug_random) then
					print *, "go down"
					print *, "j=", j
					print *, "istep=", istep
					print *, "x=", x(istep)
				endif
				n_walks = n_walks + 1
			endif
		! ---
		! ALL PROCESSES: What if we are at the boundary?
		! ---			
			if (i*j*(nx+1 - i)*(ny+1 - j) .EQ. 0) then
				xb = ax + i*dx 			! x is ax or ax+i*dx
				yb = ay + j*dy			! y is ay or ay+j*dy
				ub = uboundary(xb, yb)	! the boundary value is then computed
				iabort = 0				! success
				n_success_proc = n_success_proc + 1
				if (debug_random) then
					print *, "Reached the boundary"
				endif				
				go to 99				! end do loop
			endif
		! ---
		! ALL PROCESSES: What if we never reach the boundary within max steps?		
		! ---
			if (istep .EQ. max_steps) then 
				ub = 0					! set ub to zero
				iabort = 1				! failure
				n_success_proc = n_success_proc
				if (debug_random) then
					print *, "Failed to reach boundary"
				endif	
			endif
		enddo

99  continue

	end subroutine random_walk 
! ---
! ALL PROCESSES: A single random walk
! Inputs: i0 (step in x), j0 (step in y), max_steps (maximum number of steps)
! Outputs: ub (value at boundary), iabort (0 for success, 1 for failure)
! Example use: many_walks(0, 0, 100, ub, iabort)
! ---
	subroutine many_walks(i0, j0, max_steps, n_mc, u_mc, n_success, debug_many)
	! ---
	! ALL PROCESSES: Declare parameters
	! ---
		implicit none
		integer, intent(in) :: i0, j0, max_steps, n_mc
		real(kind=8), intent(out) :: u_mc
		integer, intent(out) :: n_success
		logical, intent(in) :: debug_many
		real(kind=8) :: ub_sum, ub, answer_r
		real(kind=8), allocatable, dimension(:) :: int_array
		integer :: i, j, k, iabort, num_processes, process_num, &
		ierr, m, r, sender, iabort_r, answer, num_sent, next_walk, jj
		integer, dimension(MPI_STATUS_SIZE) :: status
		logical :: debug_random
    	call MPI_COMM_SIZE(MPI_COMM_WORLD, num_processes, ierr)
    	call MPI_COMM_RANK(MPI_COMM_WORLD, process_num, ierr)

		ub_sum = 0
		n_success = 0
		num_sent = 0
		n_success_proc = 0
	! ---
	! MASTER PROCESS
	! ---
		if(process_num .EQ. 0) then
			allocate(int_array(n_mc))	 ! to hold answer from Workers
 			! ---
			! INITIALIZE THE PROCESS
			! Send: 0, Tag: 1, To: Processes 1 to 3
			! Increment number sent by 1
 			! ---
				do m=1,num_processes-1
					call MPI_SEND(MPI_BOTTOM, 0, MPI_DOUBLE_PRECISION, &
				    	m, 1, MPI_COMM_WORLD, ierr)
					num_sent = num_sent + 1
					if (debug_many) then
						print '("Process ",i1," sent tag = ",i1, " to process ",i1)',process_num,1,m
					endif				
				enddo
				if (debug_many) then
					print '("Number of values sent initially ",i1)',num_sent
				endif			
			! ---
			! RECIEVE ALL POSSIBLE CALLS TO RANDOM WALK
			! Receive: answer_r, Tag: iabort (0 or 1), From: Any Process (1 to 3)
			! Sender is now free
			! ---
				do m=1,n_mc
					call MPI_RECV(answer_r, 1, MPI_DOUBLE_PRECISION, &
				        MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, status, ierr)
					iabort_r = status(MPI_TAG)
					sender = status(MPI_SOURCE)    ! Whichever process sent it is now free
				! ---
				! HAVE WE REACHED THE BOUNDARY?
				! If the walk reached the boundary, iabort = 0, so add it to the array
				! ---
					if(iabort_r .EQ. 0) then
						int_array(m) = answer_r
						n_success = n_success + 1
					else
						int_array(m) = 0
						n_success = n_success
					endif
					if (debug_many) then
						print '("Process ",i1," recieved answer = ",es22.14," tag = ",i3, " from process ",i1)', &
						process_num,int_array(m),iabort_r,sender
					endif
				! ---
				! SEND ADDITIONAL CALLS TO RANDOM WALK 
				! If the number sent is less than the number of Monte Carlo points
				! Send: 0, Tag: 1, To: Processes 1 to 3 (whichever is free)
				! Increment number sent by 1
				! ---
					if (num_sent .LT. n_mc) then
						call MPI_SEND(MPI_BOTTOM, 0, MPI_DOUBLE_PRECISION, &
				    	sender, 1, MPI_COMM_WORLD, ierr)
						num_sent = num_sent + 1
						if (debug_many) then
							print '("Process ",i1," sent tag = ",i1, " to process ",i1)', &
							process_num,1,sender
						endif
				! ---
				! END WORKER PROCESSES
				! Send: 0, Tag: 0, To: Processes 1 to 3 (whichever is free)
				! ---
				  	else
						call MPI_SEND(MPI_BOTTOM, 0, MPI_DOUBLE_PRECISION, &
						sender, 0, MPI_COMM_WORLD, ierr)
				  	endif
				enddo
			! ---
			! SUM OVER THE ARRAY TO RETURN U_MC TO MANY_WALKS
			! ---
				ub_sum=sum(int_array) 		! Summation of the values from each process
				u_mc = ub_sum/n_success		! Returned to many_walks
				if (debug_many) then
			 		print '("Returned answer ",es22.14)', u_mc
		 			print '("Number of values sent finally ",i10)', num_sent
				endif
 		endif

	! ---
	! WORKER PROCESS
	! ---
 		if(process_num .GT. 0) then

			do while (.true.)
				! ---
				! RECIEVE CALLS FROM MASTER PROCESS
				! Receive: 0, Tag: 1, From: Master (0)
				! Sender is now free
				! ---
					call MPI_RECV(MPI_BOTTOM, 0, MPI_DOUBLE_PRECISION, &
				        MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, status, ierr)
					r = status(MPI_TAG)
					if (debug_many) then
				 		print '("Process ",i1," recieved tag = ",i1, " from process ",i1)',process_num,r,0
					endif
				! ---
				! THERE IS ANOTHER WALK TO DO
				! Call Random Walk again
				! ---
					if(r .EQ. 1) then
						i = i0
						j = j0
						debug_random = .False.	! Change n_mc to 3 if debugging this
						call random_walk(i, j, max_steps, ub, iabort, debug_random)
					! ---
					! SEND THE ANSWER TO THE MASTER
					! Send: ub, Tag: iabort(0 or 1), To: Master (Process 0)
					! ---
						call MPI_SEND(ub, 1, MPI_DOUBLE_PRECISION, &
							0, iabort, MPI_COMM_WORLD, ierr)
						if (debug_many) then
							print '("Process ",i1," sent answer = ",es22.14, " & tag = ",i3, " to process ",i1)', &
						 	process_num,ub,iabort,0
						endif
				! ---
				! THERE ISN'T ANOTHER WALK TO DO
				! Worker finishes
				! ---
					else if(r .EQ. 0) then
						if (debug_many) then
							print '("Process ",i1," ended ")', process_num
						endif
						go to 100
					endif
			enddo
 		endif
! ----
! COMPLETE MPI
! ----
100 	continue ! Jumps here if the Worker finishes
 
	end subroutine many_walks

end module mc_walk

! ---
! This module returns seed for the random_number function
! ---
module random_util

contains

	subroutine init_random_seed(seed1)
	! ---
	! Declare the parameters
	! ---
		integer, intent(inout) :: seed1
		integer :: clock 
		integer, dimension(:), allocatable :: seed
	! ---
	! Determine how many numbers needed to seed and allocate
	! ---
		call random_seed(size = nseed)  
		allocate(seed(nseed))
	! ---  
	! If seed1 = 0 then set seed1 randomly using the system clock.
	! This will give different sequences of random numbers
	! from different runs, so results are not reproducible.
	! ---
		if (seed1 == 0) then
		    call system_clock(count = clock)
		    seed1 = clock
		endif
	! ---
	! If seed1 > 0 then results will be reproducible since calling this
	! twice with the same seed will initialize the random
	! number generator so the same sequence is generated.
	! Once seed1 is set, set the other elements of the seed array by adding
	! multiples of 37 as suggested in the documentation. 
	! --- 
		do i=1,nseed
		    seed(i) = seed1 + 37*(i-1)  
		enddo
	! ---
	! Seed the generator
	! ---
		call random_seed(put = seed)
	! ---
	! Deallocate the seed
	! ---
		deallocate(seed)

	end subroutine init_random_seed

end module random_util

program test1
! ---
! Uses random seed module
! ---
	use mc_walk, only: random_walk, n_walks, x
	use random_util, only: init_random_seed
	use problem_description, only: utrue, ax, ay, bx, by, nx, ny, dx, dy
		real(kind=8) :: ub, rel_error, x0, y0
		integer :: iabort, i, j, max_steps, seed1, n_success, i0, j0

	! ---
	! Number of steps is unknown, so we must allocate:
	! ---
		max_steps = 100*max(nx, ny)
		allocate(x(max_steps)) ! random number array 0 to 1 

	! Try it out from a specific (x0,y0):
		x0 = 0.9
		y0 = 0.6
		
		i0 = nint((x0-ax)/dx)
		j0 = nint((y0-ay)/dy)

		! shift (x0,y0) to a grid point if it wasn't already:
		x0 = ax + i0*dx
		y0 = ay + j0*dy

		u_true = utrue(x0,y0)

	! ---
	! Generate random number array x (0 to 1)
	! ---
		seed1 = 12345   	! seed1 = 12345 for repeatable seed
						! seed1 = 0 for random seed
		call init_random_seed(seed1)
		call random_number(x)

		i = i0
		j = j0

    	n_success = 0
		print *, ""
		print '(" True solution of PDE at x =",f10.4," y =",f10.4)', x0, y0
		print '(" i =",i8," j =",i8)', i, j
		print '(" u_true =   ",es13.3)', u_true
		call random_walk(i, j, max_steps, ub, iabort) 
		print *, "ub=", ub
		print *, "x=",x(1)
		print *, "iabort=", iabort
		print *, "nwalks=", n_walks

end program test1

program test1b
! ---
! Uses random seed module
! ---
	use mc_walk, only: random_walk, n_walks, x
	use random_util, only: init_random_seed
	use problem_description, only: utrue, ax, ay, bx, by, nx, ny, dx, dy
	real(kind=8) :: ub, rel_error, x0, y0
	integer :: iabort, i, j, max_steps, seed1, n_success, n_mc
!	integer, parameter :: nx = 19
!	integer, parameter :: ny = 11
! ---
! Set the number of samples for the Monte Carlo Solution
! ---
	n_mc=10
! ---
! Number of steps is unknown, so we must allocate:
! ---
	max_steps = 20*max(nx, ny)*n_mc
!	max_steps = 10
	allocate(x(max_steps)) ! random number array 0 to 1 
! ---
! Generate random number array x (0 to 1)
! ---
	seed1 = 12345   	! seed1 = 12345 for repeatable seed
						! seed1 = 0 for random seed
	call init_random_seed(seed1)
	call random_number(x)
	! Try it out from a specific (x0,y0):
		x0 = 0.9
		y0 = 0.6
		
		i0 = nint((x0-ax)/dx)
		j0 = nint((y0-ay)/dy)

		! shift (x0,y0) to a grid point if it wasn't already:
		x0 = ax + i0*dx
		y0 = ay + j0*dy

		u_true = utrue(x0,y0)

		i = i0
		j = j0
! ---
! Record the number of times we successfully hit the boundary
! ---
	n_success = 0
! ---
! Record the number of random walks we make
! ---
	n_walks = 0
! ---
! Loop through and call a single random walk each time, indexing
! You must start the walk using the last successful walk point
! ---
	do k=1,n_mc 0
		call random_walk(i, j, max_steps, ub, iabort) 
		if(iabort .NE. 1) then
			ub_sum = ub_sum + ub
			n_success = n_success + 1
		endif
		u_mc = ub_sum/n_success
		print *, ""
		print *, "ub=", ub
	!	print *, "x=",x(k)
		print *, "iabort=", iabort
		print *, "nwalks=", n_walks
		print *, "n_success=", n_success
		print *, "u_mc=", u_mc
	enddo
end program test1b

program test2
! ---
! Uses random seed module
! ---
	use mc_walk, only: random_walk, many_walks, n_walks, x
	use random_util, only: init_random_seed
	use problem_description, only: utrue, ax, ay, bx, by, nx, ny, dx, dy

		real(kind=8) :: u_mc, rel_error, x0, y0
		integer :: n_mc, n_success, i, j, max_steps, seed1, k, i0, j0
	! ---
	! Number of steps is unknown, so we must allocate
	! 200000 is the maximum number of moves
	! ---
		n_mc = 10
		max_steps = 200000*max(nx, ny)
		!max_steps = 25
		allocate(x(max_steps)) ! random number array 0 to 1 
	! Try it out from a specific (x0,y0):
		x0 = 0.9
		y0 = 0.6
		
		i0 = nint((x0-ax)/dx)
		j0 = nint((y0-ay)/dy)

		! shift (x0,y0) to a grid point if it wasn't already:
		x0 = ax + i0*dx
		y0 = ay + j0*dy

		u_true = utrue(x0,y0)

	! ---
	! Generate random number array x (0 to 1)
	! ---
		seed1 = 12345   	! seed1 = 12345 for repeatable seed
							! seed1 = 0 for random seed
		call init_random_seed(seed1)
		call random_number(x)
	! ---
	! Set the point of interest
	! ---
		i = i0
		j = j0
	! ---
	! Record the number of random walks we make
	! ---
		n_walks = 0
		print *, ""
		print '(" True solution of PDE at x =",f10.4," y =",f10.4)', x0, y0
		print '(" u_true =   ",es13.3)', u_true
		print *, ""
		print *, "    Successes  Walks    Monte Carlo Solution    Relative Error"
		do k=1,13
			call many_walks(i, j, max_steps, n_mc, u_mc, n_success)
			n_mc = 2 * n_mc
			rel_error = abs(u_true-u_mc)/u_true
			print '("  ",i8," ",i8,"      ",es13.3,"        ",es13.3)', &
			n_success, n_walks, u_mc, rel_error
		enddo

end program test2

from pylab import *

# read in three columns from file and unpack into 3 arrays:
n,int_approx,error = loadtxt('laplace_mc.txt',unpack=True)

figure(1)
clf()
loglog(n,error,'-o',label='Monte-Carlo')
loglog([1,1e7],[1,sqrt(1e-7)],'k',label='1 / sqrt(N)')
legend()
xlabel('number of MC points used')
ylabel('abs(error)')
title('Log-log plot of relative error in MC quadrature')
savefig('mc_quad_error.png')