glusterfs.git/rpc/rpc-transport/socket/src, branch v3.11.0rc0

Halo Replication feature for AFR translator

2017-05-08T05:37:07+00:00

	Backport of https://review.gluster.org/16177
		    https://review.gluster.org/17174

Merged both these patches to make sure IPV6 changes don't make it to 3.11 at all.

Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster.  Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.

In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.

There are a few new volume options due to this feature:
  halo-shd-latency:  The threshold below which self-heal daemons will
  consider children (bricks) connected.

  halo-nfsd-latency: The threshold below which NFS daemons will consider
  children (bricks) connected.

  halo-latency: The threshold below which all other clients will
  consider children (bricks) connected.

  halo-min-replicas: The minimum number of replicas which are to
  be enforced regardless of latency specified in the above 3 options.
  If the number of children falls below this threshold the next
  best (chosen by latency) shall be swapped in.

New FUSE mount options:
  halo-latency & halo-min-replicas: As descripted above.

This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.

Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
  the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
  Writes & deletes will be protected via entry-locks as usual preventing
  concurrent writes into files which are undergoing replication.  Read operations
  on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
  other regions.  The take away from this is care should be taken to ensure
  multiple writers do not write the same files resulting in a gfid split-brain
  which will require resolution via split-brain policies (majority, mtime &
  size).  Recommended method for preventing this is using the nfs-auth feature to
  define which region for each share has RW permissions, tiers not in the origin
  region should have RO perms.

TODO:
- Synchronous clients (including the SHD) should choose clients from their own
  region as preferred sources for reads.  Most of the plumbing is in place for
  this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
  (i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
  to synchronously write to

Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests

Reviewers: jackl, dph, cjh, meyering

Reviewed By: meyering

Subscribers: ethanr

Differential Revision: https://phabricator.fb.com/D1272053

Tasks: 4117827

 >Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
 >BUG: 1428061
 >Signed-off-by: Kevin Vigor 
 >Reviewed-on: http://review.gluster.org/16099
 >Reviewed-on: https://review.gluster.org/16177
 >Tested-by: Pranith Kumar Karampuri 
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Pranith Kumar Karampuri 

BUG: 1448416
Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/17192
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kaushal M

rpc: add options to manage socket keepalive lifespan

2017-04-12T09:44:01+00:00

Problem:
Default values for handling socket timeouts for brick responses are
insufficient for aggressive applications such as databases.

Solution:
Add 1:1 gluster options for keepalive, keepalive-idle,
keepalive-interval and keepalive-timeout as per the socket level options
available as per tcp(7) man page.

Default values for options are NOT agressive and continue to be values
which result in default timeout when only the keep alive option is
turned on.

These options are Linux specific and will not be applicable to the
*BSDs.

Change-Id: I2a08ecd949ca8ceb3e090d336ad634341e2dbf14
BUG: 1426059
Signed-off-by: Milind Changire 
Reviewed-on: https://review.gluster.org/16731
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G

transport: allow OS to assign us a port

2017-03-12T15:28:45+00:00

Replace complex and slow port selection code with bind(0) which
already respects privileged ports.

Change-Id: I408a8528e58e1aafcd32eba6a8f1a759e0bf274e
BUG: 1405628
Reviewed-on-release-3.8-fb: http://review.gluster.org/16150
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16178
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur

rpc: avoid logging success on failure

2017-03-07T12:05:38+00:00

Avoid logging Success in the event of failure especially when errno has
no meaningful value w.r.t. the failure. In this case the errno is set to
zero when there's indeed a failure at the RPC level.

Change-Id: If2cc81aa1e590023ed22892dacbef7cac213e591
BUG: 1426032
Signed-off-by: Milind Changire 
Reviewed-on: https://review.gluster.org/16730
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Jeff Darcy

rpc: log more about socket disconnects

2017-03-01T17:27:54+00:00

Log more about the different paths leading to socket disconnect for ease
of debugging.

Log via gf_log_callingfn() in __socket_disconnect() at loglevel TRACE if
socket connection is being torn down.

Change-Id: I1e551c2d685784b5ec747f481179f64d524c0461
BUG: 1426125
Signed-off-by: Milind Changire 
Reviewed-on: https://review.gluster.org/16732
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

socket: Avoid flooding of SSL messages in case of failure/success

2017-02-28T04:27:28+00:00

Problem: Avoid flodding of SSL messages in case of failure/success

Solution: 1) Ideally ssl_setup_connection should be call after getting success on
             connect so update the condition before call socket_spawn in socket_connect.
          2) Change the message type to debug in case of success.

BUG: 1427018
Change-Id: Icb6101e49304d5fe539609b4afacfb1b50b62f84
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/16767
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Raghavendra G

Free iobuf after using it, not before

2017-02-26T16:30:55+00:00

Coverity warn of use after free here. I assume that
under pressure, this might crash the whole process.

Change-Id: I15fb5cfc9b509705e96e4156b739988d816bbef5
BUG: 789278
Signed-off-by: Michael Scherer 
Reviewed-on: https://review.gluster.org/16719
Smoke: Gluster Build System 
Tested-by: Michael Scherer 
Reviewed-by: Shyamsundar Ranganathan 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

socket: GF_REF_PUT should be called outside lock

2017-02-06T11:13:46+00:00

GF_REF_PUT was called inside lock which can call
socket_poller_mayday which inturn tries to take the
same lock. This can lead to deadlock scenario.

BUG: 1410701
Change-Id: Ib3b161bcfeac810bd3593dc04c10ef984f996b17
Signed-off-by: Rajesh Joseph 
Reviewed-on: https://review.gluster.org/16343
Reviewed-by: Raghavendra G 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System

socket: retry connect immediately if it fails

2017-02-02T20:21:36+00:00

Previously we relied on a complex dance of setting flags, shutting
down the socket, tearing stuff down, getting an event, tearing more
stuff down, and waiting for a higher-level retry.  What we really
need, in the case where we're just trying to connect prematurely e.g.
to a brick that hasn't fully come up yet, is a simple retry of the
connect(2) call.

This was discovered by observing failures in ec-new-entry.t with
multiplexing enabled, but probably fixes other random failures as
well.

Change-Id: Ibedb8942060bccc96b02272a333c3002c9b77d4c
BUG: 1385758
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16510
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

libglusterfs+transport+io-threads: fix 256KB stack abuse

2017-02-02T00:59:25+00:00

Some functions were allocating 64K booleans, which are (crazily) mapped to
4-byte ints, for a total of 256KB per call.  Changed to use bitfields instead,
so usage is now only 8KB per call.  This was the impediment to changing the
io-threads stack size, so that has been adjusted too.

Change-Id: I8781c4f2c8f2b830f4535e366995fac8dd0a8653
BUG: 1418095
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/15745
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Shyamsundar Ranganathan